OSF Preprints Scraper avatar

OSF Preprints Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
OSF Preprints Scraper

OSF Preprints Scraper

This actor extracts preprint metadata from OSF's preprint archive, which hosts over 190,000 open-access scholarly works across disciplines including psychology, medicine, social sciences, engineering, and more. It supports filtering by tags, subjects, and provider, as well as direct ID-based lookup.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

14 days ago

Last modified

Share

Scrape preprints from the Open Science Framework (OSF) using its public REST API — no authentication or proxy required.

What It Does

This actor extracts preprint metadata from OSF's preprint archive, which hosts over 190,000 open-access scholarly works across disciplines including psychology, medicine, social sciences, engineering, and more. It supports filtering by tags, subjects, and provider, as well as direct ID-based lookup.

Key Features

  • No authentication required — uses the public OSF API
  • Two modes: search/browse preprints or fetch specific ones by ID
  • Filter by tags, subjects, or provider (e.g., PsyArXiv, SocArXiv, MedArXiv)
  • Pagination handled automatically — retrieves up to 1,000 records per run
  • Clean structured output with camelCase field names

Input Fields

FieldTypeDescription
modeSelectsearchPreprints (default) or getById
searchQueryStringFilter preprints by tag (e.g. machine learning)
subjectFilterStringFilter by subject text (e.g. Medicine and Health Sciences)
providerStringFilter by provider (e.g. psyarxiv, socarxiv, osf)
preprintIdsArrayList of OSF preprint IDs (for getById mode)
maxItemsIntegerMax number of results (1–1000, default 50)

Provider Examples

Popular OSF preprint providers you can filter by:

Provider IDDescription
osfGeneral OSF preprints
psyarxivPsychology
socarxivSocial sciences
medarxivMedicine
eartharxivEarth sciences
engrxivEngineering
biorxivBiology
ecsarxivElectrochemical Society

Output Fields

Each item in the dataset contains:

FieldTypeDescription
preprintIdStringUnique OSF preprint ID (e.g. abc12_v2)
titleStringTitle of the preprint
descriptionStringAbstract or summary
doiStringDigital Object Identifier
datePublishedStringPublication date (ISO 8601)
dateCreatedStringCreation date (ISO 8601)
dateModifiedStringLast modified date (ISO 8601)
tagsArrayAuthor-assigned tags
isPublishedBooleanWhether the preprint is publicly published
providerStringProvider ID (e.g. psyarxiv)
subjectsArraySubject classifications
licenseStringLicense name (e.g. CC-By Attribution 4.0)
sourceUrlStringDirect URL to the preprint on OSF
recordTypeStringAlways "preprint"
scrapedAtStringTimestamp when the record was scraped

Example Output

{
"preprintId": "snveb_v2",
"title": "Beyond the Resume: Comparing the Predictive Power of Personality Assessments",
"description": "This study examines employee turnover prediction using machine learning...",
"doi": "10.31234/osf.io/snveb_v2",
"datePublished": "2026-05-26T13:58:36.783000Z",
"dateCreated": "2026-05-25T09:31:34.214181Z",
"dateModified": "2026-05-26T13:58:36.814700Z",
"tags": ["Machine learning", "Employee turnover", "Explainable AI"],
"isPublished": true,
"provider": "psyarxiv",
"subjects": ["Industrial and Organizational Psychology", "Quantitative Methods"],
"sourceUrl": "https://osf.io/preprints/psyarxiv/snveb_v2/",
"recordType": "preprint",
"scrapedAt": "2026-05-30T10:00:00.000000+00:00"
}

Use Cases

  • Academic research: Track preprints in specific fields
  • Literature reviews: Collect papers by subject or tag for systematic reviews
  • Trend analysis: Monitor publication rates by subject over time
  • Citation tracking: Gather DOIs for downstream citation analysis
  • Content aggregation: Build databases of open-access scholarly works

FAQs

Q: Does this require an API key? A: No. The OSF public API is freely accessible without authentication.

Q: How many results can I get? A: Up to 1,000 per run. OSF has 190,000+ preprints total.

Q: Can I filter by date? A: Not directly via this actor's inputs. You can filter by tag and subject, then sort results by datePublished in post-processing.

Q: What's the difference between providers? A: Different academic communities host preprint servers on OSF (e.g., PsyArXiv for psychology). Using the provider filter restricts results to that community.

Q: Are all preprints peer-reviewed? A: No — preprints are pre-peer-review. The isPublished field indicates OSF server acceptance, not journal peer review.

Q: How current is the data? A: The OSF API returns live data. New preprints appear within hours of submission.