Pricing

Pay per usage

Go to Apify Store

Osf Preprint Scraper

Try for free

Scrapes preprints from the Open Science Framework API by keyword search.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny

Actor stats

Bookmarked

Total users

Monthly active users

12 hours ago

Last modified

OSF Preprint Search Scraper

What it does

Scrapes preprints from the Open Science Framework API by keyword search.

This actor connects to a public API, fetches structured data based on your search criteria, and stores the results in a clean, normalized dataset on the Apify platform. It handles pagination automatically so you can collect large volumes of results without worrying about API limits or offsets. The actor is designed to be robust with built-in error handling, request timeouts, and input validation to ensure reliable data collection every time you run it.

Why use this actor

Manually querying APIs and handling pagination, rate limits, and data normalization is tedious and error-prone. This actor automates the entire process. Simply provide your search parameters, set the maximum number of results you want, and let the actor handle the rest. The data is stored in a structured dataset that you can export as JSON, CSV, or Excel. You can integrate this actor into larger workflows using the Apify API, schedule it for recurring data collection, or trigger it from your own applications via webhooks.

Input parameters

searchQuery (string, required): The search term to query. Default: "test".
maxResults (integer, optional): Maximum number of results to return. Default: 100. Range: 1-1000.

All inputs are validated at startup with sensible defaults applied when values are missing. The actor will log warnings for any misconfigured options and continue with safe defaults rather than failing outright.

Output data

Each result in the dataset contains the following fields:

preprintId: The preprint id of the result
title: The title of the result
description: The description of the result
dateCreated: The date created of the result
dateModified: The date modified of the result
url: The url of the result
provider: The provider of the result

All string fields are null-checked to ensure consistent data quality. Missing or undefined values are stored as null rather than empty strings or undefined values.

Example output

{
    "preprintId": "12345",
    "title": "Example Title",
    "description": "Example Description",
    "dateCreated": "2025-01-15T00:00:00Z",
    "dateModified": "2025-01-15T00:00:00Z",
    "url": "https://example.com/item",
    "provider": "12345"
}

Pricing

This actor is available on the Apify platform with transparent usage-based pricing. Each run incurs a small startup cost of approximately $0.005 per start, plus roughly $0.01 per result collected. Actual costs depend on the number of results, API response times, and memory allocation. You can control costs by setting the maxResults parameter to limit the number of results collected per run. For high-volume use cases, consider running the actor on a schedule during off-peak hours to optimize platform resource usage.

More scrapers from brave_paradise

Check out other data collection actors by brave_paradise on the Apify Store. We offer a wide range of specialized data scrapers and automation tools covering research databases, package registries, job boards, and many more public data sources. Each actor is designed with the same high-quality standards: robust error handling, automatic pagination, clean structured output, and transparent pricing.

Visit the brave_paradise profile on Apify to explore the full collection.

Biorxiv Preprint Scraper

brave_paradise/biorxiv-preprint-scraper

Scrapes preprint paper metadata from the bioRxiv API by date range and optional category.

Donny

medRxiv Scraper

parseforge/medrxiv-scraper

Extract comprehensive preprint data from medRxiv, including titles, authors, abstracts, full text, DOIs, citations, and metadata. Automate access to health-science preprints with structured outputs, ideal for researchers and analysts who need reliable, large-scale article data without manual work.

ParseForge

5.0

(1)

FINRA BrokerCheck Scraper

parseforge/finra-brokercheck-scraper

Supercharge your financial industry research! Automate collection of detailed broker and investment advisor information including employment history, regulatory actions, licensing details, and firm affiliations. Get complete professional backgrounds, disclosures, and compliance data from FINRA.

ParseForge

5.0

(3)

PDF Text Extractor

jirimoravcik/pdf-text-extractor

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

Jiří Moravčík

981

5.0

(1)

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.

codemaster devops

5.0

(1)

Eksi Sozluk Scraper

epctex/eksisozluk-scraper

epctex

5.0

(8)

arXiv Scraper

parseforge/arxiv-scraper

Comprehensive arXiv scraper for extracting scholarly article data across physics, math, CS, biology, finance, statistics, engineering, and economics. Automates access to arXiv’s large preprint archive, providing structured metadata for researchers, academics, and data scientists.

ParseForge

5.0

(1)

FRED Economic Data Scraper

parseforge/fred-scraper

Scrape economic data from the Federal Reserve’s FRED API, including series details, observations, categories, and metadata. Access indicators like CPI, GDP, unemployment rates, and thousands more. Ideal for economists, researchers, and analysts needing automated, up-to-date economic intelligence.

ParseForge

5.0

(1)

Academic Paper Scraper

labrat011/academic-paper-scraper

Search MILLIONS of academic papers from Semantic Scholar and arXiv by keyword, DOI, or citation graph. Returns titles, authors, abstracts, citation counts, and open access PDFs as clean JSON. Works as an MCP tool for AI agents.

Mick

Academic Paper Scraper

constant_quadruped/academic-paper-scraper

Search arXiv and PubMed in one request. Returns unified paper data: titles, authors, abstracts, DOIs, and PDF links. Filter by keywords, authors, categories, and date range. Built-in rate limiting and cross-source deduplication. Export to JSON, CSV, or Excel.