Arxiv Paper Search
Pricing
Pay per usage
Arxiv Paper Search
Arxiv Paper Search. Search and discover data across multiple sources with structured output. Fast, reliable, and cost-effective.
What it does
ArXiv Paper Search allows you to programmatically search and extract research papers from ArXiv, the world's largest open-access preprint repository. The actor queries the ArXiv API to retrieve paper metadata including titles, author lists, abstracts, publication dates, subject categories, PDF links, and DOI identifiers. You can search by topic keywords, filter by author name, and control how results are sorted. This makes it easy to build datasets of academic research papers for analysis, monitoring, or integration into other workflows.
Why use it
Keeping up with the rapidly growing volume of academic research is a significant challenge for researchers, data scientists, and technology professionals. This actor automates the discovery process by letting you programmatically query ArXiv and receive structured, machine-readable results. Whether you need to track new publications in a specific field, monitor a particular author's output, or build a comprehensive bibliography for a literature review, this actor eliminates the manual effort of browsing and copying data from ArXiv's web interface.
How it works
- The actor builds a search query from the provided keywords and optional author name filter.
- It sends the query to the ArXiv API at
https://export.arxiv.org/api/querywith the specified sort order and result limit. - The XML response from ArXiv is parsed to extract individual paper entries.
- For each paper, it extracts the title, authors, abstract, publication date, subject categories, PDF URL, and DOI.
- All extracted data is structured and pushed to the Apify dataset for download or further processing.
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
searchQuery | String | large language models | Search term for finding papers |
authorName | String | (empty) | Optional filter by author name |
maxResults | Integer | 25 | Maximum number of papers to return (1-100) |
sortBy | String | relevance | Sort order: relevance, lastUpdatedDate, or submittedDate |
Output fields
| Field | Type | Description |
|---|---|---|
title | String | Paper title |
authors | String | Comma-separated list of authors |
abstract | String | Paper abstract |
publishedDate | String | Publication date |
categories | Array | ArXiv subject categories |
pdfUrl | String | Direct link to PDF download |
doi | String | DOI identifier if available |
Cost estimate
This actor uses only API calls with no browser rendering, making it extremely cost-efficient. A typical run costs under $0.001 in platform credits. The default 512 MB memory is sufficient for all queries.
Tips
- Use ArXiv category codes in your search (e.g., "cs.AI" for artificial intelligence, "cs.CL" for computational linguistics) for precise filtering.
- Set
sortBytosubmittedDateto find the most recent papers in your field. - Schedule daily runs to stay updated on new publications in your area of research.
- Combine with Hugging Face Model Scraper to match papers with their corresponding model implementations.
- Also check out OpenAI Status Monitor for monitoring AI service availability.
