primaryCategory
Optional
Primary ArXiv category (e.g. cs.AI)
ArXiv Preprint Paper Search
Pricing
from $2.00 / 1,000 paper fetcheds
Search and extract preprint research papers from the ArXiv open-access repository. Query over 2.4 million academic papers across physics, mathematics, computer science, biology, economics, and more with structured JSON output, no API key required.
recordType
Optional
Record discriminator (stable, additive): paper | research-brief | research-map | corpus-assessment | topic-health | role-coverage | landscape-summary | dashboard | paper-delta | no-results | error
schemaVersion
Optional
Output schema version (semver)
arxivId
Optional
ArXiv paper identifier with version suffix (e.g. 2401.02385v2)
title
Optional
Paper title
abstract
Optional
Paper abstract/summary
published
Optional
Original submission date (ISO 8601)
updated
Optional
Most recent revision date (ISO 8601)
primaryCategory
Optional
Primary ArXiv category (e.g. cs.AI)
categories
Optional
All ArXiv categories assigned to the paper
pdfUrl
Optional
Direct URL to the PDF
absUrl
Optional
URL to the abstract page
doi
Optional
Digital Object Identifier (an arXiv-minted 10.48550/arXiv DOI is not external publication)
journalRef
Optional
Journal reference — present when the preprint was published in a venue
comment
Optional
Author-provided comment (venue acceptance, page count, code links)
extractedAt
Optional
ISO timestamp when the record was extracted
versionCount
Optional
Number of arXiv versions posted (from the vN suffix)
publicationStatus
Optional
Stable enum: published (journal_ref or external DOI) | accepted (venue acceptance in the comment) | preprint
peerReviewStatus
Optional
Stable enum: published | accepted | preprint-only
venue
Optional
Detected publication or acceptance venue, when present
hasCode
Optional
A code/repository link was found in the comment or abstract
codeUrl
Optional
First detected code repository URL, if any
withdrawn
Optional
Author comment marks the paper as withdrawn
revisionActivity
Optional
Stable enum: single-version | revised | heavily-revised
recencyDays
Optional
Days since the paper was first posted
freshness
Optional
Stable enum: cutting-edge (<90d) | recent (<1y) | established (<3y) | older
crossListed
Optional
Paper carries more than one ArXiv category
interdisciplinary
Optional
Categories span more than one top-level archive (e.g. cs + math)
relevanceScore
Optional
0-100 search-relevance axis (rank-derived) — how well the paper matched the query
maturityScore
Optional
0-100 trust/maturity axis — peer-review status, code, revisions, collaboration. Distinct from relevanceScore.
maturityFactors
Optional
Breakdown of the maturity score: [{ factor, points }]
maturityTier
Optional
Stable enum: peer-reviewed | venue-accepted | established-preprint | fresh-preprint
priorityScore
Optional
0-100 mode-weighted ordering scalar (the field to sort by)
citationRisk
Optional
Stable enum: low | medium | high — risk of citing this paper as-is
citationRiskReasons
Optional
Plain-English reasons behind the citation-risk level
ragSafe
Optional
Safe to index into a RAG/LLM corpus (has a substantive abstract, not withdrawn)
ragSafeReason
Optional
Why the paper is or is not RAG-safe
recommendedAction
Optional
Stable enum: cite-safely | cite-as-accepted | cite-with-caveat | verify-publication | read-first | skip-withdrawn
why
Optional
Plain-English reasons for the recommendation
signalReason
Optional
Reasoning chain behind the classification (publication status, peer review, versions, code, recency, maturity)
isLandmark
Optional
Earliest paper of its top-level archive within this result set
landmarkReason
Optional
Why the paper was tagged a landmark
summary
Optional
LLM-quotable one-line summary (≤280 chars)
canonicalArxivId
Optional
ArXiv ID without the version suffix (stable identity across versions)
version
Optional
Version number of this record (from the vN suffix)
statusConfidence
Optional
Confidence in the publication-status classification: high | medium | low
venueNormalized
Optional
Parsed venue: { raw, venueName, venueYear, venueType (conference|journal|workshop|unknown), confidence }
categoryNames
Optional
Human-readable names for the ArXiv category codes (null-safe; code echoed when unknown)
codeUrls
Optional
All detected code/repository URLs
codeHost
Optional
Host of the first detected code URL (e.g. github.com)
paperLifecycle
Optional
Lifecycle flags: { withdrawn, superseded, replacementHint, errataHint, statusConfidence }
citation
Optional
Deterministic citation companion: { preferredCitationTarget, citationWarning, versionAwareCitationNote, bibtexKey, bibtex } (when includeCitationFields). BibTeX only — style-formatted strings are intentionally not generated.
evidence
Optional
Inspectable evidence ledger: { statusSignals[], riskSignals[], scoreTrace[] } (when includeEvidenceLedger)
paperType
Optional
Deterministic paper type: survey | benchmark | dataset | methodology | theoretical | empirical | position-paper | unknown
isSurvey
Optional
True when the title/abstract identify a survey/review/overview
surveyConfidence
Optional
Confidence the paper is a survey: high | medium | low
foundationalCandidate
Optional
Earliest ESTABLISHED (published/accepted or revised) paper of its archive WITHIN THIS RESULT SET — a metadata-only candidate, not a citation-based seminal/importance claim
researchRole
Optional
The paper's role in a reading plan: survey | foundational | benchmark | dataset | state-of-art | reproducible | emerging | historical | methodology
foundationalReason
Optional
Why the paper was tagged foundational
firstAuthor
Optional
First author name
lastAuthor
Optional
Last author name (often the senior/PI author)
largeCollaboration
Optional
True when 10+ authors (large collaboration)
role
Optional
On role-coverage records: the research role (survey / foundational / benchmark / dataset / methodology / state-of-art / reproducible)
status
Optional
On role-coverage records: covered | missing
count
Optional
On role-coverage records: number of papers in the result set with this role