Build a RAG-Ready Corpus from arXiv
Created by
Ryan Clinton
Filter arXiv into a RAG-safe corpus: drops withdrawn and unstable preprints, keeps papers safe to index, with a corpus-quality score.
ArXiv Preprint Paper Searchryanclinton/arxiv-paper-search
Title
Status
Peer review
Cite risk
+5 fieldsTextNumberBooleanListObject
Input
Search Query:all:transformer architectures
Analysis Mode:rag
Exclude Withdrawn:true
Output fields
Title
Status
Peer review
Cite risk
Action
Venue
Versions
RAG-safe
Abstract
Sign up on Apify01
Create your Apify account to access the ArXiv Preprint Paper Search.
Start the run02
The Actor will start running based on the input automatically.
Receive the output03
Monitor the progress in real-time. You will be notified as soon as your dataset is complete and ready for review.
Integrate into your workflow04
The final output is delivered in JSON, CSV, or Excel format, ready to be plugged into your workflow.
