Crossref Scraper
Pricing
Pay per event
Crossref Scraper
Transform how you access scholarly data with our Crossref scraper. This intelligent automation tool gathers titles, authors, abstracts, and citations in seconds, giving researchers, librarians, and academics the accurate, up-to-date publication information they need without lifting a finger.
Pricing
Pay per event
Rating
5.0
(2)
Developer

ParseForge
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
π Crossref Scraper
π Supercharge your academic research with our comprehensive Crossref Scraper! Automate collection of scholarly publication data including titles, authors, abstracts, citations, and metadata from millions of academic works. Extract complete bibliographic information with multiple citation formats (BibTeX, RIS, APA, Harvard, IEEE, MLA, Vancouver, Chicago) and full metadata JSON. Perfect for researchers, librarians, and academics who need accurate, current publication data without manual work.
Target Audience: Researchers, academics, librarians, students, publishers, research institutions, bibliographic database managers
Primary Use Cases: Literature reviews, citation management, bibliographic database building, research analysis, publication tracking
π― What Does Crossref Scraper Do?
This tool collects comprehensive academic publication data from Crossref, supporting both direct URL scraping and search-based collection. It delivers:
- Complete publication details (title, DOI, type, publication date, publisher, container title)
- Author information (formatted author names with affiliations)
- Publication metadata (volume, issue, page numbers, ISSN, ISBN)
- Abstracts (cleaned from HTML and XML tags)
- Multiple citation formats (BibTeX, RIS, APA, Harvard, IEEE, MLA, Vancouver, Chicago)
- Citation metrics (reference count, cited by count)
- Complete metadata JSON (full API response for each work)
- Subject classifications
- And more
Business Value: Build comprehensive bibliographic databases, automate citation management, conduct systematic literature reviews, track publication metrics, and analyze research trends with current data from the world's largest scholarly database.
π₯ Input
To start Crossref web scraping, you can use either direct URLs or search filters. The scraper supports two input methods:
Method 1: Direct URL (Recommended) π
- startUrl - Direct Crossref search results URL to scrape. Example:
https://search.crossref.org/search/works?q=software&from_ui=yes#
How to Get Your Start URL:
- Go to search.crossref.org
- Enter your search query in the search box
- Apply any sorting options you want (Relevance, Publication Year, or Default)
- Copy the complete URL from your browser's address bar
- Paste it as your
startUrlinput
Pro Tip: π‘ When you provide a startUrl, the scraper automatically extracts the query and sort parameters from the URL. You can also provide additional input filters that will override the URL parameters if needed.
Method 2: Search Filters π
- query - Search term for Crossref works (required for search-based scraping). Example: 'software', 'machine learning', 'climate change'
- sort - How to sort the results. Options include: Default (no sort), Relevance (score), or Publication Year (year). Default: Default (no sort)
Common Settings
- maxItems - Maximum number of works to collect (up to 1,000,000). Free users: Required, maximum 50. Paid users: Optional, maximum 1,000,000. Leave empty for unlimited (paid users only). Prefill: 10
β οΈ Important Input Rules:
- You can use
startUrlOR search filters, but not both. If you provide both, the scraper will return an error. - The
maxItemsfield is independent and optional for both approaches.
Here's what the input configuration looks like in JSON:
Using Direct URL:
{"startUrl": "https://search.crossref.org/search/works?q=software&from_ui=yes#","maxItems": 10}
Using Search Filters:
{"query": "machine learning","sort": "year","maxItems": 10}
Pro Tip: π‘ Use direct URLs for maximum flexibility. Just copy the URL from Crossref after you've set up your desired search query and sorting options on their website. The scraper will automatically extract all parameters from the URL.
π Output
After the Actor finishes its run, you'll get a dataset with the output. The length of the dataset depends on the amount of results you've set. You can download those results as an Excel, HTML, XML, JSON, and CSV document.
Here's an example of scraped Crossref data you'll get if you decide to scrape publication listings:
{"doi": "10.59350/pz2ks-5sb91","title": "International Research Software Conference","url": "https://doi.org/10.59350/pz2ks-5sb91","type": "Posted Content","publicationDate": "2025-10-15","authors": "Research Software Alliance","publisher": "Front Matter","containerTitle": "Software Impacts","volume": "12","issue": "3","page": "100-115","issn": "2665-9638","isbn": "978-0-123456-78-9","abstract": "7β8 September 2026 | Sheffield, UK & online (co-located with RSECon26) Join a global gathering of leaders and change-makers advancing strategic coordination, sustainability, and collaboration across the research software community.","bibtex": "@article{2025, title={International Research Software Conference}, url={http://dx.doi.org/10.59350/pz2ks-5sb91}, DOI={10.59350/pz2ks-5sb91}, publisher={Front Matter}, year={2025}, month=oct }","ris": ["TY - GENERIC","DO - 10.59350/pz2ks-5sb91","UR - http://dx.doi.org/10.59350/pz2ks-5sb91","TI - International Research Software Conference","PY - 2025","DA - 2025/10/15","PB - Front Matter","ER -"],"apa": "(2025). International Research Software Conference. https://doi.org/10.59350/pz2ks-5sb91","harvard": "2025, International Research Software Conference, Front Matter, viewed <http://dx.doi.org/10.59350/pz2ks-5sb91>.","ieee": "[1]\"International Research Software Conference,\" Oct. 2025, doi: 10.59350/pz2ks-5sb91.","mla": "International Research Software Conference. Oct. 2025. Crossref, https://doi.org/10.59350/pz2ks-5sb91.","vancouver": "1.International Research Software Conference. 2025 Oct 15; Available from: http://dx.doi.org/10.59350/pz2ks-5sb91","chicago": "\"International Research Software Conference,\" October 15, 2025. https://doi.org/10.59350/pz2ks-5sb91.","referenceCount": 0,"citedByCount": 0,"metadataJsonUrl": "https://api.crossref.org/v1/works/10.59350%2Fpz2ks-5sb91?mailto=search%40crossref.org","metadataJson": {"status": "ok","message-type": "work","message-version": "1.0.0","message": {"DOI": "10.59350/pz2ks-5sb91","title": ["International Research Software Conference"],"publisher": "Front Matter","type": "posted-content"}},"scrapedTimestamp": "2025-12-02T16:05:20.993Z"}
What You Get: Complete bibliographic intelligence including publication details, authors, abstracts, multiple citation formats ready for use in reference managers, citation metrics, and full metadata JSON for comprehensive research analysis. The output is optimized for researchers and librarians with all citation formats formatted and ready to use.
Download Options: CSV, Excel, or JSON formats for easy analysis in your research tools
Output Field Order: Fields are organized for optimal user experience. Primary identification first (DOI, title, URL), then core business information (type, publication date, authors, publisher), followed by detailed specifications (volume, issue, page, ISSN, ISBN), supporting information (abstract), citation formats, metrics, and metadata last.
β‘ Why Choose the Crossref Scraper?
- Comprehensive Data Collection: Get 30+ data fields including publication details, authors, abstracts, citation formats, and metrics
- Multiple Citation Formats: Extract 8 different citation formats (BibTeX, RIS, APA, Harvard, IEEE, MLA, Vancouver, Chicago) ready for use in reference managers
- Complete Metadata: Full metadata JSON for each work, providing access to all available Crossref data
- Current Data: Direct API access ensures current information from Crossref
- Unlimited Scale: Collect up to 1,000,000 works with automatic pagination
- Citation Metrics: Includes reference count and cited by count for impact analysis
- Flexible Input: Use direct URLs (with automatic parameter extraction) or search filters for maximum convenience
- Smart Parameter Extraction: Automatically extracts query and sort parameters from direct URLs, saving you time
- Clean Data: Abstracts are automatically cleaned from HTML and XML tags for easy reading
Time Savings: Save 10-15 hours per week compared to manual bibliographic research
Efficiency: Fraction of the time of manual citation collection processes
Data Quality: 30+ comprehensive fields per publication with current accuracy
π§ How to Use
- Sign Up: Create a free account w/ $5 credit (takes 2 minutes)
- Find the Scraper: Visit the Crossref Scraper page
- Set Input: Either paste a direct Crossref search URL (recommended) or configure search filters (query, sort)
- Run It: Click "Start" and let it collect your data
- Download Data: Get your results in the "Dataset" tab as CSV, Excel, or JSON
Total Time: 5 minutes setup, 10-30 minutes for data collection
No Technical Skills Required: Everything is point and click
Business Use Cases
Researchers & Academics:
- Conduct systematic literature reviews
- Build comprehensive bibliographic databases
- Track publication metrics and citations
- Analyze research trends and patterns
Librarians & Information Professionals:
- Maintain current bibliographic databases
- Support researchers with citation data
- Track publication collections
- Generate citation reports
Publishers & Journals:
- Monitor competitor publications
- Track citation metrics
- Analyze publication trends
- Build publication databases
Students & Graduate Researchers:
- Collect references for thesis work
- Build citation databases
- Track relevant publications
- Export citations to reference managers
Research Institutions:
- Monitor institutional publications
- Track research impact
- Build publication repositories
- Support research administration
β Frequently Asked Questions
Q: How does it work? A: Crossref Scraper is easy to use and requires no technical knowledge. Simply provide a direct URL or configure search filters and let the tool collect the data automatically.
Q: How accurate is the data? A: We collect data directly from Crossref's API in real time, ensuring the most current and accurate information available.
Q: Can I get citations in different formats? A: Yes! The scraper automatically extracts 8 different citation formats (BibTeX, RIS, APA, Harvard, IEEE, MLA, Vancouver, Chicago) for each publication, ready to use in reference managers.
Q: What's the difference between using a direct URL vs search filters? A: Direct URLs are recommended because they automatically extract all parameters from the URL. You can also provide additional input filters that will override URL parameters if needed. Search filters give you more control but require you to specify each parameter individually.
Q: Can I schedule regular runs? A: Yes! Use the Apify API to schedule daily, weekly, or monthly runs automatically. Perfect for ongoing publication monitoring and database updates.
Q: What if I need help? A: Our support team is available 24/7. Contact us through the Apify platform.
Q: Is my data secure? A: Absolutely. All data is encrypted in transit and at rest. We never share your data with third parties.
Q: How many citation formats are included? A: The scraper extracts 8 citation formats: BibTeX, RIS (as an array), APA, Harvard, IEEE, MLA, Vancouver, and Chicago. All formats are cleaned and ready to use.
Q: What citation styles are used? A: Harvard uses the Swinburne University of Technology style, and Chicago uses the full note bibliography style. All other formats use standard styles.
Q: Can I get the complete metadata?
A: Yes! Each work includes a metadataJson field containing the complete API response from Crossref, giving you access to all available metadata.
π Recommended Actors
Looking for more data collection tools? Check out these related actors:
| Actor | Description | Link |
|---|---|---|
| GSA eLibrary Scraper | Collects government publication data from GSA eLibrary | https://apify.com/parseforge/gsa-elibrary-scraper |
| GreatSchools Scraper | Collects school ratings and reviews from GreatSchools.org | https://apify.com/parseforge/greatschools-scraper |
| PR Newswire Scraper | Extracts press releases and news data from PR Newswire | https://apify.com/parseforge/pr-newswire-scraper |
| Hubspot Marketplace Scraper | Extracts business app data from HubSpot marketplace | https://apify.com/parseforge/hubspot-marketplace-scraper |
| Hugging Face Model Scraper | Collects machine learning model data from Hugging Face | https://apify.com/parseforge/hugging-face-model-scraper |
Pro Tip: π‘ Browse our complete collection of data collection actors to find the perfect tool for your business needs.
Integrate Crossref Scraper with any app and automate your workflow
Last but not least, Crossref Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform.
These includes:
Alternatively, you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever Crossref Scraper successfully finishes a run.
Need Help? Our support team is here to help you get the most out of this tool.
β οΈ Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Crossref or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.