OpenAlex Scraper avatar
OpenAlex Scraper

Pricing

Pay per event

Go to Apify Store
OpenAlex Scraper

OpenAlex Scraper

Optimize your academic research with our comprehensive OpenAlex scraper! Obtain complete academic information, including publication dates, DOI links, open access status, and citation metrics. Ideal for researchers, academic institutions, and data analysts who need accurate data without manual work.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

πŸ“š OpenAlex Scraper

πŸš€ Supercharge your academic research with comprehensive scholarly works data collection from OpenAlex!

This tool collects scholarly publications, research papers, and academic data from OpenAlex's open catalog, supporting both researchers and institutions. It delivers complete publication metadata, author information, citation metrics, open access status, and research concepts automatically. Perfect for researchers, librarians, academic institutions, and data analysts who need accurate, current scholarly data without manual work.

Target Audience: Academic researchers, librarians, information professionals, academic institutions, data analysts, and research companies

Primary Use Cases: Literature reviews, citation tracking, research database building, academic market research, publication monitoring, and research impact analysis

πŸ“Š What Does OpenAlex Scraper Do?

This tool collects scholarly works data from OpenAlex, supporting both research analysis and academic intelligence gathering. It delivers:

  • Complete publication metadata (titles, abstracts, DOIs, publication dates, publication years)
  • Author information with affiliations, ORCID IDs, and corresponding author status
  • Institution data and research networks
  • Citation counts and impact metrics (FWCI, citation percentiles, citation counts by year)
  • Open access status and PDF availability
  • Research concepts and topic classifications with subfields, fields, and domains
  • Primary topics and keywords with relevance scores
  • Related and referenced works
  • Funding information and grants
  • Bibliographic details (volume, issue, pages)
  • Publication locations and sources
  • And much more

Business Value: Perfect for researchers conducting literature reviews, academic institutions tracking publications, librarians managing collections, and data analysts building research databases. Save hours of manual data collection and get comprehensive academic intelligence automatically.

πŸ“‹ Input

To start OpenAlex web scraping, simply fill in the input form. You can scrape OpenAlex using two approaches:

πŸ”— Option 1: Direct URL

  • startUrl - Direct OpenAlex API URL to scrape from. Example: https://api.openalex.org/works?page=1&sort=cited_by_count:desc or https://api.openalex.org/works?filter=authorships.author.id:A2208157607

πŸ” Option 2: Search Filters

  • Title & Abstract - Search query to find works in titles and abstracts. Can be a single string or array of strings (e.g., "machine learning" or ["machine learning", "deep learning"])
  • Author ID - Filter works by specific author(s). Can be a single ID or array of IDs (e.g., "A2208157607" or ["A2208157607", "A5010062957"])
  • Funder ID - Filter works by funding organization(s). Can be a single ID or array of IDs
  • Institution ID - Filter works by specific institution(s). Can be a single ID or array of IDs (e.g., "I114527120" or ["I114527120", "I420000000"])
  • Topic ID - Filter works by research topic/concept(s). Can be a single ID or array of IDs
  • Keyword ID - Filter works by keyword(s). Can be a single ID or array of IDs
  • Type - Filter by work type(s). Can be a single type or array of types (e.g., "article" or ["article", "book"])
  • Open Access - Filter for Open Access works only
  • From Publication Date - Filter works published on or after this date (YYYY-MM-DD or YYYY)
  • To Publication Date - Filter works published on or before this date (YYYY-MM-DD or YYYY)
  • Sort - Sort results by Citation Count, Citation Percentile, FWCI, Title, or Year

Note: When using arrays for filters (Author ID, Funder ID, Institution ID, Topic ID, Keyword ID, Type, Title & Abstract), the scraper uses OR logic (|) to find works matching any of the provided values.

Common Settings:

  • maxItems - Maximum number of works to collect (up to 1,000,000). Free users: Required, maximum 50. Paid users: Optional, maximum 1,000,000. Leave empty for unlimited (paid users only).

Note: You can use either startUrl OR the filter fields, but not both. The startUrl approach is for direct API URLs, while filter fields let you build searches step by step.

Here's what the input configuration looks like in JSON:

{
"startUrl": "https://api.openalex.org/works?page=1&sort=cited_by_count:desc",
"maxItems": 10
}

Or using search filters:

{
"search": "machine learning",
"fromPublicationDate": "2020-01-01",
"toPublicationDate": "2023-12-31",
"isOA": true,
"workType": "article",
"sort": "cited_by_count:desc",
"maxItems": 10
}

Or with multiple values (arrays) for filters that support it:

{
"workType": ["article", "book"],
"topicId": ["C41008148", "C121332964"],
"authorId": ["A2208157607", "A5010062957"],
"search": ["machine learning", "deep learning"],
"isOA": true,
"sort": "cited_by_count:desc",
"maxItems": 10
}

πŸ“Š Output

After the Actor finishes its run, you'll get a dataset with the output. The length of the dataset depends on the amount of results you've set. You can download those results as an Excel, HTML, XML, JSON, and CSV document.

Here's an example of scraped OpenAlex data you'll get if you decide to scrape scholarly publications:

{
"workUrl": "https://openalex.org/W3038568908",
"title": "Radiation Resistant Camera System for Monitoring Deuterium Plasma Discharges in the Large Helical Device",
"doiUrl": "https://doi.org/10.1585/pfr.15.2402039",
"publicationDate": "2020-06-08",
"publicationYear": 2020,
"workType": "article",
"citedByCount": 801215,
"abstract": "Radiation resistant camera system was constructed for monitoring deuterium plasma discharges...",
"language": "en",
"openAccessIsOa": true,
"openAccessStatus": "diamond",
"openAccessUrl": "https://www.jstage.jst.go.jp/article/pfr/15/0/15_2402039/_pdf",
"authors": [
{
"id": "https://openalex.org/A5039600762",
"displayName": "M. Shoji",
"orcid": "https://orcid.org/0000-0003-0655-7347",
"authorPosition": "first",
"isCorresponding": true,
"countries": ["JP"]
}
],
"authorNames": ["M. Shoji"],
"institutions": [
{
"id": "https://openalex.org/I4210108322",
"displayName": "National Institute for Fusion Science",
"countryCode": "JP"
}
],
"institutionNames": ["National Institute for Fusion Science"],
"primaryLocation": {
"sourceDisplayName": "Plasma and Fusion Research",
"sourceType": "journal",
"landingPageUrl": "https://doi.org/10.1585/pfr.15.2402039",
"pdfUrl": "https://www.jstage.jst.go.jp/article/pfr/15/0/15_2402039/_pdf",
"isOa": true
},
"concepts": [
{
"id": "https://openalex.org/C41008148",
"displayName": "Computer science",
"level": 0,
"score": 0.95
}
],
"conceptNames": ["Computer science"],
"fwci": 1214.93,
"citationNormalizedPercentile": {
"value": 1.0,
"is_in_top_1_percent": true,
"is_in_top_10_percent": true
},
"biblio": {
"volume": "15",
"issue": "0",
"first_page": "2402039",
"last_page": "2402039"
},
"scrapedTimestamp": "2024-01-15T10:30:00.000Z"
}

What You Get: Complete scholarly works data including publication details, author information, citation metrics, open access status, research concepts, related works, funding information, and bibliographic details. Perfect for building research databases, conducting literature reviews, tracking citations, and analyzing academic trends.

Download Options: CSV, Excel, or JSON formats for easy analysis and integration with research tools.

⚑ Why Choose the OpenAlex Scraper?

  • Comprehensive Data Collection: Get complete publication metadata, author affiliations, citation counts, research concepts, and funding information in one automated process
  • Time Savings: Automate hours of manual data collection and research database building. What would take weeks of manual work can be completed in minutes
  • Accurate Academic Intelligence: Access OpenAlex's open catalog with over 250M scholarly works from 250k sources with current data
  • Flexible Filtering: Search by keywords, authors, institutions, funders, topics, keywords, dates, citations, and more
  • Open Access Information: Identify open access publications and available PDFs automatically
  • Complete Details: Get full work details including abstracts, bibliographic information, citation metrics, and research networks

Time Savings: Save days of manual research work. Process thousands of publications automatically without manual data entry.

Efficiency: Build comprehensive research databases in minutes instead of weeks. Automate literature reviews and citation tracking.

πŸ”§ How to Use

  1. Sign Up: Create a free account w/ $5 credit (takes 2 minutes)
  2. Find the Scraper: Visit the OpenAlex Scraper page
  3. Set Input: Add your search criteria or startUrl (we'll show you exactly what to enter)
  4. Run It: Click "Start" and let it collect your data
  5. Download Data: Get your results in the "Dataset" tab as CSV, Excel, or JSON

Total Time: Less than 5 minutes from sign up to downloaded data

No Technical Skills Required: Everything is simple and intuitive

πŸ’Ό Business Use Cases

Academic Researchers:

  • Conduct comprehensive literature reviews
  • Track citations and research impact
  • Identify related works and research networks
  • Build personal research databases
  • Monitor research trends in your field

Librarians and Information Professionals:

  • Manage institutional publication collections
  • Track faculty publications and research output
  • Build research databases and catalogs
  • Monitor open access availability
  • Support research services

Academic Institutions:

  • Track institutional research output
  • Analyze publication trends and impact
  • Monitor faculty research activities
  • Build institutional research databases
  • Support research assessment and reporting

Data Analysts and Research Companies:

  • Build comprehensive research databases
  • Analyze academic trends and patterns
  • Conduct market research in academic publishing
  • Track research funding and grants
  • Support business intelligence for academic markets

❓ Frequently Asked Questions

Q: How does it work? A: OpenAlex Scraper is easy to use and requires no technical knowledge. Simply configure your search parameters or provide a startUrl, and let the tool collect the data automatically from OpenAlex's API. The scraper handles pagination automatically and fetches complete details for each work.

Q: How accurate is the data? A: Data comes directly from OpenAlex's open catalog, which indexes over 250M scholarly works from 250k sources. The data is updated regularly and includes comprehensive academic metadata with full work details.

Q: Can I schedule regular runs? A: Yes! Use the Apify API or scheduler to run the scraper regularly and keep your research databases up to date automatically.

Q: What if I need help? A: Our support team is here to help you get the most out of this tool. Contact us through the Apify platform.

Q: Is my data secure? A: Yes, all data processing happens securely on Apify's platform. Your data is private and only accessible to you.

Q: Can I filter by multiple criteria? A: Yes! You can combine multiple filters like author, institution, topic, date range, and open access status to find exactly what you need.

πŸ”Œ Integrate OpenAlex Scraper with any app and automate your workflow

Last but not least, OpenAlex Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform.

These includes:

Alternatively, you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever OpenAlex Scraper successfully finishes a run.

Looking for more data collection tools? Check out these related actors:

ActorDescriptionLink
PR Newswire ScraperCollects press releases and news content from PR Newswirehttps://apify.com/parseforge/pr-newswire-scraper
HubSpot Marketplace ScraperExtracts business app data from HubSpot marketplacehttps://apify.com/parseforge/hubspot-marketplace-scraper
FINRA BrokerCheck ScraperCollects financial broker and advisor information from FINRAhttps://apify.com/parseforge/finra-brokercheck-scraper
GSA eLibrary ScraperExtracts government services and solutions data from GSA eLibraryhttps://apify.com/parseforge/gsa-elibrary-scraper
Hugging Face Model ScraperCollects AI model information from Hugging Facehttps://apify.com/parseforge/hugging-face-model-scraper

Pro Tip: πŸ’‘ Browse our complete collection of data collection actors to find the perfect tool for your business needs.

Need Help? Our support team is here to help you get the most out of this tool.


⚠️ Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by OpenAlex or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.