Open Citations Scraper avatar

Open Citations Scraper

Pricing

Pay per event

Go to Apify Store
Open Citations Scraper

Open Citations Scraper

Comprehensive OpenCitations scraper for extracting citation and reference data from OpenCitations API. Perfect for researchers, academics, and data scientists who need automated access to citation networks, bibliographic metadata, and citation analysis data.

Pricing

Pay per event

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

a day ago

Last modified

Share

ParseForge Banner

📚 OpenCitations Scraper

🚀 Extract citation networks and bibliographic metadata from OpenCitations in seconds. Search by DOI, PMID, or OMID. No coding, no API keys required.

🕒 Last updated: 2026-04-23 · 📊 20 fields · 🔍 Citations and references modes · 📄 Optional detailed metadata

OpenCitations is an open scholarly infrastructure providing free access to citation data from millions of academic publications. This scraper collects citation relationships, self-citation flags, and optional bibliographic metadata (authors, titles, venues, publication dates) for any publication identified by DOI, PubMed ID, or OpenCitations Meta ID. Choose between citations mode (who cited this work) and references mode (what this work cites) to map research influence in either direction.

Researchers, bibliometric analysts, and data scientists use this actor to build citation networks, track research impact, identify self-citations, and analyze how knowledge flows between publications. Instead of querying the OpenCitations API manually and parsing responses, you get clean, structured data exported as JSON, CSV, or Excel. With metadata enabled, every record includes the citing and cited entity IDs, creation date, timespan, self-citation flags, plus the full title, authors, publication date, venue, and publisher.

🎯 Target Audience💡 Use Cases
Bibliometric analystsMap citation networks and measure impact
Academic researchersTrack who cites your publications
University administratorsEvaluate research impact for departments
Science policy makersAnalyze knowledge flow between institutions
Data scientistsBuild citation graph datasets for analysis
LibrariansEnrich catalog records with citation data

📋 What the OpenCitations Scraper does

  • 🔍 DOI-based search to find citations or references for any published work
  • 🆔 PMID support for biomedical publications indexed in PubMed
  • 📋 OMID support for OpenCitations internal identifier lookups
  • 🔄 Bidirectional search with citations (incoming) and references (outgoing) modes
  • 📊 Self-citation detection with flags for author and journal self-citations
  • 📝 Optional metadata including titles, authors, venues, and publication dates

The scraper queries the OpenCitations API with your identifier and search type, retrieves all matching citation relationships, and extracts structured data for each record. When metadata is enabled, it also fetches detailed bibliographic information for each citing or cited work. Results include unique citation identifiers (OCI), entity IDs, creation dates, timespans, self-citation flags, and full publication metadata.

💡 Why it matters: Manually collecting citation data from OpenCitations involves API queries, pagination, and metadata enrichment. This scraper handles everything automatically, delivering structured citation networks ready for analysis, visualization, or integration with other research tools.


🎬 Full Demo

🚧 Coming soon...


⚙️ Input

FieldTypeRequiredDescription
maxItemsintegerNoMax records to collect. Free: up to 10. Paid: up to 1,000,000
doistringNoDigital Object Identifier (e.g., 10.1016/j.jmb.2005.08.075)
pmidstringNoPubMed ID for biomedical publications
omidstringNoOpenCitations Meta Identifier (e.g., omid:br/06140242082)
searchTypestringNoSearch direction: citations (incoming) or references (outgoing)
includeMetadatabooleanNoFetch detailed metadata (title, authors, date) for each record

Example 1: Get citations for a DOI

{
"doi": "10.1016/j.jmb.2005.08.075",
"searchType": "citations",
"includeMetadata": true,
"maxItems": 50
}

Example 2: Get references from a PubMed article

{
"pmid": "16325459",
"searchType": "references",
"includeMetadata": true,
"maxItems": 100
}

⚠️ Good to Know: Provide one identifier (DOI, PMID, or OMID), not multiple. Enabling metadata makes the scraper slower but provides full bibliographic details for each citation. The default search type is "citations" (incoming citations).


📊 Output

🧾 Schema

EmojiFieldTypeDescription
📝ocistringUnique Open Citation Identifier
👤citingstringIdentifier of the citing entity
👤citedstringIdentifier of the cited entity
📅creationDatestringWhen the citation relationship was recorded
⏱️timespanstringTime between publication dates
📊journalSelfCitationbooleanWhether the citation is within the same journal
📊authorSelfCitationbooleanWhether the author cites their own work
📝titlestringPublication title (with metadata enabled)
👥authorsstringAuthor names (with metadata enabled)
📅publicationDatestringPublication date (with metadata enabled)
📖volumestringJournal volume
📄issuestringJournal issue
📍venuestringJournal or venue name
🏷️publicationTypestringType of publication
📄pagestringPage range
🏢publisherstringPublisher name
✏️editorstringEditor name
🆔workIdstringInternal work identifier
scrapedAtstringCollection timestamp
⚠️errorstringError message if processing failed

📦 Sample records


✨ Why choose this Actor

FeatureDetails
🔍 Three identifier typesSearch by DOI, PubMed ID, or OpenCitations Meta ID
🔄 Bidirectional searchFind incoming citations or outgoing references
📊 Self-citation detectionFlags for author and journal self-citations
📝 Optional metadataFull bibliographic details when enabled
🆓 Open dataAll OpenCitations data is freely available
📦 Flexible exportJSON, CSV, or Excel output
⚡ Automatic paginationHandles large citation networks automatically

📊 Map citation networks for any publication with up to 1,000,000 records per run, including self-citation detection and full metadata.


📈 How it compares to alternatives

FeatureThis ActorManual API QueriesGeneric Scrapers
DOI, PMID, and OMID supportManual
Self-citation detection
Optional metadata enrichmentManual
Bidirectional searchManual
Bulk collection (1M+ records)Manual
Structured JSON/CSV outputJSON onlyVaries
Scheduled runs

Get structured citation data at scale without writing API code or managing pagination.


🚀 How to use

  1. Create an Apify account - Sign up free with $5 credit
  2. Open the OpenCitations Scraper - Navigate to the actor page on Apify
  3. Enter a DOI, PMID, or OMID - Provide the identifier for the publication you want to analyze
  4. Choose search type and options - Select citations or references mode and enable metadata if needed
  5. Click Start - The actor collects citation relationships and delivers structured data

⏱️ A typical run with 50 citations completes in under 1 minute.


💼 Business use cases

📊 Bibliometric Analysis
  • Map citation networks for research impact assessment
  • Identify self-citations to calculate adjusted metrics
  • Track citation accumulation over time
  • Compare citation patterns across disciplines
🎓 Academic Research
  • Build citation graphs for literature reviews
  • Track who is citing your publications
  • Identify influential papers in your field
  • Analyze reference patterns in competitor research
🏛️ Research Administration
  • Evaluate faculty research impact for reviews
  • Track department-level citation metrics
  • Monitor publication influence across programs
  • Build reporting dashboards for stakeholders
📈 Data Science
  • Build citation graph datasets for network analysis
  • Train models on citation prediction tasks
  • Analyze knowledge flow between research fields
  • Create visualization datasets for research mapping


🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

🎨 Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

🤝 Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

🧪 Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

💰 How much does it cost?

Apify gives you $5 in free monthly credits on the Apify Free plan, enough to test OpenCitations Scraper and pull a real sample dataset. For ongoing usage:

  • Starter plan ($49/month) — Recommended for individuals running OpenCitations Scraper regularly. Includes higher concurrency and larger datasets.
  • Scale plan ($499/month) — Recommended for teams running OpenCitations Scraper at production scale.

Pay-Per-Event pricing means you only pay for what you actually use. Failed runs are never charged. See the Pricing tab on this Actor's page for exact event prices.

💡 Tips for using OpenCitations Scraper

  • Start with a small maxItems (3-10) to validate output format before running larger jobs.
  • Use Apify Schedules to run OpenCitations Scraper on a recurring basis and keep your dataset fresh.
  • Export via Integrations: Apify connects to Google Sheets, Airbyte, Make, Zapier, and direct webhooks — pipe your data anywhere.
  • Monitor with webhooks: trigger downstream workflows the moment a run finishes.
  • Re-run failed items: if any individual records error out, re-run with their inputs only. Failed events are not charged.

Yes. OpenCitations Scraper only collects publicly available data. Web scraping public data has been confirmed as legal by US courts (see hiQ Labs v. LinkedIn) and is widely used for research, market analysis, and business intelligence.

However, you are responsible for:

  • Respecting the source website's Terms of Service.
  • Complying with GDPR, CCPA, and other applicable data-protection laws when personal data is involved.
  • Not republishing copyrighted content without permission.

If you have specific compliance concerns, consult your legal team. See the Apify legal docs for more.

❓ Frequently Asked Questions

🔌 Automating OpenCitations Scraper

Integrate the OpenCitations Scraper into your workflow using the Apify API or client libraries.

Node.js:

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor("parseforge/open-citations-scraper").call({
doi: "10.1016/j.jmb.2005.08.075",
searchType: "citations",
includeMetadata: true,
maxItems: 100
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python:

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("parseforge/open-citations-scraper").call(run_input={
"doi": "10.1016/j.jmb.2005.08.075",
"searchType": "citations",
"includeMetadata": True,
"maxItems": 100
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(items)

Schedules: Set up recurring runs to monitor citation growth for your publications. Configure weekly or monthly schedules from the Apify Console to track new citations automatically.

🔌 Integrate with any app

  • 🔗 Make (Integromat) - Connect citation data to Google Sheets, Notion, or any of 1,500+ apps
  • 🔗 Zapier - Trigger workflows when new citations are detected
  • 🔗 Slack - Get notified when new citations appear for your publications
  • 🔗 Airbyte - Stream citation data into your data warehouse
  • 🔗 GitHub - Store citation datasets in repositories for version control
  • 🔗 Google Drive - Automatically save CSV exports to shared folders

ActorDescription
Crossref ScraperExtract DOI metadata for 155M+ research publications
PubMed Citation ScraperExtract publication metadata from PubMed for biomedical research
Open Library ScraperSearch and download book data from the Internet Archive
ROR ScraperCollect research organization data from ROR
US Census Bureau ScraperExtract demographic and economic data from the Census Bureau

💡 Pro Tip: Combine the OpenCitations Scraper with the Crossref Scraper to get both citation networks and full publication metadata for each cited work.


🆘 Need Help? Open our contact form and we will get back to you within 24 hours. We are happy to help with custom setups, integrations, or feature requests.


Disclaimer: This actor is not affiliated with, endorsed by, or connected to OpenCitations. It accesses publicly available data through the OpenCitations API. Use responsibly and in accordance with applicable terms of service.