Pricing

Pay per event

Go to Apify Store

CVF Papers Scraper

Try for free

Scrape research papers from openaccess.thecvf.com (CVPR, ICCV, WACV)

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

📖 What does it do?

The CVF Papers Scraper extracts structured metadata from openaccess.thecvf.com, the official open-access repository of the Computer Vision Foundation. It covers papers from the three major CVF-sponsored conferences:

CVPR (Conference on Computer Vision and Pattern Recognition) — annually since 2013, the largest CV conference
ICCV (International Conference on Computer Vision) — biennially (odd years) since 2013
WACV (Winter Conference on Applications of Computer Vision) — annually since 2020

What you get for each paper:

Full paper title
Authors list (as both an array and a formatted string)
Conference name and year
Direct PDF download URL
Supplemental materials URL (when available)
arXiv preprint URL (when available)
Complete BibTeX citation
Page numbers from the proceedings
Full abstract text (optional — requires one extra request per paper)

All data is extracted directly from the static HTML — no JavaScript rendering required, no login, no authentication.

👥 Who is it for?

🎓 Computer vision researchers and PhD students

Building a literature review on a CV topic? Tracking what got accepted at CVPR 2024? Use this actor to bulk-download paper metadata instead of clicking through thousands of entries manually. Filter by conference and year to get exactly the batch you need.

📊 Scientometricians and bibliometrics analysts

Studying publication trends in computer vision, measuring author collaboration networks, tracking how research topics evolve across CVPR/ICCV/WACV — start with structured, machine-readable paper data covering 10+ years of the field's most prestigious venues.

🤖 AI and ML practitioners building datasets

Creating fine-tuning corpora from CV abstracts, building citation graphs, or constructing benchmarks from paper lists — this actor delivers clean JSON at scale with no manual effort.

🏢 Research teams and enterprise R&D labs

Monitoring what competitors or academic collaborators are publishing, building internal research intelligence dashboards, or feeding paper data into RAG (retrieval-augmented generation) systems for literature QA.

📚 Academic librarians and information professionals

Maintaining curated databases of CV research, populating institutional repositories, or building subject guides — all with properly formatted BibTeX citations and direct PDF links.

🛠️ Developers building research tools

Creating paper recommendation engines, topic clustering tools, author disambiguation systems, or academic search interfaces — this actor gives you the raw structured data to build on.

🚀 Why use it?

Fast and cheap — openaccess.thecvf.com is fully server-side rendered with no anti-bot measures. One HTTP request per conference fetches all paper listings. No browser needed, no proxy required.
Complete coverage — CVPR 2013–2025 (~100,000 papers total), ICCV 2013–2025 (~40,000 papers), WACV 2020–2026 (~15,000 papers)
BibTeX included — Every paper's complete citation is available on the listing page, ready to paste into your reference manager
arXiv cross-linking — Where available, the arXiv preprint URL is extracted so you can access unrestricted versions
Abstract fetching — Enable includeAbstract to get full abstract text from each paper's detail page
Structured output — Clean JSON with typed fields (year as integer, authors as array), ready for downstream processing

📊 Data fields extracted

Field	Type	Description
`title`	string	Full paper title
`authors`	string[]	Author names as an array
`authorsString`	string	Authors joined as a single comma-separated string
`conference`	string	Conference code (`CVPR`, `ICCV`, or `WACV`)
`year`	number	Conference year (e.g., `2024`)
`pages`	string \| null	Page range in proceedings (e.g., `"4864-4873"`)
`paperUrl`	string	URL to the paper detail page on CVF open access
`pdfUrl`	string \| null	Direct URL to the PDF file
`suppUrl`	string \| null	URL to supplemental materials (PDF or ZIP)
`arxivUrl`	string \| null	arXiv preprint URL (when listed)
`bibtex`	string \| null	Complete BibTeX citation string
`abstract`	string \| null	Full abstract text (only when `includeAbstract: true`)

💰 Pricing

This scraper uses Pay-Per-Event (PPE) pricing — you pay only for papers actually extracted, not for compute time.

What you pay for	FREE	BRONZE	SILVER	GOLD	PLATINUM	DIAMOND
Run started (one-time)	$0.005	$0.005	$0.005	$0.005	$0.005	$0.005
Each paper extracted	$0.00115	$0.00100	$0.00078	$0.00060	$0.00040	$0.00028

Cost examples (at BRONZE $0.001/paper):

100 papers (quick test or small workshop): ~$0.10
1,000 papers (single conference track): ~$1.01
5,000 papers (CVPR main track): ~$5.01
15,000 papers (full CVPR 2024): ~$15.01

With abstract fetching (includeAbstract: true): each paper requires one additional HTTP request, but the PPE price stays the same — only time and network overhead increase slightly.

Free plan: Apify's free tier includes $5 of monthly credit, enough for ~4,300 papers per month at BRONZE pricing with no credit card required.

🛠️ How to use it

Step 1: Choose conferences and years

Configure which conferences and years to scrape. You can mix and match:

{
    "conferences": ["CVPR", "ICCV"],
    "years": [2023, 2024]
}

This would scrape CVPR 2023, CVPR 2024, ICCV 2023, and ICCV 2024 (4 batches total).

Step 2: Set a result limit

Use maxResults to control cost and run time. Start with a small number (e.g., 50) to test:

{
    "conferences": ["WACV"],
    "years": [2024],
    "maxResults": 50
}

Set to a large value (e.g., 999999) for unlimited results.

Step 3: Optionally fetch abstracts

Enable includeAbstract to retrieve the full abstract from each paper's detail page:

{
    "conferences": ["CVPR"],
    "years": [2024],
    "includeAbstract": true,
    "maxResults": 100
}

Note: Abstract fetching is ~10× slower because it requires one HTTP request per paper.

Step 4: Run and download

Click Start and wait for completion. Download results as JSON, CSV, or Excel from the Dataset tab.

📥 Input parameters

`conferences` — Which conferences to scrape

Array of conference codes. Options: CVPR, ICCV, WACV. Default: ["CVPR"].

{ "conferences": ["CVPR", "ICCV", "WACV"] }

Pass "all" as a string shorthand to scrape all three conferences for the selected years.

Available years per conference:

CVPR: 2013–2025 (annual)
ICCV: 2013, 2015, 2017, 2019, 2021, 2023, 2025 (biennial, odd years only)
WACV: 2020–2026 (annual)

If you specify a year that isn't available for a conference (e.g., ICCV 2024), it will be skipped with a log message.

`years` — Which years to scrape

Array of integer years. Default: [2024].

{ "years": [2022, 2023, 2024] }

`maxResults` — Limit output size

Maximum number of papers to extract across all selected conferences and years. Default: 500.

{ "maxResults": 1000 }

Set to a large value (e.g., 999999) for unlimited. Papers are extracted in listing order (top to bottom on the CVF page).

`includeAbstract` — Fetch abstract text

When true, the actor fetches each paper's detail page to extract the full abstract. Default: false.

{ "includeAbstract": true }

This makes the run slower (one extra HTTP request per paper) but enables full-text search, text analysis, and semantic similarity use cases.

`maxRequestRetries` — Retry count

Number of retry attempts for failed HTTP requests. Default: 3. Range: 1–10.

{ "maxRequestRetries": 5 }

📤 Output examples

Paper without abstract

{
    "title": "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery",
    "authors": ["Yixuan Zhu", "Ao Li", "Yansong Tang", "Wenliang Zhao", "Jie Zhou", "Jiwen Lu"],
    "authorsString": "Yixuan Zhu, Ao Li, Yansong Tang, Wenliang Zhao, Jie Zhou, Jiwen Lu",
    "conference": "CVPR",
    "year": 2024,
    "pages": "1101-1110",
    "paperUrl": "https://openaccess.thecvf.com/content/CVPR2024/html/Zhu_DPMesh_Exploiting_Diffusion_Prior_for_Occluded_Human_Mesh_Recovery_CVPR_2024_paper.html",
    "pdfUrl": "https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_DPMesh_Exploiting_Diffusion_Prior_for_Occluded_Human_Mesh_Recovery_CVPR_2024_paper.pdf",
    "suppUrl": "https://openaccess.thecvf.com/content/CVPR2024/supplemental/Zhu_DPMesh_Exploiting_Diffusion_CVPR_2024_supplemental.zip",
    "arxivUrl": "http://arxiv.org/abs/2404.01424",
    "bibtex": "@InProceedings{Zhu_2024_CVPR,\n    author    = {Zhu, Yixuan and Li, Ao and ...},\n    title     = {DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    month     = {June},\n    year      = {2024},\n    pages     = {1101-1110}\n}",
    "abstract": null
}

Paper with abstract

{
    "title": "Seeing the World through Your Eyes",
    "authors": ["Hadi Alzayer", "Kevin Zhang", "Brandon Feng", "Christopher A. Metzler", "Jia-Bin Huang"],
    "authorsString": "Hadi Alzayer, Kevin Zhang, Brandon Feng, Christopher A. Metzler, Jia-Bin Huang",
    "conference": "CVPR",
    "year": 2024,
    "pages": "4864-4873",
    "paperUrl": "https://openaccess.thecvf.com/content/CVPR2024/html/Alzayer_Seeing_the_World_through_Your_Eyes_CVPR_2024_paper.html",
    "pdfUrl": "https://openaccess.thecvf.com/content/CVPR2024/papers/Alzayer_Seeing_the_World_through_Your_Eyes_CVPR_2024_paper.pdf",
    "suppUrl": null,
    "arxivUrl": "http://arxiv.org/abs/2306.09348",
    "bibtex": "@InProceedings{Alzayer_2024_CVPR,...}",
    "abstract": "The reflections in the eyes contain information about the environment around the person, including the appearance of the illumination and objects in the room..."
}

💡 Tips and tricks

Fast survey mode: Skip includeAbstract (default false) to scrape a full conference in under 60 seconds — the listing page has all metadata except abstract text.
Abstract-enriched NLP: Enable includeAbstract: true when you need full text for topic modeling, semantic search, or LLM analysis. Expect ~10× longer run time for large batches.
Sampling a conference: Set maxResults: 50 to get a random sample — great for testing your downstream pipeline before committing to a full 2,000+ paper run.
Multi-year longitudinal studies: Use years: [2019, 2020, 2021, 2022, 2023, 2024] with conferences: ["CVPR"] to get six years of papers in one run.
Use arXiv links for full text: Many papers have arxivUrl populated, giving you the preprint even without IEEE Xplore access.
Import BibTeX directly: The bibtex field is a complete, valid BibTeX entry — paste directly into your .bib file or any reference manager.
Deduplication key: Use paperUrl as your unique identifier when combining results across multiple runs.
ICCV is biennial: ICCV only runs in odd years (2021, 2023, 2025). Specifying an even year returns 0 results; the actor skips it with a log message.

🔗 Integrations

📊 Google Sheets research tracker

Run the actor with conferences: ["CVPR"], years: [2024], and maxResults: 500
Open Google Sheets → Extensions → Apify (requires the Apify Sheets add-on)
Import the dataset and map columns: title, authorsString, conference, year, arxivUrl, pdfUrl
Result: a live spreadsheet of all selected papers, sortable and filterable

📚 Zotero / Mendeley BibTeX bulk import

Run with conferences: ["CVPR", "ICCV"], years: [2024], includeAbstract: false
Download the JSON dataset
Extract BibTeX fields into a .bib file using Python: python3 -c "import json; [print(p['bibtex']) for p in json.load(open('papers.json')) if p['bibtex']]" > cvf2024.bib
Import cvf2024.bib into Zotero or Mendeley — all papers import with full structured metadata

🤖 LLM trend analysis with Claude or ChatGPT

Run with includeAbstract: true on a focused batch (e.g., 100 papers from CVPR 2024)
Feed the JSON to an LLM with the prompt: "Analyze these CVPR 2024 paper abstracts and identify the 10 dominant research themes with examples"
Automate this pipeline using Apify's Claude or OpenAI integrations

🔍 Semantic search with Pinecone / Weaviate

Scrape all ICCV 2023 papers with includeAbstract: true
Embed abstracts using the OpenAI Embeddings API (or run the Apify OpenAI Embeddings actor)
Push vectors to Pinecone with paper metadata — instantly queryable: "find papers similar to NeRF"

📈 Publication trend dashboard in Tableau / Power BI

Schedule a monthly Apify run scraping the latest conference year
Connect the Apify dataset via the REST API to your BI tool
Build trend charts: papers per year, top authors, arXiv adoption rate, keywords in titles

🔌 API usage

You can trigger this actor programmatically using the Apify API.

Python

import apify_client

client = apify_client.ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("automation-lab/cvf-papers-scraper").call(
    run_input={
        "conferences": ["CVPR"],
        "years": [2024],
        "maxResults": 100,
        "includeAbstract": False
    }
)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"], "-", item["authorsString"])

cURL

curl -X POST \
  "https://api.apify.com/v2/acts/automation-lab~cvf-papers-scraper/runs" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "conferences": ["CVPR"],
    "years": [2024],
    "maxResults": 100
  }'

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('automation-lab/cvf-papers-scraper').call({
    conferences: ['CVPR'],
    years: [2024],
    maxResults: 100,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Extracted ${items.length} papers`);

🤖 MCP (Model Context Protocol)

Use this actor as an MCP tool inside Claude, Cursor, VS Code, or any MCP-compatible AI assistant to fetch CVF papers on demand from natural language prompts.

Claude Code (terminal)

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/cvf-papers-scraper"

Claude Desktop / Cursor / VS Code

Add to your claude_desktop_config.json (or equivalent MCP config):

{
    "mcpServers": {
        "apify": {
            "type": "http",
            "url": "https://mcp.apify.com?tools=automation-lab/cvf-papers-scraper",
            "headers": {
                "Authorization": "Bearer YOUR_APIFY_TOKEN"
            }
        }
    }
}

Example prompts for your AI assistant

Once connected, you can ask:

"Get the 50 most recent CVPR 2024 papers about 3D reconstruction"
"Scrape all ICCV 2023 papers and summarize the dominant research themes"
"Fetch WACV 2024 papers with abstracts — I want to find work on medical imaging"
"Pull all CVPR papers from 2022 to 2024 and export their BibTeX citations for my literature review"

⚖️ Legality and ethical use

openaccess.thecvf.com provides papers under open access as part of the Computer Vision Foundation's mission to advance research. The data extracted by this actor is:

Publicly available — all papers are freely accessible without login or authentication
Non-commercial research data — paper metadata (titles, authors, abstracts) is factual bibliographic information, not copyrighted content
Explicitly intended for distribution — the CVF open access repository exists specifically to make this research accessible

Usage guidance:

Use the maxRequestRetries setting to avoid hammering the server with excessive retries
CVF's open access terms permit non-commercial research use of paper metadata
The full PDF content of papers is copyrighted by authors/IEEE — downloading PDFs at scale may require separate permissions
This actor extracts metadata only by default; always check the CVF terms of service and your organization's policies before commercial use

We do not encourage scraping beyond your legitimate research needs.

❓ FAQ

Q: Does this require a proxy? No. openaccess.thecvf.com has no anti-bot measures. The actor uses plain HTTP requests — no residential proxy needed.

Q: How many papers are available total? Approximately 100,000 CVPR papers (2013–2025), 40,000 ICCV papers, and 15,000 WACV papers — around 155,000 total across all supported conferences.

Q: Can I scrape ECCV (European Conference on Computer Vision)? ECCV is not hosted on openaccess.thecvf.com — it's organized separately. This actor covers only CVF-sponsored conferences (CVPR, ICCV, WACV).

Q: Why does ICCV only have odd years? ICCV is a biennial conference held in odd-numbered years. CVPR is annual. WACV became annual starting in 2020.

Q: What happens if I specify a year that isn't available? The actor logs a warning and skips that conference/year combination. Other valid combinations continue normally.

Q: Is abstract fetching significantly more expensive? The PPE price per paper is the same whether or not you fetch abstracts. But with includeAbstract: true, the run takes longer because of extra HTTP requests — roughly 10× longer for large batches.

Q: Can I get papers from a specific research topic or keyword? CVF open access doesn't have a filtering API — papers are listed alphabetically. Use maxResults to cap output, then filter locally by title or abstract content.

Q: The actor seems to have duplicate papers — why? This shouldn't happen in normal operation. Each paper appears once per listing page. If you scrape the same conference/year in multiple runs, you'll get duplicates across datasets — deduplicate by paperUrl.

ACL Anthology Scraper — Extract NLP and computational linguistics papers from aclanthology.org (ACL, EMNLP, NAACL, EACL, COLING, and 50+ workshops)
ArXiv Scraper — Search and download paper metadata from arXiv.org across all subject areas — ideal for tracking preprints that correspond to CVF papers
Semantic Scholar Scraper — Retrieve citation counts, author profiles, and paper abstracts from Semantic Scholar's AI-powered research database

🐛 Issues and feedback

Found a bug or have a feature request? Open an issue on GitHub or contact us through the Apify support channel. We actively maintain this actor and welcome pull requests.

Common issues:

Missing papers for a specific year: Verify the year is in the supported range for that conference
Slow run with abstracts: Abstract mode fetches one extra page per paper — expected behavior
arXiv URL missing: Not all CVF papers have arXiv preprints; those fields will be null

Hugging Face Papers Scraper

parseforge/huggingface-papers-scraper

Scrape AI and machine learning research papers from Hugging Face Papers. Get titles, abstracts, authors with affiliations, upvotes, publication dates, ArXiv IDs, and community discussion counts. Search by keyword or browse daily papers.

ParseForge

HuggingFace Daily Papers Scraper

tzmyk/huggingface-daily-papers-scraper

Scrapes AI/ML research papers from HuggingFace Daily Papers (huggingface.co/papers). Extracts title, authors, abstract, GitHub repo, star count, upvotes, AI summary, and keywords.

tzmyk

ArXiv Papers Scraper

leftwinglautus/arxiv-papers-scraper

Search and scrape academic papers from the arXiv API by keyword, category, or author.

Moeeze Hassan

arXiv Papers Scraper Pro — Research Papers, Authors, Citations

diverse_venture/arxiv-papers-scraper

Search and scrape arXiv research papers. Returns titles, abstracts, authors, categories, DOIs, and PDF download links. Filter by keywords (cat:cs.LG, all:transformer, au:author_name). Up to 500 papers per run. No auth required. Ideal for AI researchers and academic data mining.

Chak Man Fung

Papers with Code Scraper

crawlerbros/papers-with-code-scraper

Scrape Papers with Code like search ML papers, fetch paper details with repos and results, browse ML tasks and leaderboards, search datasets, and find ML methods.

Crawler Bros

Ai-ML-scraper

labrat011/ai-ml-scraper

Search AI/ML models, research papers, and trending papers from HuggingFace Hub and arXiv. No API key required.

mick_

AI Papers Tracker (arXiv + PWC)

ianymu/ai-papers-tracker

Track new AI / agent / LLM research papers from arXiv + Papers With Code, filterable by keywords. Ranked by trending score (recency + match + category + code attached). Daily refresh for researchers and operators.

Yanlong Mu

Free Google Scholar Scraper — Papers + Citations

s-r/free-google-scholar-scraper

arXiv Papers Scraper

resounding_diplomacy/arxiv-papers-scraper

Scrape academic papers from arXiv by category, keyword, or author. Extract titles, authors, abstracts, PDF URLs, DOIs, categories, and more. Perfect for AI/ML research datasets.

alars num

Semantic Scholar Academic Paper Scraper

cloud9_ai/semantic-scholar-scraper

Search and extract academic papers, citations, and authors from Semantic Scholar. 200M+ papers with citation graphs and impact metrics. Search papers, get detailed paper info, or find researchers. API key optional. For research and AI.

cloud9