CVF Papers Scraper avatar

CVF Papers Scraper

Pricing

Pay per event

Go to Apify Store
CVF Papers Scraper

CVF Papers Scraper

Scrape research papers from openaccess.thecvf.com (CVPR, ICCV, WACV)

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Categories

Share

Extract research papers, authors, PDFs, BibTeX citations, and abstracts from the Computer Vision Foundation Open Access repository โ€” covering CVPR, ICCV, and WACV.

๐Ÿ“– What does it do?

The CVF Papers Scraper extracts structured metadata from openaccess.thecvf.com, the official open-access repository of the Computer Vision Foundation. It covers papers from the three major CVF-sponsored conferences:

  • CVPR (Conference on Computer Vision and Pattern Recognition) โ€” annually since 2013, the largest CV conference
  • ICCV (International Conference on Computer Vision) โ€” biennially (odd years) since 2013
  • WACV (Winter Conference on Applications of Computer Vision) โ€” annually since 2020

What you get for each paper:

  • Full paper title
  • Authors list (as both an array and a formatted string)
  • Conference name and year
  • Direct PDF download URL
  • Supplemental materials URL (when available)
  • arXiv preprint URL (when available)
  • Complete BibTeX citation
  • Page numbers from the proceedings
  • Full abstract text (optional โ€” requires one extra request per paper)

All data is extracted directly from the static HTML โ€” no JavaScript rendering required, no login, no authentication.


๐Ÿ‘ฅ Who is it for?

๐ŸŽ“ Computer vision researchers and PhD students

Building a literature review on a CV topic? Tracking what got accepted at CVPR 2024? Use this actor to bulk-download paper metadata instead of clicking through thousands of entries manually. Filter by conference and year to get exactly the batch you need.

๐Ÿ“Š Scientometricians and bibliometrics analysts

Studying publication trends in computer vision, measuring author collaboration networks, tracking how research topics evolve across CVPR/ICCV/WACV โ€” start with structured, machine-readable paper data covering 10+ years of the field's most prestigious venues.

๐Ÿค– AI and ML practitioners building datasets

Creating fine-tuning corpora from CV abstracts, building citation graphs, or constructing benchmarks from paper lists โ€” this actor delivers clean JSON at scale with no manual effort.

๐Ÿข Research teams and enterprise R&D labs

Monitoring what competitors or academic collaborators are publishing, building internal research intelligence dashboards, or feeding paper data into RAG (retrieval-augmented generation) systems for literature QA.

๐Ÿ“š Academic librarians and information professionals

Maintaining curated databases of CV research, populating institutional repositories, or building subject guides โ€” all with properly formatted BibTeX citations and direct PDF links.

๐Ÿ› ๏ธ Developers building research tools

Creating paper recommendation engines, topic clustering tools, author disambiguation systems, or academic search interfaces โ€” this actor gives you the raw structured data to build on.


๐Ÿš€ Why use it?

  • Fast and cheap โ€” openaccess.thecvf.com is fully server-side rendered with no anti-bot measures. One HTTP request per conference fetches all paper listings. No browser needed, no proxy required.
  • Complete coverage โ€” CVPR 2013โ€“2025 (~100,000 papers total), ICCV 2013โ€“2025 (~40,000 papers), WACV 2020โ€“2026 (~15,000 papers)
  • BibTeX included โ€” Every paper's complete citation is available on the listing page, ready to paste into your reference manager
  • arXiv cross-linking โ€” Where available, the arXiv preprint URL is extracted so you can access unrestricted versions
  • Abstract fetching โ€” Enable includeAbstract to get full abstract text from each paper's detail page
  • Structured output โ€” Clean JSON with typed fields (year as integer, authors as array), ready for downstream processing

๐Ÿ“Š Data fields extracted

FieldTypeDescription
titlestringFull paper title
authorsstring[]Author names as an array
authorsStringstringAuthors joined as a single comma-separated string
conferencestringConference code (CVPR, ICCV, or WACV)
yearnumberConference year (e.g., 2024)
pagesstring | nullPage range in proceedings (e.g., "4864-4873")
paperUrlstringURL to the paper detail page on CVF open access
pdfUrlstring | nullDirect URL to the PDF file
suppUrlstring | nullURL to supplemental materials (PDF or ZIP)
arxivUrlstring | nullarXiv preprint URL (when listed)
bibtexstring | nullComplete BibTeX citation string
abstractstring | nullFull abstract text (only when includeAbstract: true)

๐Ÿ’ฐ Pricing

This scraper uses Pay-Per-Event (PPE) pricing โ€” you pay only for papers actually extracted, not for compute time.

What you pay forFREEBRONZESILVERGOLDPLATINUMDIAMOND
Run started (one-time)$0.005$0.005$0.005$0.005$0.005$0.005
Each paper extracted$0.00115$0.00100$0.00078$0.00060$0.00040$0.00028

Cost examples (at BRONZE $0.001/paper):

  • 100 papers (quick test or small workshop): ~$0.10
  • 1,000 papers (single conference track): ~$1.01
  • 5,000 papers (CVPR main track): ~$5.01
  • 15,000 papers (full CVPR 2024): ~$15.01

With abstract fetching (includeAbstract: true): each paper requires one additional HTTP request, but the PPE price stays the same โ€” only time and network overhead increase slightly.

Free plan: Apify's free tier includes $5 of monthly credit, enough for ~4,300 papers per month at BRONZE pricing with no credit card required.


๐Ÿ› ๏ธ How to use it

Step 1: Choose conferences and years

Configure which conferences and years to scrape. You can mix and match:

{
"conferences": ["CVPR", "ICCV"],
"years": [2023, 2024]
}

This would scrape CVPR 2023, CVPR 2024, ICCV 2023, and ICCV 2024 (4 batches total).

Step 2: Set a result limit

Use maxResults to control cost and run time. Start with a small number (e.g., 50) to test:

{
"conferences": ["WACV"],
"years": [2024],
"maxResults": 50
}

Set to a large value (e.g., 999999) for unlimited results.

Step 3: Optionally fetch abstracts

Enable includeAbstract to retrieve the full abstract from each paper's detail page:

{
"conferences": ["CVPR"],
"years": [2024],
"includeAbstract": true,
"maxResults": 100
}

Note: Abstract fetching is ~10ร— slower because it requires one HTTP request per paper.

Step 4: Run and download

Click Start and wait for completion. Download results as JSON, CSV, or Excel from the Dataset tab.


๐Ÿ“ฅ Input parameters

conferences โ€” Which conferences to scrape

Array of conference codes. Options: CVPR, ICCV, WACV. Default: ["CVPR"].

{ "conferences": ["CVPR", "ICCV", "WACV"] }

Pass "all" as a string shorthand to scrape all three conferences for the selected years.

Available years per conference:

  • CVPR: 2013โ€“2025 (annual)
  • ICCV: 2013, 2015, 2017, 2019, 2021, 2023, 2025 (biennial, odd years only)
  • WACV: 2020โ€“2026 (annual)

If you specify a year that isn't available for a conference (e.g., ICCV 2024), it will be skipped with a log message.

years โ€” Which years to scrape

Array of integer years. Default: [2024].

{ "years": [2022, 2023, 2024] }

maxResults โ€” Limit output size

Maximum number of papers to extract across all selected conferences and years. Default: 500.

{ "maxResults": 1000 }

Set to a large value (e.g., 999999) for unlimited. Papers are extracted in listing order (top to bottom on the CVF page).

includeAbstract โ€” Fetch abstract text

When true, the actor fetches each paper's detail page to extract the full abstract. Default: false.

{ "includeAbstract": true }

This makes the run slower (one extra HTTP request per paper) but enables full-text search, text analysis, and semantic similarity use cases.

maxRequestRetries โ€” Retry count

Number of retry attempts for failed HTTP requests. Default: 3. Range: 1โ€“10.

{ "maxRequestRetries": 5 }

๐Ÿ“ค Output examples

Paper without abstract

{
"title": "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery",
"authors": ["Yixuan Zhu", "Ao Li", "Yansong Tang", "Wenliang Zhao", "Jie Zhou", "Jiwen Lu"],
"authorsString": "Yixuan Zhu, Ao Li, Yansong Tang, Wenliang Zhao, Jie Zhou, Jiwen Lu",
"conference": "CVPR",
"year": 2024,
"pages": "1101-1110",
"paperUrl": "https://openaccess.thecvf.com/content/CVPR2024/html/Zhu_DPMesh_Exploiting_Diffusion_Prior_for_Occluded_Human_Mesh_Recovery_CVPR_2024_paper.html",
"pdfUrl": "https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_DPMesh_Exploiting_Diffusion_Prior_for_Occluded_Human_Mesh_Recovery_CVPR_2024_paper.pdf",
"suppUrl": "https://openaccess.thecvf.com/content/CVPR2024/supplemental/Zhu_DPMesh_Exploiting_Diffusion_CVPR_2024_supplemental.zip",
"arxivUrl": "http://arxiv.org/abs/2404.01424",
"bibtex": "@InProceedings{Zhu_2024_CVPR,\n author = {Zhu, Yixuan and Li, Ao and ...},\n title = {DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery},\n booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n month = {June},\n year = {2024},\n pages = {1101-1110}\n}",
"abstract": null
}

Paper with abstract

{
"title": "Seeing the World through Your Eyes",
"authors": ["Hadi Alzayer", "Kevin Zhang", "Brandon Feng", "Christopher A. Metzler", "Jia-Bin Huang"],
"authorsString": "Hadi Alzayer, Kevin Zhang, Brandon Feng, Christopher A. Metzler, Jia-Bin Huang",
"conference": "CVPR",
"year": 2024,
"pages": "4864-4873",
"paperUrl": "https://openaccess.thecvf.com/content/CVPR2024/html/Alzayer_Seeing_the_World_through_Your_Eyes_CVPR_2024_paper.html",
"pdfUrl": "https://openaccess.thecvf.com/content/CVPR2024/papers/Alzayer_Seeing_the_World_through_Your_Eyes_CVPR_2024_paper.pdf",
"suppUrl": null,
"arxivUrl": "http://arxiv.org/abs/2306.09348",
"bibtex": "@InProceedings{Alzayer_2024_CVPR,...}",
"abstract": "The reflections in the eyes contain information about the environment around the person, including the appearance of the illumination and objects in the room..."
}

๐Ÿ’ก Tips and tricks

  • Fast survey mode: Skip includeAbstract (default false) to scrape a full conference in under 60 seconds โ€” the listing page has all metadata except abstract text.
  • Abstract-enriched NLP: Enable includeAbstract: true when you need full text for topic modeling, semantic search, or LLM analysis. Expect ~10ร— longer run time for large batches.
  • Sampling a conference: Set maxResults: 50 to get a random sample โ€” great for testing your downstream pipeline before committing to a full 2,000+ paper run.
  • Multi-year longitudinal studies: Use years: [2019, 2020, 2021, 2022, 2023, 2024] with conferences: ["CVPR"] to get six years of papers in one run.
  • Use arXiv links for full text: Many papers have arxivUrl populated, giving you the preprint even without IEEE Xplore access.
  • Import BibTeX directly: The bibtex field is a complete, valid BibTeX entry โ€” paste directly into your .bib file or any reference manager.
  • Deduplication key: Use paperUrl as your unique identifier when combining results across multiple runs.
  • ICCV is biennial: ICCV only runs in odd years (2021, 2023, 2025). Specifying an even year returns 0 results; the actor skips it with a log message.

๐Ÿ”— Integrations

๐Ÿ“Š Google Sheets research tracker

  1. Run the actor with conferences: ["CVPR"], years: [2024], and maxResults: 500
  2. Open Google Sheets โ†’ Extensions โ†’ Apify (requires the Apify Sheets add-on)
  3. Import the dataset and map columns: title, authorsString, conference, year, arxivUrl, pdfUrl
  4. Result: a live spreadsheet of all selected papers, sortable and filterable

๐Ÿ“š Zotero / Mendeley BibTeX bulk import

  1. Run with conferences: ["CVPR", "ICCV"], years: [2024], includeAbstract: false
  2. Download the JSON dataset
  3. Extract BibTeX fields into a .bib file using Python: python3 -c "import json; [print(p['bibtex']) for p in json.load(open('papers.json')) if p['bibtex']]" > cvf2024.bib
  4. Import cvf2024.bib into Zotero or Mendeley โ€” all papers import with full structured metadata

๐Ÿค– LLM trend analysis with Claude or ChatGPT

  1. Run with includeAbstract: true on a focused batch (e.g., 100 papers from CVPR 2024)
  2. Feed the JSON to an LLM with the prompt: "Analyze these CVPR 2024 paper abstracts and identify the 10 dominant research themes with examples"
  3. Automate this pipeline using Apify's Claude or OpenAI integrations

๐Ÿ” Semantic search with Pinecone / Weaviate

  1. Scrape all ICCV 2023 papers with includeAbstract: true
  2. Embed abstracts using the OpenAI Embeddings API (or run the Apify OpenAI Embeddings actor)
  3. Push vectors to Pinecone with paper metadata โ€” instantly queryable: "find papers similar to NeRF"

๐Ÿ“ˆ Publication trend dashboard in Tableau / Power BI

  1. Schedule a monthly Apify run scraping the latest conference year
  2. Connect the Apify dataset via the REST API to your BI tool
  3. Build trend charts: papers per year, top authors, arXiv adoption rate, keywords in titles

๐Ÿ”Œ API usage

You can trigger this actor programmatically using the Apify API.

Python

import apify_client
client = apify_client.ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("automation-lab/cvf-papers-scraper").call(
run_input={
"conferences": ["CVPR"],
"years": [2024],
"maxResults": 100,
"includeAbstract": False
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], "-", item["authorsString"])

cURL

curl -X POST \
"https://api.apify.com/v2/acts/automation-lab~cvf-papers-scraper/runs" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"conferences": ["CVPR"],
"years": [2024],
"maxResults": 100
}'

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('automation-lab/cvf-papers-scraper').call({
conferences: ['CVPR'],
years: [2024],
maxResults: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Extracted ${items.length} papers`);

๐Ÿค– MCP (Model Context Protocol)

Use this actor as an MCP tool inside Claude, Cursor, VS Code, or any MCP-compatible AI assistant to fetch CVF papers on demand from natural language prompts.

Claude Code (terminal)

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/cvf-papers-scraper"

Claude Desktop / Cursor / VS Code

Add to your claude_desktop_config.json (or equivalent MCP config):

{
"mcpServers": {
"apify": {
"type": "http",
"url": "https://mcp.apify.com?tools=automation-lab/cvf-papers-scraper",
"headers": {
"Authorization": "Bearer YOUR_APIFY_TOKEN"
}
}
}
}

Example prompts for your AI assistant

Once connected, you can ask:

  • "Get the 50 most recent CVPR 2024 papers about 3D reconstruction"
  • "Scrape all ICCV 2023 papers and summarize the dominant research themes"
  • "Fetch WACV 2024 papers with abstracts โ€” I want to find work on medical imaging"
  • "Pull all CVPR papers from 2022 to 2024 and export their BibTeX citations for my literature review"

โš–๏ธ Legality and ethical use

openaccess.thecvf.com provides papers under open access as part of the Computer Vision Foundation's mission to advance research. The data extracted by this actor is:

  • Publicly available โ€” all papers are freely accessible without login or authentication
  • Non-commercial research data โ€” paper metadata (titles, authors, abstracts) is factual bibliographic information, not copyrighted content
  • Explicitly intended for distribution โ€” the CVF open access repository exists specifically to make this research accessible

Usage guidance:

  • Use the maxRequestRetries setting to avoid hammering the server with excessive retries
  • CVF's open access terms permit non-commercial research use of paper metadata
  • The full PDF content of papers is copyrighted by authors/IEEE โ€” downloading PDFs at scale may require separate permissions
  • This actor extracts metadata only by default; always check the CVF terms of service and your organization's policies before commercial use

We do not encourage scraping beyond your legitimate research needs.


โ“ FAQ

Q: Does this require a proxy? No. openaccess.thecvf.com has no anti-bot measures. The actor uses plain HTTP requests โ€” no residential proxy needed.

Q: How many papers are available total? Approximately 100,000 CVPR papers (2013โ€“2025), 40,000 ICCV papers, and 15,000 WACV papers โ€” around 155,000 total across all supported conferences.

Q: Can I scrape ECCV (European Conference on Computer Vision)? ECCV is not hosted on openaccess.thecvf.com โ€” it's organized separately. This actor covers only CVF-sponsored conferences (CVPR, ICCV, WACV).

Q: Why does ICCV only have odd years? ICCV is a biennial conference held in odd-numbered years. CVPR is annual. WACV became annual starting in 2020.

Q: What happens if I specify a year that isn't available? The actor logs a warning and skips that conference/year combination. Other valid combinations continue normally.

Q: Is abstract fetching significantly more expensive? The PPE price per paper is the same whether or not you fetch abstracts. But with includeAbstract: true, the run takes longer because of extra HTTP requests โ€” roughly 10ร— longer for large batches.

Q: Can I get papers from a specific research topic or keyword? CVF open access doesn't have a filtering API โ€” papers are listed alphabetically. Use maxResults to cap output, then filter locally by title or abstract content.

Q: The actor seems to have duplicate papers โ€” why? This shouldn't happen in normal operation. Each paper appears once per listing page. If you scrape the same conference/year in multiple runs, you'll get duplicates across datasets โ€” deduplicate by paperUrl.


  • ACL Anthology Scraper โ€” Extract NLP and computational linguistics papers from aclanthology.org (ACL, EMNLP, NAACL, EACL, COLING, and 50+ workshops)
  • ArXiv Scraper โ€” Search and download paper metadata from arXiv.org across all subject areas โ€” ideal for tracking preprints that correspond to CVF papers
  • Semantic Scholar Scraper โ€” Retrieve citation counts, author profiles, and paper abstracts from Semantic Scholar's AI-powered research database

๐Ÿ› Issues and feedback

Found a bug or have a feature request? Open an issue on GitHub or contact us through the Apify support channel. We actively maintain this actor and welcome pull requests.

Common issues:

  • Missing papers for a specific year: Verify the year is in the supported range for that conference
  • Slow run with abstracts: Abstract mode fetches one extra page per paper โ€” expected behavior
  • arXiv URL missing: Not all CVF papers have arXiv preprints; those fields will be null