# CVF Papers Scraper (`automation-lab/cvf-papers-scraper`) Actor

Scrape research papers from openaccess.thecvf.com (CVPR, ICCV, WACV)

- **URL**: https://apify.com/automation-lab/cvf-papers-scraper.md
- **Developed by:** [Stas Persiianenko](https://apify.com/automation-lab) (community)
- **Categories:** AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## CVF Papers Scraper

> Extract research papers, authors, PDFs, BibTeX citations, and abstracts from the Computer Vision Foundation Open Access repository — covering CVPR, ICCV, and WACV.

### 📖 What does it do?

The **CVF Papers Scraper** extracts structured metadata from [openaccess.thecvf.com](https://openaccess.thecvf.com), the official open-access repository of the Computer Vision Foundation. It covers papers from the three major CVF-sponsored conferences:

- **CVPR** (Conference on Computer Vision and Pattern Recognition) — annually since 2013, the largest CV conference
- **ICCV** (International Conference on Computer Vision) — biennially (odd years) since 2013
- **WACV** (Winter Conference on Applications of Computer Vision) — annually since 2020

**What you get for each paper:**
- Full paper title
- Authors list (as both an array and a formatted string)
- Conference name and year
- Direct PDF download URL
- Supplemental materials URL (when available)
- arXiv preprint URL (when available)
- Complete BibTeX citation
- Page numbers from the proceedings
- Full abstract text (optional — requires one extra request per paper)

All data is extracted directly from the static HTML — no JavaScript rendering required, no login, no authentication.

---

### 👥 Who is it for?

#### 🎓 Computer vision researchers and PhD students
Building a literature review on a CV topic? Tracking what got accepted at CVPR 2024? Use this actor to bulk-download paper metadata instead of clicking through thousands of entries manually. Filter by conference and year to get exactly the batch you need.

#### 📊 Scientometricians and bibliometrics analysts
Studying publication trends in computer vision, measuring author collaboration networks, tracking how research topics evolve across CVPR/ICCV/WACV — start with structured, machine-readable paper data covering 10+ years of the field's most prestigious venues.

#### 🤖 AI and ML practitioners building datasets
Creating fine-tuning corpora from CV abstracts, building citation graphs, or constructing benchmarks from paper lists — this actor delivers clean JSON at scale with no manual effort.

#### 🏢 Research teams and enterprise R&D labs
Monitoring what competitors or academic collaborators are publishing, building internal research intelligence dashboards, or feeding paper data into RAG (retrieval-augmented generation) systems for literature QA.

#### 📚 Academic librarians and information professionals
Maintaining curated databases of CV research, populating institutional repositories, or building subject guides — all with properly formatted BibTeX citations and direct PDF links.

#### 🛠️ Developers building research tools
Creating paper recommendation engines, topic clustering tools, author disambiguation systems, or academic search interfaces — this actor gives you the raw structured data to build on.

---

### 🚀 Why use it?

- **Fast and cheap** — openaccess.thecvf.com is fully server-side rendered with no anti-bot measures. One HTTP request per conference fetches all paper listings. No browser needed, no proxy required.
- **Complete coverage** — CVPR 2013–2025 (~100,000 papers total), ICCV 2013–2025 (~40,000 papers), WACV 2020–2026 (~15,000 papers)
- **BibTeX included** — Every paper's complete citation is available on the listing page, ready to paste into your reference manager
- **arXiv cross-linking** — Where available, the arXiv preprint URL is extracted so you can access unrestricted versions
- **Abstract fetching** — Enable `includeAbstract` to get full abstract text from each paper's detail page
- **Structured output** — Clean JSON with typed fields (year as integer, authors as array), ready for downstream processing

---

### 📊 Data fields extracted

| Field | Type | Description |
|-------|------|-------------|
| `title` | string | Full paper title |
| `authors` | string[] | Author names as an array |
| `authorsString` | string | Authors joined as a single comma-separated string |
| `conference` | string | Conference code (`CVPR`, `ICCV`, or `WACV`) |
| `year` | number | Conference year (e.g., `2024`) |
| `pages` | string \| null | Page range in proceedings (e.g., `"4864-4873"`) |
| `paperUrl` | string | URL to the paper detail page on CVF open access |
| `pdfUrl` | string \| null | Direct URL to the PDF file |
| `suppUrl` | string \| null | URL to supplemental materials (PDF or ZIP) |
| `arxivUrl` | string \| null | arXiv preprint URL (when listed) |
| `bibtex` | string \| null | Complete BibTeX citation string |
| `abstract` | string \| null | Full abstract text (only when `includeAbstract: true`) |

---

### 💰 Pricing

This scraper uses **Pay-Per-Event (PPE) pricing** — you pay only for papers actually extracted, not for compute time.

| What you pay for | FREE | BRONZE | SILVER | GOLD | PLATINUM | DIAMOND |
|-----------------|------|--------|--------|------|----------|---------|
| Run started (one-time) | $0.005 | $0.005 | $0.005 | $0.005 | $0.005 | $0.005 |
| Each paper extracted | $0.00115 | $0.00100 | $0.00078 | $0.00060 | $0.00040 | $0.00028 |

**Cost examples (at BRONZE $0.001/paper):**
- 100 papers (quick test or small workshop): ~$0.10
- 1,000 papers (single conference track): ~$1.01
- 5,000 papers (CVPR main track): ~$5.01
- 15,000 papers (full CVPR 2024): ~$15.01

**With abstract fetching** (`includeAbstract: true`): each paper requires one additional HTTP request, but the PPE price stays the same — only time and network overhead increase slightly.

**Free plan:** Apify's free tier includes $5 of monthly credit, enough for ~4,300 papers per month at BRONZE pricing with no credit card required.

---

### 🛠️ How to use it

#### Step 1: Choose conferences and years

Configure which conferences and years to scrape. You can mix and match:

```json
{
    "conferences": ["CVPR", "ICCV"],
    "years": [2023, 2024]
}
````

This would scrape CVPR 2023, CVPR 2024, ICCV 2023, and ICCV 2024 (4 batches total).

#### Step 2: Set a result limit

Use `maxResults` to control cost and run time. Start with a small number (e.g., 50) to test:

```json
{
    "conferences": ["WACV"],
    "years": [2024],
    "maxResults": 50
}
```

Set to a large value (e.g., `999999`) for unlimited results.

#### Step 3: Optionally fetch abstracts

Enable `includeAbstract` to retrieve the full abstract from each paper's detail page:

```json
{
    "conferences": ["CVPR"],
    "years": [2024],
    "includeAbstract": true,
    "maxResults": 100
}
```

**Note:** Abstract fetching is ~10× slower because it requires one HTTP request per paper.

#### Step 4: Run and download

Click **Start** and wait for completion. Download results as JSON, CSV, or Excel from the **Dataset** tab.

***

### 📥 Input parameters

#### `conferences` — Which conferences to scrape

Array of conference codes. Options: `CVPR`, `ICCV`, `WACV`. Default: `["CVPR"]`.

```json
{ "conferences": ["CVPR", "ICCV", "WACV"] }
```

Pass `"all"` as a string shorthand to scrape all three conferences for the selected years.

**Available years per conference:**

- CVPR: 2013–2025 (annual)
- ICCV: 2013, 2015, 2017, 2019, 2021, 2023, 2025 (biennial, odd years only)
- WACV: 2020–2026 (annual)

If you specify a year that isn't available for a conference (e.g., ICCV 2024), it will be skipped with a log message.

#### `years` — Which years to scrape

Array of integer years. Default: `[2024]`.

```json
{ "years": [2022, 2023, 2024] }
```

#### `maxResults` — Limit output size

Maximum number of papers to extract across all selected conferences and years. Default: `500`.

```json
{ "maxResults": 1000 }
```

Set to a large value (e.g., `999999`) for unlimited. Papers are extracted in listing order (top to bottom on the CVF page).

#### `includeAbstract` — Fetch abstract text

When `true`, the actor fetches each paper's detail page to extract the full abstract. Default: `false`.

```json
{ "includeAbstract": true }
```

This makes the run slower (one extra HTTP request per paper) but enables full-text search, text analysis, and semantic similarity use cases.

#### `maxRequestRetries` — Retry count

Number of retry attempts for failed HTTP requests. Default: `3`. Range: 1–10.

```json
{ "maxRequestRetries": 5 }
```

***

### 📤 Output examples

#### Paper without abstract

```json
{
    "title": "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery",
    "authors": ["Yixuan Zhu", "Ao Li", "Yansong Tang", "Wenliang Zhao", "Jie Zhou", "Jiwen Lu"],
    "authorsString": "Yixuan Zhu, Ao Li, Yansong Tang, Wenliang Zhao, Jie Zhou, Jiwen Lu",
    "conference": "CVPR",
    "year": 2024,
    "pages": "1101-1110",
    "paperUrl": "https://openaccess.thecvf.com/content/CVPR2024/html/Zhu_DPMesh_Exploiting_Diffusion_Prior_for_Occluded_Human_Mesh_Recovery_CVPR_2024_paper.html",
    "pdfUrl": "https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_DPMesh_Exploiting_Diffusion_Prior_for_Occluded_Human_Mesh_Recovery_CVPR_2024_paper.pdf",
    "suppUrl": "https://openaccess.thecvf.com/content/CVPR2024/supplemental/Zhu_DPMesh_Exploiting_Diffusion_CVPR_2024_supplemental.zip",
    "arxivUrl": "http://arxiv.org/abs/2404.01424",
    "bibtex": "@InProceedings{Zhu_2024_CVPR,\n    author    = {Zhu, Yixuan and Li, Ao and ...},\n    title     = {DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    month     = {June},\n    year      = {2024},\n    pages     = {1101-1110}\n}",
    "abstract": null
}
```

#### Paper with abstract

```json
{
    "title": "Seeing the World through Your Eyes",
    "authors": ["Hadi Alzayer", "Kevin Zhang", "Brandon Feng", "Christopher A. Metzler", "Jia-Bin Huang"],
    "authorsString": "Hadi Alzayer, Kevin Zhang, Brandon Feng, Christopher A. Metzler, Jia-Bin Huang",
    "conference": "CVPR",
    "year": 2024,
    "pages": "4864-4873",
    "paperUrl": "https://openaccess.thecvf.com/content/CVPR2024/html/Alzayer_Seeing_the_World_through_Your_Eyes_CVPR_2024_paper.html",
    "pdfUrl": "https://openaccess.thecvf.com/content/CVPR2024/papers/Alzayer_Seeing_the_World_through_Your_Eyes_CVPR_2024_paper.pdf",
    "suppUrl": null,
    "arxivUrl": "http://arxiv.org/abs/2306.09348",
    "bibtex": "@InProceedings{Alzayer_2024_CVPR,...}",
    "abstract": "The reflections in the eyes contain information about the environment around the person, including the appearance of the illumination and objects in the room..."
}
```

***

### 💡 Tips and tricks

- **Fast survey mode**: Skip `includeAbstract` (default `false`) to scrape a full conference in under 60 seconds — the listing page has all metadata except abstract text.
- **Abstract-enriched NLP**: Enable `includeAbstract: true` when you need full text for topic modeling, semantic search, or LLM analysis. Expect ~10× longer run time for large batches.
- **Sampling a conference**: Set `maxResults: 50` to get a random sample — great for testing your downstream pipeline before committing to a full 2,000+ paper run.
- **Multi-year longitudinal studies**: Use `years: [2019, 2020, 2021, 2022, 2023, 2024]` with `conferences: ["CVPR"]` to get six years of papers in one run.
- **Use arXiv links for full text**: Many papers have `arxivUrl` populated, giving you the preprint even without IEEE Xplore access.
- **Import BibTeX directly**: The `bibtex` field is a complete, valid BibTeX entry — paste directly into your `.bib` file or any reference manager.
- **Deduplication key**: Use `paperUrl` as your unique identifier when combining results across multiple runs.
- **ICCV is biennial**: ICCV only runs in odd years (2021, 2023, 2025). Specifying an even year returns 0 results; the actor skips it with a log message.

***

### 🔗 Integrations

#### 📊 Google Sheets research tracker

1. Run the actor with `conferences: ["CVPR"]`, `years: [2024]`, and `maxResults: 500`
2. Open Google Sheets → **Extensions → Apify** (requires the Apify Sheets add-on)
3. Import the dataset and map columns: `title`, `authorsString`, `conference`, `year`, `arxivUrl`, `pdfUrl`
4. Result: a live spreadsheet of all selected papers, sortable and filterable

#### 📚 Zotero / Mendeley BibTeX bulk import

1. Run with `conferences: ["CVPR", "ICCV"]`, `years: [2024]`, `includeAbstract: false`
2. Download the JSON dataset
3. Extract BibTeX fields into a `.bib` file using Python: `python3 -c "import json; [print(p['bibtex']) for p in json.load(open('papers.json')) if p['bibtex']]" > cvf2024.bib`
4. Import `cvf2024.bib` into Zotero or Mendeley — all papers import with full structured metadata

#### 🤖 LLM trend analysis with Claude or ChatGPT

1. Run with `includeAbstract: true` on a focused batch (e.g., 100 papers from CVPR 2024)
2. Feed the JSON to an LLM with the prompt: *"Analyze these CVPR 2024 paper abstracts and identify the 10 dominant research themes with examples"*
3. Automate this pipeline using Apify's Claude or OpenAI integrations

#### 🔍 Semantic search with Pinecone / Weaviate

1. Scrape all ICCV 2023 papers with `includeAbstract: true`
2. Embed abstracts using the OpenAI Embeddings API (or run the Apify OpenAI Embeddings actor)
3. Push vectors to Pinecone with paper metadata — instantly queryable: *"find papers similar to NeRF"*

#### 📈 Publication trend dashboard in Tableau / Power BI

1. Schedule a monthly Apify run scraping the latest conference year
2. Connect the Apify dataset via the REST API to your BI tool
3. Build trend charts: papers per year, top authors, arXiv adoption rate, keywords in titles

***

### 🔌 API usage

You can trigger this actor programmatically using the Apify API.

#### Python

```python
import apify_client

client = apify_client.ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("automation-lab/cvf-papers-scraper").call(
    run_input={
        "conferences": ["CVPR"],
        "years": [2024],
        "maxResults": 100,
        "includeAbstract": False
    }
)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"], "-", item["authorsString"])
```

#### cURL

```bash
curl -X POST \
  "https://api.apify.com/v2/acts/automation-lab~cvf-papers-scraper/runs" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "conferences": ["CVPR"],
    "years": [2024],
    "maxResults": 100
  }'
```

#### Node.js

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('automation-lab/cvf-papers-scraper').call({
    conferences: ['CVPR'],
    years: [2024],
    maxResults: 100,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Extracted ${items.length} papers`);
```

***

### 🤖 MCP (Model Context Protocol)

Use this actor as an **MCP tool** inside Claude, Cursor, VS Code, or any MCP-compatible AI assistant to fetch CVF papers on demand from natural language prompts.

#### Claude Code (terminal)

```bash
claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/cvf-papers-scraper"
```

#### Claude Desktop / Cursor / VS Code

Add to your `claude_desktop_config.json` (or equivalent MCP config):

```json
{
    "mcpServers": {
        "apify": {
            "type": "http",
            "url": "https://mcp.apify.com?tools=automation-lab/cvf-papers-scraper",
            "headers": {
                "Authorization": "Bearer YOUR_APIFY_TOKEN"
            }
        }
    }
}
```

#### Example prompts for your AI assistant

Once connected, you can ask:

- *"Get the 50 most recent CVPR 2024 papers about 3D reconstruction"*
- *"Scrape all ICCV 2023 papers and summarize the dominant research themes"*
- *"Fetch WACV 2024 papers with abstracts — I want to find work on medical imaging"*
- *"Pull all CVPR papers from 2022 to 2024 and export their BibTeX citations for my literature review"*

***

### ⚖️ Legality and ethical use

**openaccess.thecvf.com** provides papers under open access as part of the Computer Vision Foundation's mission to advance research. The data extracted by this actor is:

- **Publicly available** — all papers are freely accessible without login or authentication
- **Non-commercial research data** — paper metadata (titles, authors, abstracts) is factual bibliographic information, not copyrighted content
- **Explicitly intended for distribution** — the CVF open access repository exists specifically to make this research accessible

**Usage guidance:**

- Use the `maxRequestRetries` setting to avoid hammering the server with excessive retries
- CVF's open access terms permit non-commercial research use of paper metadata
- The full PDF content of papers is copyrighted by authors/IEEE — downloading PDFs at scale may require separate permissions
- This actor extracts metadata only by default; always check the CVF terms of service and your organization's policies before commercial use

We do not encourage scraping beyond your legitimate research needs.

***

### ❓ FAQ

**Q: Does this require a proxy?**
No. openaccess.thecvf.com has no anti-bot measures. The actor uses plain HTTP requests — no residential proxy needed.

**Q: How many papers are available total?**
Approximately 100,000 CVPR papers (2013–2025), 40,000 ICCV papers, and 15,000 WACV papers — around 155,000 total across all supported conferences.

**Q: Can I scrape ECCV (European Conference on Computer Vision)?**
ECCV is not hosted on openaccess.thecvf.com — it's organized separately. This actor covers only CVF-sponsored conferences (CVPR, ICCV, WACV).

**Q: Why does ICCV only have odd years?**
ICCV is a biennial conference held in odd-numbered years. CVPR is annual. WACV became annual starting in 2020.

**Q: What happens if I specify a year that isn't available?**
The actor logs a warning and skips that conference/year combination. Other valid combinations continue normally.

**Q: Is abstract fetching significantly more expensive?**
The PPE price per paper is the same whether or not you fetch abstracts. But with `includeAbstract: true`, the run takes longer because of extra HTTP requests — roughly 10× longer for large batches.

**Q: Can I get papers from a specific research topic or keyword?**
CVF open access doesn't have a filtering API — papers are listed alphabetically. Use `maxResults` to cap output, then filter locally by title or abstract content.

**Q: The actor seems to have duplicate papers — why?**
This shouldn't happen in normal operation. Each paper appears once per listing page. If you scrape the same conference/year in multiple runs, you'll get duplicates across datasets — deduplicate by `paperUrl`.

***

### 🔗 Related actors

- **[ACL Anthology Scraper](https://apify.com/automation-lab/acl-anthology-scraper)** — Extract NLP and computational linguistics papers from aclanthology.org (ACL, EMNLP, NAACL, EACL, COLING, and 50+ workshops)
- **[ArXiv Scraper](https://apify.com/automation-lab/arxiv-scraper)** — Search and download paper metadata from arXiv.org across all subject areas — ideal for tracking preprints that correspond to CVF papers
- **[Semantic Scholar Scraper](https://apify.com/automation-lab/semantic-scholar-scraper)** — Retrieve citation counts, author profiles, and paper abstracts from Semantic Scholar's AI-powered research database

***

### 🐛 Issues and feedback

Found a bug or have a feature request? Open an issue on GitHub or contact us through the Apify support channel. We actively maintain this actor and welcome pull requests.

Common issues:

- **Missing papers for a specific year**: Verify the year is in the supported range for that conference
- **Slow run with abstracts**: Abstract mode fetches one extra page per paper — expected behavior
- **arXiv URL missing**: Not all CVF papers have arXiv preprints; those fields will be `null`

# Actor input Schema

## `conferences` (type: `array`):

Which conferences to scrape. Options: CVPR, ICCV, WACV. Leave empty to default to CVPR.

## `years` (type: `array`):

Which years to scrape. CVPR: 2013–2025. ICCV: 2013–2025 (odd years). WACV: 2020–2026.

## `maxResults` (type: `integer`):

Maximum number of papers to return across all selected conferences and years.

## `includeAbstract` (type: `boolean`):

Fetch the full abstract for each paper (requires one extra HTTP request per paper — slower and more expensive).

## `maxRequestRetries` (type: `integer`):

Number of retry attempts for failed HTTP requests.

## Actor input object example

```json
{
  "conferences": [
    "CVPR"
  ],
  "years": [
    2024
  ],
  "maxResults": 20,
  "includeAbstract": false,
  "maxRequestRetries": 3
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "conferences": [
        "CVPR"
    ],
    "years": [
        2024
    ],
    "maxResults": 20,
    "includeAbstract": false,
    "maxRequestRetries": 3
};

// Run the Actor and wait for it to finish
const run = await client.actor("automation-lab/cvf-papers-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "conferences": ["CVPR"],
    "years": [2024],
    "maxResults": 20,
    "includeAbstract": False,
    "maxRequestRetries": 3,
}

# Run the Actor and wait for it to finish
run = client.actor("automation-lab/cvf-papers-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "conferences": [
    "CVPR"
  ],
  "years": [
    2024
  ],
  "maxResults": 20,
  "includeAbstract": false,
  "maxRequestRetries": 3
}' |
apify call automation-lab/cvf-papers-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=automation-lab/cvf-papers-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "CVF Papers Scraper",
        "description": "Scrape research papers from openaccess.thecvf.com (CVPR, ICCV, WACV)",
        "version": "0.1",
        "x-build-id": "dLVMsTJgCHQFK7tnL"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/automation-lab~cvf-papers-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-automation-lab-cvf-papers-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/automation-lab~cvf-papers-scraper/runs": {
            "post": {
                "operationId": "runs-sync-automation-lab-cvf-papers-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/automation-lab~cvf-papers-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-automation-lab-cvf-papers-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "conferences": {
                        "title": "📚 Conferences",
                        "type": "array",
                        "description": "Which conferences to scrape. Options: CVPR, ICCV, WACV. Leave empty to default to CVPR.",
                        "default": [
                            "CVPR"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "years": {
                        "title": "📅 Years",
                        "type": "array",
                        "description": "Which years to scrape. CVPR: 2013–2025. ICCV: 2013–2025 (odd years). WACV: 2020–2026.",
                        "default": [
                            2024
                        ]
                    },
                    "maxResults": {
                        "title": "Max results",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of papers to return across all selected conferences and years.",
                        "default": 500
                    },
                    "includeAbstract": {
                        "title": "Include abstract",
                        "type": "boolean",
                        "description": "Fetch the full abstract for each paper (requires one extra HTTP request per paper — slower and more expensive).",
                        "default": false
                    },
                    "maxRequestRetries": {
                        "title": "Max request retries",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Number of retry attempts for failed HTTP requests.",
                        "default": 3
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
