VLDB Scraper
Pricing
Pay per usage
VLDB Scraper
Under maintenanceScrape academic papers from VLDB (Very Large Data Bases) proceedings one of the top-tier venues in database research. Ideal for literature reviews, citation analysis, research trend tracking, and building academic datasets.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Iyadh Khan
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a month ago
Last modified
Categories
Share
VLDB Papers Scraper - Academic Research Data Extractor
An Apify Actor that scrapes academic papers from VLDB (Very Large Data Bases) proceedings — one of the top-tier venues in database research. Ideal for literature reviews, citation analysis, research trend tracking, and building academic datasets.
What it does
For each paper across the specified VLDB volumes, the scraper extracts:
- Volume - PVLDB volume number
- Year - Corresponding publication year(s)
- Title - Paper title
- Authors - Author names
- Abstract - Paper abstract
- PDF Link - Direct link to the PDF
- Artifact Available - Whether the paper has associated artifacts
- URL - Link to the paper page
Input
| Field | Type | Description | Default |
|---|---|---|---|
volumes | string[] | Select PVLDB volumes from a dropdown (Volume 1–20, 2008–present) | ["17", "18", "19"] |
max_workers | integer | Parallel Chrome instances (1–4). Use 1–2 for 4GB, 3–4 for 8GB+. | 2 |
Available volumes
| Volume | Year(s) |
|---|---|
| 1 | 2008 |
| 2 | 2009 |
| 3 | 2010 |
| 4 | 2010–2011 |
| 5 | 2011–2012 |
| 6 | 2012–2013 |
| 7 | 2013–2014 |
| 8 | 2014–2015 |
| 9 | 2015–2016 |
| 10 | 2016–2017 |
| 11 | 2017–2018 |
| 12 | 2018–2019 |
| 13 | 2019–2020 |
| 14 | 2020–2021 |
| 15 | 2021–2022 |
| 16 | 2022–2023 |
| 17 | 2023–2024 |
| 18 | 2024–2025 |
| 19 | 2025–2026 |
| 20 | 2026–2027 |
Example input
{"volumes": ["17", "18", "19"],"max_workers": 2}
Output
Results are stored in the default dataset. Each item has this structure:
{"volume": 18,"year": "2024–2025","title": "Example Paper Title","authors": "Author One, Author Two","abstract": "This paper presents...","pdfLink": "https://www.vldb.org/pvldb/vol18/example.pdf","artifactAvailable": true,"url": "https://www.vldb.org/pvldb/vol18/paper/example"}
How it works
- Visits each selected VLDB volume index page and collects all paper links
- Splits papers across parallel Chrome workers (configurable, 1–4)
- Each worker scrapes its assigned papers using Selenium + BeautifulSoup
- Pushes each paper's data to the Apify dataset
Getting started
On Apify platform
- Go to the Actor's page on the Apify platform
- Select the volumes you want to scrape and set concurrency
- Click Start and wait for the run to finish
- Download the results from the Dataset tab (JSON, CSV, Excel, etc.)
Local development
Requires Chrome/Chromium and Chromedriver installed locally.
$apify run