VLDB Scraper avatar

VLDB Scraper

Under maintenance

Pricing

Pay per usage

Go to Apify Store
VLDB Scraper

VLDB Scraper

Under maintenance

Scrape academic papers from VLDB (Very Large Data Bases) proceedings one of the top-tier venues in database research. Ideal for literature reviews, citation analysis, research trend tracking, and building academic datasets.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Iyadh Khan

Iyadh Khan

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Categories

Share

VLDB Papers Scraper - Academic Research Data Extractor

An Apify Actor that scrapes academic papers from VLDB (Very Large Data Bases) proceedings — one of the top-tier venues in database research. Ideal for literature reviews, citation analysis, research trend tracking, and building academic datasets.

What it does

For each paper across the specified VLDB volumes, the scraper extracts:

  • Volume - PVLDB volume number
  • Year - Corresponding publication year(s)
  • Title - Paper title
  • Authors - Author names
  • Abstract - Paper abstract
  • PDF Link - Direct link to the PDF
  • Artifact Available - Whether the paper has associated artifacts
  • URL - Link to the paper page

Input

FieldTypeDescriptionDefault
volumesstring[]Select PVLDB volumes from a dropdown (Volume 1–20, 2008–present)["17", "18", "19"]
max_workersintegerParallel Chrome instances (1–4). Use 1–2 for 4GB, 3–4 for 8GB+.2

Available volumes

VolumeYear(s)
12008
22009
32010
42010–2011
52011–2012
62012–2013
72013–2014
82014–2015
92015–2016
102016–2017
112017–2018
122018–2019
132019–2020
142020–2021
152021–2022
162022–2023
172023–2024
182024–2025
192025–2026
202026–2027

Example input

{
"volumes": ["17", "18", "19"],
"max_workers": 2
}

Output

Results are stored in the default dataset. Each item has this structure:

{
"volume": 18,
"year": "2024–2025",
"title": "Example Paper Title",
"authors": "Author One, Author Two",
"abstract": "This paper presents...",
"pdfLink": "https://www.vldb.org/pvldb/vol18/example.pdf",
"artifactAvailable": true,
"url": "https://www.vldb.org/pvldb/vol18/paper/example"
}

How it works

  1. Visits each selected VLDB volume index page and collects all paper links
  2. Splits papers across parallel Chrome workers (configurable, 1–4)
  3. Each worker scrapes its assigned papers using Selenium + BeautifulSoup
  4. Pushes each paper's data to the Apify dataset

Getting started

On Apify platform

  1. Go to the Actor's page on the Apify platform
  2. Select the volumes you want to scrape and set concurrency
  3. Click Start and wait for the run to finish
  4. Download the results from the Dataset tab (JSON, CSV, Excel, etc.)

Local development

Requires Chrome/Chromium and Chromedriver installed locally.

$apify run