Pricing

from $0.30 / 1,000 results

Go to Apify Store

Common Crawl Index Scraper

Try for free

Scrape Common Crawl index metadata. Extract domain coverage, crawl dates, page counts, and segment info.

Pricing

from $0.30 / 1,000 results

Rating

0.0

(0)

Developer

Donny Nguyen

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

What does Common Crawl Index Scraper do?

Scrape Common Crawl index metadata. Extract domain coverage, crawl dates, page counts, and segment info. It runs on the Apify platform and delivers structured data in JSON, CSV, or Excel format, ready for analysis, integration, or automation workflows. Common Crawl Index Scraper handles pagination, retries, and proxy rotation automatically so you can focus on using the data.

Why use Common Crawl Index Scraper?

No coding required — configure inputs in a simple web UI and click Start
Export anywhere — download results as JSON, CSV, or Excel, or connect via API
Scheduled runs — set up recurring scrapes to keep your data fresh (hourly, daily, weekly)
Scalable — process hundreds or thousands of items with automatic proxy rotation and retry logic
Integrations — connect to Google Sheets, Slack, Zapier, Make, webhooks, and more through the Apify platform

How to use Common Crawl Index Scraper

Navigate to the Common Crawl Index Scraper page on Apify Store and click Try for free
Configure your input parameters (see Input Configuration below)
Click Start and wait for the run to complete
View results in the Output tab — use the formatted table or switch to raw JSON
Download your data as JSON, CSV, or Excel, or access it via the Apify API

Input configuration

Field	Type	Description	Default
URLs	`array`	Common Crawl index pages to scrape	['https://index.commoncrawl.org/collinfo.json']
Max Results	`integer`	Maximum number of results	100

Output data

The actor stores results in a dataset. Each item in the dataset represents one extracted record with structured fields. You can preview the data in the Output tab's formatted table view.

Key output fields include: Name, Timegate, Scraped At, ID, Cdx API.

Example output:

{
  "name": "Example Name",
  "timegate": "2026-01-15T10:30:00Z",
  "scrapedAt": "Example Scraped At",
  "id": "Example ID",
  "cdxApi": "Example Cdx API"
}

Each run also produces an execution log with detailed information about pages processed, items extracted, and any errors encountered.

Cost of usage

Common Crawl Index Scraper uses Pay-Per-Event pricing (Utility tier). Each successfully extracted result costs approximately $0.0003 ($0.30 per 1,000 results).

On a free Apify plan ($5/month platform credit), you can extract approximately 16,666 results per month.

Example: Extracting 1,000 results would cost approximately $0.30.

Tips and advanced usage

Proxy configuration: This actor uses lightweight HTTP requests for fast, efficient scraping. For sites with rate limiting, the actor automatically rotates proxies.
Large datasets: For runs with thousands of results, increase the memory allocation in Run Options to speed up processing. The actor automatically manages request queues and pagination.
Scheduled runs: Use Apify Schedules to run this actor on a recurring basis. Combined with integrations (webhooks, Google Sheets, Slack), you can build automated data pipelines that keep your datasets up to date.
API access: Every dataset is accessible via the Apify API. Use the REST API or official Python/JavaScript clients to integrate results directly into your applications.

Useful Links

Related Actors:

Air Quality Index Scraper

consummate_mandala/air-quality-index-scraper

Donny Nguyen

Uv Index Scraper

consummate_mandala/uv-index-scraper

Donny Nguyen

Index Crypto Fear Greed Index — Prices, Volume & Market Data

tropical_quince/crypto-fear-greed-index

Index crypto fear greed index data at scale with this powerful Apify actor. Extracts prices, volume & market data with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.

Donny Nguyen

XML Sitemap Checker

coder_luffy/xml-sitemap-checker

Verify if your website has a properly configured XML sitemap. Checks robots.txt and common paths, validates accessibility, XML structure, content type, and URL count — ensuring search engines can easily crawl and index your site.

Luffy

Competitor Analyzer

salman_bareesh/competitor-analyzer

Extract and analyze the top organic and paid competitors for any domain. This actor provides comprehensive competitor data including common keyword terms and competitive ranking scores.

Salman Bareesh

Consumer Price Index (Inflation) Actor

harvest/consumer-price-index-inflation-actor

Consumer Price Index API / Inflation API, delivering current and historical CPI data for various consumer goods.

Harvest Data

Keywords Extractor

lukaskrivka/keywords-extractor

Use our free website keyword extractor to crawl any website and extract keyword counts on each page.

Lukáš Křivka

806

4.8

Crawl Documentation Site — Data, Details & Metadata

tropical_quince/documentation-site-crawler

Crawl documentation site data at scale with this powerful Apify actor. Extracts data, details & metadata with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.

Donny Nguyen

TIOBE Index Scraper

rmkarymshakov/tiobe-index-scraper

Scrapes TIOBE Index for top 20 programming languages, extracting current and previous year's positions, names, ratings, and rating changes. Stores structured data in a dataset for easy analysis. Ideal for developers, researchers, and analysts tracking language popularity trends over time.

Rakhman Karymshakov

Crawl Wordpress Blog — Headlines, Content & Dates

tropical_quince/wordpress-blog-crawler

Crawl wordpress blog data at scale with this powerful Apify actor. Extracts headlines, content & dates with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.

Donny Nguyen