worldometers-population-scraper

Try for free

Pricing

from $40.00 / 1,000 results

Rating

0.0

(0)

Developer

Miraç Birben

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Worldometers Population Scraper (Python, Crawlee, Playwright)

This Apify Actor is designed to collect live population and related demographic data (births, deaths, population growth, etc.) from the Worldometers.info website. It reliably scrapes dynamically loaded content and JavaScript-updated counters using PlaywrightCrawler.

Features

Live Counter Data: Extracts current world population, births today and this year, deaths today and this year, and population growth from Worldometers.info.
Playwright Automation: Seamlessly handles dynamically loaded data and JavaScript-rendered pages.
Intelligent Data Extraction: Uses robust XPath and CSS selectors to identify counter values based on their labels.
Data Cleaning and Transformation: Converts extracted numerical values into readable integers.
Error Tolerance: Continues processing even if counters load with a delay.
Proxy Support: Supports Apify Proxy configuration to prevent IP blocking.
Optimized Crawling: Improves performance by blocking unnecessary resources (images, media, fonts, stylesheets).

Quick Start

Follow these steps to run this Actor on the Apify platform or develop it locally.

Install Dependencies

Before running the project locally, you need to install the Python dependencies:

pip install -r requirements.txt
Run the Actor

Run Locally:

After cloning the project, you can start the Actor locally by running the following command in the root directory:

code
Bash
download
content_copy
expand_less
apify run

Deploy to Apify Console:

To upload the Actor to the Apify Console, follow these steps:

Log in to your Apify account:

code
Bash
download
content_copy
expand_less
apify login

Push the Actor to the Apify platform:

code
Bash
download
content_copy
expand_less
apify push
Input Configuration

The Actor expects a JSON object with the following structure:

code
JSON
download
content_copy
expand_less
{
  "start_urls": [
    {
      "url": "https://www.worldometers.info/world-population/"
    }
  ],
  "maxRequestsPerCrawl": 10,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

start_urls (array of objects): A list of URLs to start scraping from. Each object must have a url property.

maxRequestsPerCrawl (integer, default: 10): The maximum number of pages the crawler will visit. (This value might not be kept high, as Worldometers data is usually on a single main page.)

proxyConfiguration (object, default: {"useApifyProxy": true}): Proxy settings. Set "useApifyProxy": true to use Apify Proxy.

Output Data

The Actor pushes JSON objects with the following structure to the Apify Dataset:

code
JSON
download
content_copy
expand_less
{
  "Current World Population": 8123456789,
  "Births today": 123456,
  "Deaths today": 56789,
  "Population Growth today": 66667,
  "Births this year": 12345678,
  "Deaths this year": 5678901,
  "Population Growth this year": 6666777,
  "source_url": "https://www.worldometers.info/world-population/",
  "scraped_at": "now"
}

All population and counter fields (Current World Population, Births today, etc.) are of type int or null if data is not found.

source_url: The URL of the page from which the data was extracted.

scraped_at: The timestamp when the data was scraped.

Project Structure
code
Text
download
content_copy
expand_less
.actor/
├── actor.json           # Actor configuration: name, version, environment variables, etc.
├── dataset_schema.json  # Schema for the output dataset
├── input_schema.json    # Schema for the Actor's input parameters and Console form
└── output_schema.json   # Schema for the Actor's overall output
src/
├── __init__.py          # Initialization file for the Python package
├── __main__.py          # Main entry point for the Actor (used by apify run/call)
├── main.py              # Contains the crawler logic and data extraction processes
└── py.typed             # Specifies type hints for tools like MyPy
storage/                 # Local storage (mirrors Apify Cloud during development)
├── datasets/            # Output items (as JSON objects)
├── key_value_stores/    # INPUT configuration and other files
└── request_queues/      # Pending crawl requests
AGENTS.md                # Notes for agents and developers regarding maintenance and logic
Dockerfile               # Definition of the Docker container image in which the Actor will run
README.md                # General information and usage guide for this project
requirements.txt         # List of Python dependencies
Maintenance and Development

Selector Updates: If the Worldometers.info website structure changes, the Playwright selectors within src/main.py (page.wait_for_selector, page.locator(xpath)) may need to be updated.

Performance: The resource blocking logic (page.route) is carefully configured. It is recommended to preserve these optimizations unless a critical style or font prevents counters from being visible.

Debugging: During development, you can set headless=False to open the browser visually and monitor Actor.log.info messages.

Additional Resources

Apify SDK for Python documentation

Crawlee for Python documentation

Playwright Python documentation

Apify Platform documentation

Apify Developer Community (Discord)

code
Code
download
content_copy
expand_less