worldometers-population-scraper avatar
worldometers-population-scraper

Pricing

from $40.00 / 1,000 results

Go to Apify Store
worldometers-population-scraper

worldometers-population-scraper

Pricing

from $40.00 / 1,000 results

Rating

5.0

(1)

Developer

Miraç Birben

Miraç Birben

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

16 hours ago

Last modified

Categories

Share

code Markdown download content_copy expand_less

Worldometers Population Scraper (Python, Crawlee, Playwright)

This Apify Actor is designed to collect live population and related demographic data (births, deaths, population growth, etc.) from the Worldometers.info website. It reliably scrapes dynamically loaded content and JavaScript-updated counters using PlaywrightCrawler.

Features

  • Live Counter Data: Extracts current world population, births today and this year, deaths today and this year, and population growth from Worldometers.info.
  • Playwright Automation: Seamlessly handles dynamically loaded data and JavaScript-rendered pages.
  • Intelligent Data Extraction: Uses robust XPath and CSS selectors to identify counter values based on their labels.
  • Data Cleaning and Transformation: Converts extracted numerical values into readable integers.
  • Error Tolerance: Continues processing even if counters load with a delay.
  • Proxy Support: Supports Apify Proxy configuration to prevent IP blocking.
  • Optimized Crawling: Improves performance by blocking unnecessary resources (images, media, fonts, stylesheets).

Quick Start

Follow these steps to run this Actor on the Apify platform or develop it locally.

Install Dependencies

Before running the project locally, you need to install the Python dependencies:

pip install -r requirements.txt
Run the Actor
Run Locally:
After cloning the project, you can start the Actor locally by running the following command in the root directory:
code
Bash
download
content_copy
expand_less
apify run
Deploy to Apify Console:
To upload the Actor to the Apify Console, follow these steps:
Log in to your Apify account:
code
Bash
download
content_copy
expand_less
apify login
Push the Actor to the Apify platform:
code
Bash
download
content_copy
expand_less
apify push
Input Configuration
The Actor expects a JSON object with the following structure:
code
JSON
download
content_copy
expand_less
{
"start_urls": [
{
"url": "https://www.worldometers.info/world-population/"
}
],
"maxRequestsPerCrawl": 10,
"proxyConfiguration": {
"useApifyProxy": true
}
}
start_urls (array of objects): A list of URLs to start scraping from. Each object must have a url property.
maxRequestsPerCrawl (integer, default: 10): The maximum number of pages the crawler will visit. (This value might not be kept high, as Worldometers data is usually on a single main page.)
proxyConfiguration (object, default: {"useApifyProxy": true}): Proxy settings. Set "useApifyProxy": true to use Apify Proxy.
Output Data
The Actor pushes JSON objects with the following structure to the Apify Dataset:
code
JSON
download
content_copy
expand_less
{
"Current World Population": 8123456789,
"Births today": 123456,
"Deaths today": 56789,
"Population Growth today": 66667,
"Births this year": 12345678,
"Deaths this year": 5678901,
"Population Growth this year": 6666777,
"source_url": "https://www.worldometers.info/world-population/",
"scraped_at": "now"
}
All population and counter fields (Current World Population, Births today, etc.) are of type int or null if data is not found.
source_url: The URL of the page from which the data was extracted.
scraped_at: The timestamp when the data was scraped.
Project Structure
code
Text
download
content_copy
expand_less
.actor/
├── actor.json # Actor configuration: name, version, environment variables, etc.
├── dataset_schema.json # Schema for the output dataset
├── input_schema.json # Schema for the Actor's input parameters and Console form
└── output_schema.json # Schema for the Actor's overall output
src/
├── __init__.py # Initialization file for the Python package
├── __main__.py # Main entry point for the Actor (used by apify run/call)
├── main.py # Contains the crawler logic and data extraction processes
└── py.typed # Specifies type hints for tools like MyPy
storage/ # Local storage (mirrors Apify Cloud during development)
├── datasets/ # Output items (as JSON objects)
├── key_value_stores/ # INPUT configuration and other files
└── request_queues/ # Pending crawl requests
AGENTS.md # Notes for agents and developers regarding maintenance and logic
Dockerfile # Definition of the Docker container image in which the Actor will run
README.md # General information and usage guide for this project
requirements.txt # List of Python dependencies
Maintenance and Development
Selector Updates: If the Worldometers.info website structure changes, the Playwright selectors within src/main.py (page.wait_for_selector, page.locator(xpath)) may need to be updated.
Performance: The resource blocking logic (page.route) is carefully configured. It is recommended to preserve these optimizations unless a critical style or font prevents counters from being visible.
Debugging: During development, you can set headless=False to open the browser visually and monitor Actor.log.info messages.
Additional Resources
Apify SDK for Python documentation
Crawlee for Python documentation
Playwright Python documentation
Apify Platform documentation
Apify Developer Community (Discord)
code
Code
download
content_copy
expand_less