worldometers-population-scraper
Pricing
from $40.00 / 1,000 results
Go to Apify Store

worldometers-population-scraper
Pricing
from $40.00 / 1,000 results
Rating
5.0
(1)
Developer

Miraç Birben
Maintained by Community
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
16 hours ago
Last modified
Categories
Share
code Markdown download content_copy expand_less
Worldometers Population Scraper (Python, Crawlee, Playwright)
This Apify Actor is designed to collect live population and related demographic data (births, deaths, population growth, etc.) from the Worldometers.info website. It reliably scrapes dynamically loaded content and JavaScript-updated counters using PlaywrightCrawler.
Features
- Live Counter Data: Extracts current world population, births today and this year, deaths today and this year, and population growth from Worldometers.info.
- Playwright Automation: Seamlessly handles dynamically loaded data and JavaScript-rendered pages.
- Intelligent Data Extraction: Uses robust XPath and CSS selectors to identify counter values based on their labels.
- Data Cleaning and Transformation: Converts extracted numerical values into readable integers.
- Error Tolerance: Continues processing even if counters load with a delay.
- Proxy Support: Supports Apify Proxy configuration to prevent IP blocking.
- Optimized Crawling: Improves performance by blocking unnecessary resources (images, media, fonts, stylesheets).
Quick Start
Follow these steps to run this Actor on the Apify platform or develop it locally.
Install Dependencies
Before running the project locally, you need to install the Python dependencies:
pip install -r requirements.txtRun the ActorRun Locally:After cloning the project, you can start the Actor locally by running the following command in the root directory:codeBashdownloadcontent_copyexpand_lessapify runDeploy to Apify Console:To upload the Actor to the Apify Console, follow these steps:Log in to your Apify account:codeBashdownloadcontent_copyexpand_lessapify loginPush the Actor to the Apify platform:codeBashdownloadcontent_copyexpand_lessapify pushInput ConfigurationThe Actor expects a JSON object with the following structure:codeJSONdownloadcontent_copyexpand_less{"start_urls": [{"url": "https://www.worldometers.info/world-population/"}],"maxRequestsPerCrawl": 10,"proxyConfiguration": {"useApifyProxy": true}}start_urls (array of objects): A list of URLs to start scraping from. Each object must have a url property.maxRequestsPerCrawl (integer, default: 10): The maximum number of pages the crawler will visit. (This value might not be kept high, as Worldometers data is usually on a single main page.)proxyConfiguration (object, default: {"useApifyProxy": true}): Proxy settings. Set "useApifyProxy": true to use Apify Proxy.Output DataThe Actor pushes JSON objects with the following structure to the Apify Dataset:codeJSONdownloadcontent_copyexpand_less{"Current World Population": 8123456789,"Births today": 123456,"Deaths today": 56789,"Population Growth today": 66667,"Births this year": 12345678,"Deaths this year": 5678901,"Population Growth this year": 6666777,"source_url": "https://www.worldometers.info/world-population/","scraped_at": "now"}All population and counter fields (Current World Population, Births today, etc.) are of type int or null if data is not found.source_url: The URL of the page from which the data was extracted.scraped_at: The timestamp when the data was scraped.Project StructurecodeTextdownloadcontent_copyexpand_less.actor/├── actor.json # Actor configuration: name, version, environment variables, etc.├── dataset_schema.json # Schema for the output dataset├── input_schema.json # Schema for the Actor's input parameters and Console form└── output_schema.json # Schema for the Actor's overall outputsrc/├── __init__.py # Initialization file for the Python package├── __main__.py # Main entry point for the Actor (used by apify run/call)├── main.py # Contains the crawler logic and data extraction processes└── py.typed # Specifies type hints for tools like MyPystorage/ # Local storage (mirrors Apify Cloud during development)├── datasets/ # Output items (as JSON objects)├── key_value_stores/ # INPUT configuration and other files└── request_queues/ # Pending crawl requestsAGENTS.md # Notes for agents and developers regarding maintenance and logicDockerfile # Definition of the Docker container image in which the Actor will runREADME.md # General information and usage guide for this projectrequirements.txt # List of Python dependenciesMaintenance and DevelopmentSelector Updates: If the Worldometers.info website structure changes, the Playwright selectors within src/main.py (page.wait_for_selector, page.locator(xpath)) may need to be updated.Performance: The resource blocking logic (page.route) is carefully configured. It is recommended to preserve these optimizations unless a critical style or font prevents counters from being visible.Debugging: During development, you can set headless=False to open the browser visually and monitor Actor.log.info messages.Additional ResourcesApify SDK for Python documentationCrawlee for Python documentationPlaywright Python documentationApify Platform documentationApify Developer Community (Discord)codeCodedownloadcontent_copyexpand_less