A template example built with Selenium and a headless Chrome browser to scrape a website and save the results to storage. The URL of the web page is passed in via input, which is defined by the input schema. The template uses the Selenium WebDriver to load and process the page. Enqueued URLs are stored in the default request queue. The data are then stored in the default dataset where you can easily access them.

Included features

Apify SDK for Python - a toolkit for building Apify Actors and scrapers in Python
Input schema - define and easily validate a schema for your Actor's input
Request queue - queues into which you can put the URLs you want to scrape
Dataset - store structured data where each object stored has the same attributes
Selenium - a browser automation library

How it works

This code is a Python script that uses Selenium to scrape web pages and extract data from them. Here's a brief overview of how it works:

The script reads the input data from the Actor instance, which is expected to contain a start_urls key with a list of URLs to scrape and a max_depth key with the maximum depth of nested links to follow.
The script enqueues the starting URLs in the default request queue and sets their depth to 1.
The script processes the requests in the queue one by one, fetching the URL using requests and parsing it using Selenium.
If the depth of the current request is less than the maximum depth, the script looks for nested links in the page and enqueues their targets in the request queue with an incremented depth.
The script extracts the desired data from the page (in this case, titles of each page) and pushes them to the default dataset using the push_data method of the Actor instance.
The script catches any exceptions that occur during the web scraping process and logs an error message using the Actor.log.exception method.

Resources

Selenium controlled Chrome example
Selenium Grid: what it is and how to set it up
Web scraping with Selenium and Python
Cypress vs. Selenium for web testing
Python tutorials in Academy
Video guide on getting scraped data using Apify API
A short guide on how to build web scrapers using code templates:

Getting started

For complete information see this article. In short, you will:

Build the Actor
Run the Actor

Pull the Actor for local development

If you would like to develop locally, you can pull the existing Actor from Apify console using Apify CLI:

Install apify-cli

Using Homebrew

$brew install apify-cli

Using NPM

$npm -g install apify-cli

Pull the Actor by its unique <ActorId>, which is one of the following:
- unique name of the Actor to pull (e.g. "apify/hello-world")
- or ID of the Actor to pull (e.g. "E2jjCZBezvAZnX8Rb")
You can find both by clicking on the Actor title at the top of the page, which will open a modal containing both Actor unique name and Actor ID.

This command will copy the Actor into the current directory on your local machine.
```
$apify pull <ActorId>
```

Documentation reference

To learn more about Apify and Actors, take a look at the following resources:

On this page

Share Actor:

Example Selenium

apify/example-selenium

Example of loading a web page in headless Chrome using Selenium Webdriver.

Apify

286

5.0

Selenium Custom Firefox POC

jancurn/selenium-custom-firefox

Uses Selenium to run custom build of Firefox that is harder to detect. The actor just saves a screenshot and snapshot of the HTML.

Jan Čurn

OMR Review Scraper

scrapers123/omr-review-scraper

Scrapes software reviews from OMR

ManM

5.0

Example Code Runner (Python)

apify/example-code-runner-python

Python Actor to run code examples from the documentation via "Run on Apify" links.

Apify

354

Smartcontext AI Web Crawler

bluelightco/smartcontext-ai-crawler

Scrape any website and extract structured data using AI-powered instructions. Provide URLs and a natural language prompt to get tailored JSON outputs.

Bluelight Consulting

AI Vision Scraper

zscrape/ai-vision-scraper

AI Vision Scraper automates web tasks, navigating sites, solving CAPTCHAs, and extracting data on demand using a single prompt. From competitor tracking to form submissions, it streamlines workflows and automation across industries like e-commerce, sales, and recruiting.

ZScrape Solutions

Playwright Test Runner

jindrich.bar/playwright-test

Run Playwright tests across numerous browser configurations with Apify. Create your tests in seconds and get comprehensive test reports faster than ever.

Jindřich Bär

Python Example

apify/python-example

Example Actor written in Python, showing how to read the Actor input and push to the Actor's default dataset.

Apify

108

Pay-as-you-go API / JSON scraper

pocesar/pay-as-you-go-api-json-scraper

Scrape as pay-as-you-go any API / JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. Enables you to follow pagination recursively from the payload without the need to visit the HTML page.

Paulo Cesar

Send HTTP requests

riceman/send-http-requests

Send HTTP requests (GET, POST, PUT, PATCH, DELETE) to any API endpoint with customizable headers, parameters, and body data. Perfect for Clay users seeking API functionality without the Explorer plan upgrade, or anyone needing simple HTTP request capabilities.