Pricing

Pay per usage

Try for free

Go to Store

Legacy PhantomJS Crawler

Try for free

Developed by

Apify

Replacement for the legacy Apify Crawler product with a backward-compatible interface. The actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.

5.0 (6)

Pricing

Pay per usage

Total users

1.6K

Monthly users

Runs succeeded

>99%

Last modified

a year ago

Developer tools

Open source

You can access the Legacy PhantomJS Crawler programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

1from apify_client import ApifyClient
2
3# Initialize the ApifyClient with your Apify API token
4# Replace '<YOUR_API_TOKEN>' with your token.
5client = ApifyClient("<YOUR_API_TOKEN>")
6
7# Prepare the Actor input
8run_input = {
9    "startUrls": [{
10            "key": "START",
11            "value": "https://www.example.com/",
12        }],
13    "crawlPurls": [{
14            "key": "MY_LABEL",
15            "value": "https://www.example.com/[.*]",
16        }],
17    "clickableElementsSelector": "a:not([rel=nofollow])",
18    "pageFunction": """function pageFunction(context) {
19    // called on every page the crawler visits, use it to extract data from it
20    var $ = context.jQuery;
21    var result = {
22        title: $('title').text(),
23        myValue: $('TODO').text()
24    };
25    return result;
26}
27""",
28    "interceptRequest": """function interceptRequest(context, newRequest) {
29    // called whenever the crawler finds a link to a new page,
30    // use it to override default behavior
31    return newRequest;
32}
33""",
34}
35
36# Run the Actor and wait for it to finish
37run = client.actor("apify/legacy-phantomjs-crawler").call(run_input=run_input)
38
39# Fetch and print Actor results from the run's dataset (if there are any)
40print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
41for item in client.dataset(run["defaultDatasetId"]).iterate_items():
42    print(item)
43
44# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

Legacy PhantomJS Crawler - Crawl websites, extract data API in Python

The Apify API client for Python is the official library that allows you to use Legacy PhantomJS Crawler API in Python, providing convenience functions and automatic retries on errors.

Install the apify-client

$pip install apify-client

Other API clients include:

Legacy PhantomJS Crawler API in JavaScript

Legacy PhantomJS Crawler API through CLI

Legacy PhantomJS Crawler OpenAPI definition

Legacy PhantomJS Crawler API

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

3.6

JSDOM Scraper

apify/jsdom-scraper

Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. `window`). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Scraper and the browser scrapers.

Apify

4.3

Vanilla JS Scraper

mstephen190/vanilla-js-scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Matthias Stephens

471

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

8.4K

5.0

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

4.7

Send Legacy PhantomJS Crawler Results

drobnikj/send-crawler-results

This actor downloads results from Legacy PhantomJS Crawler task and sends them to email as attachments. It is designed to run from finish webhook.

Jakub Drobník

Example Process Crawl Results

apify/example-process-crawl-results

Iterates through all results from a crawler run and count them. Needs to be called from the crawler's finish webhook by adding an URL to finish the webhook of your crawler. Use this actor as a starting point to develop custom post-processing of data from the crawler.

Apify

4.5

Stealth Scraper

lolio9/stealth-scraper

A stealthy, headless browser-based scraper that mimics human behavior to avoid detection. Automatically saves every visited HTML page and downloadable file, incrementally archiving progress. Perfect for large websites, internal networks, or compliance-sensitive environments.

Marcus

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

90K

4.4

Apify Run Queue

lexis-solutions/apify-run-queue

Apify utility to queue runs until memory is available. Workaround to memory exceeded errors on Apify. This actor will retry starting actor starts with a delay. Open source and free!