Legacy PhantomJS Crawler avatar
Legacy PhantomJS Crawler
Try for free

No credit card required

View all Actors
Legacy PhantomJS Crawler

Legacy PhantomJS Crawler

apify/legacy-phantomjs-crawler
Try for free

No credit card required

Replacement for the legacy Apify Crawler product with a backward-compatible interface. The actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.

The code examples below show how to run the Actor and get its results. To run the code, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token, which you can find under Settings > Integrations in Apify Console. Learn mode

Node.js

Python

curl

1import { ApifyClient } from 'apify-client';
2
3// Initialize the ApifyClient with your Apify API token
4const client = new ApifyClient({
5    token: '<YOUR_API_TOKEN>',
6});
7
8// Prepare Actor input
9const input = {
10    "startUrls": [
11        {
12            "key": "START",
13            "value": "https://www.example.com/"
14        }
15    ],
16    "crawlPurls": [
17        {
18            "key": "MY_LABEL",
19            "value": "https://www.example.com/[.*]"
20        }
21    ],
22    "clickableElementsSelector": "a:not([rel=nofollow])",
23    "pageFunction": function pageFunction(context) {
24        // called on every page the crawler visits, use it to extract data from it
25        var $ = context.jQuery;
26        var result = {
27            title: $('title').text(),
28            myValue: $('TODO').text()
29        };
30        return result;
31    },
32    "interceptRequest": function interceptRequest(context, newRequest) {
33        // called whenever the crawler finds a link to a new page,
34        // use it to override default behavior
35        return newRequest;
36    }
37};
38
39(async () => {
40    // Run the Actor and wait for it to finish
41    const run = await client.actor("apify/legacy-phantomjs-crawler").call(input);
42
43    // Fetch and print Actor results from the run's dataset (if any)
44    console.log('Results from dataset');
45    const { items } = await client.dataset(run.defaultDatasetId).listItems();
46    items.forEach((item) => {
47        console.dir(item);
48    });
49})();
Developer
Maintained by Apify
Actor metrics
  • 124 monthly users
  • 99.9% runs succeeded
  • Created in Mar 2019
  • Modified 6 months ago