Actor picture

Phantom.js Scraper

barry8schneider/legacy-phantomjs-crawler

PhantomJS is 6 to 10 times faster than puppeteer per Compute Unit. Sends an email when the task is complete. The input screen has been improved. Note: PhantomJS is no longer being developed and might be detected and blocked by websites.

No credit card required

Author's avatarBarry Schneider
  • Modified
  • Users38
  • Runs363
Actor picture

Phantom.js Scraper

To run the code examples, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token. For a more detailed explanation, please read about running actors via the API in Apify Docs.

const { ApifyClient } = require('apify-client');

// Initialize the ApifyClient with API token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare actor input
const input = {
    "startUrls": [
        {
            "key": "START",
            "value": "https://apify.com/"
        }
    ],
    "crawlPurls": [
        {
            "key": "MY_LABEL",
            "value": "https://www.example.com/[.*]"
        }
    ],
    "clickableElementsSelector": "a:not([rel=nofollow])",
    "proxyConfiguration": {
        "useApifyProxy": true
    },
    "pageFunction": function pageFunction(context) {
        // called on every page the crawler visits, use it to extract data from it
        var $ = context.jQuery;
        var result = {
            title: $('title').text(),
            myValue: $('TODO').text()
        };
        return result;
    },
    "interceptRequest": function interceptRequest(context, newRequest) {
        // called whenever the crawler finds a link to a new page,
        // use it to override default behavior
        return newRequest;
    }
};

(async () => {
    // Run the actor and wait for it to finish
    const run = await client.actor("barry8schneider/legacy-phantomjs-crawler").call(input);

    // Fetch and print actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();