Legacy PhantomJS Crawler

No credit card required

Legacy PhantomJS Crawler

Legacy PhantomJS Crawler

apify/legacy-phantomjs-crawler

No credit card required

Replacement for the legacy Apify Crawler product with a backward-compatible interface. The actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.

The code examples below show how to run the Actor and get its results. To run the code, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token, which you can find under Settings > Integrations in Apify Console. Learn mode

# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare Actor input
cat > input.json <<'EOF'
{
  "startUrls": [
    {
      "key": "START",
      "value": "https://www.example.com/"
    }
  ],
  "crawlPurls": [
    {
      "key": "MY_LABEL",
      "value": "https://www.example.com/[.*]"
    }
  ],
  "clickableElementsSelector": "a:not([rel=nofollow])",
  "pageFunction": "function pageFunction(context) {\n    // called on every page the crawler visits, use it to extract data from it\n    var $ = context.jQuery;\n    var result = {\n        title: $('title').text(),\n        myValue: $('TODO').text()\n    };\n    return result;\n}\n",
  "interceptRequest": "function interceptRequest(context, newRequest) {\n    // called whenever the crawler finds a link to a new page,\n    // use it to override default behavior\n    return newRequest;\n}\n"
}
EOF

# Run the Actor
curl "https://api.apify.com/v2/acts/apify~legacy-phantomjs-crawler/runs?token=$API_TOKEN" \
  -X POST \
  -d @input.json \
  -H 'Content-Type: application/json'
Developer
Maintained by Apify
Actor stats
  • 1.4k users
  • 15M runs
  • Modified about 2 months ago

You might also like these Actors