Legacy PhantomJS Crawler avatar
Legacy PhantomJS Crawler
Try for free

No credit card required

View all Actors
Legacy PhantomJS Crawler

Legacy PhantomJS Crawler

apify/legacy-phantomjs-crawler
Try for free

No credit card required

Replacement for the legacy Apify Crawler product with a backward-compatible interface. The actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.

The code examples below show how to run the Actor and get its results. To run the code, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token, which you can find under Settings > Integrations in Apify Console. Learn more

1# Set API token
2API_TOKEN=<YOUR_API_TOKEN>
3
4# Prepare Actor input
5cat > input.json <<'EOF'
6{
7  "startUrls": [
8    {
9      "key": "START",
10      "value": "https://www.example.com/"
11    }
12  ],
13  "crawlPurls": [
14    {
15      "key": "MY_LABEL",
16      "value": "https://www.example.com/[.*]"
17    }
18  ],
19  "clickableElementsSelector": "a:not([rel=nofollow])",
20  "pageFunction": "function pageFunction(context) {\n    // called on every page the crawler visits, use it to extract data from it\n    var $ = context.jQuery;\n    var result = {\n        title: $('title').text(),\n        myValue: $('TODO').text()\n    };\n    return result;\n}\n",
21  "interceptRequest": "function interceptRequest(context, newRequest) {\n    // called whenever the crawler finds a link to a new page,\n    // use it to override default behavior\n    return newRequest;\n}\n"
22}
23EOF
24
25# Run the Actor using an HTTP API
26# See the full API reference at https://docs.apify.com/api/v2
27curl "https://api.apify.com/v2/acts/apify~legacy-phantomjs-crawler/runs?token=$API_TOKEN" \
28  -X POST \
29  -d @input.json \
30  -H 'Content-Type: application/json'
Developer
Maintained by Apify
Actor metrics
  • 119 monthly users
  • 19 stars
  • 100.0% runs succeeded
  • Created in Mar 2019
  • Modified about 1 month ago