
Legacy PhantomJS Crawler
apify/legacy-phantomjs-crawler
Replacement for the legacy Apify Crawler product with a backward-compatible interface. The actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.
The code examples below show how to run the Actor and get its results. To run the code, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token, which you can find under Settings > Integrations in Apify Console. Learn mode
# Set API token
API_TOKEN=<YOUR_API_TOKEN>
# Prepare Actor input
cat > input.json <<'EOF'
{
"startUrls": [
{
"key": "START",
"value": "https://www.example.com/"
}
],
"crawlPurls": [
{
"key": "MY_LABEL",
"value": "https://www.example.com/[.*]"
}
],
"clickableElementsSelector": "a:not([rel=nofollow])",
"pageFunction": "function pageFunction(context) {\n // called on every page the crawler visits, use it to extract data from it\n var $ = context.jQuery;\n var result = {\n title: $('title').text(),\n myValue: $('TODO').text()\n };\n return result;\n}\n",
"interceptRequest": "function interceptRequest(context, newRequest) {\n // called whenever the crawler finds a link to a new page,\n // use it to override default behavior\n return newRequest;\n}\n"
}
EOF
# Run the Actor
curl "https://api.apify.com/v2/acts/apify~legacy-phantomjs-crawler/runs?token=$API_TOKEN" \
-X POST \
-d @input.json \
-H 'Content-Type: application/json'
Developer
Maintained by Apify
Actor stats
- 1.4k users
- 15M runs
- Modified about 2 months ago
Categories