Legacy PhantomJS Crawler avatar
Legacy PhantomJS Crawler
Try for free

No credit card required

View all Actors
Legacy PhantomJS Crawler

Legacy PhantomJS Crawler

apify/legacy-phantomjs-crawler
Try for free

No credit card required

Replacement for the legacy Apify Crawler product with a backward-compatible interface. The actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.

The code examples below show how to run the Actor and get its results. To run the code, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token, which you can find under Settings > Integrations in Apify Console. Learn more

1from apify_client import ApifyClient
2
3# Initialize the ApifyClient with your Apify API token
4client = ApifyClient("<YOUR_API_TOKEN>")
5
6# Prepare the Actor input
7run_input = {
8    "startUrls": [{
9            "key": "START",
10            "value": "https://www.example.com/",
11        }],
12    "crawlPurls": [{
13            "key": "MY_LABEL",
14            "value": "https://www.example.com/[.*]",
15        }],
16    "clickableElementsSelector": "a:not([rel=nofollow])",
17    "pageFunction": """function pageFunction(context) {
18    // called on every page the crawler visits, use it to extract data from it
19    var $ = context.jQuery;
20    var result = {
21        title: $('title').text(),
22        myValue: $('TODO').text()
23    };
24    return result;
25}
26""",
27    "interceptRequest": """function interceptRequest(context, newRequest) {
28    // called whenever the crawler finds a link to a new page,
29    // use it to override default behavior
30    return newRequest;
31}
32""",
33}
34
35# Run the Actor and wait for it to finish
36run = client.actor("apify/legacy-phantomjs-crawler").call(run_input=run_input)
37
38# Fetch and print Actor results from the run's dataset (if there are any)
39print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
40for item in client.dataset(run["defaultDatasetId"]).iterate_items():
41    print(item)
42
43# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start
Developer
Maintained by Apify
Actor metrics
  • 119 monthly users
  • 19 stars
  • 100.0% runs succeeded
  • Created in Mar 2019
  • Modified about 1 month ago