Legacy PhantomJS Crawler

Pricing

Pay per usage

Try for free

Go to Apify Store

Legacy PhantomJS Crawler

Try for free

Developed by

Apify

Maintained by Apify

Replacement for the legacy Apify Crawler product with a backward-compatible interface. The actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.

5.0 (6)

Pricing

Pay per usage

1.6K

Last modified

a month ago

Developer tools

Open source

You can access the Legacy PhantomJS Crawler programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

{
  "openapi": "3.0.1",
  "info": {
    "version": "0.0",
    "x-build-id": "Mqc7kLzYAQiXg4012"
  },
  "servers": [
    {
      "url": "https://api.apify.com/v2"
    }
  ],
  "paths": {
    "/acts/apify~legacy-phantomjs-crawler/run-sync-get-dataset-items": {
      "post": {
        "operationId": "run-sync-get-dataset-items-apify-legacy-phantomjs-crawler",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK"
          }
        }
      }
    },
    "/acts/apify~legacy-phantomjs-crawler/runs": {
      "post": {
        "operationId": "runs-sync-apify-legacy-phantomjs-crawler",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor and returns information about the initiated run in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/runsResponseSchema"
                }
              }
            }
          }
        }
      }
    },
    "/acts/apify~legacy-phantomjs-crawler/run-sync": {
      "post": {
        "operationId": "run-sync-apify-legacy-phantomjs-crawler",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK"
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "inputSchema": {
        "type": "object",
        "required": [
          "startUrls"
        ],
        "properties": {
          "startUrls": {
            "title": "Start URLs",
            "minItems": 1,
            "type": "array",
            "description": "List of URLs that will be loaded by the crawler on start. For a POST request, append [POST] to the URL, e.g. <code>http://www.example.com/[POST]</code>",
            "items": {
              "type": "object",
              "required": [
                "key",
                "value"
              ],
              "properties": {
                "key": {
                  "type": "string",
                  "title": "Key"
                },
                "value": {
                  "type": "string",
                  "title": "Value"
                }
              }
            }
          },
          "crawlPurls": {
            "title": "Pseudo-URLs",
            "type": "array",
            "description": "Specifies URLs of pages to crawl. Put regular expressions in [ ] brackets, e.g. <code>http://www.example.com/[.*]</code>",
            "items": {
              "type": "object",
              "required": [
                "key",
                "value"
              ],
              "properties": {
                "key": {
                  "type": "string",
                  "title": "Key"
                },
                "value": {
                  "type": "string",
                  "title": "Value"
                }
              }
            }
          },
          "clickableElementsSelector": {
            "title": "Clickable elements",
            "type": "string",
            "description": "CSS selector used to find links to other web pages. Leave empty to ignore all links.<br><br>For example: <code>a[href]</code>"
          },
          "pageFunction": {
            "title": "Page function",
            "type": "string",
            "description": "JavaScript function that is executed on every crawled page, use it to extract data. Note that only ES5.1 syntax is supported."
          },
          "interceptRequest": {
            "title": "Intercept request function",
            "type": "string",
            "description": "JavaScript function called whenever the crawler finds a link or form leading to a new web page. Note that only ES5.1 syntax is supported"
          },
          "considerUrlFragment": {
            "title": "URL #fragments identify unique pages",
            "type": "boolean",
            "description": "Indicates that the URL fragment identifier (i.e. <code>http://example.com/page#<b>this-guy-here</b></code>) should be considered when matching a URL against a Pseudo-URL or when checking whether a page has already been visited. Typically, URL fragments are used as internal page anchors and therefore they should be ignored because they don't represent separate pages. However, many AJAX-based website nowadays use URL fragment to represent page parameters; in such cases, this option should be enabled.",
            "default": false
          },
          "loadImages": {
            "title": "Download HTML images",
            "type": "boolean",
            "description": "Indicates whether the crawler should load HTML images, both those included using the <code>&lt;img&gt;</code> tag as well as those included in CSS styles. Disable this feature after you have fine-tuned your crawler in order to increase crawling performance and reduce your bandwidth costs.",
            "default": true
          },
          "loadCss": {
            "title": "Download CSS files",
            "type": "boolean",
            "description": "Indicates whether the crawler should load CSS stylesheet files. Disable this feature after you have fine-tuned your crawler in order to increase crawling performance and reduce your bandwidth costs.",
            "default": true
          },
          "injectJQuery": {
            "title": "Inject jQuery",
            "type": "boolean",
            "description": "Indicates that the <a href='http://jquery.com' target='_blank' rel='noopener'>jQuery</a> library should be injected into each page before <b>Page function</b> is invoked. Note that the jQuery object will not be registered into global namespace in order to avoid conflicts with libraries used by the web page. It can only be accessed through <code>context.jQuery</code>.",
            "default": true
          },
          "injectUnderscoreJs": {
            "title": "Inject Underscore.js",
            "type": "boolean",
            "description": "Indicates that the <a href='http://underscorejs.org' target='_blank' rel='noopener'>Underscore.js</a> library should be injected into each page before <b>Page function</b> is invoked. Note that the Underscore object will not be registered into global namespace in order to avoid conflicts with libraries used by the web page. It can only be accessed through <code>context.underscoreJs</code>.",
            "default": false
          },
          "ignoreRobotsTxt": {
            "title": "Ignore robots exclusion standards",
            "type": "boolean",
            "description": "Indicates that the crawler should ignore <code>robots.txt</code>, <code>&lt;meta name='robots'&gt;</code> tags and <code>X-Robots-Tag</code> HTTP headers. Use this feature at your own risk!",
            "default": false
          },
          "skipLoadingFrames": {
            "title": "Don't load frames and IFRAMEs",
            "type": "boolean",
            "description": "Indicates that child frames included using FRAME or IFRAME tags will not be loaded by the crawler. This might improve crawling performance. As a side-effect, JavaScript redirects issued by the page before it was completely loaded will not be performed, which might be useful in certain situations.",
            "default": false
          },
          "verboseLog": {
            "title": "Verbose log",
            "type": "boolean",
            "description": "If enabled, the log will also contain DEBUG messages. Note that this setting will dramatically slow down the crawler as well as your web browser and increase the log size.",
            "default": false
          },
          "disableWebSecurity": {
            "title": "Disable web security",
            "type": "boolean",
            "description": "If checked, the virtual browser will allow cross-domain XHRs and untrusted SSL certificates, so that your crawler can access content from any domain. Only activate this feature if you know what you're doing!",
            "default": false
          },
          "rotateUserAgents": {
            "title": "Rotate User-Agent headers",
            "type": "boolean",
            "description": "If checked, the crawler automatically rotates the <code>User-Agent</code> HTTP header for each new IP address, from a pre-defined list. This settings overwrites <code>User-Agent</code> set in <b>Custom HTTP headers</b>.",
            "default": false
          },
          "maxCrawledPages": {
            "title": "Max pages per crawl",
            "minimum": 1,
            "maximum": 999999999,
            "type": "integer",
            "description": "Maximum number of pages that the crawler will open. The crawl will stop when this limit is reached. Always set this value in order to prevent infinite loops in misconfigured crawlers. Note that in cases of parallel crawling, the actual number of pages visited might be slightly higher than this value."
          },
          "maxOutputPages": {
            "title": "Max result records",
            "minimum": 1,
            "maximum": 999999999,
            "type": "integer",
            "description": "Maximum number of pages the crawler can output to JSON. The crawl will stop when this limit is reached. This value is useful when you only need a limited number of results."
          },
          "maxCrawlDepth": {
            "title": "Max crawling depth",
            "minimum": 1,
            "maximum": 999999999,
            "type": "integer",
            "description": "Defines how many links away from the start URLs the crawler will descend. This value is a safeguard against infinite crawling depths on misconfigured crawlers. Note that pages added using <code>enqueuePage()</code> in <b>Page function</b> are not subject to the maximum depth constraint."
          },
          "timeout": {
            "title": "Execution timeout",
            "minimum": 1,
            "maximum": 1814400,
            "type": "integer",
            "description": "This field has been deprecated and its value is ignored. To set the execution timeout, use the actor run timeout option instead.",
            "default": 604800
          },
          "resourceTimeout": {
            "title": "Resource timeout",
            "minimum": 100,
            "maximum": 1000000,
            "type": "integer",
            "description": "Timeout for network resources loaded by the crawler, in milliseconds.",
            "default": 30000
          },
          "pageLoadTimeout": {
            "title": "Page load timeout",
            "minimum": 100,
            "maximum": 1000000,
            "type": "integer",
            "description": "Timeout for web page load, in milliseconds. If the web page does not load in this time frame, it is considered to have failed and will be retried, similarly as with other page load errors.",
            "default": 60000
          },
          "pageFunctionTimeout": {
            "title": "Page function timeout",
            "minimum": 0,
            "maximum": 3600000,
            "type": "integer",
            "description": "Timeout for the asynchronous part of the <b>Page function</b>, in milliseconds. Note that this value is only applied if your page function runs code in the background, i.e. when it invokes <code>context.willFinishLater()</code>. The page function itself always runs to completion regardless of the timeout.",
            "default": 600000
          },
          "maxInfiniteScrollHeight": {
            "title": "Infinite scroll height",
            "minimum": 0,
            "maximum": 1000000,
            "type": "integer",
            "description": "Defines the maximum client height in pixels to which the browser window is scrolled in order to fetch dynamic AJAX-based content from the web server. By default, the crawler doesn't scroll and uses a fixed browser window size. Note that you might need to enable <b>Download HTML images</b> to make infinite scroll work, because otherwise the crawler wouldn't know that some resources are still being loaded and will stop infinite scrolling prematurely."
          },
          "randomWaitBetweenRequests": {
            "title": "Delay between requests",
            "minimum": 1000,
            "maximum": 1000000,
            "type": "integer",
            "description": "This option forces the crawler to ensure a minimum time interval between opening two web pages, in order to prevent it from overloading the target server. The actual minimum time is a random value drawn from a Gaussian distribution with a mean specified by your setting (in milliseconds) and a standard deviation corresponding to 25% of the mean. The minimum value is 1000 milliseconds, the crawler never issues requests in shorter intervals than 1000 milliseconds.",
            "default": 1000
          },
          "maxCrawledPagesPerSlave": {
            "title": "Max pages per IP address",
            "minimum": 1,
            "maximum": 100,
            "type": "integer",
            "description": "Maximum number of pages that a single crawling process will open before it is restarted with a new proxy server setting. This option can help avoid the blocking of the crawler by the target server and also ensures that the crawling processes don't grow too large, as they are killed periodically.",
            "default": 50
          },
          "maxParallelRequests": {
            "title": "Max parallel processes",
            "minimum": 1,
            "maximum": 100,
            "type": "integer",
            "description": "The maximum number of parallel processes that will perform the crawl. The actual number might be lower if the actor runs without enough memory. Note that each parallel process uses a different proxy (if enabled).",
            "default": 50
          },
          "maxPageRetryCount": {
            "title": "Max page retries",
            "minimum": 0,
            "maximum": 10,
            "type": "integer",
            "description": "The maximum number of times the crawler will retry to open a web page on load error. Note that on page function errors, the pages are not retried.",
            "default": 3
          },
          "customHttpHeaders": {
            "title": "Custom HTTP headers",
            "type": "array",
            "description": "Custom HTTP headers set by the crawler to all requests. It is an array of objects, where each object has the <code>key</code> and <code>value</code> properties.",
            "items": {
              "type": "object",
              "required": [
                "key",
                "value"
              ],
              "properties": {
                "key": {
                  "type": "string",
                  "title": "Key"
                },
                "value": {
                  "type": "string",
                  "title": "Value"
                }
              }
            }
          },
          "proxyConfiguration": {
            "title": "Proxy configuration",
            "type": "object",
            "description": "Specifies the type of proxy servers that will be used by the crawler in order to hide its origin."
          },
          "proxyType": {
            "title": "Proxy type (legacy)",
            "type": "string",
            "description": "Specifies the type of proxy servers that will be used by the crawler.<br><br>This is a legacy option only kept for backwards compatibility, use <b>proxyConfiguration</b> instead!"
          },
          "customProxies": {
            "title": "Custom proxies (legacy)",
            "type": "string",
            "description": "Specifies Apify Proxy groups to be used when <b>proxyType</b> is <code>CUSTOM</code>. Each proxy should be specified in the <code>scheme://user:password@host:port</code> format, multiple proxies should be separated by a space or new line. <br><br>This is a legacy option only kept for backwards compatibility - use <b>proxyConfiguration</b> instead!"
          },
          "customData": {
            "title": "Custom data",
            "description": "A custom JSON object that is passed to <b>Page function</b> and intercept request function as <code>context.customData</code>. This setting is mainly useful if you're invoking the crawler using the API, so that you can pass some arbitrary parameters to your code."
          },
          "finishWebhookUrl": {
            "title": "Finish webhook URL",
            "pattern": "^(?:(?:[hH][tT][tT][pP][sS]?):\\/\\/)(?:\\S+(?::\\S*)?@)?(?:(?!10(?:\\.\\d{1,3}){3})(?!127(?:\\.\\d{1,3}){3})(?!169\\.254(?:\\.\\d{1,3}){2})(?!192\\.168(?:\\.\\d{1,3}){2})(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-zA-Z\\u00a1-\\uffff0-9]+-?)*[a-zA-Z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-zA-Z\\u00a1-\\uffff0-9]+-?)*[a-zA-Z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-zA-Z\\u00a1-\\uffff]{2,})))(?::\\d{2,5})?(?:\\/[^\\s]*)?$",
            "maxLength": 1000,
            "type": "string",
            "description": "An HTTP endpoint that receives a POST request right after the run of this actor finishes. The POST payload is a JSON object with the following properties: <code>actorId</code>, <code>runId</code>, <code>taskId</code>, <code>datasetId</code> and <code>data</code><br><br>For more information about finish webhooks, please see the actor README."
          },
          "finishWebhookData": {
            "title": "Finish webhook data",
            "maxLength": 10000,
            "type": "string",
            "description": "Custom string that is sent in the POST payload to <b>Finish webhook URL</b>, as the <code>data</code> property. <br><br>For more information about finish webhooks, please see the actor README."
          },
          "cookiesPersistence": {
            "title": "Cookies persistence",
            "enum": [
              "PER_PROCESS",
              "PER_CRAWLER_RUN",
              "OVER_CRAWLER_RUNS"
            ],
            "type": "string",
            "description": "Indicates how cookies collected by the crawler are persisted. This is useful if you need to maintain a login.<br><br>For more information about cookies, please see the actor README.",
            "default": "PER_PROCESS"
          },
          "cookies": {
            "title": "Initial cookies",
            "type": "array",
            "description": "JSON array with cookies that the crawler starts with. This is useful for reusing a login from an external web browser. Note that if the <b>Cookies persistence</b> setting is <b>Over all crawler runs</b>, this field in the actor task configuration will be overwritten with new cookies from the crawler whenever it successfully finishes.<br><br>For more information about cookies, please see the actor README."
          }
        }
      },
      "runsResponseSchema": {
        "type": "object",
        "properties": {
          "data": {
            "type": "object",
            "properties": {
              "id": {
                "type": "string"
              },
              "actId": {
                "type": "string"
              },
              "userId": {
                "type": "string"
              },
              "startedAt": {
                "type": "string",
                "format": "date-time",
                "example": "2025-01-08T00:00:00.000Z"
              },
              "finishedAt": {
                "type": "string",
                "format": "date-time",
                "example": "2025-01-08T00:00:00.000Z"
              },
              "status": {
                "type": "string",
                "example": "READY"
              },
              "meta": {
                "type": "object",
                "properties": {
                  "origin": {
                    "type": "string",
                    "example": "API"
                  },
                  "userAgent": {
                    "type": "string"
                  }
                }
              },
              "stats": {
                "type": "object",
                "properties": {
                  "inputBodyLen": {
                    "type": "integer",
                    "example": 2000
                  },
                  "rebootCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "restartCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "resurrectCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "computeUnits": {
                    "type": "integer",
                    "example": 0
                  }
                }
              },
              "options": {
                "type": "object",
                "properties": {
                  "build": {
                    "type": "string",
                    "example": "latest"
                  },
                  "timeoutSecs": {
                    "type": "integer",
                    "example": 300
                  },
                  "memoryMbytes": {
                    "type": "integer",
                    "example": 1024
                  },
                  "diskMbytes": {
                    "type": "integer",
                    "example": 2048
                  }
                }
              },
              "buildId": {
                "type": "string"
              },
              "defaultKeyValueStoreId": {
                "type": "string"
              },
              "defaultDatasetId": {
                "type": "string"
              },
              "defaultRequestQueueId": {
                "type": "string"
              },
              "buildNumber": {
                "type": "string",
                "example": "1.0.0"
              },
              "containerUrl": {
                "type": "string"
              },
              "usage": {
                "type": "object",
                "properties": {
                  "ACTOR_COMPUTE_UNITS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_WRITES": {
                    "type": "integer",
                    "example": 1
                  },
                  "KEY_VALUE_STORE_LISTS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_INTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_EXTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_SERPS": {
                    "type": "integer",
                    "example": 0
                  }
                }
              },
              "usageTotalUsd": {
                "type": "number",
                "example": 0.00005
              },
              "usageUsd": {
                "type": "object",
                "properties": {
                  "ACTOR_COMPUTE_UNITS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_WRITES": {
                    "type": "number",
                    "example": 0.00005
                  },
                  "KEY_VALUE_STORE_LISTS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_INTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_EXTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_SERPS": {
                    "type": "integer",
                    "example": 0
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Legacy PhantomJS Crawler - Crawl websites, extract data OpenAPI definition

OpenAPI is a standard for designing and describing RESTful APIs, allowing developers to define API structure, endpoints, and data formats in a machine-readable way. It simplifies API development, integration, and documentation.

OpenAPI is effective when used with AI agents and GPTs by standardizing how these systems interact with various APIs, for reliable integrations and efficient communication.

By defining machine-readable API specifications, OpenAPI allows AI models like GPTs to understand and use varied data sources, improving accuracy. This accelerates development, reduces errors, and provides context-aware responses, making OpenAPI a core component for AI applications.

You can download the OpenAPI definitions for Legacy PhantomJS Crawler from the options below:

OpenAPI.json

If you’d like to learn more about how OpenAPI powers GPTs, read our blog post.

You can also check out our other API clients:

Legacy PhantomJS Crawler API in Python

Legacy PhantomJS Crawler API in JavaScript

Legacy PhantomJS Crawler API through CLI

Legacy PhantomJS Crawler API

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

3.1K

4.7

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

9.9K

5.0

Monitoring Reporter Dashboard

apify/monitoring-reporter-dashboard

The monitoring reporter dashboard is a part of the Apify Monitoring Suite (apify/monitoring). See its readme for more information and how to use this.

Apify

4.8

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

11K

4.9

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

99K

3.4

Send Legacy PhantomJS Crawler Results

drobnikj/send-crawler-results

This actor downloads results from Legacy PhantomJS Crawler task and sends them to email as attachments. It is designed to run from finish webhook.

Jakub Drobník

Example Puppeteer

apify/example-puppeteer

Example showing how to use headless Chromium with Puppeteer to open a web page, determine its dimensions, save a screenshot, and print the page to PDF. This actor must use images with Puppeteer (Node.js 8 + Puppeteer on Debian).

Apify

433

4.6

API / JSON scraper

pocesar/json-downloader

Scrape any API / JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. Enables you to follow pagination recursively from the payload without the need to visit the HTML page.

Paulo Cesar

540

Cloudflare Web Scraper

dtrungtin/cloudflare-web-scraper

Prevents Puppeteer from being detected as a bot in services like Cloudflare and allows you to pass captchas without any problems

Tin

154

Content Checker

jakubbalada/content-checker

Monitor a website or web page for content changes. Automatically saves before and after screenshots and sends an email notification when content changes are detected.