Deprecated

Pricing

Pay per usage

See alternative Actors

Go to Apify Store

Phantom.js Scraper

Deprecated

See alternative Actors

PhantomJS is 6 to 10 times faster than puppeteer per Compute Unit. Sends an email when the task is complete. The input screen has been improved. Note: PhantomJS is no longer being developed and might be detected and blocked by websites.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Barry Schneider

Actor stats

Bookmarked

Total users

Monthly active users

5 years ago

Last modified

Categories

Developer tools

Open source

Start URLs

startUrls

Required

List of URLs that will be loaded by the crawler on start. For a POST request, append [POST] to the URL, e.g. http://www.example.com/[POST]

Type:array

Min. items:1

Pseudo-URLs

crawlPurls

Optional

Specifies URLs of pages to crawl. Put regular expressions in [ ] brackets, e.g. http://www.example.com/[.*]

Type:array | null

Clickable elements

clickableElementsSelector

Optional

CSS selector used to find links to other web pages. Leave empty to ignore all links.

For example: a[href]

Type:string | null

Proxy configuration

proxyConfiguration

Optional

Specifies the type of proxy servers that will be used by the crawler in order to hide its origin.

Type:object

Default:

{
  "useApifyProxy": true
}

Inject jQuery

injectJQuery

Optional

Indicates that the jQuery library should be injected into each page before Page function is invoked. Note that the jQuery object will not be registered into global namespace in order to avoid conflicts with libraries used by the web page. It can only be accessed through context.jQuery.

Type:boolean | null

Default:true

Inject Underscore.js

injectUnderscoreJs

Optional

Indicates that the Underscore.js library should be injected into each page before Page function is invoked. Note that the Underscore object will not be registered into global namespace in order to avoid conflicts with libraries used by the web page. It can only be accessed through context.underscoreJs.

Type:boolean | null

Default:false

Send email notification to account owner when task ends

sendEmailNotification

Optional

Send email notification to account owner's email address when task end's successful, fail, times out, or aborted.

Type:boolean | null

Default:true

Send download link for Simplified

databaseSimpleEmail

Optional

Send email to account owner's email address with link to download dataset in excel file using simplified format.

Type:boolean | null

Default:true

Name of Dataset

datasetName

Optional

Name to use for Dataset. no spaces or special characters

Type:string | null

Max. length:50

Custom data

customData

Optional

A custom JSON object that is passed to Page function and intercept request function as context.customData. This setting is mainly useful if you're invoking the crawler using the API, so that you can pass some arbitrary parameters to your code.

Page function

pageFunction

Optional

JavaScript function that is executed on every crawled page, use it to extract data. Note that only ES5.1 syntax is supported.

Type:string | null

Intercept request function

interceptRequest

Optional

JavaScript function called whenever the crawler finds a link or form leading to a new web page. Note that only ES5.1 syntax is supported

Type:string | null

Verbose log

verboseLog

Optional

If enabled, the log will also contain DEBUG messages. Note that this setting will dramatically slow down the crawler as well as your web browser and increase the log size.

Type:boolean | null

Default:false

Download HTML images

loadImages

Optional

Indicates whether the crawler should load HTML images, both those included using the <img> tag as well as those included in CSS styles. Disable this feature after you have fine-tuned your crawler in order to increase crawling performance and reduce your bandwidth costs.

Type:boolean | null

Default:false

Download CSS files

loadCss

Optional

Indicates whether the crawler should load CSS stylesheet files. Disable this feature after you have fine-tuned your crawler in order to increase crawling performance and reduce your bandwidth costs.

Type:boolean | null

Default:false

Ignore robots exclusion standards

ignoreRobotsTxt

Optional

Indicates that the crawler should ignore robots.txt, <meta name='robots'> tags and X-Robots-Tag HTTP headers. Use this feature at your own risk!

Type:boolean | null

Default:false

Don't load frames and IFRAMEs

skipLoadingFrames

Optional

Indicates that child frames included using FRAME or IFRAME tags will not be loaded by the crawler. This might improve crawling performance. As a side-effect, JavaScript redirects issued by the page before it was completely loaded will not be performed, which might be useful in certain situations.

Type:boolean | null

Default:false

URL #fragments identify unique pages

considerUrlFragment

Optional

Indicates that the URL fragment identifier (i.e. http://example.com/page#this-guy-here) should be considered when matching a URL against a Pseudo-URL or when checking whether a page has already been visited. Typically, URL fragments are used as internal page anchors and therefore they should be ignored because they don't represent separate pages. However, many AJAX-based website nowadays use URL fragment to represent page parameters; in such cases, this option should be enabled.

Type:boolean | null

Default:false

Disable web security

disableWebSecurity

Optional

If checked, the virtual browser will allow cross-domain XHRs and untrusted SSL certificates, so that your crawler can access content from any domain. Only activate this feature if you know what you're doing!

Type:boolean | null

Default:false

Rotate User-Agent headers

rotateUserAgents

Optional

If checked, the crawler automatically rotates the User-Agent HTTP header for each new IP address, from a pre-defined list. This settings overwrites User-Agent set in Custom HTTP headers.

Type:boolean | null

Default:false

Max pages per crawl

maxCrawledPages

Optional

Maximum number of pages that the crawler will open. The crawl will stop when this limit is reached. Always set this value in order to prevent infinite loops in misconfigured crawlers. Note that in cases of parallel crawling, the actual number of pages visited might be slightly higher than this value.

Type:integer | null

Minimum:1

Maximum:999999999

Max result records

maxOutputPages

Optional

Maximum number of pages the crawler can output to JSON. The crawl will stop when this limit is reached. This value is useful when you only need a limited number of results.

Type:integer | null

Minimum:1

Maximum:999999999

Max crawling depth

maxCrawlDepth

Optional

Defines how many links away from the start URLs the crawler will descend. This value is a safeguard against infinite crawling depths on misconfigured crawlers. Note that pages added using enqueuePage() in Page function are not subject to the maximum depth constraint.

Type:integer | null

Minimum:1

Maximum:999999999

Execution timeout

timeout

Optional

This field has been deprecated and its value is ignored. To set the execution timeout, use the actor run timeout option instead.

Type:integer | null

Minimum:1

Maximum:1814400

Default:604800

Resource timeout

resourceTimeout

Optional

Timeout for network resources loaded by the crawler, in seconds.

Type:integer | null

Minimum:1

Maximum:3600

Default:60

Page load timeout

pageLoadTimeout

Optional

Timeout for web page load, in seconds. If the web page does not load in this time frame, it is considered to have failed and will be retried, similarly as with other page load errors.

Type:integer | null

Minimum:1

Maximum:3600

Default:60

Page function timeout

pageFunctionTimeout

Optional

Timeout for the asynchronous part of the Page function, in seconds. Note that this value is only applied if your page function runs code in the background, i.e. when it invokes context.willFinishLater(). The page function itself always runs to completion regardless of the timeout.

Type:integer | null

Minimum:1

Maximum:3600

Default:60

Infinite scroll height

maxInfiniteScrollHeight

Optional

Defines the maximum client height in pixels to which the browser window is scrolled in order to fetch dynamic AJAX-based content from the web server. By default, the crawler doesn't scroll and uses a fixed browser window size. Note that you might need to enable Download HTML images to make infinite scroll work, because otherwise the crawler wouldn't know that some resources are still being loaded and will stop infinite scrolling prematurely.

Type:integer | null

Minimum:0

Maximum:1000000

Delay between requests

randomWaitBetweenRequests

Optional

This option forces the crawler to ensure a minimum time interval between opening two web pages, in order to prevent it from overloading the target server. The actual minimum time is a random value drawn from a Gaussian distribution with a mean specified by your setting (in seconds) and a standard deviation corresponding to 25% of the mean. The minimum value is 1 second, the crawler never issues requests in shorter intervals than 1000 seconds.

Type:integer | null

Minimum:1

Maximum:3600

Default:1

Max pages per IP address

maxCrawledPagesPerSlave

Optional

Maximum number of pages that a single crawling process will open before it is restarted with a new proxy server setting. This option can help avoid the blocking of the crawler by the target server and also ensures that the crawling processes don't grow too large, as they are killed periodically.

Type:integer | null

Minimum:1

Maximum:100

Default:50

Max parallel processes

maxParallelRequests

Optional

The maximum number of parallel processes that will perform the crawl. The actual number might be lower if the actor runs without enough memory. Note that each parallel process uses a different proxy (if enabled).

Type:integer | null

Minimum:1

Maximum:100

Default:50

Max page retries

maxPageRetryCount

Optional

The maximum number of times the crawler will retry to open a web page on load error. Note that on page function errors, the pages are not retried.

Type:integer | null

Minimum:0

Maximum:10

Default:3

Custom HTTP headers

customHttpHeaders

Optional

Custom HTTP headers set by the crawler to all requests. It is an array of objects, where each object has the key and value properties.

Type:array | null

Proxy type (legacy)

proxyType

Optional

Specifies the type of proxy servers that will be used by the crawler.

This is a legacy option only kept for backwards compatibility, use proxyConfiguration instead!

Type:string | null

Proxy groups (legacy)

proxyGroups

Optional

Specifies Apify Proxy groups to be used when proxyType is SELECTED_PROXY_GROUPS.

This is a legacy option only kept for backwards compatibility - use proxyConfiguration instead!

Type:array | null

Default:

[]

Custom proxies (legacy)

customProxies

Optional

Specifies Apify Proxy groups to be used when proxyType is CUSTOM. Each proxy should be specified in the scheme://user:password@host:port format, multiple proxies should be separated by a space or new line.

This is a legacy option only kept for backwards compatibility - use proxyConfiguration instead!

Type:string | null

Finish webhook URL

finishWebhookUrl

Optional

An HTTP endpoint that receives a POST request right after the run of this actor finishes. The POST payload is a JSON object with the following properties: actorId, runId, taskId, datasetId and data

For more information about finish webhooks, please see the actor README.

Type:string | null

Max. length:1000

Finish webhook data

finishWebhookData

Optional

Custom string that is sent in the POST payload to Finish webhook URL, as the data property.

For more information about finish webhooks, please see the actor README.

Type:string | null

Max. length:10000

Cookies persistence

cookiesPersistence

Optional

Indicates how cookies collected by the crawler are persisted. This is useful if you need to maintain a login.

For more information about cookies, please see the actor README.

Type:string | null

Default:PER_PROCESS

Options:

PER_PROCESSPER_CRAWLER_RUNOVER_CRAWLER_RUNS

Initial cookies

Optional

JSON array with cookies that the crawler starts with. This is useful for reusing a login from an external web browser. Note that if the Cookies persistence setting is Over all crawler runs, this field in the actor task configuration will be overwritten with new cookies from the crawler whenever it successfully finishes.

For more information about cookies, please see the actor README.

Type:array | null

Legacy PhantomJS Crawler

apify/legacy-phantomjs-crawler

Replacement for the legacy Apify Crawler product with a backward-compatible interface. The Actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.

Apify

875

5.0

(6)

Send Legacy PhantomJS Crawler Results

drobnikj/send-crawler-results

This actor downloads results from Legacy PhantomJS Crawler task and sends them to email as attachments. It is designed to run from finish webhook.

Jakub Drobník

Modern Web Crawler — Adaptive + Stealth + Analytics

brilliant_gum/phantom-reborn-crawler

Modern replacement for the Legacy PhantomJS Crawler. Auto HTTP/Browser detection, basic anti-bot stealth, built-in analytics, data quality scoring, captcha solver integration. Modern Chrome + Cheerio engine — no PhantomJS, no abandoned tech. Proxies included.

Yuliia Kulakova

Forward dataset as POST data

anchor/forward-dataset-webhook

This actor forwards the results of an Actor to an endpoint, instead of having to fetch the results manually. It will download the dataset and attach it to the body of a POST request you will specify. It acts as a new webhook. Simplify your Actor process !!!

Anchor

5.0

(4)

Firestore Import

drobnikj/firestore-import

Imports dataset items to Firestone DB.

Jakub Drobník

PHANTOM LEADS - Google Maps Business Scraper

renaissant_overload/phantom-leads-google-maps-business-scraper

Scrape Google Maps business leads at scale: name, address, phone, website, email, social media, GPS coordinates, rating & reviews. Export CSV/JSON for cold email, CRM enrichment & sales prospecting. Multi-query, all countries & languages. Built for sales teams, agencies & B2B lead gen.

NABIL

France BODACC Insolvency & Legal Announcements Scraper

scrapers_lat/france-bodacc-scraper

Extract French official legal announcements: company registrations, sales and transfers, insolvency (procedure collective) filings and account deposits. Get company name, SIREN, court, dates and parsed announcement details. Export to JSON, CSV or Excel.

Scrapers Lat

LinkedIn Profile Search Scraper No Cookies ✅ Find all people 📧

harvestapi/linkedin-profile-search

Search for LinkedIn profiles with filters and extract detailed profile information, including work experience, education history, location and more. No cookies or account required.

HarvestAPI

31K

4.8

(80)

Google Keyword Suggestions Scraper

powerai/google-keywords-suggest-scraper

Get Google keyword suggestions and insights including search volume, competition level, and bid estimates for any keyword.

PowerAI

200

RSS Feed Scraper — Atom, Podcast & Multi-Feed

devilscrapes/rss-feed-scraper

Parse and convert any RSS or Atom feed to a clean dataset — title, link, author, published date, summary, full HTML content, tags, GUID — export to JSON or CSV. A drop-in RSS feed parser for RSS 2.0, Atom 1.0, and the content:encoded / dc:creator extensions.

DevilScrapes

Instagram Influencer Engagement Scraper

phantom_coder/instagram-engagement-scraper

Extract engagement metrics from public Instagram profiles. Get feed engagement rate, Reels views, and follower stats. Covers up to 72 posts and 36 Reels. No login needed. From $0.02 per result.