Keywords Extractor avatar
Keywords Extractor
Try for free

No credit card required

View all Actors
Keywords Extractor

Keywords Extractor

lukaskrivka/keywords-extractor
Try for free

No credit card required

Use our free website keyword extractor to crawl any website and extract keyword counts on each page.

Start URLs

startUrlsarrayRequired

A static list of URLs to scrape. To be able to add new URLs on the fly, enable the Use request queue option.

For details, see Start URLs in README.

Use Browser

useBrowserbooleanOptional

If on, it will use regular borwser for scraping.

Default value of this property is false

Keywords

keywordsarrayRequired

List of keywords to search and count on every page

Case sensitive

caseSensitivebooleanOptional

If on, it will only match keywords with exact upper or lower case.

Default value of this property is false

Scan scripts

scanScriptsbooleanOptional

If on, it will also count keywords appearing inside scripts.

Default value of this property is false

Link selector

linkSelectorstringOptional

A CSS selector saying which links on the page (<a> elements with href attribute) shall be followed and added to the request queue. This setting only applies if Use request queue is enabled. To filter the links added to the queue, use the Pseudo-URLs setting.

If Link selector is empty, the page links are ignored.

For details, see Link selector in README.

Pseudo-URLs

pseudoUrlsarrayOptional

Specifies what kind of URLs found by Link selector should be added to the request queue. A pseudo-URL is a URL with regular expressions enclosed in [] brackets, e.g. http://www.example.com/[.*]. This setting only applies if the Use request queue option is enabled.

If Pseudo-URLs are omitted, the actor enqueues all links matched by the Link selector.

For details, see Pseudo-URLs in README.

Default value of this property is []

Max depth

maxDepthintegerOptional

How many links deep from the Start URLs do you want to crawl. Start URLs have depth 0.

Default value of this property is 5

Proxy configuration

proxyConfigurationobjectOptional

Specifies proxy servers that will be used by the scraper in order to hide its origin.

For details, see Proxy configuration in README.

Default value of this property is {}

Max pages per run

maxPagesPerCrawlintegerOptional

The maximum number of pages that the scraper will load. The scraper will stop when this limit is reached. It's always a good idea to set this limit in order to prevent excess platform usage for misconfigured scrapers. Note that the actual number of pages loaded might be slightly higher than this value.

If set to 0, there is no limit.

Default value of this property is 100

Max concurrency

maxConcurrencyintegerOptional

Specified the maximum number of pages that can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. This option enables you to set an upper limit, for example to reduce the load on a target website.

Default value of this property is 50

Retire Instance After Request Count

retireInstanceAfterRequestCountintegerOptional

How often will the browser itself rotate. Pick higher for smaller consumption, pick less to rotate (test) more proxies

Default value of this property is 50

Use Chrome

useChromebooleanOptional

Only works for puppeteer type. Be careful that Chrome is not guaranteed to work with Puppeteer.

Default value of this property is false

Wait for

waitForstringOptional

Only works for puppeteer type. Will wait on each page. You can provide number in ms or a selector.

Developer
Maintained by Community
Actor metrics
  • 23 monthly users
  • 100.0% runs succeeded
  • Created in Mar 2020
  • Modified about 3 years ago
Categories