Camoufox Scraper avatar

Camoufox Scraper

Try for free

No credit card required

Go to Store
Camoufox Scraper

Camoufox Scraper

josef.prochazka/camoufox-scraper
Try for free

No credit card required

Simple actor that uses Playwright with Camoufox to test if a specific website blocking mechanisms can be bypassed by using Camoufox.

Start URLs

startUrlsarrayRequired

A static list of URLs to scrape.

Max crawling depth

maxCrawlingDepthintegerOptional

Specifies how many links away from the Start URLs the scraper will descend. Note that pages added using context.request_queue in Page function are not subject to the maximum depth constraint.

Default value of this property is 1

Max requests per crawl

maxRequestsPerCrawlintegerOptional

Crawler will stop after processing this amount of requests.

Default value of this property is 1

Request timeout

requestTimeoutintegerOptional

The maximum duration (in seconds) for the request to complete before timing out. The timeout value is passed to the httpx.AsyncClient object.

Default value of this property is 30

Link selector

linkSelectorstringOptional

A CSS selector stating which links on the page (<a> elements with href attribute) shall be followed and added to the request queue. To filter the links added to the queue, use the Link patterns field.

If the Link selector is empty, the page links are ignored. Of course, you can work with the page links and the request queue in the Page function as well.

Link patterns

linkPatternsarrayOptional

Link patterns (regular expressions) to match links in the page that you want to enqueue. Combine with Link selector to tell the scraper where to find links. Omitting the link patterns will cause the scraper to enqueue all links matched by the Link selector.

Page function

pageFunctionstringRequired

A Python function, that is executed for every page. Use it to scrape data from the page, perform actions or add new URLs to the request queue. The page function has its own naming scope and you can import any installed modules. Typically you would want to obtain the data from the context.soup object and return them. Identifier page_function can't be changed. For more information about the context object you get into the page_function check the github.com/apify/actor-beautifulsoup-scraper#context. Asynchronous functions are supported.

Proxy configuration

proxyConfigurationobjectRequired

Specifies proxy servers that will be used by the scraper in order to hide its origin.

Default value of this property is {"useApifyProxy":true}

Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 0 No stars yet

  • >99% runs succeeded

  • Created in Feb 2025

  • Modified a day ago