Camoufox Scraper

Pricing

Pay per usage

Camoufox Scraper

Crawls websites with stealthy Camoufox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Apify

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

tsconfig.json

{
    "extends": "@apify/tsconfig",
    "compilerOptions": {
        "outDir": "dist",
        "module": "ESNext",
        "allowJs": true,
        "skipLibCheck": true
    },
    "include": ["src"]
}

Dockerfile

FROM apify/actor-node-playwright-camoufox:22 AS builder

COPY --chown=myuser package*.json ./

RUN npm install --include=dev --audit=false

COPY --chown=myuser . ./

RUN npm run build

FROM apify/actor-node-playwright-camoufox:22

COPY --from=builder --chown=myuser /home/myuser/dist ./dist

COPY --chown=myuser package*.json ./

RUN npm --quiet set progress=false \
    && npm install --omit=dev \
    && echo "Installed NPM packages:" \
    && (npm list --omit=dev --all || true) \
    && echo "Node.js version:" \
    && node --version \
    && echo "NPM version:" \
    && npm --version \
    && rm -r ~/.npm

COPY --chown=myuser . ./

ENV APIFY_DISABLE_OUTDATED_WARNING=1

CMD ./start_xvfb_and_run_cmd.sh && npm run start:prod --silent

package.json

{
    "name": "actor-camoufox-scraper",
    "version": "3.1.0",
    "private": true,
    "description": "Crawl web pages using Apify, headless browser and Playwright with Camoufox",
    "type": "module",
    "dependencies": {
        "@apify/scraper-tools": "^1.1.4",
        "@crawlee/core": "^3.14.1",
        "@crawlee/playwright": "^3.14.1",
        "@crawlee/utils": "^3.14.1",
        "apify": "^3.2.6",
        "camoufox-js": "^0.7.0",
        "idcac-playwright": "^0.1.3",
        "playwright": "*"
    },
    "devDependencies": {
        "@apify/tsconfig": "^0.1.0",
        "@types/node": "^22.7.4",
        "tsx": "^4.19.1",
        "typescript": "~5.9.0"
    },
    "scripts": {
        "start": "npm run start:dev",
        "start:prod": "node dist/main.js",
        "start:dev": "tsx src/main.ts",
        "build": "tsc"
    },
    "repository": {
        "type": "git",
        "url": "https://github.com/apify/apify-sdk-js"
    },
    "author": {
        "name": "Apify Technologies",
        "email": "support@apify.com",
        "url": "https://apify.com"
    },
    "license": "Apache-2.0",
    "homepage": "https://github.com/apify/apify-sdk-js"
}

.dockerignore

# configurations
.idea

# crawlee and apify storage folders
apify_storage
crawlee_storage
storage
dist
!./storage/key_value_stores/default/INPUT.json

# installed files
node_modules

src/main.ts

1import { runActor } from '@apify/scraper-tools';
2
3import { CrawlerSetup } from './internals/crawler_setup.js';
4
5runActor(CrawlerSetup);

.actor/actor.json

{
    "actorSpecification": 1,
    "name": "camoufox-scraper",
    "version": "0.1",
    "buildTag": "latest",
    "storages": {
        "dataset": {
            "actorSpecification": 1,
            "fields": {},
            "views": {}
        }
    }
}

src/internals/consts.ts

1import type {
2    GlobInput,
3    ProxyConfigurationOptions,
4    PseudoUrlInput,
5    RegExpInput,
6    RequestOptions,
7    Session,
8} from '@crawlee/core';
9import type { Dictionary } from '@crawlee/utils';
10
11export const enum ProxyRotation {
12    Recommended = 'RECOMMENDED',
13    PerRequest = 'PER_REQUEST',
14    UntilFailure = 'UNTIL_FAILURE',
15}
16
17/**
18 * Replicates the INPUT_SCHEMA with TypeScript types for quick reference
19 * and IDE type check integration.
20 */
21export interface Input {
22    startUrls: RequestOptions[];
23    globs: GlobInput[];
24    regexps: RegExpInput[];
25    pseudoUrls: PseudoUrlInput[];
26    excludes: GlobInput[];
27    linkSelector?: string;
28    keepUrlFragments: boolean;
29    respectRobotsTxtFile: boolean;
30    pageFunction: string;
31    preNavigationHooks?: string;
32    postNavigationHooks?: string;
33    proxyConfiguration: ProxyConfigurationOptions;
34    proxyRotation: ProxyRotation;
35    sessionPoolName?: string;
36    initialCookies: Parameters<Session['setCookies']>[0];
37    maxScrollHeightPixels: number;
38    ignoreSslErrors: boolean;
39    ignoreCorsAndCsp: boolean;
40    closeCookieModals: boolean;
41    downloadMedia: boolean;
42    maxRequestRetries: number;
43    maxPagesPerCrawl: number;
44    maxResultsPerCrawl: number;
45    maxCrawlingDepth: number;
46    maxConcurrency: number;
47    pageLoadTimeoutSecs: number;
48    pageFunctionTimeoutSecs: number;
49    waitUntil: 'networkidle' | 'load' | 'domcontentloaded';
50    debugLog: boolean;
51    browserLog: boolean;
52    customData: Dictionary;
53    datasetName?: string;
54    keyValueStoreName?: string;
55    requestQueueName?: string;
56    headless: boolean;
57    os: ('linux' | 'macos' | 'windows')[];
58    blockImages: boolean;
59    blockWebrtc: boolean;
60    blockWebgl: boolean;
61    disableCoop: boolean;
62    geoip: boolean;
63    humanize: string;
64    locale: string[];
65    fonts: string[];
66    customFontsOnly: boolean;
67    enableCache: boolean;
68    debug: boolean;
69}

src/internals/crawler_setup.ts

1import { readFile } from 'node:fs/promises';
2import { dirname } from 'node:path';
3import { fileURLToPath, URL } from 'node:url';
4
5import type {
6    AutoscaledPool,
7    EnqueueLinksOptions,
8    PlaywrightCrawlerOptions,
9    PlaywrightCrawlingContext,
10    PlaywrightLaunchContext,
11    ProxyConfiguration,
12    Request,
13} from '@crawlee/playwright';
14import {
15    Dataset,
16    KeyValueStore,
17    log,
18    PlaywrightCrawler,
19    RequestList,
20    RequestQueueV2,
21} from '@crawlee/playwright';
22import type { Awaitable, Dictionary } from '@crawlee/utils';
23import { sleep } from '@crawlee/utils';
24import type { ApifyEnv } from 'apify';
25import { Actor } from 'apify';
26import { launchOptions } from 'camoufox-js';
27import { getInjectableScript } from 'idcac-playwright';
28import type { Response } from 'playwright';
29import { firefox } from 'playwright';
30
31import type {
32    CrawlerSetupOptions,
33    RequestMetadata,
34} from '@apify/scraper-tools';
35import {
36    browserTools,
37    constants as scraperToolsConstants,
38    createContext,
39    tools,
40} from '@apify/scraper-tools';
41
42import type { Input } from './consts.js';
43import { ProxyRotation } from './consts.js';
44
45const SESSION_STORE_NAME = 'APIFY-PLAYWRIGHT-SCRAPER-SESSION-STORE';
46const REQUEST_QUEUE_INIT_FLAG_KEY = 'REQUEST_QUEUE_INITIALIZED';
47
48const { META_KEY, DEVTOOLS_TIMEOUT_SECS, SESSION_MAX_USAGE_COUNTS } =
49    scraperToolsConstants;
50const SCHEMA = JSON.parse(
51    await readFile(new URL('../../INPUT_SCHEMA.json', import.meta.url), 'utf8'),
52);
53
54/**
55 * Holds all the information necessary for constructing a crawler
56 * instance and creating a context for a pageFunction invocation.
57 */
58export class CrawlerSetup implements CrawlerSetupOptions {
59    name = 'Playwright Scraper';
60    rawInput: string;
61    env: ApifyEnv;
62    /**
63     * Used to store data that persist navigations
64     */
65    globalStore = new Map();
66    requestQueue: RequestQueueV2;
67    keyValueStore: KeyValueStore;
68    customData: unknown;
69    input: Input;
70    maxSessionUsageCount: number;
71    evaledPageFunction: (...args: unknown[]) => unknown;
72    evaledPreNavigationHooks: ((...args: unknown[]) => Awaitable<void>)[];
73    evaledPostNavigationHooks: ((...args: unknown[]) => Awaitable<void>)[];
74    devtools: boolean;
75    datasetName?: string;
76    keyValueStoreName?: string;
77    requestQueueName?: string;
78
79    crawler!: PlaywrightCrawler;
80    dataset!: Dataset;
81    pagesOutputted!: number;
82    private initPromise: Promise<void>;
83
84    constructor(input: Input) {
85        // Set log level early to prevent missed messages.
86        if (input.debugLog) log.setLevel(log.LEVELS.DEBUG);
87
88        // Keep this as string to be immutable.
89        this.rawInput = JSON.stringify(input);
90
91        // Attempt to load page function from disk if not present on input.
92        tools.maybeLoadPageFunctionFromDisk(
93            input,
94            dirname(fileURLToPath(import.meta.url)),
95        );
96
97        // Validate INPUT if not running on Apify Cloud Platform.
98        if (!Actor.isAtHome()) tools.checkInputOrThrow(input, SCHEMA);
99
100        this.input = input;
101        this.env = Actor.getEnv();
102
103        // Validations
104        this.input.pseudoUrls.forEach((purl) => {
105            if (!tools.isPlainObject(purl)) {
106                throw new Error(
107                    'The pseudoUrls Array must only contain Objects.',
108                );
109            }
110            if (purl.userData && !tools.isPlainObject(purl.userData)) {
111                throw new Error(
112                    'The userData property of a pseudoUrl must be an Object.',
113                );
114            }
115        });
116
117        this.input.initialCookies.forEach((cookie) => {
118            if (!tools.isPlainObject(cookie)) {
119                throw new Error(
120                    'The initialCookies Array must only contain Objects.',
121                );
122            }
123        });
124
125        if (
126            !/^(domcontentloaded|load|networkidle)$/.test(this.input.waitUntil)
127        ) {
128            throw new Error(
129                'Navigation wait until event must be valid. See tooltip.',
130            );
131        }
132
133        // solving proxy rotation settings
134        this.maxSessionUsageCount =
135            SESSION_MAX_USAGE_COUNTS[this.input.proxyRotation];
136
137        // Functions need to be evaluated.
138        this.evaledPageFunction = tools.evalFunctionOrThrow(
139            this.input.pageFunction,
140        );
141
142        if (this.input.preNavigationHooks) {
143            this.evaledPreNavigationHooks = tools.evalFunctionArrayOrThrow(
144                this.input.preNavigationHooks,
145                'preNavigationHooks',
146            );
147        } else {
148            this.evaledPreNavigationHooks = [];
149        }
150
151        if (this.input.postNavigationHooks) {
152            this.evaledPostNavigationHooks = tools.evalFunctionArrayOrThrow(
153                this.input.postNavigationHooks,
154                'postNavigationHooks',
155            );
156        } else {
157            this.evaledPostNavigationHooks = [];
158        }
159
160        // Start Chromium with Debugger any time the page function includes the keyword.
161        this.devtools = this.input.pageFunction.includes('debugger;');
162
163        // Named storages
164        this.datasetName = this.input.datasetName;
165        this.keyValueStoreName = this.input.keyValueStoreName;
166        this.requestQueueName = this.input.requestQueueName;
167
168        // Initialize async operations.
169        this.crawler = null!;
170        this.requestQueue = null!;
171        this.dataset = null!;
172        this.keyValueStore = null!;
173        this.initPromise = this._initializeAsync();
174    }
175
176    private async _initializeAsync() {
177        // RequestList
178        const startUrls = this.input.startUrls.map((req) => {
179            req.useExtendedUniqueKey = true;
180            req.keepUrlFragment = this.input.keepUrlFragments;
181            return req;
182        });
183
184        // KeyValueStore
185        this.keyValueStore = await KeyValueStore.open(this.keyValueStoreName);
186
187        // RequestQueue
188        this.requestQueue = await RequestQueueV2.open(this.requestQueueName);
189
190        if (
191            !(await this.keyValueStore.recordExists(
192                REQUEST_QUEUE_INIT_FLAG_KEY,
193            ))
194        ) {
195            const requests: Request[] = [];
196            for await (const request of await RequestList.open(
197                null,
198                startUrls,
199            )) {
200                if (
201                    this.input.maxResultsPerCrawl > 0 &&
202                    requests.length >= 1.5 * this.input.maxResultsPerCrawl
203                ) {
204                    break;
205                }
206                requests.push(request);
207            }
208
209            const { waitForAllRequestsToBeAdded } =
210                await this.requestQueue.addRequestsBatched(requests);
211
212            void waitForAllRequestsToBeAdded.then(async () => {
213                await this.keyValueStore.setValue(
214                    REQUEST_QUEUE_INIT_FLAG_KEY,
215                    '1',
216                );
217            });
218        }
219
220        // Dataset
221        this.dataset = await Dataset.open(this.datasetName);
222        const info = await this.dataset.getInfo();
223        this.pagesOutputted = info?.itemCount ?? 0;
224    }
225
226    /**
227     * Resolves to a `PlaywrightCrawler` instance.
228     */
229    async createCrawler() {
230        await this.initPromise;
231
232        const options: PlaywrightCrawlerOptions = {
233            requestHandler: this._requestHandler.bind(this),
234            requestQueue: this.requestQueue,
235            requestHandlerTimeoutSecs: this.devtools
236                ? DEVTOOLS_TIMEOUT_SECS
237                : this.input.pageFunctionTimeoutSecs,
238            preNavigationHooks: [],
239            postNavigationHooks: [],
240            failedRequestHandler: this._failedRequestHandler.bind(this),
241            respectRobotsTxtFile: this.input.respectRobotsTxtFile,
242            maxConcurrency: this.input.maxConcurrency,
243            maxRequestRetries: this.input.maxRequestRetries,
244            maxRequestsPerCrawl:
245                this.input.maxPagesPerCrawl === 0
246                    ? undefined
247                    : this.input.maxPagesPerCrawl,
248            proxyConfiguration: (await Actor.createProxyConfiguration(
249                this.input.proxyConfiguration,
250            )) as any as ProxyConfiguration,
251            launchContext: {
252                launcher: firefox,
253                launchOptions: await launchOptions({
254                    ...this.input,
255                    humanize: this.input.humanize
256                        ? Number(this.input.humanize)
257                        : 0,
258                }),
259            } as PlaywrightLaunchContext,
260            useSessionPool: true,
261            persistCookiesPerSession: true,
262            sessionPoolOptions: {
263                persistStateKeyValueStoreId: this.input.sessionPoolName
264                    ? SESSION_STORE_NAME
265                    : undefined,
266                persistStateKey: this.input.sessionPoolName,
267                sessionOptions: {
268                    maxUsageCount: this.maxSessionUsageCount,
269                },
270            },
271            experiments: {
272                requestLocking: true,
273            },
274        };
275
276        this._createNavigationHooks(options);
277
278        if (this.input.proxyRotation === ProxyRotation.UntilFailure) {
279            options.sessionPoolOptions!.maxPoolSize = 1;
280        }
281
282        this.crawler = new PlaywrightCrawler(options);
283
284        return this.crawler;
285    }
286
287    private _createNavigationHooks(options: PlaywrightCrawlerOptions) {
288        options.preNavigationHooks!.push(
289            async ({ request, page, session }, gotoOptions) => {
290                // Attach a console listener to get all logs from Browser context.
291                if (this.input.browserLog) browserTools.dumpConsole(page);
292
293                // Add initial cookies, if any.
294                if (
295                    this.input.initialCookies &&
296                    this.input.initialCookies.length
297                ) {
298                    const cookiesToSet = session
299                        ? tools.getMissingCookiesFromSession(
300                              session,
301                              this.input.initialCookies,
302                              request.url,
303                          )
304                        : this.input.initialCookies;
305
306                    if (cookiesToSet?.length) {
307                        // setting initial cookies that are not already in the session and page
308                        session?.setCookies(cookiesToSet, request.url);
309                        await page.context().addCookies(cookiesToSet);
310                    }
311                }
312
313                if (gotoOptions) {
314                    gotoOptions.timeout =
315                        (this.devtools
316                            ? DEVTOOLS_TIMEOUT_SECS
317                            : this.input.pageLoadTimeoutSecs) * 1000;
318                    gotoOptions.waitUntil = this.input.waitUntil;
319                }
320            },
321        );
322
323        options.preNavigationHooks!.push(
324            ...this._runHookWithEnhancedContext(this.evaledPreNavigationHooks),
325        );
326        options.postNavigationHooks!.push(
327            ...this._runHookWithEnhancedContext(this.evaledPostNavigationHooks),
328        );
329    }
330
331    private _runHookWithEnhancedContext(
332        hooks: ((...args: unknown[]) => Awaitable<void>)[],
333    ) {
334        return hooks.map((hook) => (ctx: Dictionary, ...args: unknown[]) => {
335            const { customData } = this.input;
336            return hook({ ...ctx, Apify: Actor, Actor, customData }, ...args);
337        });
338    }
339
340    private async _failedRequestHandler({
341        request,
342    }: PlaywrightCrawlingContext) {
343        const lastError =
344            request.errorMessages[request.errorMessages.length - 1];
345        const errorMessage = lastError ? lastError.split('\n')[0] : 'no error';
346        log.error(
347            `Request ${request.url} failed and will not be retried anymore. Marking as failed.\nLast Error Message: ${errorMessage}`,
348        );
349        return this._handleResult(request, undefined, undefined, true);
350    }
351
352    /**
353     * First of all, it initializes the state that is exposed to the user via
354     * `pageFunction` context.
355     *
356     * Then it invokes the user provided `pageFunction` with the prescribed context
357     * and saves its return value.
358     *
359     * Finally, it makes decisions based on the current state and post-processes
360     * the data returned from the `pageFunction`.
361     */
362    private async _requestHandler(crawlingContext: PlaywrightCrawlingContext) {
363        const { request, response, crawler } = crawlingContext;
364
365        /**
366         * PRE-PROCESSING
367         */
368        // Make sure that an object containing internal metadata
369        // is present on every request.
370        tools.ensureMetaData(request);
371
372        // Abort the crawler if the maximum number of results was reached.
373        const aborted = await this._handleMaxResultsPerCrawl(
374            crawler.autoscaledPool,
375        );
376        if (aborted) return;
377
378        const pageFunctionArguments: Dictionary = {};
379
380        // We must use properties and descriptors not to trigger getters / setters.
381        Object.defineProperties(
382            pageFunctionArguments,
383            Object.getOwnPropertyDescriptors(crawlingContext),
384        );
385
386        pageFunctionArguments.response = {
387            status: response && response.status(),
388            headers: response && response.headers(),
389        };
390
391        // Setup and create Context.
392        const contextOptions = {
393            crawlerSetup: {
394                rawInput: this.rawInput,
395                env: this.env,
396                globalStore: this.globalStore,
397                requestQueue: this.requestQueue,
398                keyValueStore: this.keyValueStore,
399                customData: this.input.customData,
400            },
401            pageFunctionArguments,
402        };
403        const { context, state } = createContext(contextOptions);
404
405        if (this.input.closeCookieModals) {
406            await sleep(500);
407            await crawlingContext.page.evaluate(getInjectableScript());
408            await sleep(2000);
409        }
410
411        if (this.input.maxScrollHeightPixels > 0) {
412            await crawlingContext.infiniteScroll({
413                maxScrollHeight: this.input.maxScrollHeightPixels,
414            });
415        }
416
417        /**
418         * USER FUNCTION INVOCATION
419         */
420        const pageFunctionResult = await this.evaledPageFunction(context);
421
422        /**
423         * POST-PROCESSING
424         */
425        // Enqueue more links if Pseudo URLs and a link selector are available,
426        // unless the user invoked the `skipLinks()` context function
427        // or maxCrawlingDepth would be exceeded.
428        if (!state.skipLinks) await this._handleLinks(crawlingContext);
429
430        // Save the `pageFunction`s result to the default dataset.
431        await this._handleResult(
432            request,
433            response,
434            pageFunctionResult as Dictionary,
435        );
436    }
437
438    private async _handleMaxResultsPerCrawl(autoscaledPool?: AutoscaledPool) {
439        if (
440            !this.input.maxResultsPerCrawl ||
441            this.pagesOutputted < this.input.maxResultsPerCrawl
442        )
443            return false;
444        if (!autoscaledPool) return false;
445        log.info(
446            `User set limit of ${this.input.maxResultsPerCrawl} results was reached. Finishing the crawl.`,
447        );
448        await autoscaledPool.abort();
449        return true;
450    }
451
452    private async _handleLinks({
453        request,
454        enqueueLinks,
455    }: PlaywrightCrawlingContext) {
456        if (!this.requestQueue) return;
457        const currentDepth = (request.userData![META_KEY] as RequestMetadata)
458            .depth;
459        const hasReachedMaxDepth =
460            this.input.maxCrawlingDepth &&
461            currentDepth >= this.input.maxCrawlingDepth;
462        if (hasReachedMaxDepth) {
463            log.debug(
464                `Request ${request.url} reached the maximum crawling depth of ${currentDepth}.`,
465            );
466            return;
467        }
468
469        const enqueueOptions: EnqueueLinksOptions = {
470            globs: this.input.globs,
471            pseudoUrls: this.input.pseudoUrls,
472            exclude: this.input.excludes,
473            transformRequestFunction: (requestOptions) => {
474                requestOptions.userData ??= {};
475                requestOptions.userData[META_KEY] = {
476                    parentRequestId: request.id || request.uniqueKey,
477                    depth: currentDepth + 1,
478                };
479
480                requestOptions.useExtendedUniqueKey = true;
481                requestOptions.keepUrlFragment = this.input.keepUrlFragments;
482                return requestOptions;
483            },
484        };
485
486        if (this.input.linkSelector) {
487            await enqueueLinks({
488                ...enqueueOptions,
489                selector: this.input.linkSelector,
490            });
491        }
492    }
493
494    private async _handleResult(
495        request: Request,
496        response?: Response,
497        pageFunctionResult?: Dictionary,
498        isError?: boolean,
499    ) {
500        const payload = tools.createDatasetPayload(
501            request,
502            response,
503            pageFunctionResult,
504            isError,
505        );
506        await this.dataset.pushData(payload);
507        this.pagesOutputted++;
508    }
509}

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

3.8K

5.0

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

10K

4.8

Stealth Scraper

lolio9/stealth-scraper

A stealthy, headless browser-based scraper that mimics human behavior to avoid detection. Automatically saves every visited HTML page and downloadable file, incrementally archiving progress. Perfect for large websites, internal networks, or compliance-sensitive environments.

Marcus

Scrape And Bypass Any Url Using Scrappey

dormic/apify-scrappey

A template for scraping data from web pages using the Scrappey.com API service integrated with an Apify Actor. This actor provides a robust solution for handling complex web scraping scenarios, including sites with anti-bot protection such as Cloudflare, Datadome, PerimeterX and all other forms.

Pim

136

5.0

🛡️⚡ Cloudflare Scraper - Bypass All Captchas

neatrat/cloudflare-scraper

Updated June 2025, No proxies needed! A powerful web scraper that bypasses Cloudflare protection.

Neatrat

5.0

Cloudflare Bypass Scraper Pro

xtech/cloudflare-scraper-pro

Cloudflare Scraper Pro: The ultimate solution for scraping Cloudflare-protected websites. Advanced browser automation with intelligent Turnstile & CAPTCHA bypass, automatic Cloudflare challenge resolution, and robust proxy rotation to extract data from the most heavily protected sites.

Xtech

bcv-tasa-oficial

grupoaceivzla/bcv-tasa-oficial

Grupo ACEI

Scraper Api

zfcsoftware/scraper-api

This api allows you to scrape sites such as websites with rate limits, websites with protection such as Cloudflare. It is cheap and fast. It uses trusted proxies and works for most sites. The ip address used is not used again, a reliable ip is used for each request.