data:image/s3,"s3://crabby-images/e09f3/e09f33c5b1972a00d590e13bbbce1aa2367cfe3d" alt="Web Scraper avatar"
Web Scraper
No credit card required
data:image/s3,"s3://crabby-images/e09f3/e09f33c5b1972a00d590e13bbbce1aa2367cfe3d" alt="Web Scraper"
Web Scraper
No credit card required
Crawls arbitrary websites using the Chrome browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.
data:image/s3,"s3://crabby-images/484aa/484aaf7b34ccff6981c5d9dff0b001d3c17a1594" alt="competent_path avatar"
received 401 status code
I tried this with the following input:
1{ 2 "breakpointLocation": "NONE", 3 "browserLog": false, 4 "closeCookieModals": false, 5 "debugLog": false, 6 "downloadCss": false, 7 "downloadMedia": false, 8 "excludes": [ 9 { 10 "glob": "/**/*.{png,jpg,jpeg,pdf}" 11 } 12 ], 13 "globs": [ 14 { 15 "glob": "" 16 } 17 ], 18 "headless": false, 19 "ignoreCorsAndCsp": true, 20 "ignoreSslErrors": true, 21 "injectJQuery": true, 22 "keepUrlFragments": false, 23 "pageFunction": "async function pageFunction(context) {\n const $ = context.jQuery;\n return {html: $('html').first().html()};\n}", 24 "postNavigationHooks": "// We need to return array of (possibly async) functions here.\n// The functions accept a single argument: the \"crawlingContext\" object.\n[\n async (crawlingContext) => {\n // ...\n },\n]", 25 "preNavigationHooks": "// We need to return array of (possibly async) functions here.\n// The functions accept two arguments: the \"crawlingContext\" object\n// and \"gotoOptions\".\n[\n async (crawlingContext, gotoOptions) => {\n // ...\n },\n]\n", 26 "proxyConfiguration": { 27 "useApifyProxy": true, 28 "apifyProxyGroups": [ 29 "RESIDENTIAL" 30 ] 31 }, 32 "runMode": "PRODUCTION", 33 "startUrls": [ 34 { 35 "url": "https://www.wsj.com/livecoverage/stock-market-today-dow-sp500-nasdaq-live-08-07-2024/card/robinhood-reports-record-quarterly-revenue-and-profit-tIlQ0DnKKwNWFeqoRcA2", 36 "method": "GET" 37 } 38 ], 39 "useChrome": true, 40 "waitUntil": [ 41 "networkidle2" 42 ] 43}
PuppeteerCrawler: Reclaiming failed request back to the list or queue. Request blocked - received 401 status code. 2025-02-18T22:39:55.764Z {"id":"9nWDjDToDXvA6Ny","url":"https://www.wsj.com/livecoverage/stock-market-today-dow-sp500-nasdaq-live-08-07-2024/card/robinhood-reports-record-quarterly-revenue-and-profit-tIlQ0DnKKwNWFeqoRcA2","retryCount":1}
Actor Metrics
3.3k monthly users
-
456 bookmarks
>99% runs succeeded
4.8 days response time
Created in Mar 2019
Modified a month ago