Website Content Crawler avatar
Website Content Crawler
Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
UY

I get lots of: ERROR AdaptiveCrawler: Request failed and reached maximum retries. Error: Attribute selector didn't terminate

Closed

upright_yoga opened this issue
2 months ago

For almost any webpage I get this output. File downloads seem to work fine.

UY

upright_yoga

2 months ago

Tried with more standard search terms

2024-07-16T20:27:25.659Z INFO AdaptiveCrawler: Running browser request handler for https://brainee.hnonline.sk/notsorry/news/veda/ako-na-to/27194611-cestovanie-v-case-je-podla-fyzika-skutocne-mozne-treba-vsak-zohladnit-tuto-jednu-vec 2024-07-16T20:27:28.531Z INFO AdaptiveCrawler:Statistics: AdaptiveCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":421325,"retryHistogram":[]} 2024-07-16T20:27:28.534Z INFO HttpCrawler:Statistics: HttpCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":420164,"retryHistogram":[]} 2024-07-16T20:27:28.820Z INFO AdaptiveCrawler:AutoscaledPool: state {"currentConcurrency":1,"desiredConcurrency":1,"systemStatus":{"isSystemIdle":false,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0.139},"cpuInfo":{"isOverloaded":true,"limitRatio":0.4,"actualRatio":0.751},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}} 2024-07-16T20:27:28.823Z INFO HttpCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":1,"systemStatus":{"isSystemIdle":false,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.7,"actualRatio":0.075},"cpuInfo":{"isOverloaded":true,"limitRatio":0.4,"actualRatio":0.751},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}} 2024-07-16T20:27:55.667Z WARN AdaptiveCrawler: Reclaiming failed request back to the list or queue. Attribute selector didn't terminate 2024-07-16T20:27:55.669Z at processHtml (file:///home/myuser/dist/html-processing.js:38:15) {"id":"M7Xgd2FcwswWEd3","url":"https://brainee.hnonline.sk/notsorry/news/veda/ako-na-to/27194611-cestovanie-v-case-je-podla-fyzika-skutocne-mozne-treba-vsak-zohladnit-tuto-jednu-vec","retryCount":2} 2024-07-16T20:27:55.857Z INFO AdaptiveCrawler: Running browser request handler for https://hashtag.zoznam.sk/5-sposobov-ako-cestovat-casom/ 2024-07-16T20:28:02.248Z INFO AdaptiveCrawler: Running browser request handler for https://icjk.sk/233/Rodicia-Jana-Kuciaka-pat-rokov-po-vrazde-Janko-svoju-pracu-robil-poctivo-a-nikomu-vedome-neublizoval-to-by-malo-byt-kredo-aj-pre-ostatnych-novinarov 2024-07-16T20:28:28.541Z INFO AdaptiveCrawler:Statistics: AdaptiveCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":481335,"retryHistogram":[]} 2024-07-16T20:28:28.543Z INFO HttpCrawler:Statistics: HttpCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":480174,"retryHistogram":[]} 2024-07-16T20:28:28.863Z INFO AdaptiveCrawler:AutoscaledPool: state {"currentConcurrency":2,"desiredConcurrency":1,"systemStatus":{"isSystemIdle":false,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0.033},"cpuInfo":{"isOverloaded":true,"limitRatio":0.4,"actualRatio":0.805},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}} 2024-07-16T20:28:28.865Z INFO HttpCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":1,"systemStatus":{"isSystemIdle":false,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.7,"actualRatio":0},"cpuInfo":{"isOverloaded":true,"limitRatio":0.4,"actualRatio":0.805},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}} 2024-07-16T20:28:42.362Z WARN AdaptiveCrawler: Reclaiming failed request back to the list or queue. Attribute selector didn't terminate 2024-07-16T20:28:42.365Z at processHtml (file:///home/myuser/dist/html-processing.js:38:15) {"id":"dSrYeJr4X4NfY8n","url":"https://hashtag.zoznam.sk/5-sposobov-ako-cestovat-casom/","retryCount":3} 2024-07-16T20:28:48.854Z WARN AdaptiveCrawler: Reclaiming failed request back to the list or queue. Attribute selector didn't terminate 2024-07-16T20:28:48.856Z at processHtml (file:///home/myuser/dist/html-processing.js:38:15) {"id":"YGBj5kvwgkRMfT8","url":"https://icjk.sk/233/Rodicia-Jana-Kuciaka-pat-rokov-po-vrazde-Janko-svoju-pracu-robil-poctivo-a-nikomu-vedome-neublizoval-to-by-malo-byt-kredo-aj-pre-ostatnych-novinarov","retryCount":3} 2024-07-16T20:28:49.186Z INFO AdaptiveCrawler: Running browser request handler for https://avmania.zive.cz/nejlepsi-filmy-o-cestovani-casem

UY

upright_yoga

2 months ago

Edit, I found the issue. There was a small error with the json I send - will leave it here

jindrich.bar avatar

Hello, and thank you for your interest in this Actor!

We are glad you found the issue. Attribute selector didn't terminate most likely refers to a malformed Attribute CSS selector - this usually happens with wrong use of quotation marks (e.g. in JSON strings, those must be escaped with \, otherwise they collide with the JSON string delimiters).

I'll close this issue now, but feel free to ask additional questions, if you have any. Cheers!

Developer
Maintained by Apify
Actor metrics
  • 2.8k monthly users
  • 434 stars
  • 99.9% runs succeeded
  • 2.9 days response time
  • Created in Mar 2023
  • Modified 3 days ago