Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
MM

faild!!!!!!!!!!!!!!!!

Closed

malachite_malachite opened this issue
2 months ago

2024-10-09T17:23:43.858Z ACTOR:从存储库中提取构建 WFRlsqlrc8RY1YN5I 的 Docker 映像。 2024-10-09T17:23:44.000Z参与者:创建 Docker 容器。 2024-10-09T17:23:44.100Z ACTOR:启动 Docker 容器。 2024-10-09T17:23:44.972Z启动 X 虚拟帧缓冲区使用:Xvfb :99 -ac -screen 0 1920x1080x24+32 -nolisten tcp 2024-10-09T17:23:44.987Z正在执行主命令 2024-10-09T17:23:47.080Z INFO 系统信息{"apifyVersion":"3.2.5","apifyClientVersion":"2.9.5","crawleeVersion":"3.11.5","osType":"Linux","nodeVersion":"v22.9.0"} 2024-10-09T17:23:47.379Z INFO 从起始 URL 发现可能的站点地图文件... 2024-10-09T17:23:56.653Z INFO AdaptiveCrawler:启动爬虫。 2024-10-09T17:23:56.862Z INFO AdaptiveCrawler:正在运行https://www.apollo.io/features的浏览器请求处理程序 2024-10-09T17:23:56.873Z INFO AdaptiveCrawler:正在运行https://www.apollo.io/customers的浏览器请求处理程序 2024-10-09T17:24:00.459Z WARN AdaptiveCrawler:将失败的请求重新回收到列表或队列。browserContext.addCookies:cookies [0] 。sameSite:预期为(Strict | Lax | None)之一 2024-10-09T17:24:00.461Z ,位于 crawlerOptions.preNavigationHooks (/home/myuser/dist/input.js:147:47) {"id":"AVizYriGowLrYJl","url":" https://www.apollo.io/features ","re​​tryCount":1} 2024-10-09T17:24:01.022Z WARN AdaptiveCrawler:将失败的请求重新回收到列表或队列。browserContext.addCookies:cookies [0] 。sameSite:预期为(Strict | Lax | None)之一 2024-10-09T17:24:01.025Z at crawlerOptions.preNavigationHooks (/home/myuser/dist/input.js:147:47) {“id”:“X25j5DxYkOPeQet”,“url”:“ https://www.apollo.io/customers ”,“retryCount”:1} 2024-10-09T17:24:01.331Z INFO AdaptiveCrawler:正在运行https://www.apollo.io/的浏览器请求处理程序 2024-10-09T17:24:04.548Z ... [trimmed]

MM

malachite_malachite

2 months ago

God, can you help me try, can your actor capture the hidden mailbox inside! ! If it is possible, please teach me! ! I spent 3 hours and still can't figure out how to use it

Developer
Maintained by Apify

Actor Metrics

  • 3.8k monthly users

  • 721 stars

  • >99% runs succeeded

  • 2.2 days response time

  • Created in Mar 2023

  • Modified 2 days ago