
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
3.6 (39)
Pricing
Pay per usage
1410
Total users
54K
Monthly users
8K
Runs succeeded
>99%
Issues response
7.6 days
Last modified
3 days ago
Crawler can not get all the text content
Closed
I tried to use python Apify to crawl the URL: "https://roundrockisd.org/graduation/top-10/whs-2023-top-10/", however, I can only get 8 students information and the text is cut-off.
I checked other issues mentioning similar problems, all the suggested solutions are to update the parameter "htmlTransformer" to "none", but it does not work for me.
Here is the end of scraping results.
"Jyotsna Arunkumar – No. 8What campuses in Round Rock ISD did you attend?Canyon Vista Middle School, and Westwood High School.What’s the best memory you have of your time in Round Rock ISD? Why?The FBLA State trip to Galveston my senior year. I loved spending time with friends – walking an hour after we missed our bus by 30 seconds, sharing a tub of melted ice cream, seeing jellyfish at the aquarium, and watching movies into the night stand out among many memorable moments.Who has been your most influential teacher in Round Rock ISD? Why?So many teachers have had a positive influence on me. Thank you Mrs. Key for making IB such a welcoming and positive experience. To Mrs. Howi"
My parameters are
run_input = {"startUrls": [{"url": url}],"useSitemaps": False,"crawlerType": "playwright:firefox","includeUrlGlobs": [],"excludeUrlGlobs": [],"ignoreCanonicalUrl": False,"maxCrawlDepth": 0,"maxCrawlPages": 1,"initialConcurrency": 0,"maxConcurrency": 200,"initialCookies": [],... [trimmed]
Hello and thank you for your interest in this Actor!
This is curious - I tried running the Actor with htmlTransformer: "none"
and got the data for all the students. Check out my example run - perhaps there might be some issue in the rest of your code?
I'll close this issue now, but feel free to ask additional questions if you have any. We'd be more than happy to see the whole script - maybe there are some issues in the Apify Python Client we don't know about yet! :)
Cheers!