Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoI wanted to scrap only few texts from the page https://listings.icarlton.com/en/property/apartment-for-rent-in-hidd-140.html. I added their CSS classes in "Keep HTML elements (CSS selector)" but in the output I don't see these values and I see all content scrapped from the page. Also, is there any way to write an instruction that the output will be structured in the way that I am setting the name of the column A in Excel e.g. "property name" then in cell A2 will be a value from class "wtp_text_center_mob tw-text-4xl tw-font-bold tw-text-left tw-mt-0"?
Hi,
Thank you for using the Website Content Crawler.
I’ve looked into your runs, and if I understand correctly, you need to scrape structured information from the listing.
The Website Content Crawler is primarily designed to retrieve all content from a website. For extracting structured data, a better tool is the Web Scraper.
I’ve put together a basic Web Scraper to save the data into Apify's dataset. Please check this run—you can copy the input from the run and paste it into your Web Scraper, and it should work.
The scraper includes glob patterns to scrape only property data and contains a few selectors to extract the required information.
I hope this helps! I’ll go ahead and close this issue for now, but please feel free to ask any additional questions.
thank you very much Jiri
Actor Metrics
3.9k monthly users
-
718 stars
>99% runs succeeded
2.2 days response time
Created in Mar 2023
Modified 15 hours ago