Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
IG

Scrapping only few elements on the page and save them in the separate fields

Closed

igocza opened this issue
7 days ago

I wanted to scrap only few texts from the page https://listings.icarlton.com/en/property/apartment-for-rent-in-hidd-140.html. I added their CSS classes in "Keep HTML elements (CSS selector)" but in the output I don't see these values and I see all content scrapped from the page. Also, is there any way to write an instruction that the output will be structured in the way that I am setting the name of the column A in Excel e.g. "property name" then in cell A2 will be a value from class "wtp_text_center_mob tw-text-4xl tw-font-bold tw-text-left tw-mt-0"?

jiri.spilka avatar

Hi,

Thank you for using the Website Content Crawler.

I’ve looked into your runs, and if I understand correctly, you need to scrape structured information from the listing.

The Website Content Crawler is primarily designed to retrieve all content from a website. For extracting structured data, a better tool is the Web Scraper.

I’ve put together a basic Web Scraper to save the data into Apify's dataset. Please check this run—you can copy the input from the run and paste it into your Web Scraper, and it should work.

The scraper includes glob patterns to scrape only property data and contains a few selectors to extract the required information.

I hope this helps! I’ll go ahead and close this issue for now, but please feel free to ask any additional questions.

IG

igocza

6 days ago

thank you very much Jiri

Developer
Maintained by Apify

Actor Metrics

  • 3.9k monthly users

  • 718 stars

  • >99% runs succeeded

  • 2.2 days response time

  • Created in Mar 2023

  • Modified 15 hours ago