No credit card required

Website Content Crawler

apify/website-content-crawler

No credit card required

Automatically crawl and extract text content from websites with documentation, knowledge bases, help centers, or blogs. This Actor is designed to provide data to feed, fine-tune, or train large language models such as ChatGPT or LLaMA.

All issues Create new issue

How to scrape all FAQs on a page the requires each FAQ be clicked separately?

Closed

ollieiq opened this issue

I am trying to scrape the FAQs (both the questions and answers) on a page like this: https://www.philips-hue.com/en-us/support/product/philips-hue-system/100005 I can grab all the questions but for answers, only the last FAQ answer on the page is grabbed and that was with me using the . collapse__header parameter in the Expand Clickable elements field. Please advise on what I can do to remedy this.

Jindřich Bär (jindrich.bar)

Hello and thank you for your interest in this Actor!

Unfortunately, this is one of the downsides of Website Content Crawler - JS navigation and dropdowns. Since the implementation of these is not standardized by any document anywhere, different pages handle these differently. As a result, the Actor has fairly limited support for interaction with the on-page elements.

The good news is - it's pretty simple to download the data from this page using Cheerio Crawler - before the Javascript is executed on the page, the questions / answers are stored in :info attribute of the <faq> element. You can simply parse it out and store those in a Dataset. Check out my example run here - feel free to copy the input and experiment with it. And definitely let me know if you have any other questions (regarding this website or others).

I'll close this issue now (but as I said, still feel free to ping me :)) Cheers!

tm_oiq

Hello and thank you for your quick and helpful response to my inquiry! My ultimate goal is to scrape all the data in the Support section of the Philips Hue portal and other similar IoT manufacturer devices. Is there an actor you recommend that would do that for me? https://www.philips-hue.com/en-us/support/faq is the URL I am starting from.

I would like to grab all the FAQ’s and articles (both text and pdf) that are linked to from the main support page URL and all pages linked to off of that (some of the images link to setup guides, etc). I’m just getting started into scraping so any direction you can point me in is greatly appreciated!

Add comment

Developer

Apify

Actor metrics

2k monthly users
99.9% runs succeeded
2.9 days response time
Created in Mar 2023
Modified 3 days ago

Categories

Developer tools

Business

Web Scraper

apify/web-scraper

Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

Apify

63.5k

Google Maps Scraper

compass/crawler-google-places

Extract data from hundreds of Google Maps locations and businesses. Get Google Maps data including reviews, images, contact info, opening hours, location, popular times, prices & more. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

Compass

63.5k

Google Search Results Scraper

apify/google-search-scraper

Scrape Google Search Engine Results Pages (SERPs). Select the country or language and extract organic and paid results, ads, queries, People Also Ask, prices, reviews, like a Google SERP API. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

Apify

43.8k

GPT Scraper

drobnikj/gpt-scraper

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Jakub Drobník

4.4k

AI Product Matcher

equidem/ai-product-matcher

Match products across multiple e-commerce websites. Use this AI product matching Actor whenever you need to find matching pairs of products from different online shops for dynamic pricing, competitor analysis or market research.

Matěj Sochor

318

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

4.3k

Facebook Ads Scraper

apify/facebook-ads-scraper

Extract advertising data from one or multiple Facebook Pages. Get page details, reach estimates, publisher platforms, report count, number of impressions, ad IDs, timestamps, and more. Download Facebook ads data in JSON, CSV, and Excel and use it in apps, spreadsheets, and reports.

Apify

AI Web Agent

apify/ai-web-agent

Use natural language prompts to browse the web, click on elements, fill and submit forms, extract data, and take screenshots using the OpenAI API.

Apify

431

📩📍 Google Maps Email Extractor

lukaskrivka/google-maps-with-contact-details

Extract Google Maps contact details. Scrape websites of Google Maps places for contact details and get email addresses, website, location, address, zipcode, phone number, social media links. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.