Extended GPT Scraper
No credit card required
Extended GPT Scraper
No credit card required
Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.
Do you want to learn more about this Actor?
Get a demoI tried a variety of configurations using the default link selector but haven't been able to get the crawler to go past the first page provided by the start url.
"linkSelector": "a[href]",
Globs: tried setting to [] and excluding property entirely
Crawling depth: set to 0 for unlimited and tried with a given value of 2
No content selector is used.
Any tips for getting this portion to work? Otherwise it works great!
Figured this out, I think you need to provide at least one glob for it to work.
Hi, I am also facing the same issue.
I want to scrape only the urls that contain the certain strings. For example, let's take https://news.ycombinator.com as the start URL and define the string to be "ask", so the scraper should scrape the page https://news.ycombinator.com/ask.
I tried the configurations below, but the crawler didn't go past the first page provided by the start url.
- startUrls: https://news.ycombinator.com/, "globs": [], "linkSelector": "a[href*=ask]"
- startUrls: https://news.ycombinator.com/, "globs": ["*ask*"], "linkSelector": "a[href]"
- startUrls: https://news.ycombinator.com/, "globs": ["*ask*"], "linkSelector": "a[href*=ask]"
- 77 monthly users
- 44 stars
- 99.6% runs succeeded
- 3.4 days response time
- Created in Jun 2023
- Modified 7 days ago