
Extended GPT Scraper
Pricing
Pay per usage

Extended GPT Scraper
Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.
4.6 (4)
Pricing
Pay per usage
87
Total users
1.5K
Monthly users
33
Runs succeeded
99%
Last modified
6 months ago
Not crawling other pages besides start urls
Closed
I tried a variety of configurations using the default link selector but haven't been able to get the crawler to go past the first page provided by the start url.
"linkSelector": "a[href]",
Globs: tried setting to [] and excluding property entirely
Crawling depth: set to 0 for unlimited and tried with a given value of 2
No content selector is used.
Any tips for getting this portion to work? Otherwise it works great!
convincing_bush
Figured this out, I think you need to provide at least one glob for it to work.
shiraklein-justt
Hi, I am also facing the same issue.
I want to scrape only the urls that contain the certain strings. For example, let's take https://news.ycombinator.com as the start URL and define the string to be "ask", so the scraper should scrape the page https://news.ycombinator.com/ask.
I tried the configurations below, but the crawler didn't go past the first page provided by the start url.
- startUrls: https://news.ycombinator.com/, "globs": [], "linkSelector": "a[href*=ask]"
- startUrls: https://news.ycombinator.com/, "globs": ["*ask*"], "linkSelector": "a[href]"
- startUrls: https://news.ycombinator.com/, "globs": ["*ask*"], "linkSelector": "a[href*=ask]"