No credit card required
Web Scraper
No credit card required
Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.
how do i associate which output is for which URL when doing bulk crawling. I want to be able to map the results of the crawl to the source URL. How do I do that?
Hello and thank you for your interest in this Actor!
I'm assuming you've found the solution because you've already closed this issue. If that was a mistake (or you're still looking for the "official" answer to your question), here you go:
The page function can return a JS object with multiple fields. The current page's URL is stored in context.request.url
and can be accessed from there. The following snippet stores the page URL and its content into one dataset record, so you can map the content to its URL.
1async function pageFunction(context) { 2 const $ = context.jQuery; 3 const content = $('body').first().text(); 4 5 return { 6 url: context.request.url, 7 content, 8 }; 9};
Does this answer your question? Let us know if you have any more questions for us!
- 3.7k monthly users
- 98.8% runs succeeded
- 3.6 days response time
- Created in Mar 2019
- Modified about 1 month ago