Web Scraper avatar
Web Scraper
Try for free

No credit card required

View all Actors
Web Scraper

Web Scraper

apify/web-scraper
Try for free

No credit card required

Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

User avatar

how do i associate which output is for which URL when doing bulk crawling

Closed

burgundy_zebra opened this issue
2 months ago

how do i associate which output is for which URL when doing bulk crawling. I want to be able to map the results of the crawl to the source URL. How do I do that?

User avatar

Hello and thank you for your interest in this Actor!

I'm assuming you've found the solution because you've already closed this issue. If that was a mistake (or you're still looking for the "official" answer to your question), here you go:

The page function can return a JS object with multiple fields. The current page's URL is stored in context.request.url and can be accessed from there. The following snippet stores the page URL and its content into one dataset record, so you can map the content to its URL.

1async function pageFunction(context) {
2    const $ = context.jQuery;
3    const content = $('body').first().text();
4
5    return {
6        url: context.request.url,
7        content,
8    };
9};

Does this answer your question? Let us know if you have any more questions for us!

Developer
Maintained by Apify
Actor metrics
  • 3.7k monthly users
  • 98.8% runs succeeded
  • 3.6 days response time
  • Created in Mar 2019
  • Modified about 1 month ago