Web Scraper avatar
Web Scraper

Pricing

Pay per usage

Go to Store
Web Scraper

Web Scraper

Developed by

Apify

Apify

Maintained by Apify

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

4.5 (22)

Pricing

Pay per usage

715

Total users

83k

Monthly users

4.1k

Runs succeeded

>99%

Issue response

38 days

Last modified

23 days ago

BZ

how do i associate which output is for which URL when doing bulk crawling

Closed

burgundy_zebra opened this issue
a year ago

how do i associate which output is for which URL when doing bulk crawling. I want to be able to map the results of the crawl to the source URL. How do I do that?

jindrich.bar avatar

Hello and thank you for your interest in this Actor!

I'm assuming you've found the solution because you've already closed this issue. If that was a mistake (or you're still looking for the "official" answer to your question), here you go:

The page function can return a JS object with multiple fields. The current page's URL is stored in context.request.url and can be accessed from there. The following snippet stores the page URL and its content into one dataset record, so you can map the content to its URL.

async function pageFunction(context) {
const $ = context.jQuery;
const content = $('body').first().text();
return {
url: context.request.url,
content,
};
};

Does this answer your question? Let us know if you have any more questions for us!