
Web Scraper
Pricing
Pay per usage

Web Scraper
Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.
4.5 (22)
Pricing
Pay per usage
697
Total users
82.5k
Monthly users
3.9k
Runs succeeded
>99%
Issue response
32 days
Last modified
19 days ago
Tracking Webscraper
Closed
Hi there. I am currently using your web scraping actor in my site to scrape websites. Some of these scraping jobs get large and can take a few minutes, so I want to develop a tracking system. Is there any way I can update, say, a counter in the context local to where I am initializing and running the actor? Thanks
earnest_lawnmower
Here is my code. I want to call some logic whenever a url is scraped: input.startUrls = urls; const run = await client .actor(properties.apifyCredentials.actorId) .call(input); // Run x logic when a url is scraped, not when all urls are scraped at end of job const { items } = await client.dataset(run.defaultDatasetId).listItems();
Hello and thank you for your interest in this Actor!
Just to make sure I understand your question - this is more about Apify and less about this specific Actor, right?
If you want to run some code whenever an Actor produces a result, you basically have two options:
- The "easy" option: polling the Dataset API:
- here, we retrieve the run id (and the default dataset id) and repeatedly ask the server, how many results are in the dataset.
- this is very simple, but may be not "granular" enough for some use cases - if the Actor stores tens of results per second, the code will only see the results appear in multiples of ten (this might actually be a good thing, saving you some computation power).
// Run the Actor and get the run ID immediately - `waitSecs: 0` makes the call resolve immediately (and returns a reference to a "running" run).const { id: runId, defaultDatasetId } = await client.actor("yourActorId").call(input, { waitSecs: 0 });const interval = setInterval(async () => {const runInfo = await client.run(runId).get();if (runInfo?.status !== 'RUNNING') {clearInterval(interval);}const dataset = await client.dataset(defaultDatasetId).get();console.log(`The dataset currently contains ${dataset?.itemCount} items.`);}, 1000);
- The "precise" option: make your Actor call your own API
- With this option, you implement some "notification" system in your Ac... [trimmed]
Closing due to inactivity.