Sitemap To Request Queue avatar
Sitemap To Request Queue
Try for free

No credit card required

View all Actors
Sitemap To Request Queue

Sitemap To Request Queue

pocesar/sitemap-to-request-queue
Try for free

No credit card required

Download sitemap XMLs and put them in a RequestQueue

Sitemap to RequestQueue

Downloads a sitemap.xml files and append them to a RequestQueue of your choice.

Example

1// this is your actor
2Apify.main(async () => {
3  const { proxyConfig } = await Apify.getInput();
4  const requestQueue = await Apify.openRequestQueue();
5
6  // this is needed so it doesn't execute everytime there's a migration
7  const run = (await Apify.getValue('SITEMAP-CALL', run)) || { runId: '', actorId: '' };
8
9  if (!run || !run.runId) {
10    // this might take a while!
11    const runCall = await Apify.call('pocesar/sitemap-to-request-queue', {
12      // required proxy configuration, like { useApifyProxy: true, apifyProxyGroups: ['SHADER'] }
13      proxyConfig,
14      // use this for this run's RequestQueue, but can be a named one, or if you
15      // leave it empty, it will be placed on the remote run RQ
16      targetRQ: requestQueue.queueId,
17      // required sitemaps
18      startUrls: [{
19        url: "http://example.com/sitemap1.xml",
20        userData: {
21          label: "DETAILS" // userData will passthrough
22        }
23      }, {
24        url: "http://example.com/sitemap2.xml",
25      }],
26      // Provide your own transform callback to filter or alter the request before adding it to the queue
27      transform: ((request) => {
28        if (!request.url.includes('detail')) {
29          return null;
30        }
31
32        request.userData.label = request.url.includes('/item/') ? 'DETAILS' : 'CATEGORY';
33
34        return request;
35      }).toString()
36    }, { waitSecs: 0 });
37
38    run.runId = runCall.id;
39    run.actorId = runCall.actId;
40
41    await Apify.setValue('SITEMAP-CALL', run);
42  }
43
44  await Apify.utils.waitForRunToFinish(run);
45
46  const crawler = new Apify.PuppeteerCrawler({
47    requestQueue, // ready to use!
48    //...
49  });
50
51  await crawler.run();
52});

License

Apache 2.0

Developer
Maintained by Community
Actor metrics
  • 4 monthly users
  • 100.0% runs succeeded
  • 0.0 days response time
  • Created in Sep 2020
  • Modified over 1 year ago
Categories