Sitemap To Request Queue avatar
Sitemap To Request Queue

Pricing

Pay per usage

Go to Store
Sitemap To Request Queue

Sitemap To Request Queue

pocesar/sitemap-to-request-queue

Developed by

Paulo Cesar

Maintained by Community

Download sitemap XMLs and put them in a RequestQueue

0.0 (0)

Pricing

Pay per usage

0

Monthly users

6

Runs succeeded

>99%

Last modified

2 years ago

Sitemap to RequestQueue

Downloads a sitemap.xml files and append them to a RequestQueue of your choice.

Example

1// this is your actor
2Apify.main(async () => {
3  const { proxyConfig } = await Apify.getInput();
4  const requestQueue = await Apify.openRequestQueue();
5
6  // this is needed so it doesn't execute everytime there's a migration
7  const run = (await Apify.getValue('SITEMAP-CALL', run)) || { runId: '', actorId: '' };
8
9  if (!run || !run.runId) {
10    // this might take a while!
11    const runCall = await Apify.call('pocesar/sitemap-to-request-queue', {
12      // required proxy configuration, like { useApifyProxy: true, apifyProxyGroups: ['SHADER'] }
13      proxyConfig,
14      // use this for this run's RequestQueue, but can be a named one, or if you
15      // leave it empty, it will be placed on the remote run RQ
16      targetRQ: requestQueue.queueId,
17      // required sitemaps
18      startUrls: [{
19        url: "http://example.com/sitemap1.xml",
20        userData: {
21          label: "DETAILS" // userData will passthrough
22        }
23      }, {
24        url: "http://example.com/sitemap2.xml",
25      }],
26      // Provide your own transform callback to filter or alter the request before adding it to the queue
27      transform: ((request) => {
28        if (!request.url.includes('detail')) {
29          return null;
30        }
31
32        request.userData.label = request.url.includes('/item/') ? 'DETAILS' : 'CATEGORY';
33
34        return request;
35      }).toString()
36    }, { waitSecs: 0 });
37
38    run.runId = runCall.id;
39    run.actorId = runCall.actId;
40
41    await Apify.setValue('SITEMAP-CALL', run);
42  }
43
44  await Apify.utils.waitForRunToFinish(run);
45
46  const crawler = new Apify.PuppeteerCrawler({
47    requestQueue, // ready to use!
48    //...
49  });
50
51  await crawler.run();
52});

License

Apache 2.0

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.