Indeed Scraper avatar

Indeed Scraper

Try for free

Pay $5.00 for 1,000 results

Go to Store
Indeed Scraper

Indeed Scraper

misceres/indeed-scraper
Try for free

Pay $5.00 for 1,000 results

Scrape jobs posted on Indeed. Get detailed information from this job portal about saved and sponsored jobs. Specify the search based on location with the output attributes position, location, and description.

Do you want to learn more about this Actor?

Get a demo
MQ

Blocked by HTML

Closed

marketable_queen opened this issue
2 months ago

Hi team, just having an issue with scrapping from Indeed, keep getting "Blocked by HTML" in the log and no jobs are scrapped.

Thank you in advance

lukas.prusa avatar

Hi, thanks for opening this issue!

Unfortunately, it seems like company jobs search URLs are currently getting fully captcha blocked by Indeed. This has already happened in the past for some specific URLs with Indeed domain combinations, and there is basically nothing we can do about it.

We will investigate this and try to find some way around the blocking, but as said, we might not be able to overcome it. Unless we will get around, we will just have to wait until they remove the blocking. They are pretty much just experimenting with this. Last time it happened, they removed it in just a few days :)

I will keep you updated here, thanks!

MQ

marketable_queen

2 months ago

Thanks for getting back Lucas and for the information :)

NO

noctury

2 months ago

Hello Lukáš. I have the same issue. Would it be possible to let the whole run fail in this case, so that i could at least get the last run that worked with the api (https://api.apify.com/v2/actor-runs?token= ...) Currently i would have to check the details for all runs.

lukas.prusa avatar

Hi, thanks for the suggestion!

Just to make sure, you mean to fail the scraper straight after starting it, if any of the start URLs are the "company detail" ones? That would make sense for us, as currently this URL is just blocked 100% of the time. We can remove it when Indeed removes this crazy blocking (hopefully).

I will keep you updated here, thanks!

NO

noctury

2 months ago

Yes i mean exactly what you wrote. I use this to get the latest successful run:

1const result = await axios.get(
2    `https://api.apify.com/v2/actor-runs?token=${process.env.APIFY_API_KEY}`
3  );
4  const jsonData = result.data;
5
6
7  const succeededItems = jsonData.data.items.filter(
8    (item) => item.status === "SUCCEEDED"
9  );
10
11  succeededItems.sort(
12    // @ts-ignore
13    (a, b) => new Date(b.finishedAt) - new Date(a.finishedAt)
14  );
15
16  return succeededItems.length > 0 ? succeededItems[0].defaultDatasetId : null;
17};

currently it is useless cause the run fails internally. If the run would fail, i would at least get the last working one

lukas.prusa avatar

Hi, thanks for your patience. We've accidentally got this issue stuck in our revision, and forgot to update the scraper after finishing it...

It will now fail the Actor if there are any company detail URLs on the input :) Thanks and happy scraping!

Developer
Maintained by Apify

Actor Metrics

  • 794 monthly users

  • 155 stars

  • >99% runs succeeded

  • 4 days response time

  • Created in Mar 2023

  • Modified 39 minutes ago

Categories