Advanced Glassdoor Scraper avatar

Advanced Glassdoor Scraper

Deprecated
Go to Store
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
Advanced Glassdoor Scraper

Advanced Glassdoor Scraper

epctex/advanced-glassdoor-scraper

The most advanced Glassdoor Scraper that you would ever need. Extract millions of companies, salaries, interviews, jobs, and reviews from Glassdoor. You can specify search terms, filters, list pages, and more! Extremely fast, with no limits. Super easy to use!

GI

still getting blocked

Closed

gold_ingratitude opened this issue
5 months ago

Hi again

Example run IDs f3vVtgHiqGKbM3RHW xBfcbGdalOFUDKDPH UrPFSVRvWjT0evyp0

With or without proxy, most URLs continue getting blocked. I've just tried a blocked request on my local machine, via CURL, with proxy and without, but including the headers i've mentioned yesterday and it worked

Example curl (works): curl -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3' -H 'accept-language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7' https://www.glassdoor.co.uk/Overview/Working-at-Medium-EI_IE784883.11,17.htm

However, some improvements have occured since yesterday, with a minority of URLs coming back with results for the RUNs above. So the improvements you've made have yielded a minor improvement.

Could you please add header input support?

epctex avatar

epctex (epctex)

5 months ago

Hey there,

We just deployed a new version with the Header support. The field is called httpHeaders and it is in the object format. It is under the "Advanced Options" header. You can use the following as an example:

1{
2    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36",
3    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3",
4    "accept-language": "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7"
5  }

Best

Developer
Maintained by Community