
LinkedIn Jobs Scraper | Remove Duplicate Jobs
Pricing
$5.00/month + usage

LinkedIn Jobs Scraper | Remove Duplicate Jobs
LinkedIn Jobs Scraper | Remove Duplicate Jobs. The LinkedIn jobs scraper allows you to collect jobs in 2 ways: By providing one or more start URLs, or By entering multiple keywords, search queries. You can use either method individually or combine both.
5.0 (1)
Pricing
$5.00/month + usage
9
Monthly users
45
Runs succeeded
99%
Response time
5.3 hours
Last modified
9 days ago
failure at 1:10 am
Open
.398Z at async wrap (/usr/src/app/node_modules/@apify/timeout/cjs/index.cjs:54:21) {"id":"4R71fEUgs2LzQlO","url":"https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?currentJobId=4200641516&distance=25&f_TPR=3000&f_WT=2&geoId=103644278&keywords=BI+OR+SQL+OR+AI&origin=JOB_SEARCH_PAGE_SEARCH_BUTTON&refresh=true&sortBy=DD&start=9430","method":"GET","uniqueKey":"https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?currentJobId=4200641516&distance=25&f_TPR=3000&f_WT=2&geoId=103644278&keywords=BI+OR+SQL+OR+AI&origin=JOB_SEARCH_PAGE_SEARCH_BUTTON&refresh=true&sortBy=DD&start=9430"} 2025-04-07T05:10:21.544Z ERROR CheerioCrawler: Request failed and reached maximum retries. Error: Resource https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?currentJobId=4200641516&distance=25&f_TPR=3000&f_WT=2&geoId=103644278&keywords=BI+OR+SQL+OR+AI&origin=JOB_SEARCH_PAGE_SEARCH_BUTTON&refresh=true&sortBy=DD&start=9400 served Content-Type application/octet-stream, but only text/html, text/xml, application/xhtml+xml, application/xml, application/json are allowed. Skipping resource. 2025-04-07T05:10:21.546Z WARNING: Logging is too fast, some lines were skipped.

Yes, occasionally requests may be blocked due to LinkedIn's bot protection algorithms. To minimize this, I recommend using a residential proxy.
dolnikov
ok, I tried using residential proxy - again, Experiencing problems, 1945 failed requests in the past 10 seconds. scraper errors are like: MY-LOG: Scraping Manager, Data Science and Engineering - Advertising (Forecasting) 2025-04-07T15:13:42.179Z WARNING: Logging is too fast, some lines were skipped. 2025-04-07T15:13:42.565Z ERROR CheerioCrawler: Request failed and reached maximum retries. after it finished, it collected only a small fraction of job ads per my filter. Is there any way to fix it? thanks.
dolnikov
additionally : "Finished! Total 2061 requests: 112 succeeded, 1949 failed."

what is the input you gave to the actor?
dolnikov
most likely it was something like this: https://www.linkedin.com/jobs/search/?f_TPR=r3330&f_WT=2%2C3&keywords="project manager" OR "power BI" OR "PowerBI" OR "BI" OR "Business Intelligence" OR "Data Analytics" OR "qlik" OR "qlikview" OR "qliksense"%20 OR "qlik sense"&origin=JOB_SEARCH_PAGE_JOB_FILTER&refresh=true&sortBy=DD

No give me the json input please
dolnikov
{ "startUrls": [ { "url": "https://www.linkedin.com/jobs/search/?distance=25&f_TPR=39000&f_WT=2&geoId=103644278&keywords=Manager%20AND%20(Analytics%20OR%20BI%20OR%20SQL%20OR%20AI%20OR%20Intelligence%20OR%20Qlik%20OR%20Tableau%20OR%20Synapse%20OR%20Databricks)&origin=JOB_SEARCH_PAGE_SEARCH_BUTTON&refresh=true&sortBy=DD" } ], "keyword": [], "publishedAt": "", "saveOnlyUniqueItems": true, "proxy": { "useApifyProxy": true }, "maxItems": 2000 }

I just ran the scraper with your input, and it worked fine on my end. Additionally, I noticed that you're using a data center proxy instead of a residential proxy in your input, which could be causing the heavy blocking from LinkedIn due to the requests made by the script. I recommend giving it another try on your side. I also suggest removing the "max number of items to scrape" parameter for now, as it isn't functioning as expected. I'll work on resolving this in a future update.
dolnikov
removed maxItems, works without issues now, thanks. is there way to get notified or see some fix log to know when you fix the maxItems parameter?
dolnikov
2 more runs, looks fine

Nice!
Currently, the Max Item parameter controls the number of requests the web scraper makes (approximatively). This effectively sets a cap on how many records can be retrieved. While it doesn't guarantee that you'll get exactly that number of items, it does ensure you won't exceed it.
For example, if you set a limit of 2,000 jobs, the scraper will never return more than 2,000. In practice, you'll likely end up with fewer than the set limit.
I’ll try to update the Max Item behaviour in the coming days so that it sets a hard limit on the number of records stored, rather than the number of requests made. However, I can’t give an exact date for when this change will be completed.
I'll send a message in this discussion to ping you of any updates.
Pricing
Pricing model
RentalTo use this Actor, you have to pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period. You also pay for the Apify platform usage.
Free trial
3 days
Price
$5.00