Linkedin Company Insights Scraper avatar
Linkedin Company Insights Scraper

Pricing

$30.00/month + usage

Go to Store
Linkedin Company Insights Scraper

Linkedin Company Insights Scraper

Developed by

SASWAVE

SASWAVE

Maintained by Community

Linkedin company insights scraper. Extract data for analysis and buy intents. Scrap data about: Employee count with history and median tenure, Employee distribution and headcount growth by function, Profile New hires, Profile Notable company alumni with exit date and new role, Total job openings

0.0 (0)

Pricing

$30.00/month + usage

4

Total users

40

Monthly users

12

Runs succeeded

65%

Issue response

22 hours

Last modified

18 hours ago

SE

Seem to be getting more failures

Closed

setlateral opened this issue
2 months ago

e.g.:

2025-03-24T23:56:32.793Z ACTOR: Pulling Docker image of build W5W3w9wbKirz9mkHi from registry.
2025-03-24T23:56:32.902Z ACTOR: Creating Docker container.
2025-03-24T23:56:33.026Z ACTOR: Starting Docker container.
2025-03-24T23:56:35.026Z [apify] INFO Initializing Actor...
2025-03-24T23:56:35.029Z [apify] INFO System info ({"apify_sdk_version": "2.4.0", "apify_client_version": "1.9.2", "crawlee_version": "0.6.5", "python_version": "3.12.9", "os": "linux"})
2025-03-24T23:56:52.180Z [apify] ERROR Actor failed with an exception
2025-03-24T23:56:52.182Z Traceback (most recent call last):
2025-03-24T23:56:52.184Z File "/usr/src/app/src/main.py", line 428, in main
2025-03-24T23:56:52.186Z await parse(request.url, cookies, proxies, Actor, payload_input)
2025-03-24T23:56:52.188Z File "/usr/src/app/src/main.py", line 358, in parse
2025-03-24T23:56:52.190Z await get_insights(actor, fsd_company, obj, cookies, proxies)
2025-03-24T23:56:52.191Z File "/usr/src/app/src/main.py", line 137, in get_insights
2025-03-24T23:56:52.193Z for elem in datajson['data']['elements']:
2025-03-24T23:56:52.195Z ~~~~~~~~~~~~~~~~^^^^^^^^^^^^
2025-03-24T23:56:52.197Z KeyError: 'elements'
2025-03-24T23:56:52.198Z [apify] INFO Exiting Actor ({"exit_code": 91})

and how can one tell if there is anything wrong with the linkedin cookie being passed, e.g. the cookie is expired?

SE

setlateral

2 months ago

strange, now it is working again apparently i changed nothing, not even the cookie, perhaps it was having a problem with specific companies on linkedin, not sure

SE

setlateral

2 months ago

if it helps you, I got another deeper exception, unrelated to the first:

2025-03-25T00:15:02.160Z Profile Hires:
2025-03-25T00:15:07.544Z Failed to prolong lock for cached request VxRmDf5b99ydSwR, either lost the lock or the request was already handled
2025-03-25T00:15:07.545Z Traceback (most recent call last):
2025-03-25T00:15:07.546Z File "/usr/local/lib/python3.12/site-packages/crawlee/storages/_request_queue.py", line 668, in _prolong_request_lock
2025-03-25T00:15:07.546Z res = await self._resource_client.prolong_request_lock(
2025-03-25T00:15:07.547Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:15:07.547Z File "/usr/local/lib/python3.12/site-packages/apify/apify_storage_client/_request_queue_client.py", line 124, in prolong_request_lock
2025-03-25T00:15:07.548Z await self._client.prolong_request_lock(
2025-03-25T00:15:07.548Z File "/usr/local/lib/python3.12/site-packages/apify_client/_logging.py", line 61, in async_wrapper
2025-03-25T00:15:07.549Z return await fun(resource_client, *args, **kwargs)
2025-03-25T00:15:07.549Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:15:07.550Z File "/usr/local/lib/python3.12/site-packages/apify_client/clients/resource_clients/request_queue.py", line 610, in prolong_request_lock
2025-03-25T00:15:07.550Z response = await self.http_client.call(
2025-03-25T00:15:07.551Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:15:07.551Z File "/usr/local/lib/python3.12/site-packages/apify_client/_http_client.py", line 286, in call
2025-03-25T00:15:07.552Z return await retry_with_exp_backoff_async(
2025-03-25T00:15:07.553Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:15:07.554Z File "/usr/local/lib/python3.12/site-packages/apify_client/_utils.py", line 99, in retry_with_exp_backoff_async
2025-03-25T00:15:07.555Z return await async_func(stop_retrying, attempt)
2025-03-25T00:15:07.556Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:15:07.556Z File "/usr/local/lib/python3.12/site-packages/apify_client/_http_client.py", line 284, in _make_request
2025-03-25T00:15:07.557Z raise ApifyApiError(response, attempt)
2025-03-25T00:15:07.558Z apify_client._errors.ApifyApiError: Cannot prolong request lock which was not locked with same client or was already handled.
2025-03-25T00:15:07.561Z [apify] ERROR Actor failed with an exception
2025-03-25T00:15:07.562Z Traceback (most recent call last):
2025-03-25T00:15:07.562Z File "/usr/src/app/src/main.py", line 428, in main
2025-03-25T00:15:07.565Z await parse(request.url, cookies, proxies, Actor, payload_input)
2025-03-25T00:15:07.566Z ^^^^^^^^^^^
2025-03-25T00:15:07.567Z AttributeError: 'NoneType' object has no attribute 'url'
SE

setlateral

2 months ago

another type of exception:

2025-03-25T00:39:22.899Z ACTOR: Pulling Docker image of build W5W3w9wbKirz9mkHi from registry.
2025-03-25T00:39:23.009Z ACTOR: Creating Docker container.
2025-03-25T00:39:23.076Z ACTOR: Starting Docker container.
2025-03-25T00:39:24.923Z [apify] INFO Initializing Actor...
2025-03-25T00:39:24.929Z [apify] INFO System info ({"apify_sdk_version": "2.4.0", "apify_client_version": "1.9.2", "crawlee_version": "0.6.5", "python_version": "3.12.9", "os": "linux"})
2025-03-25T00:39:39.518Z https://www.linkedin.com/company/19129362
2025-03-25T00:39:42.602Z Profile Hires:
2025-03-25T00:39:54.855Z [apify] ERROR Actor failed with an exception
2025-03-25T00:39:54.857Z urllib3.exceptions.SSLError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1010)
2025-03-25T00:39:54.859Z
2025-03-25T00:39:54.861Z The above exception was the direct cause of the following exception:
2025-03-25T00:39:54.862Z
2025-03-25T00:39:54.864Z Traceback (most recent call last):
2025-03-25T00:39:54.866Z File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
2025-03-25T00:39:54.868Z resp = conn.urlopen(
2025-03-25T00:39:54.869Z ^^^^^^^^^^^^^
2025-03-25T00:39:54.871Z File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 841, in urlopen
2025-03-25T00:39:54.873Z retries = retries.increment(
2025-03-25T00:39:54.875Z ^^^^^^^^^^^^^^^^^^
2025-03-25T00:39:54.876Z File "/usr/local/lib/python3.12/site-packages/urllib3/util/retry.py", line 519, in increment
2025-03-25T00:39:54.878Z raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
2025-03-25T00:39:54.880Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:39:54.882Z urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.linkedin.com', port=443): Max retries exceeded with url: /voyager/api/voyagerPremiumDashCompanyInsightsCard?q=company&company=urn%3Ali%3Afsd_company%3A19175337&decorationId=com.linkedin.voyager.dash.premium.companyinsights.CompanyInsightsCardCollection-24 (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1010)')))
2025-03-25T00:39:54.884Z
2025-03-25T00:39:54.885Z During handling of the above exception, another exception occurred:
2025-03-25T00:39:54.887Z
2025-03-25T00:39:54.889Z Traceback (most recent call last):
2025-03-25T00:39:54.890Z File "/usr/src/app/src/main.py", line 428, in main
2025-03-25T00:39:54.892Z await parse(request.url, cookies, proxies, Actor, payload_input)
2025-03-25T00:39:54.893Z File "/usr/src/app/src/main.py", line 358, in parse
2025-03-25T00:39:54.895Z await get_insights(actor, fsd_company, obj, cookies, proxies)
2025-03-25T00:39:54.896Z File "/usr/src/app/src/main.py", line 131, in get_insights
2025-03-25T00:39:54.898Z res = requests.get(url, headers=headers, cookies=cookies, proxies=proxies)
2025-03-25T00:39:54.900Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:39:54.901Z File "/usr/local/lib/python3.12/site-packages/requests/api.py", line 73, in get
2025-03-25T00:39:54.903Z return request("get", url, params=params, **kwargs)
2025-03-25T00:39:54.904Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:39:54.906Z File "/usr/local/lib/python3.12/site-packages/requests/api.py", line 59, in request
2025-03-25T00:39:54.908Z return session.request(method=method, url=url, **kwargs)
2025-03-25T00:39:54.909Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:39:54.911Z File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
2025-03-25T00:39:54.913Z resp = self.send(prep, **send_kwargs)
2025-03-25T00:39:54.914Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:39:54.916Z File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
2025-03-25T00:39:54.917Z r = adapter.send(request, **kwargs)
2025-03-25T00:39:54.919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-25T00:39:54.921Z File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 698, in send
2025-03-25T00:39:54.922Z raise SSLError(e, request=request)
2025-03-25T00:39:54.924Z requests.exceptions.SSLError: HTTPSConnectionPool(host='www.linkedin.com', port=443): Max retries exceeded with url: /voyager/api/voyagerPremiumDashCompanyInsightsCard?q=company&company=urn%3Ali%3Afsd_company%3A19175337&decorationId=com.linkedin.voyager.dash.premium.companyinsights.CompanyInsightsCardCollection-24 (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1010)')))
2025-03-25T00:39:54.926Z [apify] INFO Exiting Actor ({"exit_code": 91})
saswave avatar

SASWAVE (saswave)

2 months ago

Looking into the issue, most of them seem to be proxy related

Will check with the support team

SE

setlateral

2 months ago

thanks

SE

setlateral

2 months ago

what is a safe number of companies to send in a batch considering the use of my own cookie and the processing required? and for the batch what is the memory and other requirements for the batch run? I have read the apify docs but the batch and cookie aspects add a little unknowns

saswave avatar

SASWAVE (saswave)

2 months ago

Great feedback, we added to our todo to update all our actors information to answer those question

In your case , for this actor, you can use minimum memory settings at 1024 (lesser the memory settings, lesser the cost to run on apify infra) because we don't implement high scale parallel requests; we are using cookie session so we want to be slow

Your cookie session might not be valid at some point during the process, so smaller the batch, lesser annoying stuff to handle with run stopping in the middle

SE

setlateral

2 months ago

ok thanks, I am doing only 10 companies at a time, i have upgraded my Apify subscription to Scale and trying again. I have shared all runs with you for this Agent (I think)

SE

setlateral

2 months ago

I am still getting very inconsistent results, mostly I get failures, leading to wasted processing time and credit on my Apify plan. I am also getting this error again:

2025-03-26T05:43:35.446Z ACTOR: Pulling Docker image of build I8brh8IqYMadNLSym from registry.
2025-03-26T05:43:35.542Z ACTOR: Creating Docker container.
2025-03-26T05:43:35.586Z ACTOR: Starting Docker container.
2025-03-26T05:43:37.130Z [apify] INFO Initializing Actor...
2025-03-26T05:43:37.133Z [apify] INFO System info ({"apify_sdk_version": "2.4.0", "apify_client_version": "1.9.2", "crawlee_version": "0.6.5", "python_version": "3.12.9", "os": "linux"})
2025-03-26T05:43:47.723Z [apify] ERROR Actor failed with an exception
2025-03-26T05:43:47.724Z Traceback (most recent call last):
2025-03-26T05:43:47.724Z File "/usr/src/app/src/main.py", line 432, in main
2025-03-26T05:43:47.725Z await parse(request.url, cookies, proxies, Actor, payload_input)
2025-03-26T05:43:47.725Z File "/usr/src/app/src/main.py", line 362, in parse
2025-03-26T05:43:47.725Z await get_insights(actor, fsd_company, obj, cookies, proxies)
2025-03-26T05:43:47.726Z File "/usr/src/app/src/main.py", line 141, in get_insights
2025-03-26T05:43:47.726Z for elem in datajson['data']['elements']:
2025-03-26T05:43:47.727Z ~~~~~~~~~~~~~~~~^^^^^^^^^^^^
2025-03-26T05:43:47.728Z KeyError: 'elements'
2025-03-26T05:43:47.728Z [apify] INFO Exiting Actor ({"exit_code": 91})

I have given you access to all my 779 runs of this actor, I am hoping this helps you understand, either fix or help me set up my runs better.

my success rate is around 10% for the linked in companies I have tried so far

saswave avatar

SASWAVE (saswave)

2 months ago

We updated the actor

Thank you for sharing your runs , it helps us error proofing our solution

At first we only tested on big companies that always have insight linkedin, most of your companies don't have insights or very low informations

We also updated the error message to ask you explicitly to update your cookie session

SE

setlateral

2 months ago

awesome, will give it another run, thanks for the quick turn-around

SE

setlateral

2 months ago

i will try again tomorrow it seems i have hit a limit on linked in

saswave avatar

SASWAVE (saswave)

2 months ago

Good to know, do you see that on your last run ?

Will try to replicate the error and return an explicit error like for cookie session not being valid anymore

SE

setlateral

2 months ago

I am running again now that my daily company viewing limit on linkedin is reset (tested interactively for browsing success)

We're getting better success rates but there are still errors intermittently, if you able to view the exceptions in the runs for today, perhaps you can find more ways to ensure the actor is more robust?

saswave avatar

SASWAVE (saswave)

2 months ago

We updated the actor and fixed 2 errors

It would also help if when having the daily limit reached, if you could run 1 last time with updated valid cookies that we can reuse to test ourself and implement an explicit error for daily limit reached

thank you for your help

SE

setlateral

2 months ago

Great, thanks, it is run again now, I will report back after a few batches

SE

setlateral

2 months ago

Looking better, I will inspect results closer soon