Pricing

$29.00/month + usage

🔥 Linkedin Companies & Profiles Bulk Scraper

Companies & Profiles Linkedin scraper. Get comprehensive profiles of individuals and companies based on your keywords and filters. Unleash the power of data! 🌐🔍

Pricing

$29.00/month + usage

Rating

2.6

(48)

Developer

Bebity

Actor stats

352

Bookmarked

15K

Total users

Monthly active users

22 days

Issues response

8 days ago

Last modified

[Unreleased]

[10.14.1] — 2026-06-23

Changed

Bumped the pinned LinkedIn client version (x-li-application-version) 0.2.5982 → 0.2.6110 (current live value, captured via the camoux browser; the old one was ~128 builds stale). Both served data in testing — this is opportunistic maintenance before the old version rots.

[10.14.0] — 2026-06-23

Fixed

Redirect-loop soft-blocks are detected again (v10.13 regression). When a leased account is logged-out / soft-blocked, LinkedIn bounces its profile request (/flagship-web/in/{vanity} 302 → itself / login). v10.13's classifier could not see into the resulting opaque Too many redirects message, so it fell to UNKNOWN → retire and the dead account stayed in the pool and got re-leased indefinitely — fewer "bans" but a collapsing success rate. The crawler's HTTP client now uses followRedirects: false, so a 3xx is surfaced with its Location; a LinkedIn-internal redirect (or a Too many redirects loop) is classified AUTH_BLOCK and takes the account out of rotation. A redirect off linkedin.com stays transient (retire). Fail-fast: no more 10×-redirect + 8×-retry waste per dead account.
Account health is only credited on a real 2xx. Previously any status < 400 (including a 3xx bounce) marked the account healthy.

Added

Strike-count-then-kill account policy: an auth-shaped failure (auth block / challenge / redirect-loop) retires + rotates the session on the first STRATEGY_KILL_AFTER_STRIKES − 1 strikes and only flags the account LOGIN_REQUIRED on the strike-out, so a one-off bounce never burns a recoverable account. New optional env knob STRATEGY_KILL_AFTER_STRIKES (default 2). A later success on the account clears its strikes.

[10.13.0] — 2026-06-15

Fixed

Accounts are no longer falsely retired on infrastructure errors. A new failure classifier distinguishes genuine authentication blocks / challenges from proxy failures, connection resets, HTTP 429 rate-limits, timeouts and empty responses: only the former take an account out of rotation, while transient/infra errors rotate the session and keep the account. This removes the main cause of account churn (previously a crude substring match flagged accounts on proxy/timeout noise).

Changed

Bumped the pinned LinkedIn client version (x-li-application-version) from the ~200-builds-stale 0.2.5782 to the current 0.2.5982, and centralized it into a single constant for easier future bumps.

Added

Account-flag telemetry now reports a normalized reason (AUTH_BLOCK / CHALLENGE / LEASE_DEAD) to the account provider, and usage telemetry carries an optional STRATEGY_TAG for cohort measurement.
Optional env knobs STRATEGY_MAX_CONCURRENCY and STRATEGY_PACING_MS to tune crawler aggressiveness without code changes.

[10.12.0] — 2026-06-08

Added

Migration-resilient resume: the v2 engine now persists a lightweight per-run progress checkpoint (delivered profiles + per-keyword counts) to the run's key-value store, so after an Apify host migration it resumes only the not-yet-delivered targets instead of restarting from zero. Eliminates the duplicate results previously emitted on migration. The request queue stays in-memory — no scraping internals are persisted.

[10.11.4] — 2026-06-08

Changed

Tuned crawler concurrency (request concurrency + session pool size) for better resource management under load.

Fixed

Leased accounts missing a valid li_at session cookie are now detected during cookie extraction and flagged, instead of being used and failing mid-run. Added offline tests for the cookie validation/extraction logic.

[10.11.3] — 2026-06-08

Changed

Daily result limit temporarily lowered from 150,000 to 10,000 while the rebuilt v2 engine is being stabilized (a notice is now shown in the run log). Will be raised over time.
Startup summary now describes the proxy source as account-bound proxies only.

[10.11.2] — 2026-06-05

Added

status + reason on every output record (profiles and companies). Each item now carries status (FOUND / NOT_FOUND) and reason (DOES_NOT_EXIST, UNREACHABLE, or null when found), defaulted at emit time. Non-existent or unreachable handles are now catalogued as NOT_FOUND records instead of vanishing silently: a LinkedIn 404 → DOES_NOT_EXIST, a fetch that exhausts retries → UNREACHABLE, and a company whose Voyager response resolves no entity → DOES_NOT_EXIST. Both fields are documented in the output schema and shown as the first column in the Profiles/Companies table views.

Fixed

Profiles occasionally returned with only their identity block missing (no firstName/lastName/headline/location/profileId, but experience/education/skills present). The cause: a leased account/proxy sometimes returns a garbled/partial profile shell — not a 404, long enough to look real, but with the top-card identity block absent. The shell handler now detects this (no name and no nonIterableProfileId), retires that session and retries the shell on a fresh account (up to 3 rotations). The detection is deliberately conservative — it fires only when both the name and the profileId are missing, so a real profile that merely lacks a headline is never rotated away. If all rotations still return a partial shell, the actor keeps the partial data rather than dropping the profile. Added offline tests for the isPartialShell predicate (incl. real captured shells, which are never flagged).

Fixed

Profile summary was duplicated. LinkedIn renders the About text twice (a truncated collapsed preview + the full expanded text); the parser joined both, doubling the summary verbatim. It now de-duplicates — dropping any paragraph that is a duplicate of, or a truncated prefix of, a longer one. Confirmed on a 15-profile live run: 0 duplication (e.g. a 537-char doubled summary is now the correct 268 chars).
Profile firstName/lastName missing on some profiles, with the name leaking into headline. Some profile shells carry no firstName/lastName JSON fields at all — the name only exists as the top-card's first rendered line. The top-card extractor now isolates that name line (so it no longer leaks into headline) and recovers the name from it when the JSON fields are absent. Live name coverage went 6/15 → 10/15 (every profile that returns a shell now gets its name). Added offline regression tests on real captured shells (incl. a zero-name-field shell).

[10.11.0] — 2026-06-05

Added

Output schema (.actor/dataset_schema.json + .actor/output_schema.json, wired via actor.json storages.dataset / output). Declares every field the actor can emit (profiles + companies, union, all nullable, nested objects) and gives the Console two dedicated table views — Profiles and Companies. Built from real run outputs, not just the TS interfaces.
README fully rewritten following the Apify Academy best-practice structure: v10 "rebuilt from scratch" header, documented input (all fields + example) and output (maximal profile + company JSON examples covering every emittable field/section), data tables, how-to, pricing, FAQ/legality, and the new banner.
urn (member URN) is now documented in the output schema and README — it was already emitted but undocumented.

Changed

Output field names harmonized between profiles and companies (breaking for the v2 output shape):
- followerCount (company) → followersCount (now shared with profiles).
- backgroundImageUrl (profile) → coverImageUrl (now shared with companies).
- industryName (profile) → industry (now shared with companies).
- Distinct-by-design fields kept as-is: profile profilePictureUrl vs company logoUrl, profile summary vs company description.

[10.10.1] — 2026-06-05

Fixed

get-companies returned the wrong company. The Voyager details parser picked the first company in the response's included[] (which also carries affiliated/similar companies) whenever the universalName match missed — and the match was case-sensitive, so a slug like Bebity (≠ stored bebity) silently fell through to an unrelated company (it returned Blanche.Agency for Bebity). It now selects the company the query actually resolved to, via the response's primary *elements URN (case-independent), with a case-insensitive universalName fallback.
A company NAME (not a URL) is now searched, not looked up as a slug. Under the legacy isUrl:true flag, a bare token in get-companies (e.g. Bebity) was treated as a direct universalName GraphQL lookup, which resolves to the wrong company. A markerless entry now routes to a proper company search (companies vertical filter); only an explicit /company/{slug} URL stays a direct lookup. Profile routing is unchanged.

[10.10.0] — 2026-06-05

Added

Experience role descriptions — each experience item now carries a description field (the role blurb / "…see more" text). It is detected structurally via the SDUI maxLineCountExpression marker, so it is never confused with the location.
input routing private log at startup — shows, per run, how each input entry was classified (searchCount / directCount plus the actual searchTerms / directSlugs, capped to 25). Makes "why was my name treated as a URL?" obvious from the logs.

Fixed

Profile sections cross-contaminated each other (intermittent, ~1 run in 4). When LinkedIn returned the Part1 card with every section inlined into the root flight row, the section scope matched that one row for all sections, so experience / education / certifications / volunteer each received the full concatenated blob and mislabelled it. Sections are now scoped by the node's observabilityIdentifier subtree, which is robust to both the inlined and split layouts.
location on experiences contained the role description note (and the note was lost when a real location existed). Notes now go to the new description field; location only ever holds a real location.
Company "media / CTA" taglines parsed as fake roles ("Learn more here", promo lines) — they broke multi-role grouping (sub-roles lost their company). Dropped via the *-media-button tracking marker. The same fix removes uploaded certification attachments ("…certificate.pdf") that were emitted as bogus certifications.
Multi-role groups with a standalone employment-type line ("Permanent", "Full-time") lost the company on each sub-role and put the type into location. The type is now read as employmentType (the secondary line before the date; location is the one after).
Blank "+N skills" affordance nodes split experiences — a whitespace-only node was treated as a job title, swallowing the next real role. Whitespace-only text is now ignored everywhere.
Education "Activities and societies" became a bogus school entry (or was silently dropped when long). It is now captured as the entry's activities field.
Honors & Projects emitted noise entries — the "Associated with {company/school}" association line and the "Other contributors" avatar affordance were parsed as separate entries (and the association line stole the real entry's description). Both are now skipped.
Skills empty-state placeholder ("Nothing to see for now" / "Skills that X adds will appear here.") was emitted as two skills on profiles with no skills. Now filtered.
A typed name submitted with the legacy isUrl:true flag was treated as a direct profile slug instead of a search (e.g. searching "Hugo Picquet" fetched a non-existent /in/Hugo picquet). Under isUrl:true, only entries that can actually be a slug (URL marker or a space-free token) are routed to a direct lookup; anything else is a search.

[10.9.0] — 2026-06-04

Fixed

v1-shaped inputs were silently ignored, returning the schema default (keywords:["web dev"], limit:1) instead of the submitted targets. Root cause: the schema exposed its fields under v2__-prefixed names, so Apify injected the v2__* defaults for every field a v1-shaped input (action/keywords/limit) didn't provide. Those injected defaults flipped normalizeInput's isV2 discriminator to true (v2-wins precedence), so the engine read the empty v2__search:["web dev"] / v2__limit:1 defaults and discarded the real keywords/limit. Net effect: a run of 4 profile URLs with limit:4 searched "web dev" capped at 1 and emitted a single record.

Changed

Input fields renamed back to the shared v1/v2 names (v2__action→action, v2__search→keywords, v2__limit→limit, v2__location→location, v2__profileFields→profileFields, v2__proxyConfiguration→proxyConfiguration). There is now a single set of field names, so normalizeInput and the startup-summary config read them directly — the v2__ vs v1 discriminator (and the v2-wins precedence that caused the bug) is gone. Legacy isUrl:true is still honored as a direct-lookup hint; isName remains a no-op. Behavior is otherwise unchanged.

[10.8.0] — 2026-06-03

Changed

Simplified v2 input. Targeting collapses from four concepts (action + isUrl + isName + the implicit meaning of keywords) to two: v2__action (Profiles / Companies) and v2__search (one bulk list of search terms, URLs, or /in/·/company/ slug-paths). URLs are auto-detected; a bare term is a search; a /in/{slug} or /company/{slug} marker forces a direct lookup (returns one result, ignores the limit). Removed the confusing isUrl/isName toggles and the enrichWith* fields from the v2 UI. The location field becomes v2__location (same autocomplete behavior).

Added

normalizeInput backward-compat layer (src-v2/dtos/normalize-input.ts). All v2 fields are prefixed v2__; the presence of any v2__ key marks an input as v2. Legacy v1-shaped inputs (action/keywords/isUrl/isName) still work unchanged — they ride through the schema's additionalProperties: true and are mapped to the canonical model (isName is dropped, being a no-op in v2). Raw LinkedIn IDs are rejected with a clear message instead of issuing a dead request. The schema requires nothing (required: []) so v1-shaped inputs still pass platform validation. Covered by offline tests in src-v2/__tests__/run.ts.

[10.7.0] — 2026-06-03

Added

location filter now works (get-profiles). The previously-inert "🌍 Location" input is wired into people-search as the geoUrn facet. Each entry is resolved autocomplete-style: free text (e.g. Paris, United States, Greater London) is matched against LinkedIn's geo typeahead and the top match is used; a raw geo id (106383538), a urn:li:geo:… URN, or a URL carrying geoUrn=… is used directly (no lookup). Multiple entries widen the filter (matches ANY). Resolution runs once at startup via one leased account and is baked into every search request (geoUrn=["…"] + origin=FACETED_SEARCH), including pagination. Unmatched entries are skipped with a warning (the run continues; if none resolve it runs unfiltered) — the matched place names are surfaced on the public log. New src-v2/linkedin/geo.ts (URL builder + typeahead parser + raw-id detection) and src-v2/runtime/geo-resolver.ts, both with offline tests over a real captured typeahead fixture. Geo typeahead transport is the Voyager GraphQL voyagerSearchDashReusableTypeahead (same rot caveat as the company queryId; see docs/RE §1e). Profiles only — company search is unaffected.

[10.6.0] — 2026-06-03

Fixed

limit ("🔢 Limit of result per query") no longer caps directly-provided URLs, and is now per-query. It was implemented as a single global counter applied to every enqueued item — including profile/company URLs passed in directly. With the default limit=1, a run of N URLs (e.g. the big-one preset: 493 profile URLs) returned only 1 item. Now: direct URLs bypass the per-query cap (all N are scraped, still subject to the daily quota), and in search mode limit is a per-keyword budget — limit=5 over 2 search terms yields up to 10 rows, matching the input-schema description. Gate logic extracted to src-v2/runtime/limit-gate.ts with offline tests; it composes with the daily-quota cap from 10.5.0 (per-query limit AND global remaining-quota both apply).

Added

profileFields multi-select (get-profiles) — choose which profile sections to scrape; each option maps to one LinkedIn request (About → above card; Experience/Education/Certifications/Volunteer → part1 card; Languages → part4 card; Skills/Honors/Projects/Organizations → their pagers). Fewer selections = fewer requests = faster, cheaper, lower ban risk. Empty/absent = scrape everything (backward compatible). Base identity (name, headline, location, picture, counts) always returned via the mandatory shell. Selection logic in src-v2/runtime/profile-fields.ts with offline tests.
Company output enriched (get-companies): CompanyData now also exposes phone ({ number, extension }), callToAction ({ text, type, url }), verified (page verification), active, pageType (COMPANY/SCHOOL/SHOWCASE), jobSearchUrl, and hashtags (associated #tags, leading # stripped). Field shapes validated against the decompiled Voyager Company / CallToAction / PhoneNumber / PageVerification models; covered by offline tests over the microsoft/stripe fixtures.

[10.5.0] — 2026-06-02

Added

Per-user daily result quota in v2 (src-v2/runtime/rate-limit.ts), ported from v1's Redis approach. Counts delivered dataset items (profiles and companies, one shared quota) per Apify userId over a 24h sliding window, in Redis. Config: REDIS_URL + DAILY_LIMIT (default 150000). Same key scheme as v1 (rate_limit:{userId}:count / :window_start), so a shared Redis makes v1 and v2 draw from the same budget.
The run's effective cap is min(input.limit, remaining); the request gate stops enqueuing at that cap, so the daily limit is never exceeded. Each emitted item increments the Redis counter.
Graceful degradation: no REDIS_URL (or a connection error) → the limiter is a no-op and the run proceeds unthrottled.
Optional RATE_LIMIT_PREFIX env to namespace the Redis keys (e.g. claudetest:), isolating from existing data on a shared Redis. Default "" keeps the v1-compatible rate_limit:{userId} scheme.
Client-facing quota line on the public log channel (Daily quota: X/Y used today — Z remaining); all other rate-limit internals (Redis connection, detailed counters) stay on the private/debug channel.

Changed

Unlike v1 (which Actor.fail()s), hitting the daily cap now lets the run succeed: allowed results are delivered and the run finishes with a warning + an Apify status message (Daily limit … Resets in ~N min). Already-at-cap runs exit immediately without leasing accounts.

[10.4.0] — 2026-06-02

Added

action=get-companies in v2 — company search and company details (previously threw). Company search reuses the unified Crawlee queue + session affinity as a POST RSC stream (mirror of people-search, page-key d_flagship3_search_srp_companies); regex-extracts /company/{universalName} slugs. Company details are fetched from the Voyager GraphQL API (/voyager/api/graphql, voyagerOrganizationDashCompanies), which returns normalized JSON (application/vnd.linkedin.normalized+json+2.1) — not SDUI/RSC. One GET per company → one dataset item (no cards/pagers).
CompanyData output: name, universalName, companyId, description, tagline, websiteUrl, industry, employeeCount + range, foundedYear, specialities, headquarter, all locations, followerCount, logoUrl, coverImageUrl, linkedinUrl.
Offline tests over real captured fixtures (company-microsoft.voyager.json, company-stripe.voyager.json, company-search-consulting.rsc.txt): parser fields, slug extraction, and the Voyager header contract.

Docs

docs/RE/LINKEDIN_API_REVERSE_ENGINEERING.md §7 — full company reverse-engineering (search + details endpoints, headers, field mapping), captured live via CamouxAI.
docs/DECISIONS.md D12 — company details use Voyager GraphQL JSON (distinct transport/parser from profiles); the queryId is a versioned hash that rots on client bumps.
Design spec: docs/superpowers/specs/2026-06-02-company-search-details-design.md.

[10.3.1] — 2026-06-02

Changed

Request queue is now memory-only in public mode, instead of persisted-then-dropped. v10.3.0 wrote every queued request (search/shell/card/pager) to storage/request_queues/ during the run and deleted them in a finally block — so they were still visible mid-run and the cleanup was best-effort. The request queue is now backed by an in-memory storage client (@crawlee/memory-storage, persistStorage: false), so the engine's internals are never written to the run's storage at all. Scoped to the request queue only: the dataset output keeps the default storage client and persists as before. General-debug mode still uses the default persistent queue for the Apify crawler UI.

[10.3.0] — 2026-06-02

Added

General-debug mode (src-v2/): activate via the GENERAL_DEBUG=1 env var or a hidden input field. When on, all Crawlee classes (session pool, statistics) run with persistence + stats enabled for the Apify crawler UI; when off, none of it is exposed.
Double logger (logs.priv / logs.pub / logs.all): a friendly, safe public log stream by default; switches to a detailed private stream under general-debug. All levels (debug/info/warning/error, plus warn alias).
Startup summary: every run opens with the scraper version, log mode, general-debug state, action/limit/keyword counts, and a warning for any missing required env vars.

Changed

All src-v2 log call sites migrated onto the double logger; account leasing / usage telemetry / RSC internals now log only on the private channel.
Public mode now mutes Crawlee's own class loggers (crawler, session pool, request queue, statistics) by raising the shared root @apify/log level to ERROR; this also hides Crawlee's routine retry warnings (Reclaiming failed request …, which expose the crawler name + target URL + retry count). Our funny public logs ride a child logger kept at INFO, so the user-facing narrative still prints; a genuine fatal crawler error still surfaces. Private mode raises the root to DEBUG so everything shows.

Fixed

Per-second "Statistics" log spam in public mode. The previous suppression trick set statisticsOptions.logIntervalSecs to ~41 years; logIntervalSecs * 1000 overflowed setInterval's 32-bit signed range and silently clamped to ~1 ms, flooding the log dozens of times per second (and the line leaked engine internals). Suppression is now done by log level; the interval stays a sane value (60s public / 30s private).
Request queue left in storage in public mode. Session-pool and Statistics state were already gated out of the Key-Value store, but the request queue (every queued search/shell/card/pager request) is a separate store and still lingered in storage/request_queues/. main.ts now drops the request queue at shutdown when general-debug is off — the dataset output (a separate store) is untouched, and private mode keeps the queue for the Apify crawler UI.

[10.2.1] — 2026-06-02

Fixed

Leased accounts are now released at shutdown (AccountRegistry.releaseAll() in main.ts's finally) — they were leased but never returned to Milkbox, holding leases open server-side.
Idempotent profile-shell seed: a re-queued shell no longer resets a live accumulator entry (which would dedupe its cards by uniqueKey and strand the profile so it never emits).
choosePagers filters out any unresolved pager (defensive under the project's loose TypeScript).

Added

postNavigationHook marks the account good (registry.markGood) on a <400 response, so account-health telemetry sees successes, not only errors/bans.

Docs

New docs/SESSION-AFFINITY.md (system + how to use/reuse the brick + gotchas); CLAUDE.md updated to the unified-queue flow (the old two-phase description was stale).

[10.2.0] — 2026-06-02

Added

Unified single-queue v2 pipeline with session affinity. Every LinkedIn call (search → profile shell → cards → pagers) is now a queued Crawlee Request instead of Phase-1 direct fetch + Phase-2 inline ctx.sendRequest. A profile's cards/pagers are pinned to the account that handled its shell and run concurrently on that one account/IP; each new profile takes a fresh session. Output unchanged: one dataset item per profile.
Generic crawlee-session-affinity brick (src-v2/crawlee/session-affinity/): soft request→session affinity for Crawlee 3.16 (pin via userData, graceful fallback when a session dies, forefront batch). SessionPool is fed by an AccountRegistry that adapts the Milkbox provider. Offline-tested (pnpm run test:affinity).
Two-wave profile accumulator (src-v2/linkedin/profile-accumulator.ts): per-profile completion barrier (cards → pagers) that emits one aggregated item; best-effort (a part that exhausts retries still advances the barrier). Offline-tested (pnpm run test:queue).

Changed

Search retries now ride Crawlee (maxRequestRetries + session retire) instead of the manual 8-account loop. Component/pager POSTs are navigated by the crawler (validated: impit POST returns RSC).
Profile handler split into profile-shell / component / pager handlers + a part-extractors registry; main.ts rebuilt on the affinity brick (maxConcurrency: 50, maxPoolSize: 10, maxRequestRetries: 8).

Fixed

Account ids are sanitized without hyphens so they are valid Apify proxy session tokens (/^[\w._~]+$/); Milkbox UUID ids no longer break proxy resolution.
Per-(profile, part) uniqueKey on card/pager requests prevents Crawlee from deduping one profile's parts against another's (their URLs are identical; the vanity lives in the POST body).

[10.1.0] — 2026-06-02

Added

proxyConfiguration input (standard Apify proxy editor) — supports both Apify Proxy (groups/country) and custom proxy URLs. Search now routes through a proxy (linkedin/http.ts).
File-based proxy list (linkedin/proxies.ts): loads bird-proxies.txt (override via PROXY_LIST_FILE) and builds the fallback ProxyConfiguration from it — used when out of Apify Proxy credits. Takes precedence over the input proxy; the account's bound proxy still wins.
Generic lazy-section discovery (component-request.ts extractLazySectionRefs): reads com.linkedin.sdui.profile.card.ref{profileId}{Section} refs from card responses instead of hardcoding (generalizes the Skills-only special case).
Lazy sections via the detail pager (Lot D): section content is loaded with POST /flagship-web/rsc-action/actions/pagination?sduiid=…pagers.profile.details.{slug} (a proto.sdui.actions.requests.PaginationRequest, empty states, keyed on the viewee profileId) — the broken ref{profileId}Skills call (500) is removed. New SKILLS_PAGER + ENTITY_PAGERS (certifications/courses/publications/honors/projects/organizations/languages, recipes confirmed live) + buildPaginationBody/buildPaginationUrl/paginationPageKey (component-request.ts), extractPagerBuckets/extractSkillsFromPager (rsc-flight.ts), extractCertificationsFromPager (rsc-parser.ts).
- skills wired + verified live (6–57 clean skills).
- certifications wired (prefers the complete pager over the card inline) + verified live ({name, issuer, date}).
- honors / projects / organizations / languages wired + verified live via a generic entity grouper (groupPagerEntities): new ProfileData fields honors ({title, description}), projects ({name, dateRange, description}), organizations ({name, role, dateRange, description}); languages now come from the pager too.
- courses/publications: pager recipes ready in ENTITY_PAGERS, parsers still to add. volunteer/patents: pagerId TBD.
Offline test harness (pnpm run test:v2, src-v2/__tests__/): assertions over real captured RSC fixtures (request body shape, ref discovery, section parsing, top-card name/headline/location).
New profile fields profilePictureUrl + backgroundImageUrl (extractTopCardImages): owner's images scoped to the topCard (avoids recommendation/company avatars), largest rendition built from the SDUI image model rootUrl + suffixUrl (≈800×800 photo, 350×1400 cover).
followersCount / connectionsCount now extracted from the topCard rendered text (extractTopCardCounts) — these are not numeric JSON fields in the RSC. Handles both single-node ("255 followers") and split-node ("229" + "connections") forms, and "500+". Verified live: followers on creator profiles, connections on normal profiles.
Per-account browser fingerprint: user-agent, accept-language, and x-li-track on all LinkedIn requests are now driven by browser_session.fingerprint from Milkbox — UA, languages, timezone, and screen match the leased account's browser profile.
LINKEDIN_HTTP3 env toggle: enables HTTP/3 on impit for Phase-1 search requests. Forced off whenever a proxy is in use — impit cannot combine HTTP/3 with a proxy.

Changed

pnpm start now runs v2 (src-v2/main.ts) as the classic Apify actor entry (also used by apify run). v1 stays reachable via pnpm run start:v1. Added build:v2.
Minimal real-web component body (buildComponentBody): drops the guessed replaceableSectionArgs/vieweeProfileId/shouldSetupReplaceableComponent/profileComponentState. Component cards no longer require nonIterableProfileId, so a missed shell extraction no longer zeroes the profile (it's now only a health signal, not a gate).
x-li-application-version bumped 0.2.5529 → 0.2.5782 (the old value was stale).
Phase-1 search now uses impit (ImpitHttpClient, Firefox fingerprint) instead of axios, matching Phase-2's TLS fingerprint.

Fixed

Top-card parsing (extractTopCardFields): headline/name/location were mis-assigned (name or company used as headline; comma-bearing headline classified as location; pronouns "She/Her" used as headline; comma-less locations dropped). Now parsed by stable document order — verified live on satyanadella / williamhgates / hanaelliott / ghislaindurand.
firstName/lastName now fall back to findStringFieldInStream when the tree walk misses the profile shape on large (activity-inlined) shells.
Top-card name with a nickname (e.g. "Zsófia Réka (Sophie) Tóth"): the exact full-name match failed and the headline became the name; now falls back to a starts-with-first + contains-last match to skip the name line.
Phase-2 requests (profile GET + component POSTs) now egress through the leased account's residential proxy via a crawler-level ProxyConfiguration(newUrlFunction) resolving each session's bound proxy. The previous per-request request.proxyUrl was a no-op that leaked the datacenter IP.

Notes

Proxy precedence: a leased account's bound proxy wins (keeps the cookie ↔ IP pairing); the file/input proxy is the fallback for search and accounts without a bound proxy.
Known gaps: skills, certifications, honors, projects, organizations, languages now work via the section pager (Lot D). Remaining: courses, publications (recipe ready in ENTITY_PAGERS, parser to add) and volunteer/patents (pagerId still to capture). (The ref component and the detail-page GET are dead ends — 500 / "Something went wrong".) industryName is absent from the RSC (likely voyager/api JSON only); followersCount/connectionsCount are now read from the topCard text. languages name/proficiency pairing is fragile when a language has no proficiency level. See docs/DATA-SOURCING.md §6 and docs/DECISIONS.md D8.
bird-proxies.txt is committed with credentials in clear — should be moved to a secret / KV store.

[10.0.0] — 2026-06-02

Added

v2 scraping engine (src-v2/) talking directly to LinkedIn's SDUI / RSC API — no dependency on @bebity/linkedin-scraper.
Two-phase profile pipeline: keyword/URL search → vanity names, then per-profile RSC fetch + component POST calls (summary, experience, education, certifications, volunteer, languages, skills).
Milkbox-based account/cookie/proxy leasing behind an AccountProvider abstraction, with batched usage telemetry and auto-ban (LOGIN_REQUIRED) on auth errors.
Documentation under docs/ (ARCHITECTURE.md, DATA-SOURCING.md, RE/LINKEDIN_API_REVERSE_ENGINEERING.md).

Changed

Version set to 10.0.0 to mark the v2 rewrite (previously tracked the @bebity/linkedin-scraper package version, 7.x).

Notes

Input/output and published actor identity stay compatible with v1.
v2 currently supports profiles only (get-profiles); get-companies still throws.
enrichWithCompany / enrichWithContact are accepted in the input schema (v1 compat) but not yet implemented in v2.
v1 engine (src/, wrapping @bebity/linkedin-scraper) remains the deployed/shipping path.

Companies Search Scraper for LinkedIn | No Cookies

apimaestro/linkedin-companies-search-scraper

Scrape LinkedIn companies using keywords without login and get structured data including company profiles, industries, locations, and follower count.

API Maestro

1.8K

4.7

Linkedin Company Search

clothefobia/linkedin-company-search

Linkedin Company Search : search all companies on linkedin

clothe fobia

Linkedin Ads Library

data_link_miner/linkedin-ads-library

The scraper extract LinkedIn ads data & detect active advertisers. This actor allows you to scrape LinkedIn Ads Library to either: Extract full ad creatives and metadata.Check whether a company is running ads and how many ads are active

Data LinkMiner

5.0

LinkedIn Company Search Scraper ✅ No Cookies

harvestapi/linkedin-company-search

Search for LinkedIn companies with filters and extract detailed company information. No cookies or account required. Fast and reliable actor

HarvestAPI

4.9K

3.8

🔥Bulk Linkedin Company Profile Scraper (No Cookies)

dev_fusion/Linkedin-Company-Scraper

The LinkedIn Company Scraper Actor efficiently extracts comprehensive company information from LinkedIn, including names, industries, websites, employee counts, and more, without requiring cookies. It simplifies data gathering for market analysis and competitive research.

Dev Fusion

11K

4.0

LinkedIn Company Search Scraper

powerai/linkedin-company-search-scraper

Extract company information from LinkedIn with detailed metadata including company profiles, size, industry, and more. Perfect for market research, competitor analysis, and business development.

PowerAI

443

1.8

LinkedIn Company Scraper

scrapier/linkedin-company-scraper-actor

Scrape LinkedIn company data with the LinkedIn Company Scraper. Extract company names, industries, employee counts, locations, and descriptions. Ideal for market research, lead generation, and competitor analysis. Fast, accurate, and scalable for single or bulk company profiles.

Scrapier

Company Detail Scraper for LinkedIn (No Cookies)

apimaestro/linkedin-company-detail

Extract detailed LinkedIn company data instantly. Get company overview, employee count, locations, funding info, and more. Perfect for market research, lead generation, and competitor analysis. Clean, structured data ready for your business needs.

API Maestro

4.8K

3.1

Linkedin Leads Generator

contacts-api/linkedin-leads-generator

Generate high-quality B2B prospects with our LinkedIn Leads Generator. Collect verified emails and profile data for sales outreach, recruiting, and lead generation—accurate, fast, and scalable.

Lead Heaven

276

5.0

Linkedin Company Details Scraper (No Cookies) ✅ Bulk

harvestapi/linkedin-company

Extract detailed information from LinkedIn Companies such as company name, address, phone numbers, website, employee count, and more. Find LinkedIn company URLs by name in Bulk. No cookies or account required.

HarvestAPI

13K

4.7

🔥 Linkedin Companies & Profiles Bulk Scraper

Changelog

[Unreleased]

[10.14.1] — 2026-06-23

Changed

[10.14.0] — 2026-06-23

Fixed

Added

[10.13.0] — 2026-06-15

Fixed

Changed

Added

[10.12.0] — 2026-06-08

Added

[10.11.4] — 2026-06-08

Changed

Fixed

[10.11.3] — 2026-06-08

Changed

[10.11.2] — 2026-06-05

Added

Fixed

Fixed

[10.11.0] — 2026-06-05

Added

Changed

[10.10.1] — 2026-06-05

Fixed

[10.10.0] — 2026-06-05

Added

Fixed

[10.9.0] — 2026-06-04

Fixed

Changed

[10.8.0] — 2026-06-03

Changed

Added

[10.7.0] — 2026-06-03

Added

[10.6.0] — 2026-06-03

Fixed

Added

[10.5.0] — 2026-06-02

Added

Changed

[10.4.0] — 2026-06-02

Added

Docs

[10.3.1] — 2026-06-02

Changed

[10.3.0] — 2026-06-02

Added

Changed

Fixed

[10.2.1] — 2026-06-02

Fixed

Added

Docs

[10.2.0] — 2026-06-02

Added

Changed

Fixed

[10.1.0] — 2026-06-02

Added

Changed

Fixed

Notes

[10.0.0] — 2026-06-02

Added

Changed

Notes

You might also like

Companies Search Scraper for LinkedIn | No Cookies

Linkedin Company Search

Linkedin Ads Library

LinkedIn Company Search Scraper ✅ No Cookies

🔥Bulk Linkedin Company Profile Scraper (No Cookies)

LinkedIn Company Search Scraper

LinkedIn Company Scraper

Company Detail Scraper for LinkedIn (No Cookies)