Pricing

from $2.40 / 1,000 candidate-results

Resume / Candidate Profile Scraper

Extract structured candidate data from public resume, portfolio, GitHub, and profile URLs into flat, CSV-ready rows with skills, visible contacts, profile links, and a completeness score — no login, cookies, or residential proxy.

Pricing

from $2.40 / 1,000 candidate-results

Rating

0.0

(0)

Developer

Delowar Munna

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

✨ Why this scraper

Public-only & safe — no logged-in LinkedIn / Indeed Resume / Naukri / Seek private databases, no cookies, no credentials. Login-required pages are skipped, never bypassed.
Mixed inputs — public HTML profiles, personal sites, portfolios, public GitHub profiles, and directly public PDF/text resumes, all into one stable schema.
32 flat fields — identity, visible contacts, profile links, source tracking, detected skills, completeness score. No nested objects; drops straight into Sheets/Excel/CRMs.
Transparent completeness score — rule-based (no AI), explained below.
Pay-Per-Event — one flat candidate-result event per saved unique candidate. Failed, skipped, duplicate, and filtered rows are never charged.

🚀 Quick start — sample inputs

Example 1 — mixed public URLs with skill detection

{
    "startUrls": [
        { "url": "https://github.com/addyosmani" },
        { "url": "https://github.com/sindresorhus" },
        { "url": "https://kentcdodds.com" }
    ],
    "sourceType": "auto",
    "maxResults": 100,
    "includePdfText": true,
    "skillKeywords": ["TypeScript", "React", "AWS", "Node.js"],
    "deduplicate": true,
    "proxyConfiguration": { "useApifyProxy": true }
}

Example 2 — filtered shortlist + custom residential proxy via your own provider

{
    "startUrls": [
        { "url": "https://github.com/yyx990803" },
        { "url": "https://github.com/antfu" },
        { "url": "https://github.com/getify" },
        { "url": "https://feross.org" }
    ],
    "maxResults": 250,
    "requiredKeywords": ["typescript"],
    "minCompletenessScore": 50,
    "deduplicate": true,
    "proxyConfiguration": {
        "useApifyProxy": false,
        "proxyUrls": ["http://user:pass@proxy.iproyal.com:12321"]
    }
}

Tip: a public resume URL like { "url": "https://example.com/jane-doe-resume.pdf" } also works — directly public PDF resumes are parsed with includePdfText: true and fill the education_summary / experience_summary / certifications_text fields that profile pages usually leave empty.

Provide at least one valid public HTTP/HTTPS URL in startUrls. Unsupported protocols (file:, ftp:, mailto:, tel:) are rejected, and duplicate URLs are removed before crawling.

The actor blocks Apify Residential proxy; if you need residential routing, supply your own provider via proxyConfiguration.proxyUrls as shown. See 🚦 Proxy policy below.

📦 Output

The dataset has one view: Candidates — a 32-column flat table.

Resume / Candidate Profile Scraper — all-fields table view

Output fields (32)

candidate_name, headline, current_title, current_company, location_text, email, phone, website_url, linkedin_url, github_url, portfolio_url, source_url, canonical_url, source_domain, source_type, resume_file_type, skills_detected, skill_count, matched_keywords, experience_years_text, education_summary, experience_summary, certifications_text, languages_text, public_contact_available, profile_completeness_score, profile_quality_label, reason_tags, page_title, page_text_snippet, input_index, scraped_at.

Scalar fields fall back to null, comma-joined lists to "", counts/scores to 0, and booleans to false when a value isn't visibly present.

Sample records — Candidates

Real output rows (public GitHub / personal-site profiles). Fields populate from what's publicly visible — resume-section fields (education_summary, experience_summary, certifications_text) are blank on profile pages and fill in from public resume PDFs.

A public GitHub profile (github_profile):

{
    "candidate_name": "Addy Osmani",
    "headline": "Director at Google working on Gemini and Google Cloud",
    "current_title": "Director",
    "current_company": "Google",
    "location_text": "Sunnyvale, California",
    "email": null,
    "phone": null,
    "website_url": "https://www.addyosmani.com/",
    "linkedin_url": "https://www.linkedin.com/in/addyosmani",
    "github_url": "https://github.com/addyosmani",
    "portfolio_url": null,
    "source_url": "https://github.com/addyosmani",
    "canonical_url": "https://github.com/addyosmani",
    "source_domain": "github.com",
    "source_type": "github_profile",
    "resume_file_type": "html",
    "skills_detected": "javascript, html, css, react, vue, angular, google cloud",
    "skill_count": 7,
    "matched_keywords": "react",
    "experience_years_text": null,
    "education_summary": null,
    "experience_summary": null,
    "certifications_text": null,
    "languages_text": null,
    "public_contact_available": false,
    "profile_completeness_score": 70,
    "profile_quality_label": "high",
    "reason_tags": "has_linkedin,has_github,has_skills,keyword_match",
    "page_title": "addyosmani (Addy Osmani) · GitHub",
    "page_text_snippet": null,
    "input_index": 4,
    "scraped_at": "2026-06-07T12:33:34.659Z"
}

A personal-site profile (public_profile) with a visible contact:

{
    "candidate_name": "Lee Robinson",
    "headline": "VP of Developer Experience",
    "current_title": "VP of Developer Experience",
    "current_company": "Cursor",
    "location_text": null,
    "email": "lee@leerob.com",
    "phone": null,
    "website_url": "https://leerob.com/",
    "linkedin_url": "https://www.linkedin.com/in/leeerob",
    "github_url": "https://github.com/leerob",
    "portfolio_url": null,
    "source_url": "https://leerob.com/",
    "canonical_url": "https://leerob.com/",
    "source_domain": "leerob.com",
    "source_type": "public_profile",
    "resume_file_type": "html",
    "skills_detected": "",
    "skill_count": 0,
    "matched_keywords": "",
    "experience_years_text": "15 years",
    "education_summary": null,
    "experience_summary": null,
    "certifications_text": null,
    "languages_text": null,
    "public_contact_available": true,
    "profile_completeness_score": 65,
    "profile_quality_label": "medium",
    "reason_tags": "has_public_email,has_linkedin,has_github,public_profile",
    "page_title": "Lee Robinson",
    "page_text_snippet": null,
    "input_index": 19,
    "scraped_at": "2026-06-07T12:33:41.035Z"
}

🎯 Profile-completeness score

Transparent rule-based score (0–100) computed from extracted fields — no AI, no external enrichment.

Signal	Points
`candidate_name` present	+15
`headline` or `current_title` present	+15
`current_company` present	+10
`location_text` present	+10
at least one public contact (`email` or `phone`)	+15
any profile link (`linkedin` / `github` / `portfolio` / website)	+10
`skill_count >= 3`	+10
`experience_summary` present	+10
`education_summary` or `certifications_text` present	+5

Score is capped at 100.

Labels: high (70–100) · medium (40–69) · low (0–39).

reason_tags is a comma-separated list explaining the row — e.g. has_public_email, has_public_phone, has_linkedin, has_github, has_portfolio, has_skills, has_experience, has_education, resume_pdf, public_profile, low_information, plus keyword_match / location_match when your filters matched.

⚙️ Filters

Filter	Effect
`requiredKeywords`	Keep only rows whose visible text or detected skills contain at least one keyword. Missing text fails.
`locationIncludes`	Keep only rows whose `location_text` contains one of the values. Missing location fails (when set).
`minCompletenessScore`	Keep only rows scoring at or above the threshold (0–100).
`deduplicate`	Drop duplicates by email, canonical/profile URL, or name + source; the richer duplicate is kept.

Filters are applied after extraction and before any dataset push or event charge. Filtered-out rows are counted in filtered_out and never charged.

💰 Pricing

Pay-Per-Event. One flat event per saved row (final per-event price is configured on the Apify console):

Event	Charged when
`candidate-result`	Once per unique candidate row that passed all filters and was successfully written to the dataset.

So your bill is simply results_saved × price_per_event. The actor honors the user-configured per-run spending cap (Apify eventChargeLimitReached): it caps how many results it collects up-front to what the limit can pay for, and stops cleanly the moment the cap is reached during charging.

Not charged:

Failed inputs and blocked/transient errors.
Pages skipped because they require login / cookies / private access.
Duplicates (by email, canonical/profile URL, name + source).
Rows filtered out by requiredKeywords / locationIncludes / minCompletenessScore.
Pure low-information / error rows (no useful candidate signal).

🚦 Proxy policy

Use Apify Datacenter proxy or no proxy for normal runs — both work for public resume/profile pages at this actor's conservative concurrency.

Apify Residential proxy is not supported. The actor will fail at startup if proxyConfiguration.apifyProxyGroups includes RESIDENTIAL. Reason: in pay-per-event actors, residential bandwidth (~/GB) is billed to the developer, not the run user, so a single bandwidth-heavy run could exceed the per-result event revenue.

If you genuinely need residential routing, supply your own residential provider via the proxy editor's Custom proxy URLs field — that traffic goes through your provider, not Apify, and is unaffected:

http://user:pass@proxy.iproyal.com:12321
http://user:pass@proxy.brightdata.com:22225
http://user:pass@proxy.oxylabs.io:7777

📊 Run summary

After each run, a RUN_SUMMARY entry is written to the key-value store:

{
    "inputs_total": 20,
    "successful_inputs": 20,
    "failed_inputs": 0,
    "skipped_private_or_login_required": 0,
    "raw_results_found": 20,
    "results_saved": 19,
    "duplicates_removed": 1,
    "filtered_out": 0,
    "charged_events": 19,
    "charge_failures": 0,
    "blocked_requests": 0,
    "retry_count": 0,
    "pdfs_processed": 0,
    "pdfs_skipped": 0,
    "html_pages_processed": 20,
    "runtime_seconds": 12,
    "scraped_at": "2026-06-07T12:33:45.708Z"
}

charged_events equals the number of successfully saved unique candidate rows.

🚧 Limitations (V1)

Public data only: no login, cookies, sessions, or member-only content. Pages behind an auth/login wall, paywall, or captcha are skipped (counted in skipped_private_or_login_required), never bypassed.
HTTP-first: HTML + directly public PDF/text resumes. No browser automation in V1 (a future opt-in), no media/image downloads, and no crawling beyond the URLs you provide.
Visible-only contacts: email / phone are extracted only when publicly visible (mailto/tel links, structured data, or visible text). No enrichment, verification, or append.
No AI: skills come from a static dictionary plus your skillKeywords; the completeness score is rule-based.
PDF caps: PDFs over 10 MB are skipped; extracted text is truncated for memory safety. Only structured fields are stored — not full document text.

❓ FAQ

Do I need any account, cookie, or API key? No. The actor only fetches public URLs over HTTP. No usernames, passwords, cookies, authorization headers, session tokens, or paid people-data vendor keys are accepted.

Which URLs work best? Public personal sites / "about" pages, public portfolios, public GitHub profiles, and directly public PDF/text resumes. Private resume databases and logged-in LinkedIn/Indeed pages are out of scope.

Why are some fields empty? Fields populate only when the value is visibly present on the page or in the PDF text. Missing scalars are null, missing lists are "".

How is profile_completeness_score computed? A transparent rule-based sum (see above) — no AI. Use it with minCompletenessScore to keep only richer profiles.

Can I export to CSV? Yes — every field is flat (no nested objects). Use Apify's CSV / Excel export, or the dataset API with format=csv.

🛠️ Technical notes

Stack: Node.js 22 · Apify SDK 3 · Crawlee HttpCrawler · Cheerio (HTML) · unpdf (public PDF text). No browser.
Concurrency: min=1, max=10 (conservative; tune after real runs).
Memory: 1 GB min · 2 GB default · 4 GB max.
Proxy: Apify Proxy enabled by default; custom proxy URLs accepted; Apify Residential rejected at startup.
Reliability: session rotation, realistic headers, and retry/backoff on transient 429/5xx. Auth walls and 401/403 are skipped without retry.

HeadHunter Russia Resume CV Scraper 🇷🇺🔎 hh.ru - Cheap 🧑‍💼

scrapestorm/headhunter-russia-resume-cv-scraper-hh-ru---cheap

🔍 Easily Collect HeadHunter Resume 🇷🇺🧑‍💼 Extract resume search results from HeadHunter (hh.ru) for any search URL, including resume links, job titles, candidate age & more 📊 Perfect for recruiter sourcing, talent pipeline building, labor market research & resume visibility tracking 🚀✨

Storm_Scraper

Indeed Resume Scraper

lexis-solutions/resume-indeed-com-scraper

Indeed Resume scraper for recruiters: scrape Indeed resumes, extract candidate name, location, title, skills, experience and education, build hiring lists, enrich ATS/CRM, and automate candidate sourcing from resumes.indeed.com

Lexis Solutions

Resume / CV Parser (Claude → Structured JSON)

gochujang/resume-parser

Pass a PDF resume URL (or text). Returns structured JSON: name, email, phone, location, current title, skills, education, experience (with highlights), languages, links. Powered by Claude with strict schema. BYO Anthropic API key. $0.02 per resume.

Hojun Lee

LinkedIn Public Profile Extractor - No Login, No Cookies

whoareyouanas/linkedin-profile-actor

Extract publicly visible LinkedIn profile details from profile URLs or usernames using a lightweight HTTP-first actor.

Anas Nadeem

Hh Resume Search Scraper

soft_alexist/hh-resume-search-scraper

Scrape resume profiles from hh.ru's search engine with precision. Collect candidate names, skills, salary expectations, job search status, and 20+ professional attributes in structured JSON — perfect for recruiters, HR teams, and talent acquisition platforms.

Soft Alexist

Ai Resume Scorer

vivid_astronaut/ai-resume-scorer

Fabio Suizu

LinkedIn Candidate Finder - Recruiter Sourcing, No Cookies

thirdwatch/linkedin-candidate-finder-scraper

Find LinkedIn profiles that match recruiter requirements (role, skills, location, experience, companies). Returns candidate name, headline and profile URL.

Thirdwatch

LinkedIn Profile Scraper - No Login, No Cookies

logiover/linkedin-profile-scraper

No-login LinkedIn profile scraper & API alternative. Export public people data to CSV/JSON - bulk profile data extraction, no cookies.

Logiover

Job Description Keyword Analyzer – Match Resume Score

scrapepilot/job-description-keyword-analyzer-match-resume-score

Paste your resume, add job URLs or text – get match score, top required keywords, matched & missing keywords. Tailor your CV for each application. Bulk processing, checkpoint/resume. $7.99/month unlimited.

Scrape Pilot

instagram profile scraper pro

qaseemiqbal/instagram-profile-scraper-pro

Extract clean public Instagram profile data from usernames, profile URLs, or profile IDs. Get followers, bio, links, public contact clues, business signals, and profile status in a ready-to-download dataset.

Muhammad Qaseem Iqbal