Advanced Linkedin Jobs Scraper With Ai avatar
Advanced Linkedin Jobs Scraper With Ai

Pricing

$20.00/month + usage

Go to Apify Store
Advanced Linkedin Jobs Scraper With Ai

Advanced Linkedin Jobs Scraper With Ai

An intelligent, high-performance LinkedIn job scraper powered by LangGraph multi-agent system, LlamaIndex for semantic search, and Crawlee + Playwright for robust web scraping

Pricing

$20.00/month + usage

Rating

0.0

(0)

Developer

charith wijesundara

charith wijesundara

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Categories

Share

Advanced LinkedIn Jobs Scraper with Agentic AI

An intelligent, high-performance LinkedIn job scraper powered by LangGraph multi-agent system, LlamaIndex for semantic search, Crawlee + Playwright for robust web scraping, and multi-LLM support including OpenAI (GPT-4o) and Google Gemini. This Actor goes beyond basic job scraping to provide AI-powered job matching, market insights, and personalized recommendations.

[!NOTE] Versatile Use Cases: Designed for both Professional (HR, Recruitment, Market Research) and Personal (Job Hunting, Career Planning) work. Whether you're building a talent pipeline or looking for your next dream role, this tool has you covered.

🌟 Features

πŸ€– Agentic AI System

  • Multi-LLM Support: Built-in support for OpenAI and Google Gemini models
  • Multi-agent orchestration using LangGraph
  • Intelligent workflow that scrapes, indexes, analyzes, and matches jobs
  • Semantic understanding of job requirements and user profiles

🎯 Smart Job Matching

  • Profile-based matching with relevance scoring (0-1)
  • Skill analysis with percentage match calculations
  • Location, salary, and experience matching
  • Explainable AI - get reasons why each job matches your profile

πŸ“Š Market Insights

  • Top skills in demand with frequency analysis
  • Salary trends and averages
  • Location distribution across jobs
  • Experience level and employment type breakdowns
  • Remote work availability statistics

πŸ›‘οΈ Robust Scraping

  • Anti-bot protection with residential proxies
  • Intelligent pagination and dynamic content handling
  • Error recovery with retry logic
  • Rate limiting to avoid blocks

πŸš€ Quick Start

Basic Usage

{
"jobTitle": "Python Developer",
"locations": ["New York, NY", "Remote"],
"maxJobs": 50,
"userSkills": ["Python", "FastAPI", "PostgreSQL", "Docker"],
"userExperience": 3,
"enableJobMatching": true,
"enableMarketAnalysis": true
}

Advanced Configuration (Google Gemini)

{
"jobTitle": "Machine Learning Engineer",
"locations": ["San Francisco, CA", "Remote"],
"maxJobs": 100,
"datePosted": "week",
"experienceLevel": ["Mid-Senior level", "Director"],
"remoteFilter": "Remote",
"userSkills": ["Python", "TensorFlow", "PyTorch", "MLOps"],
"userExperience": 5,
"salaryMin": 150000,
"salaryMax": 250000,
"preferredJobTypes": ["Full-time"],
"mustHaveSkills": ["Python", "Machine Learning"],
"remotePreference": "Remote",
"llmProvider": "google",
"modelName": "gemini-1.5-pro",
"googleApiKey": "YOUR_GOOGLE_API_KEY",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

πŸ“₯ Input Parameters

Job Search Criteria

ParameterTypeDescriptionRequired
jobTitlestringJob title or keywords to searchβœ…
locationsarrayJob locations (e.g., "New York, NY", "Remote")❌
maxJobsintegerMaximum jobs to scrape (1-500)❌ (default: 100)
datePostedenumFilter by date: any, day, week, month❌ (default: week)
experienceLevelarrayFilter by seniority level❌
remoteFilterenumRemote, On-site, Hybrid, or empty❌

User Profile (for Matching)

ParameterTypeDescription
userSkillsarrayYour technical skills
userExperienceintegerYears of experience
preferredLocationsarrayPreferred job locations
salaryMinintegerMinimum desired salary (USD yearly)
salaryMaxintegerMaximum desired salary (USD yearly)
preferredJobTypesarrayPreferred employment types
mustHaveSkillsarrayNon-negotiable required skills
remotePreferenceenumRemote work preference

Configuration

ParameterTypeDescription
enableJobMatchingbooleanEnable AI job matching (default: true)
enableMarketAnalysisbooleanGenerate market insights (default: true)
modelNameenumOpenAI model: gpt-4o, gpt-4o-mini, o1, o3-mini
proxyConfigurationobjectProxy settings (recommended: residential)
debugbooleanEnable debug logging

πŸ“€ Output

Dataset: Job Matches

Each job includes:

  • Basic Info: Title, company, location, URL
  • Match Score: Relevance score (0-1) and reasons
  • Details: Skills, salary, employment type, seniority level
  • Metadata: Posted date, number of applicants, remote friendliness

Example:

{
"title": "Senior Python Developer",
"company": "TechCorp Inc.",
"location": "New York, NY",
"relevance_score": 0.87,
"match_reasons": ["Strong skill match", "Preferred location", "Salary matches expectations"],
"skills_required": ["Python", "FastAPI", "PostgreSQL", "Docker", "AWS"],
"salary_range": {
"min": 120000,
"max": 160000,
"currency": "USD",
"period": "yearly"
},
"employment_type": "Full-time",
"seniority_level": "Mid-Senior level",
"posted_date": "2 days ago",
"num_applicants": 47,
"url": "https://www.linkedin.com/jobs/view/123456789"
}

Key-Value Store: Market Insights

{
"total_jobs_analyzed": 100,
"top_skills": [
["Python", 78],
["AWS", 56],
["Docker", 45]
],
"avg_salary_range": {
"min": 125000,
"max": 175000,
"currency": "USD"
},
"remote_jobs_percentage": 65.5,
"avg_applicants": 42.3
}

πŸ—οΈ Architecture

Technology Stack

  • Agent Framework: LangGraph for multi-agent coordination
  • Semantic Search: LlamaIndex with OpenAI embeddings
  • Web Scraping: Crawlee + Playwright for robust scraping
  • LLM: OpenAI GPT-4o/GPT-4o-mini
  • Platform: Apify Actors (serverless)

Workflow

1. User Input β†’ Parse criteria and profile
2. Scraper Agent β†’ Scrape LinkedIn jobs with Crawlee
3. Indexer β†’ Create vector index with LlamaIndex
4. Analysis Agent β†’ Generate market insights
5. Matching Agent β†’ Match jobs to profile with AI
6. Output β†’ Ranked jobs + insights

βš™οΈ How It Works

1. Job Scraping

  • Crawlee navigates LinkedIn with Playwright
  • Handles pagination and dynamic content
  • Extracts comprehensive job details
  • Uses residential proxies to avoid blocking

2. Intelligent Indexing

  • LlamaIndex creates semantic embeddings
  • Jobs indexed for fast similarity search
  • Enables natural language queries

3. Profile Matching

  • Multi-factor scoring algorithm:
    • Skill match (40% weight)
    • Location match (20% weight)
    • Experience match (15% weight)
    • Salary match (15% weight)
    • Employment type (10% weight)
  • Explainable results with match reasons

4. Market Analysis

  • Aggregates data across all scraped jobs
  • Identifies trending skills and technologies
  • Calculates salary benchmarks
  • Analyzes remote work availability

πŸ› οΈ Local Development

Prerequisites

  • Python 3.14+
  • Apify CLI: npm install -g apify-cli
  • OpenAI API key

Setup

# Clone or navigate to directory
cd langraph-linkedin-jobs-scraper
# Set environment variables
export OPENAI_API_KEY="your-openai-api-key"
export APIFY_PROXY_PASSWORD="your-proxy-password" # Optional
# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers
playwright install chromium
# Run locally
apify run

Input Format

Create storage/key_value_stores/default/INPUT.json:

{
"jobTitle": "Data Scientist",
"locations": ["Remote"],
"maxJobs": 20,
"userSkills": ["Python", "Machine Learning"],
"debug": true
}

🚨 Important Notes

LinkedIn Blocking

LinkedIn actively blocks automated scrapers. To minimize blocking:

  • βœ… Use residential proxies (configured by default)
  • βœ… Keep maxJobs reasonable (<200)
  • βœ… Don't run too frequently
  • βœ… Respect rate limits

Privacy & Ethics

  • ❌ Do not scrape personal data without permission
  • ❌ Respect LinkedIn's Terms of Service
  • ❌ Don't use for spam or unauthorized recruiting
  • βœ… Use for personal job searching and market research

API Costs

  • OpenAI API usage for embeddings and LLM
  • Apify platform usage and proxy costs
  • Expect ~$0.10-0.50 per run depending on maxJobs

πŸ“š Resources

πŸ› Troubleshooting

"No jobs found"

  • Check if LinkedIn is blocking your IP
  • Try enabling residential proxies
  • Verify job title and location are correct

"LinkedIn blocking/captcha"

  • Use residential proxies (set in proxyConfiguration)
  • Reduce maxJobs parameter
  • Increase delays (modify scraper.py)

"OpenAI API errors"

  • Verify OPENAI_API_KEY is set correctly
  • Check API quota and billing
  • Try switching to gpt-4o-mini for lower cost

"Scraping timeout"

  • Reduce maxJobs parameter
  • Check internet connection
  • LinkedIn might be experiencing issues

🀝 Support

For issues, questions, or feedback:

  • Open an issue on Apify
  • Contact via Apify platform
  • Check Apify documentation

Made with ❀️ using LangGraph, LlamaIndex, and Crawlee