Advanced Linkedin Jobs Scraper With Ai
Pricing
$20.00/month + usage
Advanced Linkedin Jobs Scraper With Ai
An intelligent, high-performance LinkedIn job scraper powered by LangGraph multi-agent system, LlamaIndex for semantic search, and Crawlee + Playwright for robust web scraping
Pricing
$20.00/month + usage
Rating
0.0
(0)
Developer

charith wijesundara
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Share
Advanced LinkedIn Jobs Scraper with Agentic AI
An intelligent, high-performance LinkedIn job scraper powered by LangGraph multi-agent system, LlamaIndex for semantic search, Crawlee + Playwright for robust web scraping, and multi-LLM support including OpenAI (GPT-4o) and Google Gemini. This Actor goes beyond basic job scraping to provide AI-powered job matching, market insights, and personalized recommendations.
[!NOTE] Versatile Use Cases: Designed for both Professional (HR, Recruitment, Market Research) and Personal (Job Hunting, Career Planning) work. Whether you're building a talent pipeline or looking for your next dream role, this tool has you covered.
π Features
π€ Agentic AI System
- Multi-LLM Support: Built-in support for OpenAI and Google Gemini models
- Multi-agent orchestration using LangGraph
- Intelligent workflow that scrapes, indexes, analyzes, and matches jobs
- Semantic understanding of job requirements and user profiles
π― Smart Job Matching
- Profile-based matching with relevance scoring (0-1)
- Skill analysis with percentage match calculations
- Location, salary, and experience matching
- Explainable AI - get reasons why each job matches your profile
π Market Insights
- Top skills in demand with frequency analysis
- Salary trends and averages
- Location distribution across jobs
- Experience level and employment type breakdowns
- Remote work availability statistics
π‘οΈ Robust Scraping
- Anti-bot protection with residential proxies
- Intelligent pagination and dynamic content handling
- Error recovery with retry logic
- Rate limiting to avoid blocks
π Quick Start
Basic Usage
{"jobTitle": "Python Developer","locations": ["New York, NY", "Remote"],"maxJobs": 50,"userSkills": ["Python", "FastAPI", "PostgreSQL", "Docker"],"userExperience": 3,"enableJobMatching": true,"enableMarketAnalysis": true}
Advanced Configuration (Google Gemini)
{"jobTitle": "Machine Learning Engineer","locations": ["San Francisco, CA", "Remote"],"maxJobs": 100,"datePosted": "week","experienceLevel": ["Mid-Senior level", "Director"],"remoteFilter": "Remote","userSkills": ["Python", "TensorFlow", "PyTorch", "MLOps"],"userExperience": 5,"salaryMin": 150000,"salaryMax": 250000,"preferredJobTypes": ["Full-time"],"mustHaveSkills": ["Python", "Machine Learning"],"remotePreference": "Remote","llmProvider": "google","modelName": "gemini-1.5-pro","googleApiKey": "YOUR_GOOGLE_API_KEY","proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
π₯ Input Parameters
Job Search Criteria
| Parameter | Type | Description | Required |
|---|---|---|---|
jobTitle | string | Job title or keywords to search | β |
locations | array | Job locations (e.g., "New York, NY", "Remote") | β |
maxJobs | integer | Maximum jobs to scrape (1-500) | β (default: 100) |
datePosted | enum | Filter by date: any, day, week, month | β (default: week) |
experienceLevel | array | Filter by seniority level | β |
remoteFilter | enum | Remote, On-site, Hybrid, or empty | β |
User Profile (for Matching)
| Parameter | Type | Description |
|---|---|---|
userSkills | array | Your technical skills |
userExperience | integer | Years of experience |
preferredLocations | array | Preferred job locations |
salaryMin | integer | Minimum desired salary (USD yearly) |
salaryMax | integer | Maximum desired salary (USD yearly) |
preferredJobTypes | array | Preferred employment types |
mustHaveSkills | array | Non-negotiable required skills |
remotePreference | enum | Remote work preference |
Configuration
| Parameter | Type | Description |
|---|---|---|
enableJobMatching | boolean | Enable AI job matching (default: true) |
enableMarketAnalysis | boolean | Generate market insights (default: true) |
modelName | enum | OpenAI model: gpt-4o, gpt-4o-mini, o1, o3-mini |
proxyConfiguration | object | Proxy settings (recommended: residential) |
debug | boolean | Enable debug logging |
π€ Output
Dataset: Job Matches
Each job includes:
- Basic Info: Title, company, location, URL
- Match Score: Relevance score (0-1) and reasons
- Details: Skills, salary, employment type, seniority level
- Metadata: Posted date, number of applicants, remote friendliness
Example:
{"title": "Senior Python Developer","company": "TechCorp Inc.","location": "New York, NY","relevance_score": 0.87,"match_reasons": ["Strong skill match", "Preferred location", "Salary matches expectations"],"skills_required": ["Python", "FastAPI", "PostgreSQL", "Docker", "AWS"],"salary_range": {"min": 120000,"max": 160000,"currency": "USD","period": "yearly"},"employment_type": "Full-time","seniority_level": "Mid-Senior level","posted_date": "2 days ago","num_applicants": 47,"url": "https://www.linkedin.com/jobs/view/123456789"}
Key-Value Store: Market Insights
{"total_jobs_analyzed": 100,"top_skills": [["Python", 78],["AWS", 56],["Docker", 45]],"avg_salary_range": {"min": 125000,"max": 175000,"currency": "USD"},"remote_jobs_percentage": 65.5,"avg_applicants": 42.3}
ποΈ Architecture
Technology Stack
- Agent Framework: LangGraph for multi-agent coordination
- Semantic Search: LlamaIndex with OpenAI embeddings
- Web Scraping: Crawlee + Playwright for robust scraping
- LLM: OpenAI GPT-4o/GPT-4o-mini
- Platform: Apify Actors (serverless)
Workflow
1. User Input β Parse criteria and profile2. Scraper Agent β Scrape LinkedIn jobs with Crawlee3. Indexer β Create vector index with LlamaIndex4. Analysis Agent β Generate market insights5. Matching Agent β Match jobs to profile with AI6. Output β Ranked jobs + insights
βοΈ How It Works
1. Job Scraping
- Crawlee navigates LinkedIn with Playwright
- Handles pagination and dynamic content
- Extracts comprehensive job details
- Uses residential proxies to avoid blocking
2. Intelligent Indexing
- LlamaIndex creates semantic embeddings
- Jobs indexed for fast similarity search
- Enables natural language queries
3. Profile Matching
- Multi-factor scoring algorithm:
- Skill match (40% weight)
- Location match (20% weight)
- Experience match (15% weight)
- Salary match (15% weight)
- Employment type (10% weight)
- Explainable results with match reasons
4. Market Analysis
- Aggregates data across all scraped jobs
- Identifies trending skills and technologies
- Calculates salary benchmarks
- Analyzes remote work availability
π οΈ Local Development
Prerequisites
- Python 3.14+
- Apify CLI:
npm install -g apify-cli - OpenAI API key
Setup
# Clone or navigate to directorycd langraph-linkedin-jobs-scraper# Set environment variablesexport OPENAI_API_KEY="your-openai-api-key"export APIFY_PROXY_PASSWORD="your-proxy-password" # Optional# Install dependenciespip install -r requirements.txt# Install Playwright browsersplaywright install chromium# Run locallyapify run
Input Format
Create storage/key_value_stores/default/INPUT.json:
{"jobTitle": "Data Scientist","locations": ["Remote"],"maxJobs": 20,"userSkills": ["Python", "Machine Learning"],"debug": true}
π¨ Important Notes
LinkedIn Blocking
LinkedIn actively blocks automated scrapers. To minimize blocking:
- β Use residential proxies (configured by default)
- β
Keep
maxJobsreasonable (<200) - β Don't run too frequently
- β Respect rate limits
Privacy & Ethics
- β Do not scrape personal data without permission
- β Respect LinkedIn's Terms of Service
- β Don't use for spam or unauthorized recruiting
- β Use for personal job searching and market research
API Costs
- OpenAI API usage for embeddings and LLM
- Apify platform usage and proxy costs
- Expect ~$0.10-0.50 per run depending on
maxJobs
π Resources
π Troubleshooting
"No jobs found"
- Check if LinkedIn is blocking your IP
- Try enabling residential proxies
- Verify job title and location are correct
"LinkedIn blocking/captcha"
- Use residential proxies (set in
proxyConfiguration) - Reduce
maxJobsparameter - Increase delays (modify scraper.py)
"OpenAI API errors"
- Verify
OPENAI_API_KEYis set correctly - Check API quota and billing
- Try switching to
gpt-4o-minifor lower cost
"Scraping timeout"
- Reduce
maxJobsparameter - Check internet connection
- LinkedIn might be experiencing issues
π€ Support
For issues, questions, or feedback:
- Open an issue on Apify
- Contact via Apify platform
- Check Apify documentation
Made with β€οΈ using LangGraph, LlamaIndex, and Crawlee