Cmg Linkedin Search Actor
Pricing
Pay per usage
Cmg Linkedin Search Actor
Automates Search On LinkedIn
0.0 (0)
Pricing
Pay per usage
0
Monthly users
1
Last modified
15 hours ago
CMG LinkedIn Search Actor Documentation
Project Overview
The CMG LinkedIn Search Actor is a sophisticated web scraping solution built on the Apify platform that automates the process of searching and extracting profile data from LinkedIn Sales Navigator. This actor leverages browser automation with Playwright to navigate LinkedIn, perform searches with filters, and extract detailed profile information.
Architecture
Core Components
-
Main Application (src/main.py)
- Entry point for the Apify Actor
- Handles input configuration, proxy setup, and orchestration of the scraping process
- Delegates the scraping work to the BrowserAgentLinkedIn service
-
BrowserAgentLinkedIn
- Manages browser sessions with proper configurations
- Handles authentication with LinkedIn using stored cookies or credentials
- Orchestrates the search and scraping workflow
- Utilizes AI models (OpenAI and Google Gemini) for data processing
- Interfaces with Firebase for data persistence
-
LinkedinScraper
- Handles the specific LinkedIn interactions
- Performs login, navigation, and data extraction
- Extracts profile URLs from search results
- Manages cookie persistence
-
ApifyActorManager
- Coordinates parallel scraping of profiles using Apify actors
- Manages concurrent scraping jobs
-
FirebaseManager
- Provides data persistence layer
- Stores and retrieves user credentials, cookies, and scraping results
- Caches URLs and profiles to avoid duplicate work
- Tracks job progress and status
-
Additional Services
- TelegramBotApiHandler: Likely used for notifications
- PlaywrightLinkedInScraper: Alternative scraping implementation
Data Flow
- The actor receives input parameters including search queries, maximum profiles to scrape, and job ID
- It sets up a browser session with appropriate configurations and proxy
- It authenticates with LinkedIn using stored cookies or performs login
- It navigates to the search URL with filters and extracts profile URLs
- It coordinates concurrent scraping of detailed profile information
- Results are stored in Firebase and returned as the actor output
Features
Authentication Management
- Securely stores and retrieves LinkedIn credentials from Firebase
- Maintains and updates cookies for persistent sessions
- Handles login flow when cookies are invalid or expired
Search Capabilities
- Supports complex LinkedIn Sales Navigator search filters
- Handles location filtering
- Extracts profile URLs from search results pages
- Avoids duplicate scraping by tracking previously processed URLs
Browser Automation
- Uses Playwright for browser automation
- Configures browser with appropriate settings to avoid detection
- Employs custom Chrome extension (extension.crx)
- Utilizes residential proxies for IP rotation
Data Processing
- Integrates AI models (OpenAI and Google Gemini) for data processing
- Structures extracted profile data into standardized formats
- Performs data validation and cleanup
Scalability
- Implements parallel processing of profile scraping
- Manages concurrency for optimal performance
- Caches results to avoid redundant work
Error Handling
- Implements robust error detection and recovery
- Logs errors for debugging
- Saves error details to Firebase for monitoring
Configuration
The actor supports several configuration options:
-
Search Parameters
search_queries
: LinkedIn search parametersurl_with_filters
: Pre-configured LinkedIn search URL with filtersmax_profiles
: Maximum number of profiles to scrape (default: 25)
-
Authentication
- User credentials stored in Firebase
- Session cookies for persistent login
-
Proxy Configuration
- Uses residential proxies through Apify Proxy
- Configurable proxy settings
-
AI Models
- Configurable OpenAI and Google Gemini models for data processing
Technical Implementation
Dependencies
- Apify SDK for Python
- Playwright for browser automation
- Firebase for data persistence
- LangChain with OpenAI and Google Generative AI integrations
- Custom browser automation utilities
Data Storage
- Primary storage in Firebase
- Structured data export to Apify Dataset
- Local JSON file caching
Security Features
- Encrypted credential storage
- Proxy usage for IP rotation
- Browser fingerprint randomization
- User agent rotation
Usage
The actor is designed to be run on the Apify platform but can also be executed locally for development and testing purposes. The main entry point is the main.py
file, which handles the actor lifecycle and orchestrates the scraping process.
Key input parameters:
search_queries
: LinkedIn search parametersmax_profiles
: Maximum number of profiles to scrapejob_id
: Unique identifier for the scraping joburl_with_filters
: Pre-configured LinkedIn search URL with filtersuser_id
: User identifier for authentication
Development and Testing
The codebase includes:
- Test scripts for local testing
- Configuration for development environments
- Debugging utilities
- Error logging and monitoring
Limitations and Considerations
- LinkedIn's terms of service restrictions
- Rate limiting considerations
- IP blocking prevention measures
- Authentication challenges
This documentation provides a comprehensive overview of the CMG LinkedIn Search Actor, its architecture, features, and technical implementation. The actor represents a sophisticated solution for automated LinkedIn profile data extraction and processing.
Pricing
Pricing model
Pay per usageThis Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.