Cmg Linkedin Search Actor avatar
Cmg Linkedin Search Actor
Under maintenance

Pricing

Pay per usage

Go to Store
Cmg Linkedin Search Actor

Cmg Linkedin Search Actor

Under maintenance
obedient_stars/cmg-linkedin-search-actor

Developed by

Synthwave Solutions

Maintained by Community

Automates Search On LinkedIn

0.0 (0)

Pricing

Pay per usage

0

Monthly users

1

Last modified

15 hours ago

CMG LinkedIn Search Actor Documentation

Project Overview

The CMG LinkedIn Search Actor is a sophisticated web scraping solution built on the Apify platform that automates the process of searching and extracting profile data from LinkedIn Sales Navigator. This actor leverages browser automation with Playwright to navigate LinkedIn, perform searches with filters, and extract detailed profile information.

Architecture

Core Components

  1. Main Application (src/main.py)

    • Entry point for the Apify Actor
    • Handles input configuration, proxy setup, and orchestration of the scraping process
    • Delegates the scraping work to the BrowserAgentLinkedIn service
  2. BrowserAgentLinkedIn

    • Manages browser sessions with proper configurations
    • Handles authentication with LinkedIn using stored cookies or credentials
    • Orchestrates the search and scraping workflow
    • Utilizes AI models (OpenAI and Google Gemini) for data processing
    • Interfaces with Firebase for data persistence
  3. LinkedinScraper

    • Handles the specific LinkedIn interactions
    • Performs login, navigation, and data extraction
    • Extracts profile URLs from search results
    • Manages cookie persistence
  4. ApifyActorManager

    • Coordinates parallel scraping of profiles using Apify actors
    • Manages concurrent scraping jobs
  5. FirebaseManager

    • Provides data persistence layer
    • Stores and retrieves user credentials, cookies, and scraping results
    • Caches URLs and profiles to avoid duplicate work
    • Tracks job progress and status
  6. Additional Services

    • TelegramBotApiHandler: Likely used for notifications
    • PlaywrightLinkedInScraper: Alternative scraping implementation

Data Flow

  1. The actor receives input parameters including search queries, maximum profiles to scrape, and job ID
  2. It sets up a browser session with appropriate configurations and proxy
  3. It authenticates with LinkedIn using stored cookies or performs login
  4. It navigates to the search URL with filters and extracts profile URLs
  5. It coordinates concurrent scraping of detailed profile information
  6. Results are stored in Firebase and returned as the actor output

Features

Authentication Management

  • Securely stores and retrieves LinkedIn credentials from Firebase
  • Maintains and updates cookies for persistent sessions
  • Handles login flow when cookies are invalid or expired

Search Capabilities

  • Supports complex LinkedIn Sales Navigator search filters
  • Handles location filtering
  • Extracts profile URLs from search results pages
  • Avoids duplicate scraping by tracking previously processed URLs

Browser Automation

  • Uses Playwright for browser automation
  • Configures browser with appropriate settings to avoid detection
  • Employs custom Chrome extension (extension.crx)
  • Utilizes residential proxies for IP rotation

Data Processing

  • Integrates AI models (OpenAI and Google Gemini) for data processing
  • Structures extracted profile data into standardized formats
  • Performs data validation and cleanup

Scalability

  • Implements parallel processing of profile scraping
  • Manages concurrency for optimal performance
  • Caches results to avoid redundant work

Error Handling

  • Implements robust error detection and recovery
  • Logs errors for debugging
  • Saves error details to Firebase for monitoring

Configuration

The actor supports several configuration options:

  1. Search Parameters

    • search_queries: LinkedIn search parameters
    • url_with_filters: Pre-configured LinkedIn search URL with filters
    • max_profiles: Maximum number of profiles to scrape (default: 25)
  2. Authentication

    • User credentials stored in Firebase
    • Session cookies for persistent login
  3. Proxy Configuration

    • Uses residential proxies through Apify Proxy
    • Configurable proxy settings
  4. AI Models

    • Configurable OpenAI and Google Gemini models for data processing

Technical Implementation

Dependencies

  • Apify SDK for Python
  • Playwright for browser automation
  • Firebase for data persistence
  • LangChain with OpenAI and Google Generative AI integrations
  • Custom browser automation utilities

Data Storage

  • Primary storage in Firebase
  • Structured data export to Apify Dataset
  • Local JSON file caching

Security Features

  • Encrypted credential storage
  • Proxy usage for IP rotation
  • Browser fingerprint randomization
  • User agent rotation

Usage

The actor is designed to be run on the Apify platform but can also be executed locally for development and testing purposes. The main entry point is the main.py file, which handles the actor lifecycle and orchestrates the scraping process.

Key input parameters:

  • search_queries: LinkedIn search parameters
  • max_profiles: Maximum number of profiles to scrape
  • job_id: Unique identifier for the scraping job
  • url_with_filters: Pre-configured LinkedIn search URL with filters
  • user_id: User identifier for authentication

Development and Testing

The codebase includes:

  • Test scripts for local testing
  • Configuration for development environments
  • Debugging utilities
  • Error logging and monitoring

Limitations and Considerations

  • LinkedIn's terms of service restrictions
  • Rate limiting considerations
  • IP blocking prevention measures
  • Authentication challenges

This documentation provides a comprehensive overview of the CMG LinkedIn Search Actor, its architecture, features, and technical implementation. The actor represents a sophisticated solution for automated LinkedIn profile data extraction and processing.

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.