Cmg Werknl Search Service avatar
Cmg Werknl Search Service
Under maintenance

Pricing

Pay per usage

Go to Store
Cmg Werknl Search Service

Cmg Werknl Search Service

Under maintenance
obedient_stars/cmg-werknl-search-service

Developed by

Synthwave Solutions

Maintained by Community

Automates searching on werk.nl

0.0 (0)

Pricing

Pay per usage

0

Monthly users

1

Last modified

19 hours ago

#CMG Werk.nl Search Service Documentation

1. Overview

The CMG Werk.nl Search Service is a specialized web scraping application designed to automate the process of searching for and extracting candidate profiles from Werk.nl, the Dutch government's employment website. The service uses browser automation to navigate the Werk.nl platform, perform searches based on specific criteria, extract candidate profile data, and store this information in a Firebase database.

2. System Architecture

The application follows a modular architecture with several key components:

2.1. Core Components

  1. Main Module (src/main.py): Entry point for the application that orchestrates the workflow.
  2. SeleniumAgentWerkNL: Core browser automation component that handles navigation, authentication, searching, and data extraction.
  3. FirebaseManager: Manages all interactions with the Firebase database for data storage and retrieval.
  4. TelegramAuthHandler: Handles two-factor authentication through Telegram bot integration.

2.2. Data Flow

  1. The process begins with input parameters including search queries, maximum profiles to extract, and user credentials.
  2. The application authenticates with Werk.nl using stored cookies or login credentials.
  3. If 2FA is required, the Telegram bot integration facilitates code entry.
  4. Once authenticated, the application performs searches based on the provided criteria.
  5. Candidate profiles matching the search criteria are extracted and enriched with additional data.
  6. The extracted profiles are stored in Firebase for later use.

3. Detailed Component Documentation

3.1. SeleniumAgentWerkNL

This is the primary component responsible for browser automation and interaction with the Werk.nl website.

Key Features:

  • Browser Initialization: Sets up a Selenium/undetected-chromedriver instance with proxy support and anti-detection features.
  • Authentication: Handles login to Werk.nl with support for 2FA via Telegram.
  • Cookie Management: Stores and retrieves browser cookies to maintain authentication sessions.
  • Profile Search: Executes searches based on specified criteria.
  • Data Extraction: Scrapes candidate profile data from search results.
  • Profile Enrichment: Gathers detailed information about candidates.

Implementation Details:

  • Uses undetected-chromedriver to bypass bot detection.
  • Implements realistic human-like interactions (random delays, mouse movements).
  • Handles various authentication flows including 2FA.

3.2. FirebaseManager

Responsible for all database operations with the Firebase backend.

Key Features:

  • User Management: Stores and retrieves user data and session information.
  • Job Management: Retrieves job details and search queries.
  • Candidate Storage: Saves extracted candidate profiles.
  • Cookie Management: Securely stores and retrieves authentication cookies with encryption.

Implementation Details:

  • Uses Firebase Admin SDK for server-side operations.
  • Implements encryption for sensitive data like cookies.
  • Provides batch operations for efficient data storage.

3.3. TelegramAuthHandler

Manages the two-factor authentication process using a Telegram bot.

Key Features:

  • 2FA Request: Sends requests for 2FA codes to users through Telegram.
  • Code Retrieval: Receives authentication codes from users via a streaming connection.
  • Notification: Sends status updates and messages to users.

Implementation Details:

  • Uses async HTTP client (httpx) for API communication.
  • Implements retry mechanisms with exponential backoff for reliability.
  • Provides streaming connection for real-time code receipt.

3.4. Data Models

The application uses Pydantic models for structured data handling:

Key Models:

  • ProfileData: Represents a candidate's complete profile information.
  • Experience: Describes a candidate's work experience.
  • Education: Represents educational background information.
  • CVProfile: Captures the core CV details for a candidate.
  • CandidateMatch: Stores matching scores between candidates and job opportunities.

4. Authentication Flow

The system uses a multi-step authentication process:

  1. Cookie-based Authentication:

    • The system first attempts to authenticate using stored cookies retrieved from Firebase.
    • Cookies are encrypted in storage and decrypted for use.
  2. Credential-based Login:

    • If cookies are invalid or expired, the system falls back to username/password authentication.
  3. Two-Factor Authentication:

    • When 2FA is required, the system: a. Sends a notification to the user via Telegram. b. Requests a 2FA code from the Telegram bot API. c. Waits for the user to provide the code through Telegram. d. Enters the received code on the Werk.nl login form. e. Notifies the user upon successful authentication.
  4. Session Persistence:

    • After successful authentication, cookies are captured and stored in Firebase for future use.

5. Search and Extraction Process

  1. Query Preparation:

    • The system retrieves search queries from Firebase or from direct input.
    • Queries are associated with specific job IDs and include search parameters like keywords, locations, etc.
  2. Search Execution:

    • The SeleniumAgentWerkNL navigates to the search interface on Werk.nl.
    • Search parameters are entered according to the provided queries.
    • Results are paginated and processed systematically.
  3. Profile Extraction:

    • Basic profile information is extracted from search results.
    • For each profile of interest, detailed information is gathered through profile page visits.
    • Data is structured according to the defined Pydantic models.
  4. Profile Filtering and Ranking:

    • Extracted profiles are filtered based on relevance criteria.
    • Profiles are ranked according to match quality for the specific job.
    • A maximum number of profiles (as specified in the input) are selected.
  5. Data Storage:

    • Selected profiles are stored in Firebase.
    • Association with the original job ID is maintained.
    • Additional metadata like extraction time and match scores are recorded.

6. Integration Points

6.1. Firebase Integration

  • Used for persistent storage of user data, job information, and candidate profiles.
  • Serves as the central data repository for the application ecosystem.

6.2. Telegram Bot Integration

  • Facilitates two-factor authentication process.
  • Provides real-time communication with users for authentication requirements.

6.3. Apify Platform Integration

  • The service is designed to run as an Apify Actor.
  • Leverages Apify's proxy infrastructure for reliable connections from Dutch IP addresses.
  • Uses Apify's logging and storage capabilities.

7. Error Handling and Resilience

  • Retry Mechanisms: Critical operations implement retry logic with exponential backoff.
  • Exception Handling: Comprehensive error catching and logging for debugging.
  • Status Tracking: Operation status is recorded in Firebase for monitoring.
  • Proxy Management: Dynamic proxy rotation for avoiding IP blocks.

8. Security Considerations

  • Credential Protection: User credentials are not hardcoded but supplied via environment variables.
  • Cookie Encryption: Authentication cookies are encrypted before storage.
  • Access Control: Firebase security rules control access to sensitive data.
  • 2FA Support: Two-factor authentication provides an additional security layer.

9. Deployment

The service is designed to be deployed as an Apify Actor, which provides:

  • Scheduled runs
  • Webhook integration
  • Input parameter configuration
  • Results storage and retrieval

The deployment requires configuration of:

  • Firebase credentials
  • Telegram bot API credentials
  • Werk.nl login credentials
  • Proxy settings

10. Limitations and Considerations

  • Rate Limiting: The service includes delays to avoid triggering rate limits on Werk.nl.
  • Session Management: Browser sessions need to be managed carefully to maintain authentication.
  • Site Changes: As with any scraper, changes to the Werk.nl website structure may require updates.
  • Legal Considerations: Usage should comply with Werk.nl's terms of service and applicable regulations.

11. Future Enhancements

Potential areas for enhancement include:

  • Improved profile matching algorithms
  • More sophisticated filtering options
  • Extended data extraction capabilities
  • Additional authentication methods
  • Performance optimizations for large-scale scraping

This documentation provides a comprehensive overview of the CMG Werk.nl Search Service, its architecture, functionality, and implementation details.

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.