Pricing

$10.00 / 1,000 results

Go to Store

✨Mass Linkedin Profile Scraper with Email 📧 (No Cookies)

Try for free

Developed by

Dev Fusion

Scrape Linkedin profiles and get full information of the lead.

4.7 (39)

Pricing

$10.00 / 1,000 results

476

Total users

8.6K

Monthly users

Runs succeeded

>99%

Issues response

11 hours

Last modified

2 days ago

Lead generation

Back to issues Create new issue

Comprehensive Issues in LinkedIn Lead Processing Workflow: LinkedIn URL Normalization and Deduplication Failures, Apify Scraper Duplicates & Limits, and Intermittent Email Validation Errors (n8n v1.97.1)

Closed

hypnotic_mantis opened this issue

Bug Description

I'm encountering a series of interconnected and persistent issues with my n8n workflow designed for LinkedIn lead processing, specifically concerning LinkedIn URL normalization, deduplication, Apify scraping, and email validation. These problems have been present since I started building this complex workflow.

My overall workflow aims to:

Get leads from Google Sheets. Normalize LinkedIn profile URLs for consistent identification. Deduplicate leads based on normalized LinkedIn URLs and email addresses. Scrape LinkedIn profile data using Apify. Validate email addresses using EmailGuard.io. Generate personalized outreach materials and save results to Outlook. Here's a detailed breakdown of the problems I've faced, chronologically where possible:

Phase 1: Initial Setup and Deduplication Challenges

Problem 1: Difficulty with Accurate Deduplication (Initial State)

Initial Goal: My primary goal from the beginning was to prevent processing duplicate leads. Leads often come from Google Sheets, where a single person might appear multiple times with slightly different data, or even multiple times with the same core LinkedIn URL but varied parameters (e.g., ?trk=). Challenge: I needed a reliable way to identify unique individuals, and the LinkedIn Profile URL seemed the most robust identifier. However, direct comparison of raw LinkedIn URLs was failing due to variations. Problem 2: "normalizedLinkedinUrl" Field Missing in "Remove Duplicates" Node (First Major Hurdle)

Introduction of Normalization: To address the deduplication challenge, I implemented a Code node (named Normalize LinkedIn URL) early in my workflow, directly after fetching leads from Google Sheets. Its purpose was to clean up LinkedIn URLs by removing parameters (like ? and # components) to create a consistent, normalized URL (e.g., https://www.linkedin.com/in/pamelajgoodwin/). This normalized URL was intended to be stored in a new field called normalizedLinkedinUrl. First Error: When I then tried to use a Remove Duplicates node, configured to compare based on this new normalizedLinkedinUrl field, it would consistently fail with the error: "normalizedLinkedinUrl" field is missing from some input items. Debugging Attempts: I inspected the output of my Normalize LinkedIn URL node, and for most items, the normalizedLinkedinUrl field seemed to be correctly generated and present. I tried adding an IF node (IF normalized linkedin Exists) before Remove Duplicates to filter out items without this field. However, the error persisted, suggesting that either the IF condition wasn't catching all cases, or the data flow was more complex. I was confused about why the Remove Duplicates node was still complaining if the IF node was supposed to ensure the field existed. Problem 3: Data Stream Mismatch and Field Nesting Issues (Root Cause of Deduplication Failure)

Discovery: Through detailed debugging, I realized the core issue was that my workflow had branched: one path was normalizing LinkedIn URLs, and another path (my email validation branch, involving EmailGuard and mails.so Outlook) was processing emails. Key Insight: The normalizedLinkedinUrl was being added at the top level of item.json in one branch, while the email validation data (especially email after processing by EmailGuard) was often nested under item.json.data. Consequence: When the workflow paths reconverged, the Remove Duplicates node (and the IF normalized linkedin Exists node before it) was receiving items where either normalizedLinkedinUrl was missing or email was nested incorrectly, leading to the "field missing" error. Solution Implemented: I added a Set node (prepare LinkedIn Data) after Normalize LinkedIn URL to explicitly ensure normalizedLinkedinUrl and email were at the root level of item.json. I added another Set node (prepare Email Data) after the True branch of my IF email = deliverable1 node (from the email validation path). This Set node ensures the email field is brought to the top level ($json.email) and also explicitly sets normalizedLinkedinUrl to an empty string ("") for items coming only through the email path, guaranteeing the field exists for all items. Crucially, I then inserted a Merge node (in Append mode) to combine the outputs of prepare LinkedIn Data and prepare Email Data. This ensures all items passing to downstream nodes (like IF normalized linkedin Exists and Remove Duplicates) consistently have both normalizedLinkedinUrl and email at the top level. I updated the IF normalized linkedin Exists condition to {{ $json.normalizedLinkedinUrl }} is Exists, and the Remove Duplicates comparison fields to normalizedLinkedinUrl,email (removing any data. prefixes). Current Status: This structural fix significantly improved the deduplication process, addressing the "missing field" errors. Phase 2: Apify Scraper Problems

Problem 4: Apify Scraper Returns Duplicate Leads

Observation: Despite implementing the LinkedIn URL normalization and subsequent deduplication, I noticed that the apify - person LinkedIn Scrape node (my HTTP Request node calling the Apify LinkedIn Profile Scraper actor) was still consistently returning duplicate scraped data. For instance, if I had 10 unique LinkedIn URLs as input, the output would include items where 6 of them were identical scraped profiles, even though they originated from distinct input URLs. My Apify console shows multiple successful runs that each returned "1 result" in the dataset, but these results often lead to duplicates in my workflow. Details: I confirmed that the Loop Over Items node was correctly passing unique personLinkedIn URLs to the Apify scraper for each iteration. The Apify node's JSON body is configured to send {"profileUrls": ["{{ $json.personLinkedin }}"]}, indicating a single URL per request. I explicitly verified that the "Batching" setting on the Apify HTTP Request node was OFF. I tried adding Wait nodes (e.g., a 22 sec wait! node after Apify) to mitigate potential rate limits, but the duplicates persisted. When checking the Apify Console, I could see unique run IDs for each n8n execution, but the datasets retrieved from those runs would contain the same duplicate data. Problem 5: Apify Scraper Exhibiting "Limits" Errors with Webhook Triggers

Context: In earlier iterations of my workflow (specifically "Workflow A" and "Workflow B"), which were triggered by webhooks, the Apify scraper would frequently return "limits" related errors. Observation: I have encountered "Payment required - perhaps check your payment details?" and "Problem in node 'apify - person LinkedIn Scrape' Payment required - perhaps check your payment details?" errors. This suggests hitting Apify's API rate limits or usage limits more aggressively when the workflow was triggered externally via webhooks, compared to manual execution. Interplay with other problems: I observed a strange behavior where, when I simplified the workflow (by temporarily removing the entire "normalized LinkedIn path" branch), the Apify "limits" errors seemed to alleviate. However, this simplification then caused the EmailGuard problem (Problem 6) to surface or become more prominent. This implies a complex and perhaps resource-intensive interaction between the different parts of my workflow. Phase 3: EmailGuard Validation Problems

Problem 6: Intermittent "Email field is required" error on EmailGuard node (Prominent in Simplified Workflow)

Context: This error became particularly noticeable and problematic when I simplified the workflow (removing the LinkedIn normalization path) to troubleshoot the Apify limits issue. Observation: The EmailGuard1 node (an HTTP Request node) would intermittently fail for specific items (like itemIndex: 2) with the error: "The email field is required.". Debugging Efforts: I verified that the input to EmailGuard1 for the failing item clearly showed a valid email field with a populated string value. I confirmed that the EmailGuard1 node's JSON body was correctly configured to dynamically reference the email: {"email": "{{ $json.email }}"}. (Initially, I had accidentally hardcoded this to {"email": ""}, which was identified and corrected, but the intermittent error persists even after this fix). The error points to the field being "required" but the input data shows it is present. This is highly perplexing and suggests an underlying issue in how n8n is sending the request or how the EmailGuard API is interpreting it for certain specific email values. Phase 4: Other Observed Errors

Problem 7: "Cannot read properties of undefined (reading 'publicIdentifier')" error in an 'add groupKey1' node

Observation: I've also encountered an error in an 'add groupKey1' node, stating "Cannot read properties of undefined (reading 'publicIdentifier')". This seems to indicate a data flow or data structure issue where the publicIdentifier field is expected but not present or defined at that point in the workflow. This specific error is visible in one of my comprehensive workflow screenshots, suggesting it's part of a later stage. Current Status and Impact These combined issues are severely hindering the reliability and efficiency of my lead processing workflow. The deduplication struggles, the Apify duplicates and limits, the EmailGuard intermittent failures, and other data-related errors mean I cannot trust the integrity or completeness of the processed leads.

Dev Fusion (dev_fusion)

Problem 1,2,3,6 : These problems are related to your workflow. You need to make changes in your workflow to solve them.

Problem 4 : Please share the run ids, where it has given duplicate data.

Problem 5 : You need to get a paid apify plan for better limits.

Problem 7 : Please share the run id where you haven't got publicIdentifier in response.

voguish_graph

lol did you use gpt to make this bug desc?

Add comment

Mass Linkedin Profile Scrapper (no cookies) + ✉️ + ☎️

anchor/linkedin-profile-enrichment

Import your file and see contacts info being enriched : LinkedIn profiles, emails, phones... from your list of LinkedIn URL of people or company, get a fully functional list of leads, that you can qualify simply. Enrich your file with valuable data, including emails and phone numbers. No cookies.

Anchor

2.2K

1.3

LinkedIn Profile Search Scraper (No Cookies) ✅ Find all people

harvestapi/linkedin-profile-search

Search for LinkedIn profiles with filters and extract detailed profile information, including work experience, education history, location and more. No cookies or account required.

HarvestAPI

228

5.0

Linkedin Profile Scraper No Cookies

logical_scrapers/linkedin-profile-scraper-no-cookies

LinkedIn Bulk Profile Scraper with No Cookies Required. scrapers all publicly available data from a given LinkedIn profile URL.

Goldmine

120

LinkedIn Profile Bulk Scraper (No Cookies) ⚡ $4 per 1k

harvestapi/linkedin-profile-scraper

Extract detailed information from LinkedIn profiles, including work experience, education history, location and more. No cookies or account required. Concurrency + fast response times make mass scraping fast ⚡

HarvestAPI

238

4.0

Linkedin Profile Scraper

logical_scrapers/linkedin-profile-scraper

🚀 Fastest linkedin profile scraper. Easily extract comprehensive LinkedIn profile data, including name, headline, industry, location, experience, education, skills, certifications, and more. Automates LinkedIn data collection for lead generation, recruiting, research, and competitive analysis.

Goldmine

217

4.2

Linkedin Bulk Profiles Email Scraper

bhansalisoft/linkedin-bulk-profiles-email-scraper

LinkedIn Bulk Profiles Email Scraper- Scrap Emails from bulk LinkedIn Profile link using Google Search Engine

bhansalisoft

275

1.0

Linkedin User Profile Url - Mass Finder

sanjeta/linkedin-user-profile-url---mass-finder

LinkedIn User Profile URL - Mass Finder automates locating LinkedIn profiles by inputting names, location or designations. It provides instant URLs, saving time for market research, lead generation, and recruitment. It enhances efficiency and streamlines workflows

ScrapeVerse

194

2.4

LinkedIn Profile Scraper

icypeas_official/linkedin-profile-scraper

LinkedIn Profile Scraper powered by Icypeas. Upload a list of LinkedIn profile URLs to find their information. Supports CSV, TSV, and semicolon-separated formats. Returns all the LinkedIn information about profiles.

Icypeas Official

122

Linkedin Profile Details Scraper (No Cookies Required)

apimaestro/linkedin-profile-detail

Scrape comprehensive LinkedIn profile data including work experience, education history, certifications, and location details. Get structured information from any public LinkedIn profile using their username.

API Maestro

1.1K

4.3

Linkedin Profile Details Batch Scraper (No Cookies Required)

apimaestro/linkedin-profile-batch-scraper-no-cookies-required

(no cookies need) Extract In Bulk comprehensive LinkedIn profile data including work experience, education history, certifications, and location details. Get structured information from any public LinkedIn profile using their username.

API Maestro

327

5.0