
Linkedin Url Scrape
Deprecated
Pricing
Pay per usage
Go to Store

Linkedin Url Scrape
Deprecated
Scrape Unlimited Linkedin Profile URL's
5.0 (3)
Pricing
Pay per usage
44
Total users
1.3k
Monthly users
8
Runs succeeded
>99%
Last modified
a year ago
.actor/Dockerfile
# First, specify the base Docker image.# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.# You can also use any other image from Docker Hub.FROM apify/actor-python:3.11
# Second, copy just requirements.txt into the Actor image,# since it should be the only file that affects the dependency install in the next step,# in order to speed up the buildCOPY requirements.txt ./
# Install the packages specified in requirements.txt,# Print the installed Python version, pip version# and all installed packages with their versions for debuggingRUN echo "Python version:" \ && python --version \ && echo "Pip version:" \ && pip --version \ && echo "Installing dependencies:" \ && pip install -r requirements.txt \ && echo "All installed Python packages:" \ && pip freeze
# Next, copy the remaining files and directories with the source code.# Since we do this after installing the dependencies, quick build will be really fast# for most source file changes.COPY . ./
# Use compileall to ensure the runnability of the Actor Python code.RUN python3 -m compileall -q .
# Specify how to launch the source code of your Actor.# By default, the "python3 -m src" command is runCMD ["python3", "-m", "src"]
.actor/actor.json
{ "actorSpecification": 1, "name": "my-actor-15", "title": "Scrape single page in Python", "description": "Scrape data from single page with provided URL.", "version": "0.0", "meta": { "templateId": "python-start" }, "input": "./input_schema.json", "dockerfile": "./Dockerfile"}
.actor/input_schema.json
{ "title": "Scrape LinkedIn profiles based on keywords", "type": "object", "schemaVersion": 1, "properties": { "keywords": { "title": "Search Keywords", "type": "array", "description": "Enter the keywords to search for LinkedIn profiles, e.g., job titles, industries, locations.", "editor": "stringList", "items": { "type": "string" }, "prefill": ["chief product officer", "united states", "insurance"] }, "numPages": { "title": "Number of Pages", "type": "integer", "description": "The number of pages to scrape (each page corresponds to a set of search results).", "editor": "number", "minimum": 1, "default": 1 } }, "required": ["keywords", "numPages"]}
src/__main__.py
1"""2This module serves as the entry point for executing the Apify Actor. It handles the configuration of logging3settings. The `main()` coroutine is then executed using `asyncio.run()`.4
5Feel free to modify this file to suit your specific needs.6"""7
8import asyncio9import logging10
11from apify.log import ActorLogFormatter12
13from .main import main14
15# Configure loggers16handler = logging.StreamHandler()17handler.setFormatter(ActorLogFormatter())18
19apify_client_logger = logging.getLogger('apify_client')20apify_client_logger.setLevel(logging.INFO)21apify_client_logger.addHandler(handler)22
23apify_logger = logging.getLogger('apify')24apify_logger.setLevel(logging.DEBUG)25apify_logger.addHandler(handler)26
27# Execute the Actor main coroutine28asyncio.run(main())
src/main.py
1import asyncio2from bs4 import BeautifulSoup3import requests4from apify import Actor5import re6
7async def main() -> None:8 async with Actor() as actor:9 actor_input = await actor.get_input() or {}10 keywords = actor_input.get('keywords', ["chief product officer", "united states", "insurance"])11 num_pages = actor_input.get('numPages', 1) # Get the number of pages from input12
13 base_url = 'https://www.google.com/search?q=site%3Alinkedin.com%2Fin%2F+'14 formatted_keywords = '+'.join(f'(%22{keyword.replace(" ", "+")}%22)' for keyword in keywords)15 16 headers = {17 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}18
19 linkedin_urls = []20
21 # Loop through pages based on `num_pages`, incrementing `start` by 10 for each page22 for page_start in range(0, num_pages * 10, 10):23 url = f"{base_url}{formatted_keywords}&start={page_start}"24 print(url)25 response = requests.get(url, headers=headers)26 soup = BeautifulSoup(response.text, 'html.parser')27 links = soup.find_all('a', href=True)28
29 # Extract LinkedIn URLs30 for link in links:31 match = re.search(r'(https?://www\.linkedin\.com/in/[^&]+)', link['href'])32 if match:33 linkedin_url = match.group(1)34 if linkedin_url not in linkedin_urls: # Avoid duplicates35 linkedin_urls.append(linkedin_url)36
37 # Output the LinkedIn URLs38 for url in linkedin_urls:39 await actor.push_data({"LinkedIn URL": url})40
41 actor.log.info(f"Found and saved {len(linkedin_urls)} LinkedIn URLs based on the keywords across {num_pages} pages.")42
43if __name__ == '__main__':44 asyncio.run(main())
.dockerignore
# configurations.idea
# crawlee and apify storage foldersapify_storagecrawlee_storagestorage
# installed files.venv
# git folder.git
.editorconfig
root = true
[*]indent_style = spaceindent_size = 4charset = utf-8trim_trailing_whitespace = trueinsert_final_newline = trueend_of_line = lf
.gitignore
# This file tells Git which files shouldn't be added to source control
.idea.DS_Store
apify_storagestorage/*!storage/key_value_storesstorage/key_value_stores/*!storage/key_value_stores/defaultstorage/key_value_stores/default/*!storage/key_value_stores/default/INPUT.json
.venv/.env/__pypackages__dist/build/*.egg-info/*.egg
__pycache__
.mypy_cache.dmypy.jsondmypy.json.pytest_cache.ruff_cache
.scrapy*.log
requirements.txt
1# Feel free to add your Python dependencies below. For formatting guidelines, see:2# https://pip.pypa.io/en/latest/reference/requirements-file-format/3
4apify ~= 1.6.05beautifulsoup4 ~= 4.12.26httpx ~= 0.25.27types-beautifulsoup4 ~= 4.12.0.78requests ~= 2.28.1