Deprecated

Pricing

Pay per usage

See alternative Actors

Go to Apify Store

Web Navigator API

Deprecated

See alternative Actors

Navigate and scrape websites using LLMs via API

Pricing

Pay per usage

Rating

0.0

(0)

Developer

carvedai

Actor stats

Bookmarked

Total users

Monthly active users

2 years ago

Last modified

.actor/Dockerfile

# First, specify the base Docker image.
# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.
# You can also use any other image from Docker Hub.
FROM apify/actor-python:3.12

# Second, copy just requirements.txt into the Actor image,
# since it should be the only file that affects the dependency install in the next step,
# in order to speed up the build
COPY requirements.txt ./

# Install the packages specified in requirements.txt,
# Print the installed Python version, pip version
# and all installed packages with their versions for debugging
RUN echo "Python version:" \
 && python --version \
 && echo "Pip version:" \
 && pip --version \
 && echo "Installing dependencies:" \
 && pip install -r requirements.txt \
 && echo "All installed Python packages:" \
 && pip freeze

# Next, copy the remaining files and directories with the source code.
# Since we do this after installing the dependencies, quick build will be really fast
# for most source file changes.
COPY . ./

# Use compileall to ensure the runnability of the Actor Python code.
RUN python3 -m compileall -q .

# Specify how to launch the source code of your Actor.
# By default, the "python3 -m src" command is run
CMD ["python3", "-m", "src"]

.actor/actor.json

{
    "actorSpecification": 1,
    "name": "web-navigator-api",
    "title": "Web Navigator API",
    "description": "Navigate and scrape websites using LLMs",
    "version": "0.0",
    "buildTag": "latest",
    "meta": {
        "templateId": "python-start"
    },
    "input": "./input_schema.json",
    "dockerfile": "./Dockerfile"
}

.actor/input_schema.json

{
    "title": "Navigate and scrape website",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "url": {
            "title": "URL of the page",
            "type": "string",
            "description": "The URL of website you want to get the data from.",
            "editor": "textfield",
            "prefill": "https://www.dancingdogmassage.com"
        },
        "objective": {
            "title": "Objective",
            "type": "string",
            "description": "The objective of the navigator.",
            "editor": "textfield",
            "prefill": "Get contact info"
        },
        "disableAdblocker": {
            "title": "Disable Ad Blocker",
            "type": "boolean",
            "description": "Whether to disable adblocker.",
            "prefill": true
        }
    },
    "required": ["url", "objective"]
}

src/main.py

1import asyncio
2
3from .main import main
4
5# Execute the Actor entry point.
6asyncio.run(main())

src/main.py

1"""This module defines the main entry point for the Apify Actor.
2
3Feel free to modify this file to suit your specific needs.
4
5To build Apify Actors, utilize the Apify SDK toolkit, read more at the official documentation:
6https://docs.apify.com/sdk/python
7"""
8
9import json
10import requests
11import time
12
13# Apify SDK - A toolkit for building Apify Actors. Read more at:
14# https://docs.apify.com/sdk/python
15from apify import Actor
16
17NAVIGATOR_URL = "https://tnyuzb7e2nxcbdrmu2r2x3ljjm0twemm.lambda-url.us-west-2.on.aws/"
18
19async def main() -> None:
20    """Main entry point for the Apify Actor.
21
22    This coroutine is executed using `asyncio.run()`, so it must remain an asynchronous function for proper execution.
23    Asynchronous execution is required for communication with Apify platform, and it also enhances performance in
24    the field of web scraping significantly.
25    """
26    async with Actor:
27        user =  await Actor.apify_client.user("me").get()
28        body = await Actor.get_input()
29        response = requests.post(NAVIGATOR_URL, json=body, timeout=None)
30        response_body = response.json()
31        await Actor.push_data(response_body)

.dockerignore

.git
.mise.toml
.nvim.lua
storage

# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

.gitignore

.mise.toml
.nvim.lua
storage

# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

requirements.txt

1# Feel free to add your Python dependencies below. For formatting guidelines, see:
2# https://pip.pypa.io/en/latest/reference/requirements-file-format/
3
4apify == 2.0.0
5requests

LinkedIn Hiring Tracker & Salary Intelligence

scrapemint/linkedin-jobs-scraper

Track LinkedIn hiring in real time. Every job row ships parsed salary range, tech stack, seniority tier, and remote type. Filter by keyword, location, experience, and post date. For comp researchers, recruiters, and M&A scouts. JSON. Pay per job.

Kennedy Mutisya

AI Web Scraper - Powered by Crawl4AI

raizen/ai-web-scraper

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

Raizen Technology

345

1.0

(1)

Charity Navigator Nonprofit Ratings Scraper

parseforge/charitynavigator-scraper

Scrape nonprofit charity ratings and financial data from Charity Navigator. Get organization names, EIN numbers, star ratings, scores, causes, revenue, assets, and donation eligibility. Search by keyword or browse all charities. Filter by cause, size, state, and minimum rating.

ParseForge

internet birthday - what the web was like on your birthday

hanamira/internet-birthday

What did the internet look like the day you were born? Get #1 songs, movies, download times, prices, Wayback Machine snapshots, and see which websites didn't exist yet. Fun, shareable nostalgia for any date.

hannah mira

Travel Accommodation Scraper

lentic_clockss/travel-accommodation-scraper

Search hotels and vacation rentals across Booking.com and Airbnb in one run. Compare prices, ratings, and availability. No login, no API key needed.

kane liu

Website Screenshot Generator

akash9078/website-screenshot-generator

Generate a screenshot of any website by entering its URL. The image is stored in a key-value store, making it ideal for tracking visual changes on webpages—especially when run on a scheduled basis.

Akash Kumar Naik

Linkedin Company Profile Scraper

scrapeverse/linkedin-company-profile-scraper

The LinkedIn Company Profile Scraper is a powerful and efficient tool designed to extract valuable information from LinkedIn company profiles with ease. Whether you're a market researcher, sales professional, or just curious about a company's background.

ScrapeVerse

486

4.8

(10)

Immobiliare.it [Only $0.70💰]| Search | Detail Pages | Agencies

memo23/immobiliare-scraper

Scrape Immobiliare for only $0.7 per 1K results. Get normalized Immobiliare listings: IDs, contract and typology, prices with mortgage metrics, geo context (macro/microzones, coordinates, address), amenity flags, media assets, agency contacts, and analytics snapshots highlighting premium visibility

Muhamed Didovic

175

5.0

(8)

Linkedin-company-scraper

logical_scrapers/linkedin-company-scraper

FASTEST LinkedIn company scraper. BULK Pull 50,000+ enriched company profiles in under 10 minutes. Company name, address, description, employee count, logo URL, website, industry, company size/type, headquarters, founding year, specialties, similar/affiliated pages, stock info and more.

Goldmine

1.7K

4.2

(9)

Linkedin Profile Scraper

logical_scrapers/linkedin-profile-scraper

🚀 Fastest linkedin profile scraper. Easily extract comprehensive LinkedIn profile data, including name, headline, industry, location, experience, education, skills, certifications, and more. Automates LinkedIn data collection for lead generation, recruiting, research, and competitive analysis.

Goldmine

446

4.0

(4)

Linkedin Company Scraper PPE

logical_scrapers/linkedin-company-scraper-ppe

($7/1000) LinkedIn company scraper. Pull unlimited number enriched company profiles in under 10 minutes. Company name, address, description, employee count, logo URL, website, industry, company size/type, headquarters, founding year, specialties, similar/affiliated pages, stock info and more.

Goldmine

132

5.0

(1)

Web Navigator API

.actor/Dockerfile

.actor/actor.json

.actor/input_schema.json

src/__main__.py

src/main.py

.dockerignore

.gitignore

requirements.txt

You might also like

LinkedIn Hiring Tracker & Salary Intelligence

AI Web Scraper - Powered by Crawl4AI

Charity Navigator Nonprofit Ratings Scraper

internet birthday - what the web was like on your birthday

Travel Accommodation Scraper

Website Screenshot Generator

Linkedin Company Profile Scraper

Immobiliare.it [Only $0.70💰]| Search | Detail Pages | Agencies

Linkedin-company-scraper

Linkedin Profile Scraper

Linkedin Company Scraper PPE

.actor/Dockerfile

.actor/actor.json

.actor/input_schema.json

src/__main__.py

src/main.py

.dockerignore

.gitignore

requirements.txt

src/main.py

src/main.py