Pricing

$100.00/month + usage

Go to Apify Store

CrunchBase Scrapper

Try Actor

Developed by

Brian Richoster

CrunchBase Scraper is an Apify Actor that automatically retrieves company data from Crunchbase in real time. It extracts detailed information such as company profiles, funding rounds, executive leadership, and industry classifications.

0.0 (0)

Pricing

$100.00/month + usage

Last modified

a month ago

Agents

Automation

Open source

.actor/actor.json

{
    "actorSpecification": 1,
    "name": "my-actor",
    "title": "Scrape single page in Python",
    "description": "Scrape data from single page with provided URL.",
    "version": "0.0",
    "buildTag": "latest",
    "meta": {
        "templateId": "python-start"
    },
    "input": "./input_schema.json",
    "dockerfile": "../Dockerfile"
}

.actor/input_schema.json

{
    "title": "Scrape data from a web page",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "url": {
            "title": "URL of the page",
            "type": "string",
            "description": "The URL of website you want to get the data from.",
            "editor": "textfield",
            "prefill": "https://www.apify.com/"
        }
    },
    "required": ["url"]
}

.dockerignore

.git
.mise.toml
.nvim.lua
storage

# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# Visual Studio Code
#  Ignores the folder created by VS Code when changing workspace settings, doing debugger
#  configuration, etc. Can be commented out to share Workspace Settings within a team
.vscode

# Zed editor
#  Ignores the folder created when setting Project Settings in the Zed editor. Can be commented out
#  to share Project Settings within a team
.zed

.gitignore

.mise.toml
.nvim.lua
storage

# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# Visual Studio Code
#  Ignores the folder created by VS Code when changing workspace settings, doing debugger
#  configuration, etc. Can be commented out to share Workspace Settings within a team
.vscode

# Zed editor
#  Ignores the folder created when setting Project Settings in the Zed editor. Can be commented out
#  to share Project Settings within a team
.zed

Dockerfile

# First, specify the base Docker image.
# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.
# You can also use any other image from Docker Hub.
FROM apify/actor-python:3.13

# Second, copy just requirements.txt into the Actor image,
# since it should be the only file that affects the dependency install in the next step,
# in order to speed up the build
COPY requirements.txt ./

# Install the packages specified in requirements.txt,
# Print the installed Python version, pip version
# and all installed packages with their versions for debugging
RUN echo "Python version:" \
 && python --version \
 && echo "Pip version:" \
 && pip --version \
 && echo "Installing dependencies:" \
 && pip install -r requirements.txt \
 && echo "All installed Python packages:" \
 && pip freeze

# Next, copy the remaining files and directories with the source code.
# Since we do this after installing the dependencies, quick build will be really fast
# for most source file changes.
COPY . ./

# Use compileall to ensure the runnability of the Actor Python code.
RUN python3 -m compileall -q src/

# Create and run as a non-root user.
RUN useradd --create-home apify && \
    chown -R apify:apify ./
USER apify

# Specify how to launch the source code of your Actor.
# By default, the "python3 -m src" command is run
CMD ["python3", "-m", "src"]

requirements.txt

1# Feel free to add your Python dependencies below. For formatting guidelines, see:
2# https://pip.pypa.io/en/latest/reference/requirements-file-format/
3
4apify < 3.0
5beautifulsoup4[lxml]
6httpx
7types-beautifulsoup4

src/init.py

src/main.py

1import asyncio
2
3from .main import main
4
5# Execute the Actor entry point.
6asyncio.run(main())

src/main.py

1"""Module defines the main entry point for the Apify Actor.
2
3Feel free to modify this file to suit your specific needs.
4
5To build Apify Actors, utilize the Apify SDK toolkit, read more at the official documentation:
6https://docs.apify.com/sdk/python
7"""
8
9from __future__ import annotations
10
11# Beautiful Soup - A library for pulling data out of HTML and XML files. Read more at:
12# https://www.crummy.com/software/BeautifulSoup/bs4/doc
13# Apify SDK - A toolkit for building Apify Actors. Read more at:
14# https://docs.apify.com/sdk/python
15from apify import Actor
16from bs4 import BeautifulSoup
17
18# HTTPX - A library for making asynchronous HTTP requests in Python. Read more at:
19# https://www.python-httpx.org/
20from httpx import AsyncClient
21
22
23async def main() -> None:
24    """Define a main entry point for the Apify Actor.
25
26    This coroutine is executed using `asyncio.run()`, so it must remain an asynchronous function for proper execution.
27    Asynchronous execution is required for communication with Apify platform, and it also enhances performance in
28    the field of web scraping significantly.
29    """
30    async with Actor:
31        # Retrieve the input object for the Actor. The structure of input is defined in input_schema.json.
32        actor_input = await Actor.get_input() or {'url': 'https://apify.com/'}
33        url = actor_input.get('url')
34        if not url:
35            raise ValueError('Missing "url" attribute in input!')
36
37        # Create an asynchronous HTTPX client for making HTTP requests.
38        async with AsyncClient() as client:
39            # Fetch the HTML content of the page, following redirects if necessary.
40            Actor.log.info(f'Sending a request to {url}')
41            response = await client.get(url, follow_redirects=True)
42
43        # Parse the HTML content using Beautiful Soup and lxml parser.
44        soup = BeautifulSoup(response.content, 'lxml')
45
46        # Extract all headings from the page (tag name and text).
47        headings = []
48        for heading in soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6']):
49            heading_object = {'level': heading.name, 'text': heading.text}
50            Actor.log.info(f'Extracted heading: {heading_object}')
51            headings.append(heading_object)
52
53        # Save the extracted headings to the dataset, which is a table-like storage.
54        await Actor.push_data(headings)

src/py.typed

Crunchbase Scraper: Reliable & Easy

scrapefull/crunchbase-scraper

Our Crunchbase data extractor reliably extracts comprehensive organization data from Crunchbase pages. Get clean, organized Crunchbase data to identify insights and seamlessly integrate into your workflow.

Scrapefull

220

Crunchbase Email Scraper

scraper-mind/crunchbase-email-scraper

CrunchBase Email Scraper – Extract CrunchBase emails quickly & accurately! 🔍 Search by keywords, location & custom domains to find targeted contacts. Supports CrunchBase listings ✅ Proxy support for seamless scraping. 📊 Download in JSON, CSV, Excel. Ideal for lead gen & research!

Scraper Mind

Crunchbase Any Search Results Scraper

saswave/crunchbase-search-results

Scrape crunchbase and Download ANY Crunchbase search results in a json file (companies, funding, acquisition, peoples ...). Only PRO plan needed

SASWAVE

481

5.0

Ultimate Crunchbase Search Scraper

curious_coder/crunchbase-scraper

Scrape Crunchbase companies, people, investors, acquisitions, etc from Crunchbase search results. Has 99% success rate

Curious Coder

2.4K

4.6

Crunchbase Companies Scraper

pratikdani/crunchbase-companies-scraper

The Crunchbase Companies Overview Actor is a powerful tool that extracts comprehensive company information from Crunchbase URLs. It provides detailed insights about companies including their basic information, financial data, social presence, web traffic statistics, and technological stack.

Pratik Dani

367

4.8

Crunchbase Session Cookies

saswave/crunchbase-session-cookies

With login password, get your session cookies. Before each run, we test previous session cookies generated from last run. If still valid, they are returned, else we generate new cookies

SASWAVE

5.0

Company Funding Activities

pratikdani/company-funding-activities

Get complete funding information of a company

Pratik Dani

💰Company Funding Details Scraper

tech_gear/company-funding-details

✅Extract detailed company funding information.✅Access funding events, revenue, total funding, and investors.

Tech Gear

🚀Company Details Scraper

tech_gear/company-details-scraper

✅Extract detailed company data.✅Access organizational structure, technologies, employees, funding events, and more.✅Get comprehensive insights all in one place.

Tech Gear

5.0

🔍 Company Research Intelligence Tool

easyapi/company-research-intelligence-tool

🔍 Transform any company domain into a comprehensive business intelligence report. Get detailed company profiles, funding data, competitor analysis, and decision-maker information - all in one powerful tool. Perfect for sales teams, investors, and market researchers.