Pricing

Pay per usage

Haldirams-Apify-Client

**Haldirams-Apify-Client** is a web scraper that extracts product titles, links, and prices from the Haldiram’s website. It uses **BeautifulSoup** for parsing and **HTTPX** for fast, asynchronous requests, ensuring efficient data extraction for analysis, price tracking, or automation. 🚀

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Sai Karnati

Actor stats

Bookmarked

Total users

Monthly active users

a year ago

Last modified

.actor/Dockerfile

# First, specify the base Docker image.
# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.
# You can also use any other image from Docker Hub.
FROM apify/actor-python:3.12

# Second, copy just requirements.txt into the Actor image,
# since it should be the only file that affects the dependency install in the next step,
# in order to speed up the build
COPY requirements.txt ./

# Install the packages specified in requirements.txt,
# Print the installed Python version, pip version
# and all installed packages with their versions for debugging
RUN echo "Python version:" \
 && python --version \
 && echo "Pip version:" \
 && pip --version \
 && echo "Installing dependencies:" \
 && pip install -r requirements.txt \
 && echo "All installed Python packages:" \
 && pip freeze

# Next, copy the remaining files and directories with the source code.
# Since we do this after installing the dependencies, quick build will be really fast
# for most source file changes.
COPY . ./

# Use compileall to ensure the runnability of the Actor Python code.
RUN python3 -m compileall -q .

# Specify how to launch the source code of your Actor.
# By default, the "python3 -m src" command is run
CMD ["python3", "-m", "src"]

.actor/actor.json

{
    "actorSpecification": 1,
    "name": "my-actor",
    "title": "Scrape single page in Python",
    "description": "Scrape data from single page with provided URL.",
    "version": "0.0",
    "buildTag": "latest",
    "meta": {
        "templateId": "python-start"
    },
    "input": "./input_schema.json",
    "dockerfile": "./Dockerfile"
}

.actor/input_schema.json

{
    "title": "Scrape data from a web page",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        
    },
    "required": []
}

.actor/output_schema.json

{
  "actorSpecification": 1,
  "title": "Haldirams Product Data",
  "type": "object",
  "properties": {
    "items": {
      "title": "Extracted Product List",
      "type": "array",
      "items": {
        "type": "object",
        "title": "Product",
        "properties": {
          "title": {
            "type": "string",
            "description": "Name of the product"
          },
          "Link": {
            "type": "string",
            "format": "uri",
            "description": "Product URL"
          },
          "price": {
            "type": "string",
            "description": "Price of the product"
          },
          "views": {
            "type": "integer",
            "description": "Number of times the product was viewed"
          },
          "collections": {
            "type": "array",
            "items": {
              "type": "string"
            },
            "description": "Collections or categories the product belongs to"
          }
        },
        "required": ["title", "Link", "price", "views", "collections"]
      }
    }
  },
  "required": ["items"]
}

src/main.py

1import asyncio
2
3from .main import main
4
5# Execute the Actor entry point.
6asyncio.run(main())

src/main.py

1"""This module defines the main entry point for the Apify Actor.
2
3Feel free to modify this file to suit your specific needs.
4
5To build Apify Actors, utilize the Apify SDK toolkit, read more at the official documentation:
6https://docs.apify.com/sdk/python
7"""
8
9# Beautiful Soup - A library for pulling data out of HTML and XML files. Read more at:
10# https://www.crummy.com/software/BeautifulSoup/bs4/doc
11from bs4 import BeautifulSoup
12import urllib.request 
13import requests
14
15# HTTPX - A library for making asynchronous HTTP requests in Python. Read more at:
16# https://www.python-httpx.org/
17from httpx import AsyncClient
18
19# Apify SDK - A toolkit for building Apify Actors. Read more at:
20# https://docs.apify.com/sdk/python
21from apify import Actor
22
23    
24async def main() -> None:
25    """Main entry point for the Apify Actor.
26
27    This coroutine is executed using `asyncio.run()`, so it must remain an asynchronous function for proper execution.
28    Asynchronous execution is required for communication with Apify platform, and it also enhances performance in
29    the field of web scraping significantly.
30    """
31    async with Actor:
32        # Retrieve the input object for the Actor. The structure of input is defined in input_schema.json.
33        actor_input = await Actor.get_input() or {'Search_Word': 'Teddy Bear'}
34        print("Received Input:", actor_input);
35        item_name = actor_input.get("Search_Word")
36        url = "https://www.haldirams.com/sweets-73.html"
37        
38        # amazon = urllib.request.urlopen(url)
39        # html_content = amazon.read()
40
41        
42
43        # Create an asynchronous HTTPX client for making HTTP requests.
44        async with AsyncClient() as client:
45            # Fetch the HTML content of the page, following redirects if necessary.
46            Actor.log.info(f'Sending a request to {url}')
47            response = requests.get(url)
48            html_content = response.text
49
50        # Parse the HTML content using Beautiful Soup html parser
51        first_n_results = 100
52        
53
54        soup = BeautifulSoup(html_content, 'html.parser')
55        elements = soup.find_all("div", class_='product-info flex flex-col flex-grow sm:flex-grow-0 px-4 lg:px-6')[:first_n_results]
56        extarcted_data = []
57        print(len(elements))
58        for element in elements:
59            data = {}
60            title = element.find("a", class_='product-item-link line-clamp-2 text-black min-h-[42px] md:min-h-[50px]')
61            data["title"] = title.get_text(strip=True) if title else "N/A"
62            print(data["title"])
63            data["Link"] = title.get("href") if title else "N/A"
64            data["price"] = element.find("span", class_='price').get_text(strip=True) if element else "N/A"
65
66            extarcted_data.append(data)
67
68
69        # Save the extracted headings to the dataset, which is a table-like storage.
70        await Actor.push_data(extarcted_data)

.dockerignore

.git
.mise.toml
.nvim.lua
storage

# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

.gitignore

.mise.toml
.nvim.lua
storage

# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

requirements.txt

1# Feel free to add your Python dependencies below. For formatting guidelines, see:
2# https://pip.pypa.io/en/latest/reference/requirements-file-format/
3
4apify < 3.0
5beautifulsoup4[lxml]
6httpx
7types-beautifulsoup4

Crawlee & BeautifulSoup Automation Template

ellustar/my-actor-30

The Crawlee & BeautifulSoup Automation Template is a ready-to-use Python actor for web scraping and automation. Built with Crawlee for crawling and BeautifulSoup for parsing, it helps extract, process, and organize data efficiently—perfect for developers and automation projects.

Ellustar

Website Scraper

quarterly_lettuce/website-scraper

Fast web scraper that extracts page titles and URLs from any website. Uses Cheerio for lightning-fast HTML parsing. Perfect for SEO audits, site mapping, and content discovery. Handles pagination and follows links automatically.

Abhishek Kumar Giri

All In One Social Media Email Scraper

direct_houseboat/all-in-one-social-media-email-scraper

Extract emails from social media profiles with this powerful Apify Actor. Supports Facebook, Instagram, LinkedIn & more. Fast, automated scraping using Python, HTTPX & BeautifulSoup. Ideal for lead generation, email marketing, and data collection.

Bikram Gautam

168

5.0

Python Crawlee & BeautifulSoup Actor Template

ellustar/my-actor-29

Python Crawlee & BeautifulSoup Actor Template: A versatile web automation and scraping actor template designed for Python developers. Harness Crawlee for scalable crawling and BeautifulSoup for precise HTML parsing, enabling efficient data extraction, automation, and web interaction workflows.

Ellustar

Apify

doshikevin361/apify

Kevin Doshi

Amazon Products Crawler (Fast & Simple)

amit123/amazon-products-crawler

Fast and reliable Amazon product scraper that extracts detailed data like title, reviews, price, rating, images, description and features from product URLs. Uses Apify's residential proxies for speed and accuracy. Perfect for price tracking, catalog enrichment, or competitive research.

Amit

5.0

Python Scraper Template

ellustar/my-actor-33

A lightweight Python scraper template using BeautifulSoup and the Apify SDK. Includes request queue handling, HTML parsing, data storage, and a clean structure for fast, customizable web scraping. Perfect for product data, articles, and general extraction.

Ellustar

Universal Website Scraper (Python)

fortuitous_inch/my-actor

Scrape structured data from any website URL using Python and BeautifulSoup. Extract titles, links, and page content for research and automation.

Amol Pandgale

BeautifulSoup Scraper

apify/beautifulsoup-scraper

Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.

Apify

5.0

TikTok User Profile Scraper

direct_houseboat/tiktok-user-profile-scraper

Scrape TikTok user profile data instantly with this fast, cloud-based Apify Actor. Extract follower count, following count, email(from signature), external URL & more using Python, HTTPX & BeautifulSoup. Perfect for influencer research, lead generation, and social media analysis.

Bikram Gautam

433

5.0