Food Panda Scraper | All In One | $5 / 1k avatar
Food Panda Scraper | All In One | $5 / 1k

Pricing

$4.99 / 1,000 results

Go to Apify Store
Food Panda Scraper | All In One | $5 / 1k

Food Panda Scraper | All In One | $5 / 1k

Scrape FoodPanda restaurants and menus across Malaysia, Taiwan, Singapore and more. Grab names, brands, ratings, reviews, prices, delivery fees, promo and more in a clean JSON/CSV. Ideal for competitor research, market mapping, promo tracking and price monitoring.

Pricing

$4.99 / 1,000 results

Rating

5.0

(1)

Developer

Fatih Tahta

Fatih Tahta

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

1

Monthly active users

12 days ago

Last modified

Share

Python HTTP Scraper Template (curl_cffi + BeautifulSoup)

A clean and efficient Apify Actor template for building browserless HTTP scrapers in Python.

It uses:

  • curl_cffi for high-success HTTP requests with browser impersonation
  • BeautifulSoup for HTML parsing
  • Apify SDK for Actor lifecycle, datasets, and proxy management

Overview

This template is designed for straightforward scraping tasks where full browser automation is not required.

It:

  • performs a warm-up request to the target’s origin to establish a session
  • reuses cookies and connections via a shared curl_cffi.requests.AsyncSession
  • supports Apify Proxy via the standard proxyConfiguration input field
  • extracts listings using multiple extraction strategies (JSON-LD + generic DOM selectors)
  • supports simple pagination (rel="next", “Next” buttons, ?page=N pattern)
  • saves structured data into the Apify Dataset

Stack

  • Python (async, runs on apify/actor-python:3.13)
  • Apify SDK (Python) – Actor lifecycle, input, dataset, proxy config
  • curl_cffi – fast HTTP client with browser TLS fingerprint impersonation
  • BeautifulSoup – HTML parsing and selector-based extraction

Features

  • Apify SDK – run scalable Actors on the Apify platform
  • curl_cffi AsyncSession – shared session with cookies + connections reused
  • Browser impersonation – realistic TLS fingerprints via impersonate="chrome120"
  • Proxy support – uses Apify proxy or custom proxies via proxyConfiguration
  • Warm-up request – hits the origin once to prime cookies and session state
  • Multi-strategy extraction:
    • JSON/JSON-LD <script> blocks (structured data)
    • generic CSS selectors for listings (.listing, .result, li.product, article, …)
  • Pagination supportrel="next", “Next/›/»” links, and ?page=N increment
  • Monetization hooks – emits outputrecord charge events per pushed item (if available)
  • Structured output – saves extracted data to the default Apify Dataset

How It Works

  1. Input
  • Reads the Actor input via Actor.get_input():
    • startUrls – list of URLs to crawl
    • queries – free-text queries turned into search URLs via build_search_url
    • limit – max records to save
    • proxyConfiguration – standard Apify proxy editor object
  1. Seeding

    • Normalizes URLs (removes fragments, sorts query params).
    • Deduplicates seeds across startUrls and query-based search URLs.
  2. Proxy & Session

    • Creates ProxyConfiguration via Actor.create_proxy_configuration(actor_proxy_input=proxyConfiguration).
    • Generates proxy URLs with fresh session IDs to avoid reusing blocked connections.
    • Creates a curl_cffi.requests.AsyncSession with:
      • Chrome-like headers
      • impersonation (chrome120)
      • proxies
      • timeout and redirect limits
  3. Warm-up

    • Derives the origin from the first seed (e.g. https://www.example.com/).
    • Sends a warm-up request to prime cookies and the proxy session.
  4. Crawling & Extraction

    • For each seed, paginates using:
      • <a rel="next">
      • links containing “Next”, “›”, “»”
      • incremented ?page=N query param
    • For each page:
      • downloads HTML via curl_cffi
      • parses with BeautifulSoup
      • runs extractors in order:
        1. JSON/JSON-LD script blocks
        2. Generic listing selectors
      • cleans records to a safe public subset of keys
  5. Saving

    • Pushes extracted objects to the default Apify Dataset via Actor.push_data.
    • Emits a monetization outputrecord event for each item (if monetization is enabled).
    • Stops automatically once limit is reached.

Actor Input

startUrls

  • Type: array<string>
  • List of URLs to crawl directly. Each one is normalized and deduplicated.

queries

  • Type: array<string>
  • Free-text queries; each is converted into a search URL using build_search_url().

You should customize build_search_url() in src/main.py per target site.

limit

  • Type: integer
  • Maximum number of records to save overall.
  • Default in code: 1000 (the .actor/input_schema.json can override this).

proxyConfiguration

  • Type: object
  • Standard Apify proxy editor object (useApifyProxy, apifyProxyGroups, etc.).
  • Passed directly to Actor.create_proxy_configuration(actor_proxy_input=...).

Adapting the Template for a Specific Target

  1. Set the search base URL

    In src/main.py, change:

    base = "https://www.example.com/search"