Gelbe Seiten (German Yellow Pages) Scraper avatar
Gelbe Seiten (German Yellow Pages) Scraper

Pricing

$15.00/month + usage

Go to Store
Gelbe Seiten (German Yellow Pages) Scraper

Gelbe Seiten (German Yellow Pages) Scraper

Developed by

Azquaier

Azquaier

Maintained by Community

Scrape German business listings from Gelbe Seiten with flexible detail levels. This Apify Actor supports fast, basic, and deep search modes, rate limiting, proxy rotation, and index control. Ideal for lead gen, SEO, and market research. Outputs structured data to Apify datasets.

0.0 (0)

Pricing

$15.00/month + usage

2

Total users

4

Monthly users

4

Runs succeeded

>99%

Last modified

11 days ago

Gelbe Seiten (German Yellow Pages) Scraper

[Version: 0.4.0]

A Python-based Apify Actor designed to scrape business listings from Gelbe Seiten (www.gelbeseiten.de). It utilizes the site's internal API for efficient listing retrieval and offers three distinct modes for varying levels of detail extraction. Features include rate limiting, proxy configuration, and flexible index ranges for controlling pagination or resuming interrupted runs.

💡 Features

  • Targeted Search: Specify the service or business type (search_what) and the geographic area (search_where). Use "bundesweit" for nationwide searches.
  • Three Search Modes (search_mode):
    • fast_search: Quickly extracts summary information directly from search result pages without visiting detail pages. Includes name, address snippet, phone, rating, primary branch, and encoded contact links (email/website). Ideal for rapid list building.
    • basic_search: Visits each business profile page once to fetch essential details like full address, email, website, description, industry, phone number, and social media links. Does not include summary data like ratings from the search results page.
    • deep_search: Combines the summary data from fast_search with a comprehensive detail page visit to extract all available fields, including opening hours, services, detailed company information, training opportunities, payment methods, social media links, fax number, Google Maps link, FAQs, etc. This is the most thorough mode.
  • Index Control: Use start_index and end_index (1-based) to define a specific range of listings to process, useful for resuming runs or targeted scraping.
  • Unlimited Mode: Set max_businesses to 0 to scrape all available results matching the search criteria.
  • Rate Limiting: Configurable requests_per_second throttle applied to API calls (all modes) and detail page requests (basic_search and deep_search) to manage load and reduce the risk of blocking.
  • Proxy Support: Leverages Apify's built-in proxy integration (proxyConfiguration) for reliable IP rotation during scraping, especially crucial for detail page visits.
  • Structured Output: Data is saved to the Apify dataset. Each record includes a scraped_at UTC timestamp and its index (overall position in the search results).

📥 Input Parameters

Configure the actor's behavior using these fields in the Apify Console "Input" tab or via API:

FieldTypeDescriptionDefaultRequired
search_whatStringThe business type, profession, or service to search for (e.g., "Restaurant", "Arzt", "Hotel", "Kreditvermittlung")."hotels"Yes
search_whereStringThe geographic location (e.g., city name like "Berlin", region, or "bundesweit" for nationwide)."bundesweit"Yes
search_modeStringExtraction detail level: fast_search (summary only), basic_search (essential details from profile page), deep_search (summary + all profile details)."basic_search"No
max_businessesIntegerMaximum number of listings to save. Set 0 for unlimited (scrapes all found results).64No
start_indexInteger1-based index of the first listing to save. Useful for resuming runs or skipping initial results.1No
end_indexInteger1-based index of the last listing to save (inclusive). Set 0 to ignore this limit and rely solely on max_businesses.0No
requests_per_secondIntegerMax requests per second. Applies to API calls (all modes) and detail page fetches (basic/deep modes). Lower values (e.g., 2-5) are safer, higher values (e.g., 10+) faster.12No
proxyConfigurationObjectApify proxy settings (Automatic recommended) or custom proxy configuration.{}No

🔹 Example Input

{
"search_what": "Hotels",
"search_where": "Hamburg",
"search_mode": "deep_search",
"max_businesses": 200,
"start_index": 51,
"requests_per_second": 8,
"proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}

📤 Output Data Structure

Each record in the dataset is a JSON object. The exact fields depend on the selected search_mode.

🔹 fast_search Example Output

{
"index": 1,
"url": "https://www.gelbeseiten.de/gsbiz/abc123xyz",
"name": "SchnellTest GmbH",
"bewertung": 4.5,
"bewertungen": 8,
"besteBranche": "Testdienste",
"telefonnummer": "040 1234567",
"emaillink": "info@schnelltest.de", // Decoded from Base64
"base64_emaillink": "aW5mb0BzY2huZWxsdGVzdC5kZQ==", // Raw Base64
"webseitelink": "https://schnelltest.de", // Decoded from Base64
"base64_webseitelink": "aHR0cHM6Ly9zY2huZWxsdGVzdC5kZQ==", // Raw Base64
"adresse_from_search": "Teststraße 1, 20095 Hamburg Neustadt", // Address snippet from search results
"scraped_at": "2025-04-30T09:30:00.123Z"
}

🔹 basic_search Example Output

{
"index": 51, // Example if start_index was 51
"url": "https://www.gelbeseiten.de/gsbiz/abc123xyz",
"name": "Hotel Hanseatic", // Extracted from detail page
"email": "info@hotel-hanseatic.de", // Extracted from detail page
"website": "http://www.hotel-hanseatic.de", // Extracted from detail page
"beschreibung": "Gemütliches Hotel im Herzen von St. Georg.", // Extracted from detail page
"branche": "Hotels", // Extracted from detail page
"social_media": { // Extracted from detail page
"facebook": "https://facebook.com/hotelhanseatic"
},
"address": "Steindamm 50, 20099 Hamburg St. Georg", // Address from detail page
"telefonnummer": "040 9876543", // Extracted from detail page
"scraped_at": "2025-04-30T09:35:00.789Z"
}

Note: Fetches only from the detail page, does not include search result summary data.

🔹 deep_search Example Output

Combines fast_search summary data with all available detail page data.

{
// --- Fields from fast_search (search results page) ---
"index": 201,
"url": "https://www.gelbeseiten.de/gsbiz/abc123xyz",
"name": "Muster Restaurant",
"bewertung": 4.8,
"bewertungen": 55,
"besteBranche": "Restaurants",
"telefonnummer": "040 1122334",
"emaillink": "reservierung@muster-restaurant.de",
"base64_emaillink": "cmVzZXJ2aWVydW5nQG11c3Rlci1yZXN0YXVyYW50LmRl",
"webseitelink": "https://www.muster-restaurant.de",
"base64_webseitelink": "aHR0cHM6Ly93d3cubXVzdGVyLXJlc3RhdXJhbnQuZGU=",
"adresse_from_search": "Musterweg 10, 20457 Hamburg Altstadt",
"scraped_at": "2025-04-30T09:40:00.456Z",
// --- Additional fields from deep_search (detail page) ---
"email": "reservierung@muster-restaurant.de",
"website": "https://www.muster-restaurant.de", // The same as webseitelink
"beschreibung": "Moderne deutsche Küche mit saisonalen Zutaten.",
"oeffnungszeiten": { // Example structure
"Mo.": "Ruhetag",
"Di.-Sa.": "18:00 - 23:00",
"So.": "12:00 - 15:00"
},
"branche": "Restaurant; Deutsche Küche", // Can be more detailed than besteBranche
"leistungsumfang": "Abendessen, Mittagstisch (So), Terrasse",
"services": ["Restaurant", "Deutsche Küche", "Terrasse"],
"unternehmensinformationen": { // Example structure
"gründungsjahr": ["2010"],
"parkplätze": ["vorhanden"]
},
"ausbildung": null, // Or text/list if available
"zahlungsmittel": ["EC-Karte", "Kreditkarte", "Bar"],
"social_media": {
"instagram": "https://instagram.com/musterrestaurant"
},
"google_maps_url": "

Each record includes additional fields depending on the search_mode selected. See below for a full field reference:

KeyAvailable inDescription
namefast/basic/deepCompany name from ether the listing or detail page.
adresse_from_searchfast/deepAddress snippet from listing page.
addressbasic/deepStreet address from detail page.
telefonnummerfast/basic/deepPhone number from ether the listing or detail page.
bewertungfast/deepAverage rating (numeric).
bewertungenfast/deepNumber of reviews.
besteBranchefast/deepPrimary branch/industry from listing.
branchebasic/deepIndustry from detail page.
emailbasic/deepEmail provided on the detail page.
emaillinkfast/deepDecoded link to the email on the detail page.
base64_emaillinkfast/deepRaw Base64 email data attribute.
webseitelinkfast/deepDecoded website link.
base64_webseitelinkfast/deepRaw Base64 website data attribute.
beschreibungbasic/deepBusiness description on the detail page.
oeffnungszeitendeepOpening hours by day.
leistungsumfangdeepScope of services.
servicesdeepList of services offered.
unternehmensinformationendeepAdditional company information.
ausbildungdeepEducation or training information.
zahlungsmitteldeepAccepted payment methods.
social_mediabasic/deepSocial media links.
google_maps_urldeepGoogle Maps search URL.
faqdeepList of FAQ Q&A pairs (if any).

Note: Any field may be null if not present on the page.

⚙️ Usage

  1. Configure inputs in the "Input" tab (set search_what, search_where, etc.).
  2. Choose a proxy mode. Automatic Apify Proxy is recommended for reliability.
  3. Click Start.
  4. Monitor progress in the Log tab.
  5. Access results under StorageDataset.

🎯 Use Cases

  • Lead generation and contact harvesting.
  • Market research and competitor analysis.
  • Local SEO and business directory creation.
  • Data enrichment pipelines on Apify.

💲 Pricing

  • Monthly actor rental: $15.
  • 1000 listings, fast_search: ≈ $0.01, ~1.20 min run.
  • 1000 listings, basic_search: ≈ $0.04, ~11.00 min run.
  • 1000 listings, deep_search: ≈ $0.05, ~12.00 min run.

🔗 Integrations

  • Scheduler: automate daily/weekly runs.
  • Webhooks: trigger downstream workflows on completion.
  • API: programmatic control via Apify API.
  • Composer: chain with other Actors (e.g., cleaning, enrichment).

🧰 Technical Notes

  • Async HTTP via httpx, HTML parsing via BeautifulSoup4 + lxml.
  • Custom RateLimiter for throttling API & detail requests with randomized delays to reduce detectability.
  • The scraper deliberately bypasses robots.txt directives to ensure complete data retrieval. Use responsibly.
  • Gelbe Seiten uses Base64 encoding in the listing. The script will output a decoded value for each of them.

🛠️ Maintainer