
Gelbe Seiten (German Yellow Pages) Scraper
Pricing
$15.00/month + usage

Gelbe Seiten (German Yellow Pages) Scraper
Scrape German business listings from Gelbe Seiten with flexible detail levels. This Apify Actor supports fast, basic, and deep search modes, rate limiting, proxy rotation, and index control. Ideal for lead gen, SEO, and market research. Outputs structured data to Apify datasets.
0.0 (0)
Pricing
$15.00/month + usage
2
Total users
4
Monthly users
4
Runs succeeded
>99%
Last modified
11 days ago
Gelbe Seiten (German Yellow Pages) Scraper
[Version: 0.4.0]
A Python-based Apify Actor designed to scrape business listings from Gelbe Seiten (www.gelbeseiten.de). It utilizes the site's internal API for efficient listing retrieval and offers three distinct modes for varying levels of detail extraction. Features include rate limiting, proxy configuration, and flexible index ranges for controlling pagination or resuming interrupted runs.
💡 Features
- Targeted Search: Specify the service or business type (
search_what
) and the geographic area (search_where
). Use"bundesweit"
for nationwide searches. - Three Search Modes (
search_mode
):fast_search
: Quickly extracts summary information directly from search result pages without visiting detail pages. Includes name, address snippet, phone, rating, primary branch, and encoded contact links (email/website). Ideal for rapid list building.basic_search
: Visits each business profile page once to fetch essential details like full address, email, website, description, industry, phone number, and social media links. Does not include summary data like ratings from the search results page.deep_search
: Combines the summary data fromfast_search
with a comprehensive detail page visit to extract all available fields, including opening hours, services, detailed company information, training opportunities, payment methods, social media links, fax number, Google Maps link, FAQs, etc. This is the most thorough mode.
- Index Control: Use
start_index
andend_index
(1-based) to define a specific range of listings to process, useful for resuming runs or targeted scraping. - Unlimited Mode: Set
max_businesses
to0
to scrape all available results matching the search criteria. - Rate Limiting: Configurable
requests_per_second
throttle applied to API calls (all modes) and detail page requests (basic_search
anddeep_search
) to manage load and reduce the risk of blocking. - Proxy Support: Leverages Apify's built-in proxy integration (
proxyConfiguration
) for reliable IP rotation during scraping, especially crucial for detail page visits. - Structured Output: Data is saved to the Apify dataset. Each record includes a
scraped_at
UTC timestamp and itsindex
(overall position in the search results).
📥 Input Parameters
Configure the actor's behavior using these fields in the Apify Console "Input" tab or via API:
Field | Type | Description | Default | Required |
---|---|---|---|---|
search_what | String | The business type, profession, or service to search for (e.g., "Restaurant", "Arzt", "Hotel", "Kreditvermittlung"). | "hotels" | Yes |
search_where | String | The geographic location (e.g., city name like "Berlin", region, or "bundesweit" for nationwide). | "bundesweit" | Yes |
search_mode | String | Extraction detail level: fast_search (summary only), basic_search (essential details from profile page), deep_search (summary + all profile details). | "basic_search" | No |
max_businesses | Integer | Maximum number of listings to save. Set 0 for unlimited (scrapes all found results). | 64 | No |
start_index | Integer | 1-based index of the first listing to save. Useful for resuming runs or skipping initial results. | 1 | No |
end_index | Integer | 1-based index of the last listing to save (inclusive). Set 0 to ignore this limit and rely solely on max_businesses . | 0 | No |
requests_per_second | Integer | Max requests per second. Applies to API calls (all modes) and detail page fetches (basic /deep modes). Lower values (e.g., 2-5) are safer, higher values (e.g., 10+) faster. | 12 | No |
proxyConfiguration | Object | Apify proxy settings (Automatic recommended) or custom proxy configuration. | {} | No |
🔹 Example Input
{"search_what": "Hotels","search_where": "Hamburg","search_mode": "deep_search","max_businesses": 200,"start_index": 51,"requests_per_second": 8,"proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }}
📤 Output Data Structure
Each record in the dataset is a JSON object. The exact fields depend on the selected search_mode
.
🔹 fast_search
Example Output
{"index": 1,"url": "https://www.gelbeseiten.de/gsbiz/abc123xyz","name": "SchnellTest GmbH","bewertung": 4.5,"bewertungen": 8,"besteBranche": "Testdienste","telefonnummer": "040 1234567","emaillink": "info@schnelltest.de", // Decoded from Base64"base64_emaillink": "aW5mb0BzY2huZWxsdGVzdC5kZQ==", // Raw Base64"webseitelink": "https://schnelltest.de", // Decoded from Base64"base64_webseitelink": "aHR0cHM6Ly9zY2huZWxsdGVzdC5kZQ==", // Raw Base64"adresse_from_search": "Teststraße 1, 20095 Hamburg Neustadt", // Address snippet from search results"scraped_at": "2025-04-30T09:30:00.123Z"}
🔹 basic_search
Example Output
{"index": 51, // Example if start_index was 51"url": "https://www.gelbeseiten.de/gsbiz/abc123xyz","name": "Hotel Hanseatic", // Extracted from detail page"email": "info@hotel-hanseatic.de", // Extracted from detail page"website": "http://www.hotel-hanseatic.de", // Extracted from detail page"beschreibung": "Gemütliches Hotel im Herzen von St. Georg.", // Extracted from detail page"branche": "Hotels", // Extracted from detail page"social_media": { // Extracted from detail page"facebook": "https://facebook.com/hotelhanseatic"},"address": "Steindamm 50, 20099 Hamburg St. Georg", // Address from detail page"telefonnummer": "040 9876543", // Extracted from detail page"scraped_at": "2025-04-30T09:35:00.789Z"}
Note: Fetches only from the detail page, does not include search result summary data.
🔹 deep_search
Example Output
Combines fast_search
summary data with all available detail page data.
{// --- Fields from fast_search (search results page) ---"index": 201,"url": "https://www.gelbeseiten.de/gsbiz/abc123xyz","name": "Muster Restaurant","bewertung": 4.8,"bewertungen": 55,"besteBranche": "Restaurants","telefonnummer": "040 1122334","emaillink": "reservierung@muster-restaurant.de","base64_emaillink": "cmVzZXJ2aWVydW5nQG11c3Rlci1yZXN0YXVyYW50LmRl","webseitelink": "https://www.muster-restaurant.de","base64_webseitelink": "aHR0cHM6Ly93d3cubXVzdGVyLXJlc3RhdXJhbnQuZGU=","adresse_from_search": "Musterweg 10, 20457 Hamburg Altstadt","scraped_at": "2025-04-30T09:40:00.456Z",// --- Additional fields from deep_search (detail page) ---"email": "reservierung@muster-restaurant.de","website": "https://www.muster-restaurant.de", // The same as webseitelink"beschreibung": "Moderne deutsche Küche mit saisonalen Zutaten.","oeffnungszeiten": { // Example structure"Mo.": "Ruhetag","Di.-Sa.": "18:00 - 23:00","So.": "12:00 - 15:00"},"branche": "Restaurant; Deutsche Küche", // Can be more detailed than besteBranche"leistungsumfang": "Abendessen, Mittagstisch (So), Terrasse","services": ["Restaurant", "Deutsche Küche", "Terrasse"],"unternehmensinformationen": { // Example structure"gründungsjahr": ["2010"],"parkplätze": ["vorhanden"]},"ausbildung": null, // Or text/list if available"zahlungsmittel": ["EC-Karte", "Kreditkarte", "Bar"],"social_media": {"instagram": "https://instagram.com/musterrestaurant"},"google_maps_url": "
Each record includes additional fields depending on the search_mode
selected. See below for a full field reference:
Key | Available in | Description |
---|---|---|
name | fast/basic/deep | Company name from ether the listing or detail page. |
adresse_from_search | fast/deep | Address snippet from listing page. |
address | basic/deep | Street address from detail page. |
telefonnummer | fast/basic/deep | Phone number from ether the listing or detail page. |
bewertung | fast/deep | Average rating (numeric). |
bewertungen | fast/deep | Number of reviews. |
besteBranche | fast/deep | Primary branch/industry from listing. |
branche | basic/deep | Industry from detail page. |
email | basic/deep | Email provided on the detail page. |
emaillink | fast/deep | Decoded link to the email on the detail page. |
base64_emaillink | fast/deep | Raw Base64 email data attribute. |
webseitelink | fast/deep | Decoded website link. |
base64_webseitelink | fast/deep | Raw Base64 website data attribute. |
beschreibung | basic/deep | Business description on the detail page. |
oeffnungszeiten | deep | Opening hours by day. |
leistungsumfang | deep | Scope of services. |
services | deep | List of services offered. |
unternehmensinformationen | deep | Additional company information. |
ausbildung | deep | Education or training information. |
zahlungsmittel | deep | Accepted payment methods. |
social_media | basic/deep | Social media links. |
google_maps_url | deep | Google Maps search URL. |
faq | deep | List of FAQ Q&A pairs (if any). |
Note: Any field may be null
if not present on the page.
⚙️ Usage
- Configure inputs in the "Input" tab (set
search_what
,search_where
, etc.). - Choose a proxy mode. Automatic Apify Proxy is recommended for reliability.
- Click Start.
- Monitor progress in the Log tab.
- Access results under Storage → Dataset.
🎯 Use Cases
- Lead generation and contact harvesting.
- Market research and competitor analysis.
- Local SEO and business directory creation.
- Data enrichment pipelines on Apify.
💲 Pricing
- Monthly actor rental: $15.
- 1000 listings,
fast_search
: ≈ $0.01, ~1.20 min run. - 1000 listings,
basic_search
: ≈ $0.04, ~11.00 min run. - 1000 listings,
deep_search
: ≈ $0.05, ~12.00 min run.
🔗 Integrations
- Scheduler: automate daily/weekly runs.
- Webhooks: trigger downstream workflows on completion.
- API: programmatic control via Apify API.
- Composer: chain with other Actors (e.g., cleaning, enrichment).
🧰 Technical Notes
- Async HTTP via
httpx
, HTML parsing viaBeautifulSoup4
+lxml
. - Custom
RateLimiter
for throttling API & detail requests with randomized delays to reduce detectability. - The scraper deliberately bypasses
robots.txt
directives to ensure complete data retrieval. Use responsibly. - Gelbe Seiten uses Base64 encoding in the listing. The script will output a decoded value for each of them.
🛠️ Maintainer
- Author: Azquaier
- Contact: 📧 mail@azquaier.xyz
- Website: 🌍 azquaier.xyz