Gelbe Seiten Scraper
3 days trial then $10.00/month - No credit card required now
Gelbe Seiten Scraper
3 days trial then $10.00/month - No credit card required now
Gather leads and information from one of Germany's most comprehensive business directories, Gelbe Seiten. Download your data as HTML table, JSON, CSV, XML, Excel, RSS, or JSONL.
This scraper is a specialized information gathering / lead-gen tool for the German market.
It's designed to extract all information from Gelbe Seiten listings, including emails, phone numbers, addresses, reviews, booking.com information and more.
Make use of one of the most comprehensive business directories in Germany, and gather all the information you need in one go.
Why choose this scraper?
- Easy to use: Just enter your search query and let the scraper do the rest.
- No page limit: This scraper can handle any number of pages, and will automatically stop when it reaches the end of the results.
- Deep Extraction: This scraper can extract all information from a listing, including reviews, photos, and more.
- Wide Range of output formats: Export your data as CSV, Excel, JSON, XML, or more.
- Committed to quality: We're constantly improving our scrapers to ensure you get the best data possible.
- Technical support and continous improvements: We're always here to help you with any issues you might have, and we're constantly improving our scrapers to ensure you get the best data possible. If the scraper encounters and information it cannot yet handle, it will give you a warning, but continue to scrape the rest of the data, just open an issue with the log-output and we'll get to work on it.
- Fast: Our scrapers are designed to be fast, so you can get the data you need quickly and easily. Even with detailed per-listing extraction, the scraper only takes ~45s to scrape 1500 listings, thats over 30 listings per second!
- Reliable: The scraper is resistant to malformed data and can automatically recover from most errors. Even when starting and stopping the scraper (or if apify migrates the scraper to a different server), the scraper will continue where it left off. And thanks to it's built-in deduplication engine, it will not scrape the same listing twice, reducing any post-processing work you might have to do.
- Strong typing: We provide a strong typing system for the output, so you can be sure that the data you get is clean and consistent.
Input
The scraper only has a single required input, the query
parameter. This is the search query you would enter on the Gelbe Seiten website,
the other inputs are:
- location: (optional) The location to search in, if not provided, the scraper will search in all of Germany.
- sort: (optional) The sort-order of the results, can be either
relevance
orbewertung
, if not provided, the scraper will fall back to Gelbe Seiten's default sort-order,relevance
. - maxPages: (optional) The maximum number of pages to scrape, if not provided, the scraper will scrape all pages. (Note that there's on average ~10 listings per page, where the last can have less)
Output format
The scraper returns a dataset, with one item per listing, which are structured as follows:
1// Hotels, etc. have a lot of additional information, that's provided via the embedded booking.com widget 2 // This part of the output is a parsed and cleaned version of this widget 3 const BookingComInfoSchema = z.object({ 4 images: z.array(z.string()), 5 score: z.number(), 6 description: z.string(), 7 outdoorArea: z.array(z.string()).optional(), 8 distancing: z.array(z.string()).optional(), 9 foodAndDrink: z.array(z.string()).optional(), 10 safety: z.array(z.string()).optional(), 11 cleaningAndDisinfection: z.array(z.string()).optional(), 12 services: z.array(z.string()).optional(), 13 internet: z.array(z.string()).optional(), 14 general: z.array(z.string()).optional(), 15 parking: z.array(z.string()).optional(), 16 rooms: z.array(z.string()).optional(), 17 paymentMethods: z.array(z.string()).optional(), 18 openingHours: z.array(z.string()).optional(), 19 receptionService: z.array(z.string()).optional(), 20 foodSafety: z.array(z.string()).optional(), 21 safetyFacilities: z.array(z.string()).optional(), 22 poolAndWellness: z.array(z.string()).optional(), 23 access: z.array(z.string()).optional(), 24 activities: z.array(z.string()).optional(), 25 publicTransport: z.array(z.string()).optional(), 26 entertainmentAndFamilyOffers: z.array(z.string()).optional(), 27 membershipAndServiceLanguages: z.array(z.string()).optional(), 28 kitchen: z.array(z.string()).optional(), 29 businessFacilities: z.array(z.string()).optional(), 30 cleaningServices: z.array(z.string()).optional(), 31 generalFacilities: z.array(z.string()).optional(), 32 miscellaneous: z.array(z.string()).optional(), 33 shops: z.array(z.string()).optional(), 34 ski: z.array(z.string()).optional(), 35 }); 36 37 // This part of the output is directly taken from gelbeseiten.de, and just passed through 38 const ReviewSchema = z.object({ 39 text: z.string(), 40 erstellungsDatumIso: z.string(), 41 erstellungsDatumFormatiert: z.string(), 42 bearbeitungsDatum: z.string().nullable(), 43 bewertungBeiAnbieter: z.number().nullable(), 44 geraetetypBewertungsabgabe: z.string().nullable(), 45 bewertungsbogenTextUrl: z.string().nullable(), 46 bewertungsKriteriumListe: z.array( 47 z.object({ 48 kriterium: z.string(), 49 bewertung: z.number(), 50 text: z.string().nullable(), 51 }), 52 ).nullable(), 53 bewertungsId: z.string().nullable(), 54 bewertungsportal: z.string().nullable(), 55 likeListe: z.array(z.object({ 56 benutzer: z.object({ 57 name: z.string().nullable(), 58 nutzerprofilUrl: z.string().nullable(), 59 nutzerbildUrl: z.string().nullable(), 60 }), 61 })).nullable(), 62 partnerBewertungstextUrl: z.string().nullable(), 63 bewertungNormiert: z.number().min(0).max(5), 64 anzahlLikes: z.string().nullable(), 65 anzahlKommentare: z.number(), 66 reaktionListe: z.array( 67 z.object({ 68 text: z.string(), 69 erstellungsDatumFormatiert: z.string(), 70 reaktionsTyp: z.string(), 71 anzahlLikes: z.string().nullable(), 72 erstellungsDatum: z.string(), 73 benutzer: z.object({ 74 name: z.string().nullable(), 75 nutzerprofilUrl: z.string().nullable(), 76 nutzerbildUrl: z.string().nullable(), 77 }), 78 spamUrl: z.string().nullable(), 79 }), 80 ), 81 verifikationListe: z.array(z.string()), 82 bewertungTextAnzahl: z.number(), 83 erstellungsDatum: z.string(), 84 bewertungsbogenSterneUrl: z.string().nullable(), 85 produkt: z.object({ 86 partnerName: z.string().nullable(), 87 name: z.string().nullable(), 88 information: z.string().nullable(), 89 }), 90 benutzer: z.object({ 91 name: z.string().nullable(), 92 nutzerprofilUrl: z.string().nullable(), 93 nutzerbildUrl: z.string().nullable(), 94 }), 95 titel: z.string().nullable(), 96 partnerName: z.string().nullable(), 97 spamUrl: z.string().nullable(), 98 }) 99 const ExtraInfoSchema = z.object({ 100 brands: z.array(z.string()).optional(), 101 memberships: z.array(z.string()).optional(), 102 languages: z.array(z.string()).optional(), 103 accessibility: z.array(z.string()).optional(), 104 }); 105 106 const OutputSchema = z.object({ 107 // Search-results 108 id: z.string(), 109 memberId: z.string().optional(), 110 name: z.string(), 111 logoURL: z.string().optional(), 112 bestIndustry: z.string(), 113 googleMapsAddress: z.string().optional(), 114 address: z.string(), 115 phone: z.string(), 116 website: z.string().optional(), 117 shortDescription: z.string().optional(), 118 highlightLevel: z.number().optional(), 119 partnerLevel: z.string().optional(), 120 rating: z.number().optional(), 121 ratingCount: z.number().optional(), 122 123 // Detail-page 124 email: z.string().optional(), 125 openingHours: z.array(z.object({ 126 day: z.string(), 127 hours: z.array( 128 z.object({ 129 closed: z.boolean(), 130 from: z.string().optional(), 131 to: z.string().optional(), 132 }), 133 ), 134 })).optional(), 135 additionalPhoneNumbers: z.array(z.object({ 136 title: z.string(), 137 number: z.string(), 138 })).optional(), 139 menu: z.string().optional(), 140 // TODO: 141 reviews: z.array(ReviewSchema).optional(), 142 description: z.string().optional(), 143 acceptedPaymentMethods: z.array(z.string()).optional(), 144 images: z.array(z.object({ 145 src: z.string(), 146 caption: z.string(), 147 })).optional(), 148 socialAccounts: z.array(z.object({ 149 url: z.string(), 150 type: z.enum([ 151 'unknown', 'facebook', 'twitter', 'instagram', 152 'linkedin', 'youtube', 'google', 'google_maps', 'pinterest', 153 'tiktok', 'snapchat', 'whatsapp', 'telegram', 154 'xing', 'unknown', 155 ]), 156 })).optional(), 157 brochure: z.string().optional(), 158 openPositions: z.array(z.record(z.string())).optional(), 159 faq: z.array(z.object({ 160 question: z.string(), 161 answer: z.string(), 162 })).optional(), 163 industries: z.array(z.string()).optional(), 164 services: z.array(z.string()).optional(), 165 extraInfo: ExtraInfoSchema.optional(), 166 bookingInfo: BookingComInfoSchema.optional(), 167 // The ids of any other listings, that are related to this one 168 relatedIds: z.array(z.string()).optional(), 169 });
Note that the schema is enforced, so you can be sure that the data you get is clean and consistent.
If there's any changes to the data, e.g. if additional properties are added, the schema would be updated accordingly, and you'll be notified of the changes.
Target Audience
This scraper is designed for anyone who needs to gather information from Gelbe Seiten listings, including:
- Marketing/Sales teams: Use the scraper to gather leads for your sales team or to find potential customers for your marketing campaigns.
- Business owners: Use the scraper to gather information about your competitors or to find potential partners.
- Researchers: Use the scraper to gather data for your research projects or to find information for your academic papers.
- Journalists: Use the scraper to gather information for your articles or to find potential sources for your stories.
- Data analysts: Use the scraper to gather data for your analysis projects or to find information for your reports.
- Anyone else who needs to gather information from Gelbe Seiten listings: Use the scraper to gather information for any other purpose you might have.
Actor Metrics
10 monthly users
-
5 stars
>99% runs succeeded
3.3 days response time
Created in Aug 2024
Modified 2 months ago