Gelbe Seiten Scraper avatar

Gelbe Seiten Scraper

Try for free

3 days trial then $10.00/month - No credit card required now

View all Actors
Gelbe Seiten Scraper

Gelbe Seiten Scraper

plowdata/gelbe-seiten
Try for free

3 days trial then $10.00/month - No credit card required now

Gather leads and information from one of Germany's most comprehensive business directories, Gelbe Seiten. Download your data as HTML table, JSON, CSV, XML, Excel, RSS, or JSONL.

This scraper is a specialized information gathering / lead-gen tool for the German market.
It's designed to extract all information from Gelbe Seiten listings, including emails, phone numbers, addresses, reviews, booking.com information and more.
Make use of one of the most comprehensive business directories in Germany, and gather all the information you need in one go.

Why choose this scraper?

  • Easy to use: Just enter your search query and let the scraper do the rest.
  • No page limit: This scraper can handle any number of pages, and will automatically stop when it reaches the end of the results.
  • Deep Extraction: This scraper can extract all information from a listing, including reviews, photos, and more.
  • Wide Range of output formats: Export your data as CSV, Excel, JSON, XML, or more.
  • Committed to quality: We're constantly improving our scrapers to ensure you get the best data possible.
  • Technical support and continous improvements: We're always here to help you with any issues you might have, and we're constantly improving our scrapers to ensure you get the best data possible. If the scraper encounters and information it cannot yet handle, it will give you a warning, but continue to scrape the rest of the data, just open an issue with the log-output and we'll get to work on it.
  • Fast: Our scrapers are designed to be fast, so you can get the data you need quickly and easily. Even with detailed per-listing extraction, the scraper only takes ~45s to scrape 1500 listings, thats over 30 listings per second!
  • Reliable: The scraper is resistant to malformed data and can automatically recover from most errors. Even when starting and stopping the scraper (or if apify migrates the scraper to a different server), the scraper will continue where it left off. And thanks to it's built-in deduplication engine, it will not scrape the same listing twice, reducing any post-processing work you might have to do.
  • Strong typing: We provide a strong typing system for the output, so you can be sure that the data you get is clean and consistent.

Input

The scraper only has a single required input, the query parameter. This is the search query you would enter on the Gelbe Seiten website, the other inputs are:

  • location: (optional) The location to search in, if not provided, the scraper will search in all of Germany.
  • sort: (optional) The sort-order of the results, can be either relevance or bewertung, if not provided, the scraper will fall back to Gelbe Seiten's default sort-order, relevance.
  • maxPages: (optional) The maximum number of pages to scrape, if not provided, the scraper will scrape all pages. (Note that there's on average ~10 listings per page, where the last can have less)

Output format

The scraper returns a dataset, with one item per listing, which are structured as follows:

1// Hotels, etc. have a lot of additional information, that's provided via the embedded booking.com widget
2    // This part of the output is a parsed and cleaned version of this widget 
3    const BookingComInfoSchema = z.object({
4        images: z.array(z.string()),
5        score: z.number(),
6        description: z.string(),
7        outdoorArea: z.array(z.string()).optional(),
8        distancing: z.array(z.string()).optional(),
9        foodAndDrink: z.array(z.string()).optional(),
10        safety: z.array(z.string()).optional(),
11        cleaningAndDisinfection: z.array(z.string()).optional(),
12        services: z.array(z.string()).optional(),
13        internet: z.array(z.string()).optional(),
14        general: z.array(z.string()).optional(),
15        parking: z.array(z.string()).optional(),
16        rooms: z.array(z.string()).optional(),
17        paymentMethods: z.array(z.string()).optional(),
18        openingHours: z.array(z.string()).optional(),
19        receptionService: z.array(z.string()).optional(),
20        foodSafety: z.array(z.string()).optional(),
21        safetyFacilities: z.array(z.string()).optional(),
22        poolAndWellness: z.array(z.string()).optional(),
23        access: z.array(z.string()).optional(),
24        activities: z.array(z.string()).optional(),
25        publicTransport: z.array(z.string()).optional(),
26        entertainmentAndFamilyOffers: z.array(z.string()).optional(),
27        membershipAndServiceLanguages: z.array(z.string()).optional(),
28        kitchen: z.array(z.string()).optional(),
29        businessFacilities: z.array(z.string()).optional(),
30        cleaningServices: z.array(z.string()).optional(),
31        generalFacilities: z.array(z.string()).optional(),
32        miscellaneous: z.array(z.string()).optional(),
33        shops: z.array(z.string()).optional(),
34        ski: z.array(z.string()).optional(),
35    });
36
37    // This part of the output is directly taken from gelbeseiten.de, and just passed through
38    const ReviewSchema = z.object({
39        text: z.string(),
40        erstellungsDatumIso: z.string(),
41        erstellungsDatumFormatiert: z.string(),
42        bearbeitungsDatum: z.string().nullable(),
43        bewertungBeiAnbieter: z.number().nullable(),
44        geraetetypBewertungsabgabe: z.string().nullable(),
45        bewertungsbogenTextUrl: z.string().nullable(),
46        bewertungsKriteriumListe: z.array(
47            z.object({
48                kriterium: z.string(),
49                bewertung: z.number(),
50                text: z.string().nullable(),
51            }),
52        ).nullable(),
53        bewertungsId: z.string().nullable(),
54        bewertungsportal: z.string().nullable(),
55        likeListe: z.array(z.object({
56            benutzer: z.object({
57                name: z.string().nullable(),
58                nutzerprofilUrl: z.string().nullable(),
59                nutzerbildUrl: z.string().nullable(),
60            }),
61        })).nullable(),
62        partnerBewertungstextUrl: z.string().nullable(),
63        bewertungNormiert: z.number().min(0).max(5),
64        anzahlLikes: z.string().nullable(),
65        anzahlKommentare: z.number(),
66        reaktionListe: z.array(
67            z.object({
68                text: z.string(),
69                erstellungsDatumFormatiert: z.string(),
70                reaktionsTyp: z.string(),
71                anzahlLikes: z.string().nullable(),
72                erstellungsDatum: z.string(),
73                benutzer: z.object({
74                    name: z.string().nullable(),
75                    nutzerprofilUrl: z.string().nullable(),
76                    nutzerbildUrl: z.string().nullable(),
77                }),
78                spamUrl: z.string().nullable(),
79            }),
80        ),
81        verifikationListe: z.array(z.string()),
82        bewertungTextAnzahl: z.number(),
83        erstellungsDatum: z.string(),
84        bewertungsbogenSterneUrl: z.string().nullable(),
85        produkt: z.object({
86            partnerName: z.string().nullable(),
87            name: z.string().nullable(),
88            information: z.string().nullable(),
89        }),
90        benutzer: z.object({
91            name: z.string().nullable(),
92            nutzerprofilUrl: z.string().nullable(),
93            nutzerbildUrl: z.string().nullable(),
94        }),
95        titel: z.string().nullable(),
96        partnerName: z.string().nullable(),
97        spamUrl: z.string().nullable(),
98    })
99    const ExtraInfoSchema = z.object({
100        brands: z.array(z.string()).optional(),
101        memberships: z.array(z.string()).optional(),
102        languages: z.array(z.string()).optional(),
103        accessibility: z.array(z.string()).optional(),
104    });
105
106    const OutputSchema = z.object({
107        // Search-results
108        id: z.string(),
109        memberId: z.string().optional(),
110        name: z.string(),
111        logoURL: z.string().optional(),
112        bestIndustry: z.string(),
113        googleMapsAddress: z.string().optional(),
114        address: z.string(),
115        phone: z.string(),
116        website: z.string().optional(),
117        shortDescription: z.string().optional(),
118        highlightLevel: z.number().optional(),
119        partnerLevel: z.string().optional(),
120        rating: z.number().optional(),
121        ratingCount: z.number().optional(),
122
123        // Detail-page
124        email: z.string().optional(),
125        openingHours: z.array(z.object({
126            day: z.string(),
127            hours: z.array(
128                z.object({
129                    closed: z.boolean(),
130                    from: z.string().optional(),
131                    to: z.string().optional(),
132                }),
133            ),
134        })).optional(),
135        additionalPhoneNumbers: z.array(z.object({
136            title: z.string(),
137            number: z.string(),
138        })).optional(),
139        menu: z.string().optional(),
140        // TODO:
141        reviews: z.array(ReviewSchema).optional(),
142        description: z.string().optional(),
143        acceptedPaymentMethods: z.array(z.string()).optional(),
144        images: z.array(z.object({
145            src: z.string(),
146            caption: z.string(),
147        })).optional(),
148        socialAccounts: z.array(z.object({
149            url: z.string(),
150            type: z.enum([
151                'unknown', 'facebook', 'twitter', 'instagram',
152                'linkedin', 'youtube', 'google', 'pinterest',
153                'tiktok', 'snapchat', 'whatsapp', 'telegram',
154                'xing', 'unknown',
155            ]),
156        })).optional(),
157        brochure: z.string().optional(),
158        openPositions: z.array(z.record(z.string())).optional(),
159        faq: z.array(z.object({
160            question: z.string(),
161            answer: z.string(),
162        })).optional(),
163        industries: z.array(z.string()).optional(),
164        services: z.array(z.string()).optional(),
165        extraInfo: ExtraInfoSchema.optional(),
166        bookingInfo: BookingComInfoSchema.optional(),
167        // The ids of any other listings, that are related to this one
168        relatedIds: z.array(z.string()).optional(),
169    });

Note that the schema is enforced, so you can be sure that the data you get is clean and consistent.
If there's any changes to the data, e.g. if additional properties are added, the schema would be updated accordingly, and you'll be notified of the changes.

Target Audience

This scraper is designed for anyone who needs to gather information from Gelbe Seiten listings, including:

  • Marketing/Sales teams: Use the scraper to gather leads for your sales team or to find potential customers for your marketing campaigns.
  • Business owners: Use the scraper to gather information about your competitors or to find potential partners.
  • Researchers: Use the scraper to gather data for your research projects or to find information for your academic papers.
  • Journalists: Use the scraper to gather information for your articles or to find potential sources for your stories.
  • Data analysts: Use the scraper to gather data for your analysis projects or to find information for your reports.
  • Anyone else who needs to gather information from Gelbe Seiten listings: Use the scraper to gather information for any other purpose you might have.
Developer
Maintained by Community
Actor metrics
  • 6 monthly users
  • 4 stars
  • 93.8% runs succeeded
  • Created in Aug 2024
  • Modified about 1 month ago