
PagineGialle Extractor
Pricing
$60.00/month + usage

PagineGialle Extractor
Extract precise Italian business data from paginegialle.it. This Actor uses an optimized API to fetch Multisearch JSON, capturing names, addresses, phones, websites, emails, social links, ratings, & categories. With pagination, rate limiting, & unique ID deduplication, it ensures reliable data.
5.0 (1)
Pricing
$60.00/month + usage
0
Total users
4
Monthly users
4
Runs succeeded
>99%
Last modified
8 days ago
PagineGialle Extractor Pro
A powerful and robust Apify actor for extracting detailed business data from the Italian Yellow Pages website, PagineGialle.it. It is designed to bypass anti-scraping measures, automatically handle full pagination, and provide structured data for business analysis, marketing, or lead generation.
Key Features
- Multi-Category & Multi-City Scraping: Launch complex searches for multiple business categories across several cities in a single run.
- Smart Automatic Pagination: The actor automatically detects the total number of result pages for each search and scrapes all of them, ensuring comprehensive data collection.
- Residential Proxy Support: Seamlessly integrates with Apify's residential proxies (
useProxy
) to avoid blocks and bypass anti-scraping measures, which is essential for getting complete results from PagineGialle. - Results Filtering: Refine your data collection by choosing to keep only businesses that have an email address or a phone number listed.
- High Concurrency: Configure the number of parallel requests (
maxConcurrency
) to optimize scraping speed according to your needs. - Rich, Structured Data: Extracts a wide range of data fields, including contact information, GPS coordinates, social media links, user ratings, and more.
- Stable Environment: Built on a stable version of Python (3.12) to ensure reliability and compatibility.
How to Use
- Go to the Actor on the Apify platform.
- Click the "Start" button.
- Configure your search in the "Input" tab.
- Add the desired categories (e.g.,
ristoranti
,hotel
). - Add the desired cities (e.g.,
roma
,milano
).
- Add the desired categories (e.g.,
- Enable Proxies: For complete results, it is highly recommended to leave the
useProxy
option checked (true
). Without it, the website will likely only return the first page of results. - Run the Actor and wait for the run to finish.
- Retrieve your data from the dataset's "Output" tab.
Input Configuration
The actor accepts the following parameters:
Parameter | Type | Description | Default Value |
---|---|---|---|
categories | Array | A list of business categories to search for. | ["ristoranti"] |
cities | Array | A list of cities in which to perform the searches. | ["roma"] |
useProxy | Boolean | Recommended. Uses Apify's residential proxies to avoid being blocked. | true |
maxConcurrency | Integer | The number of simultaneous requests to speed up scraping. | 5 |
filterByEmail | Boolean | If checked, only results containing an email address will be kept. | false |
filterByPhone | Boolean | If checked, only results containing a phone number will be kept. | false |
Output Schema
Each record in the output dataset will represent a single business and will have the following structure:
Field | Type | Description |
---|---|---|
businessName | String | The common name of the business. |
address | String | The full street address of the business. |
phoneNumber | String | The primary contact phone number. |
website | String | The URL of the business's official website. |
email | String | The contact email address. |
rating | Number | The average user rating (out of 5). |
reviews_count | Integer | The total number of user reviews. |
whatsapp | String | The WhatsApp business number, if available. |
facebook | String | The URL of the official Facebook page. |
instagram | String | The URL of the official Instagram profile. |
twitter | String | The URL of the official Twitter profile. |
latitude | String | The geographic latitude. |
longitude | String | The geographic longitude. |
zip_code | String | The postal code. |
city | String | The city where the business is located. |
province | String | The province (e.g., RM for Rome). |
opening_hours | String | A JSON string representing the opening hours schedule. |
description | String | A short description of the business activity. |
category | String | The category term used for the search. |
scraped_city | String | The city term used for the search. |
unique_id | String | A unique identifier for the business listing. |
timestamp | String | The UTC timestamp of when the data was scraped. |
License
This project is licensed under the MIT License.
On this page
Share Actor: