PagineGialle Extractor avatar
PagineGialle Extractor

Pricing

$60.00/month + usage

Go to Store
PagineGialle Extractor

PagineGialle Extractor

Developed by

DATA2B

DATA2B

Maintained by Community

Extract precise Italian business data from paginegialle.it. This Actor uses an optimized API to fetch Multisearch JSON, capturing names, addresses, phones, websites, emails, social links, ratings, & categories. With pagination, rate limiting, & unique ID deduplication, it ensures reliable data.

5.0 (1)

Pricing

$60.00/month + usage

0

Total users

4

Monthly users

4

Runs succeeded

>99%

Last modified

8 days ago

PagineGialle Extractor Pro

Apify Actor Python Version License: MIT

A powerful and robust Apify actor for extracting detailed business data from the Italian Yellow Pages website, PagineGialle.it. It is designed to bypass anti-scraping measures, automatically handle full pagination, and provide structured data for business analysis, marketing, or lead generation.

Key Features

  • Multi-Category & Multi-City Scraping: Launch complex searches for multiple business categories across several cities in a single run.
  • Smart Automatic Pagination: The actor automatically detects the total number of result pages for each search and scrapes all of them, ensuring comprehensive data collection.
  • Residential Proxy Support: Seamlessly integrates with Apify's residential proxies (useProxy) to avoid blocks and bypass anti-scraping measures, which is essential for getting complete results from PagineGialle.
  • Results Filtering: Refine your data collection by choosing to keep only businesses that have an email address or a phone number listed.
  • High Concurrency: Configure the number of parallel requests (maxConcurrency) to optimize scraping speed according to your needs.
  • Rich, Structured Data: Extracts a wide range of data fields, including contact information, GPS coordinates, social media links, user ratings, and more.
  • Stable Environment: Built on a stable version of Python (3.12) to ensure reliability and compatibility.

How to Use

  1. Go to the Actor on the Apify platform.
  2. Click the "Start" button.
  3. Configure your search in the "Input" tab.
    • Add the desired categories (e.g., ristoranti, hotel).
    • Add the desired cities (e.g., roma, milano).
  4. Enable Proxies: For complete results, it is highly recommended to leave the useProxy option checked (true). Without it, the website will likely only return the first page of results.
  5. Run the Actor and wait for the run to finish.
  6. Retrieve your data from the dataset's "Output" tab.

Input Configuration

The actor accepts the following parameters:

ParameterTypeDescriptionDefault Value
categoriesArrayA list of business categories to search for.["ristoranti"]
citiesArrayA list of cities in which to perform the searches.["roma"]
useProxyBooleanRecommended. Uses Apify's residential proxies to avoid being blocked.true
maxConcurrencyIntegerThe number of simultaneous requests to speed up scraping.5
filterByEmailBooleanIf checked, only results containing an email address will be kept.false
filterByPhoneBooleanIf checked, only results containing a phone number will be kept.false

Output Schema

Each record in the output dataset will represent a single business and will have the following structure:

FieldTypeDescription
businessNameStringThe common name of the business.
addressStringThe full street address of the business.
phoneNumberStringThe primary contact phone number.
websiteStringThe URL of the business's official website.
emailStringThe contact email address.
ratingNumberThe average user rating (out of 5).
reviews_countIntegerThe total number of user reviews.
whatsappStringThe WhatsApp business number, if available.
facebookStringThe URL of the official Facebook page.
instagramStringThe URL of the official Instagram profile.
twitterStringThe URL of the official Twitter profile.
latitudeStringThe geographic latitude.
longitudeStringThe geographic longitude.
zip_codeStringThe postal code.
cityStringThe city where the business is located.
provinceStringThe province (e.g., RM for Rome).
opening_hoursStringA JSON string representing the opening hours schedule.
descriptionStringA short description of the business activity.
categoryStringThe category term used for the search.
scraped_cityStringThe city term used for the search.
unique_idStringA unique identifier for the business listing.
timestampStringThe UTC timestamp of when the data was scraped.

License

This project is licensed under the MIT License.