General Purpose Web Scraping and Metadata Extraction avatar

General Purpose Web Scraping and Metadata Extraction

Try for free

1 day trial then $50.00/month - No credit card required now

Go to Store
General Purpose Web Scraping and Metadata Extraction

General Purpose Web Scraping and Metadata Extraction

moving_beacon-owner1/my-actor-10
Try for free

1 day trial then $50.00/month - No credit card required now

This project uses the Apify platform to scrape data from web pages, collect metadata, and store results in an Apify dataset. It features functions for managing date ranges, encoding identifiers, and handling large datasets, aiming to efficiently extract and store structured data for analysis.

Airbnb Data Scraper using Apify

This project is an Apify actor designed to scrape data from Airbnb property listings, including availability, pricing, and other details, over a given date range. The actor uses dynamic parameters for flexibility and stores the extracted data in Apify's dataset or a CSV file.


Features

  • Dynamic Date Range: Automatically generates check-in and check-out dates for the specified number of days.
  • Recursive JSON Parsing: Extracts all paths and values from the JSON responses for comprehensive data collection.
  • Data Storage: Pushes the extracted data to the Apify dataset or saves it locally as a CSV.
  • Configurable Inputs: Accepts various input parameters like URLs, stay duration, number of guests, and more.

Input Schema

The script accepts the following inputs via Apify:

ParameterDescriptionExample Value
startUrlsList of Airbnb listing URLs to scrape.[{ "url": "https://www.airbnb.com/rooms/12345" }]
checkInDateStarting date for the scraping."2024-11-21"
Stay_DaysDuration of each stay in days.1
numberOfDaysTotal number of days to scrape data for.60
adultsNumber of adults for the booking.2
childrenNumber of children for the booking.0
petsIndicates if pets are included in the booking.0

How It Works

  1. Dynamic Date Generator:

    • Generates check-in and check-out dates based on the input checkInDate, Stay_Days, and numberOfDays.
  2. Request Construction:

    • Encodes the Airbnb room ID in Base64 format.
    • Constructs GraphQL API requests with dynamically populated variables.
  3. Data Collection:

    • Sends GET requests to Airbnb's API for each listing and date range.
    • Extracts data paths and values using recursive JSON parsing.
  4. Data Storage:

    • Pushes the extracted data to the Apify dataset for further use.
    • Optionally saves data locally as a CSV file.

Output

The script outputs a dataset with the following fields:

FieldDescription
Check-In DateThe generated check-in date.
Check-Out DateThe corresponding check-out date.
PathJSON path of the extracted data.
ValueValue at the extracted JSON path.

Example Input

1{
2  "startUrls": [
3    { "url": "https://www.airbnb.com/rooms/12345" },
4    { "url": "https://www.airbnb.com/rooms/67890" }
5  ],
6  "checkInDate": "2024-11-21",
7  "Stay_Days": 1,
8  "numberOfDays": 10,
9  "adults": "2",
10  "children": "0",
11  "pets": "0"
12}

Logs

The script logs progress and errors to the console, including:

  • Current URL and date range being processed.
  • Any errors encountered during requests or data parsing.

Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 0 No stars yet

  • >99% runs succeeded

  • Created in Nov 2024

  • Modified 11 days ago