General Purpose Web Scraping and Metadata Extraction
1 day trial then $50.00/month - No credit card required now
General Purpose Web Scraping and Metadata Extraction
1 day trial then $50.00/month - No credit card required now
This project uses the Apify platform to scrape data from web pages, collect metadata, and store results in an Apify dataset. It features functions for managing date ranges, encoding identifiers, and handling large datasets, aiming to efficiently extract and store structured data for analysis.
Airbnb Data Scraper using Apify
This project is an Apify actor designed to scrape data from Airbnb property listings, including availability, pricing, and other details, over a given date range. The actor uses dynamic parameters for flexibility and stores the extracted data in Apify's dataset or a CSV file.
Features
- Dynamic Date Range: Automatically generates check-in and check-out dates for the specified number of days.
- Recursive JSON Parsing: Extracts all paths and values from the JSON responses for comprehensive data collection.
- Data Storage: Pushes the extracted data to the Apify dataset or saves it locally as a CSV.
- Configurable Inputs: Accepts various input parameters like URLs, stay duration, number of guests, and more.
Input Schema
The script accepts the following inputs via Apify:
Parameter | Description | Example Value |
---|---|---|
startUrls | List of Airbnb listing URLs to scrape. | [{ "url": "https://www.airbnb.com/rooms/12345" }] |
checkInDate | Starting date for the scraping. | "2024-11-21" |
Stay_Days | Duration of each stay in days. | 1 |
numberOfDays | Total number of days to scrape data for. | 60 |
adults | Number of adults for the booking. | 2 |
children | Number of children for the booking. | 0 |
pets | Indicates if pets are included in the booking. | 0 |
How It Works
-
Dynamic Date Generator:
- Generates check-in and check-out dates based on the input
checkInDate
,Stay_Days
, andnumberOfDays
.
- Generates check-in and check-out dates based on the input
-
Request Construction:
- Encodes the Airbnb room ID in Base64 format.
- Constructs GraphQL API requests with dynamically populated variables.
-
Data Collection:
- Sends GET requests to Airbnb's API for each listing and date range.
- Extracts data paths and values using recursive JSON parsing.
-
Data Storage:
- Pushes the extracted data to the Apify dataset for further use.
- Optionally saves data locally as a CSV file.
Output
The script outputs a dataset with the following fields:
Field | Description |
---|---|
Check-In Date | The generated check-in date. |
Check-Out Date | The corresponding check-out date. |
Path | JSON path of the extracted data. |
Value | Value at the extracted JSON path. |
Example Input
1{ 2 "startUrls": [ 3 { "url": "https://www.airbnb.com/rooms/12345" }, 4 { "url": "https://www.airbnb.com/rooms/67890" } 5 ], 6 "checkInDate": "2024-11-21", 7 "Stay_Days": 1, 8 "numberOfDays": 10, 9 "adults": "2", 10 "children": "0", 11 "pets": "0" 12}
Logs
The script logs progress and errors to the console, including:
- Current URL and date range being processed.
- Any errors encountered during requests or data parsing.
Actor Metrics
1 monthly user
-
0 No stars yet
>99% runs succeeded
Created in Nov 2024
Modified 11 days ago