Crexi Real Estate Scraper
Pricing
$10.00 / 1,000 results
Crexi Real Estate Scraper
Scrapes commercial real estate listings from Crexi.com including property details, pricing, location, images, and investment metrics.
5.0 (3)
Pricing
$10.00 / 1,000 results
0
2
1
Last modified
20 hours ago
This Apify actor scrapes publicly available commercial real estate data from Crexi.com, automating the extraction of key property listings and market details. The scraper outputs structured data for analysis, reporting, or integration with other systems.
Features
- Automated Scraping: Navigate through Crexi's property listings and extract relevant details
- Pagination Handling: Automatically process multiple pages to ensure comprehensive data collection
- Structured Output: Export scraped data in JSON format for easy analysis
- Configurable Extraction: Easily customize the fields to extract based on your specific needs
- Rate Limiting & Header Customization: Prevent overloading the server by adjusting request intervals and headers
- HTML Debugging: Saves HTML content for selector analysis during development
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
maxProperties | Integer | 50 | Maximum number of properties to scrape |
scrapeDetails | Boolean | true | Whether to scrape detailed property pages |
propertyTypes | Array | [] | List of property types to filter by |
locations | Array | [] | List of locations to filter by |
minPrice | Integer | null | Minimum price filter |
maxPrice | Integer | null | Maximum price filter |
rateLimitDelay | Integer | 2 | Delay between requests in seconds (rate limiting) |
Data Extracted
For each property listing available on Crexi, the scraper extracts:
Basic Information
property_id: Unique identifier for the propertyname: The title or name of the propertyproperty_type: Classification (e.g., Office, Retail, Industrial, Multifamily, etc.)property_url: Direct link to the detailed property page
Location
address: Full street addresscity: City namestate: State abbreviation (e.g., CA, NY)zip_code: ZIP code
Financial Information
price: Sale price or asking pricelease_rate: Rental rates or lease informationinvestment_metrics: Object containing:cap_rate: Capitalization ratenoi: Net Operating Incomecash_on_cash: Cash on cash return
Physical Details
square_footage: Total area or leasable spacelot_size: Land area or lot sizespecifications: Object containing:year_built: Year the building was constructedunits: Number of units (for multifamily properties)parking: Parking spaces or parking informationbuilding_class: Building classification (Class A, B, C)zoning: Zoning information
Description & Features
description: Summary description from listing pagefull_description: Detailed description from property detail pagehighlights: Array of property highlights or key featuresfeatures: Array of property featuresamenities: Array of building amenities
Media & Documents
image_url: Primary image URL from listingimages: Array of all property images with URLs and alt textdocuments: Array of documents/brochures with URLs and names
Status & Metadata
availability: Property availability statuslisting_date: Date the property was listedscraped_at: Timestamp when data was scrapedsource: Source website (crexi.com)
Detailed Information (if scrapeDetails=true)
agent_info: Object containing:name: Listing agent namecompany: Brokerage companyphone: Contact phone numberemail: Contact email
similar_properties: Array of similar property listingsmetadata: Additional metadata including structured data
Output Data
Each property record is a JSON object containing all the fields listed above. Example:
{"property_id": "12345","name": "Downtown Office Building","property_type": "Office","address": "123 Main Street","city": "San Francisco","state": "CA","zip_code": "94102","price": "$5,500,000","square_footage": "15000","property_url": "https://www.crexi.com/properties/12345","scraped_at": "2025-10-30T12:00:00.000Z","source": "crexi.com"}
Usage Examples
Basic Usage
{"maxProperties": 25,"scrapeDetails": true}
Filtered by Property Type and Location
{"maxProperties": 100,"scrapeDetails": true,"propertyTypes": ["Office", "Retail"],"locations": ["San Francisco", "New York"]}
Quick Scraping (No Details)
{"maxProperties": 200,"scrapeDetails": false}
With Price Range and Rate Limiting
{"maxProperties": 50,"scrapeDetails": true,"minPrice": 1000000,"maxPrice": 10000000,"rateLimitDelay": 3}
Development Features
HTML Debugging
During development, the scraper saves HTML content to the key-value store for selector analysis:
crexi_initial_page_html: Contains the HTML content of the initial search pagecrexi_page_1_html,crexi_page_2_html, etc.: HTML content for each paginated pagedebug_crexi_html: Contains HTML when standard selectors fail to find listings
This allows you to analyze the page structure and refine selectors without making repeated requests.
Error Handling
- Comprehensive error handling with detailed logging
- Graceful handling of missing elements
- Continues processing even if individual properties fail
- Validates and cleans data before pushing to output
Browser Automation
- Uses Playwright for reliable browser automation
- Handles dynamic content loading
- Implements proper delays and waits
- Anti-detection measures to avoid bot detection
Rate Limiting
- Configurable delay between requests (
rateLimitDelayparameter) - Default 2-second delay to be respectful to the server
- Separate delays for listing pages and detail pages
Installation
- Install dependencies:
$pip install -r requirements.txt
- Install Playwright browsers:
$playwright install chromium
- Run the scraper:
$python -m src
Docker Usage
docker build -t crexi-scraper .docker run -e APIFY_TOKEN=your_token crexi-scraper
Apify Platform Usage
- Create a new actor on the Apify platform
- Upload all files from this directory
- Configure input parameters in the actor's input schema
- Run the actor and retrieve results from the dataset
Notes
- The scraper respects rate limits and implements delays between requests
- HTML content is saved for debugging purposes during development
- The scraper handles various property listing layouts and structures
- All URLs are properly resolved and normalized
- Data is validated and cleaned before being pushed to the output
- The scraper will continue even if some properties fail to load
- For production use, consider increasing
rateLimitDelayto 3-5 seconds
Limitations
- Requires active internet connection
- May be affected by website structure changes
- Some data fields may not be available for all properties
- Respects robots.txt and terms of service
Support
For issues, questions, or feature requests, please contact the development team or create an issue in the repository.
License
This scraper is provided as-is for educational and research purposes. Ensure you comply with Crexi's terms of service when using this tool.
On this page
Share Actor:
