NextdoorScraper
Pricing
Pay per usage
Go to Apify Store
NextdoorScraper
Nextdoor recommendations scraper
0.0 (0)
Pricing
Pay per usage
0
5
5
Last modified
5 days ago
Nextdoor Recommendations Scraper
A web scraper built with Playwright and Crawlee that extracts recommendation posts from Nextdoor pages.
Features
- Extracts recommendation posts with author information, dates, content, and reactions
- Handles nested replies within recommendation threads
- Extracts reaction counts and types (likes, comments, shares)
- Robust error handling and retry logic
- Structured data output in JSON format
Installation
- Make sure you have Node.js installed
- Install dependencies:
$npm install
Usage
Running the Scraper
- Update the input URL in
storage/key_value_stores/default/INPUT.json
:
{"startUrls": ["https://nextdoor.com/pages/your-target-page/"],"maxRequestsPerCrawl": 50,"cutOffDate": "2024-01-01T00:00:00.000Z"}
Optional Parameters:
cutOffDate
: ISO date string. Only posts newer than this date will be scraped. If not specified, all posts will be included.
- Run the scraper:
$npm start
Note: Some Nextdoor pages may require address selection or authentication. The scraper will handle common cases but may encounter access restrictions for certain pages.
Output Structure
The scraper outputs structured data with the following format:
{"url": "https://nextdoor.com/pages/example/","title": "Page Title","summary": "Page summary if available","posts": [{"author": {"name": "S. H.","initials": "S","avatar": "https://example.com/avatar.jpg"},"date": "20 Nov","content": "Post content text","reactions": {"total": 16,"types": ["like", "comment", "share"]},"replies": [{"author": {"name": "D. W.","initials": "D"},"date": "20 Nov","content": "Reply content","reactions": {"total": 1,"types": ["like"]}}]}],"scraped_at": "2024-01-15T10:30:00.000Z"}
Data Storage
Scraped data is stored in the storage/datasets/default/
directory as JSON files.
Configuration
- maxRequestsPerCrawl: Maximum number of pages to scrape per run
- startUrls: List of Nextdoor URLs to scrape
- cutOffDate: Optional ISO date string. Only posts newer than this date will be scraped
- Timeouts: Configurable in main.ts for different waiting conditions
Notes
- The scraper handles common Nextdoor page elements like cookie banners and address selection prompts
- It uses proper browser headers and user agents to avoid blocking
- Failed requests are logged and stored separately for debugging
- The scraper waits for dynamic content to load before extraction
Troubleshooting
- Address Selection Required: Some Nextdoor pages require address selection. The scraper will detect this and log a message.
- Rate Limiting: If you encounter rate limits, try reducing
maxRequestsPerCrawl
or adding delays. - Missing Data: The scraper uses fallback values for missing fields to ensure consistent output.
Technical Details
- Built with Playwright for browser automation
- Uses Crawlee for crawling infrastructure
- TypeScript for type safety
- Apify SDK for data management
- Robust CSS selector-based extraction