NextdoorScraper avatar
NextdoorScraper

Pricing

Pay per usage

Go to Apify Store
NextdoorScraper

NextdoorScraper

Developed by

Ai Review

Ai Review

Maintained by Community

Nextdoor recommendations scraper

0.0 (0)

Pricing

Pay per usage

0

5

5

Last modified

5 days ago

Nextdoor Recommendations Scraper

A web scraper built with Playwright and Crawlee that extracts recommendation posts from Nextdoor pages.

Features

  • Extracts recommendation posts with author information, dates, content, and reactions
  • Handles nested replies within recommendation threads
  • Extracts reaction counts and types (likes, comments, shares)
  • Robust error handling and retry logic
  • Structured data output in JSON format

Installation

  1. Make sure you have Node.js installed
  2. Install dependencies:
$npm install

Usage

Running the Scraper

  1. Update the input URL in storage/key_value_stores/default/INPUT.json:
{
"startUrls": [
"https://nextdoor.com/pages/your-target-page/"
],
"maxRequestsPerCrawl": 50,
"cutOffDate": "2024-01-01T00:00:00.000Z"
}

Optional Parameters:

  • cutOffDate: ISO date string. Only posts newer than this date will be scraped. If not specified, all posts will be included.
  1. Run the scraper:
$npm start

Note: Some Nextdoor pages may require address selection or authentication. The scraper will handle common cases but may encounter access restrictions for certain pages.

Output Structure

The scraper outputs structured data with the following format:

{
"url": "https://nextdoor.com/pages/example/",
"title": "Page Title",
"summary": "Page summary if available",
"posts": [
{
"author": {
"name": "S. H.",
"initials": "S",
"avatar": "https://example.com/avatar.jpg"
},
"date": "20 Nov",
"content": "Post content text",
"reactions": {
"total": 16,
"types": ["like", "comment", "share"]
},
"replies": [
{
"author": {
"name": "D. W.",
"initials": "D"
},
"date": "20 Nov",
"content": "Reply content",
"reactions": {
"total": 1,
"types": ["like"]
}
}
]
}
],
"scraped_at": "2024-01-15T10:30:00.000Z"
}

Data Storage

Scraped data is stored in the storage/datasets/default/ directory as JSON files.

Configuration

  • maxRequestsPerCrawl: Maximum number of pages to scrape per run
  • startUrls: List of Nextdoor URLs to scrape
  • cutOffDate: Optional ISO date string. Only posts newer than this date will be scraped
  • Timeouts: Configurable in main.ts for different waiting conditions

Notes

  • The scraper handles common Nextdoor page elements like cookie banners and address selection prompts
  • It uses proper browser headers and user agents to avoid blocking
  • Failed requests are logged and stored separately for debugging
  • The scraper waits for dynamic content to load before extraction

Troubleshooting

  1. Address Selection Required: Some Nextdoor pages require address selection. The scraper will detect this and log a message.
  2. Rate Limiting: If you encounter rate limits, try reducing maxRequestsPerCrawl or adding delays.
  3. Missing Data: The scraper uses fallback values for missing fields to ensure consistent output.

Technical Details

  • Built with Playwright for browser automation
  • Uses Crawlee for crawling infrastructure
  • TypeScript for type safety
  • Apify SDK for data management
  • Robust CSS selector-based extraction