
Patch Usa News Scraper
Under maintenance
Pricing
$99.00 / 1,000 results

Patch Usa News Scraper
Under maintenance
A robust web scraper to extract news articles from patch.com. This actor is designed to crawl patch.com and extract comprehensive article data including titles, authors, publish dates, content, and images.
5.0 (1)
Pricing
$99.00 / 1,000 results
1
Total users
1
Monthly users
1
Runs succeeded
>99%
Last modified
3 days ago
Patch.com News Scraper
A robust web scraper built with Apify SDK and Playwright to extract news articles from patch.com. This actor is designed to crawl patch.com and extract comprehensive article data including titles, authors, publish dates, content, and images.
Features
- Comprehensive Data Extraction: Extracts article titles, authors, publish dates, content, and images
- Robust Error Handling: Continues scraping even if individual pages fail
- Proxy Support: Built-in proxy configuration for reliable scraping
- Cloud Deployment Ready: Configured for Apify cloud platform
- Flexible Input Configuration: Supports custom start URLs
⚠️ Important Notes
- Respect Patch.com's Terms of Service - Use this Actor responsibly and in accordance with Patch.com's policies
- Rate Limiting - The Actor includes built-in delays to avoid overwhelming Patch.com's servers
- Proxy Usage - For large-scale scraping, always use residential proxies
- Data Usage - Ensure you have permission to use scraped data for your intended purpose
- Public Articles Only - The Actor can only scrape publicly accessible Patch.com articles
Extracted Data Fields
url
: The source URL of the articletitle
: Article headlineauthor
: Article author namepublishDate
: Publication date (ISO format when available)content
: Article content (truncated to 2000 characters)imageUrl
: Featured image URLisArticle
: Boolean indicating if the page is a news articlescrapedAt
: Timestamp when the article was scraped
Input Configuration
The actor accepts the following input parameters:
{"startUrls": [{ "url": "https://patch.com/new-york/across-ny" }]}
Input Parameters
startUrls
(array, optional): Array of objects with aurl
property to start crawling from. Default:[{"url": "https://patch.com/new-york/across-ny"}]
Output Schema
The actor outputs data in the following JSON format:
{"url": "https://patch.com/new-york/across-ny/article-slug","title": "Article Title","author": "Author Name","publishDate": "2025-07-14T10:30:00.000Z","content": "Article content text (truncated to 2000 characters)...","imageUrl": "https://patch.com/img/cdn20/.../image.jpg","isArticle": true,"scrapedAt": "2025-07-14T17:46:49.097Z"}
Output Fields
url
(string): The source URL of the articletitle
(string): Article headline/titleauthor
(string): Article author name (may be empty if not found)publishDate
(string): Publication date in ISO format (may be empty if not found)content
(string): Article content text, truncated to 2000 charactersimageUrl
(string): Featured image URL (may be empty if not found)isArticle
(boolean): Indicates if the page is a valid news articlescrapedAt
(string): Timestamp when the article was scraped (ISO format)
Usage
Local Development
-
Install dependencies:
$npm install -
Run locally:
$npm start -
Format code:
$npm run format -
Lint code:
npm run lintnpm run lint:fix
Apify Cloud Deployment
-
Push to Apify:
$npm run push -
Run on Apify Cloud:
$npm run agent:run -
Check logs:
$npm run agent:log -
Pull latest changes:
$npm run pull
Development Workflow
- Local Testing: Test changes locally with
npm start
- Code Quality: Run
npm run lint
andnpm run format
before committing - Cloud Testing: Push changes with
npm run push
and test on Apify - Monitor Logs: Use
npm run agent:log
to check for errors - Iterate: Fix issues and repeat the cycle
Troubleshooting
Common Issues
- Rate Limiting: If you encounter rate limiting, ensure proxy is properly configured
- Page Load Failures: The scraper waits for network idle state, but some pages may still fail
- Data Extraction Issues: Check the page structure if data extraction is incomplete
Debugging
- Check logs with
npm run agent:log
- Run locally with
npm start
for detailed console output - Review the extracted dataset in Apify console
License
ISC License
Author
It's not you it's me