Patch Usa News Scraper avatar
Patch Usa News Scraper

Under maintenance

Pricing

$99.00 / 1,000 results

Go to Store
Patch Usa News Scraper

Patch Usa News Scraper

Under maintenance

Developed by

Runtime

Runtime

Maintained by Community

A robust web scraper to extract news articles from patch.com. This actor is designed to crawl patch.com and extract comprehensive article data including titles, authors, publish dates, content, and images.

5.0 (1)

Pricing

$99.00 / 1,000 results

1

Total users

1

Monthly users

1

Runs succeeded

>99%

Last modified

3 days ago

Patch.com News Scraper

A robust web scraper built with Apify SDK and Playwright to extract news articles from patch.com. This actor is designed to crawl patch.com and extract comprehensive article data including titles, authors, publish dates, content, and images.

Features

  • Comprehensive Data Extraction: Extracts article titles, authors, publish dates, content, and images
  • Robust Error Handling: Continues scraping even if individual pages fail
  • Proxy Support: Built-in proxy configuration for reliable scraping
  • Cloud Deployment Ready: Configured for Apify cloud platform
  • Flexible Input Configuration: Supports custom start URLs

⚠️ Important Notes

  1. Respect Patch.com's Terms of Service - Use this Actor responsibly and in accordance with Patch.com's policies
  2. Rate Limiting - The Actor includes built-in delays to avoid overwhelming Patch.com's servers
  3. Proxy Usage - For large-scale scraping, always use residential proxies
  4. Data Usage - Ensure you have permission to use scraped data for your intended purpose
  5. Public Articles Only - The Actor can only scrape publicly accessible Patch.com articles

Extracted Data Fields

  • url: The source URL of the article
  • title: Article headline
  • author: Article author name
  • publishDate: Publication date (ISO format when available)
  • content: Article content (truncated to 2000 characters)
  • imageUrl: Featured image URL
  • isArticle: Boolean indicating if the page is a news article
  • scrapedAt: Timestamp when the article was scraped

Input Configuration

The actor accepts the following input parameters:

{
"startUrls": [
{ "url": "https://patch.com/new-york/across-ny" }
]
}

Input Parameters

  • startUrls (array, optional): Array of objects with a url property to start crawling from. Default: [{"url": "https://patch.com/new-york/across-ny"}]

Output Schema

The actor outputs data in the following JSON format:

{
"url": "https://patch.com/new-york/across-ny/article-slug",
"title": "Article Title",
"author": "Author Name",
"publishDate": "2025-07-14T10:30:00.000Z",
"content": "Article content text (truncated to 2000 characters)...",
"imageUrl": "https://patch.com/img/cdn20/.../image.jpg",
"isArticle": true,
"scrapedAt": "2025-07-14T17:46:49.097Z"
}

Output Fields

  • url (string): The source URL of the article
  • title (string): Article headline/title
  • author (string): Article author name (may be empty if not found)
  • publishDate (string): Publication date in ISO format (may be empty if not found)
  • content (string): Article content text, truncated to 2000 characters
  • imageUrl (string): Featured image URL (may be empty if not found)
  • isArticle (boolean): Indicates if the page is a valid news article
  • scrapedAt (string): Timestamp when the article was scraped (ISO format)

Usage

Local Development

  1. Install dependencies:

    $npm install
  2. Run locally:

    $npm start
  3. Format code:

    $npm run format
  4. Lint code:

    npm run lint
    npm run lint:fix

Apify Cloud Deployment

  1. Push to Apify:

    $npm run push
  2. Run on Apify Cloud:

    $npm run agent:run
  3. Check logs:

    $npm run agent:log
  4. Pull latest changes:

    $npm run pull

Development Workflow

  1. Local Testing: Test changes locally with npm start
  2. Code Quality: Run npm run lint and npm run format before committing
  3. Cloud Testing: Push changes with npm run push and test on Apify
  4. Monitor Logs: Use npm run agent:log to check for errors
  5. Iterate: Fix issues and repeat the cycle

Troubleshooting

Common Issues

  1. Rate Limiting: If you encounter rate limiting, ensure proxy is properly configured
  2. Page Load Failures: The scraper waits for network idle state, but some pages may still fail
  3. Data Extraction Issues: Check the page structure if data extraction is incomplete

Debugging

  • Check logs with npm run agent:log
  • Run locally with npm start for detailed console output
  • Review the extracted dataset in Apify console

License

ISC License

Author

It's not you it's me