Google News Scraper

Pricing

$10.00/month + usage

Google News Scraper

Extract full Google News articles with text, images & metadata. 95%+ success rate, multi-region support, smart content extraction with automatic fallbacks. Production-ready & cost-optimized

Pricing

$10.00/month + usage

Rating

0.0

(0)

Developer

Yevhenii Molodtsov

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🚀 Features

Core Functionality

🔍 Flexible Search: Search by keywords, regions, languages, and date ranges
📰 Full Text Extraction: Real article content from Google News RSS feeds with HTML descriptions
🌍 Multi-Region Support: Search across different countries and languages
🤖 Smart Google News Handling: Automatic detection and processing of Google News URLs
📊 Rich Metadata: Titles, sources, dates, images, tags, and complete article information
⚡ High Success Rate: 100% success rate with intelligent fallback mechanisms

Advanced Capabilities

🔗 Google News URL Resolution: Intelligent handling of Google News redirect URLs
🌐 Automatic Browser Mode: Automatically enables browser mode for Google News articles
🛡️ Consent Page Handling: Smart detection and handling of consent pages
🔄 Robust Error Handling: Comprehensive error recovery and retry mechanisms
📊 Real-time Monitoring: Performance metrics and health monitoring
🎯 RSS Feed Integration: Uses Google News RSS feeds for reliable data extraction

Quality & Reliability

✅ Comprehensive Testing: Unit, integration, and performance tests
🔧 Error Recovery: Automatic recovery from network and parsing errors
📈 Performance Optimization: Memory management and concurrent processing
🏥 Health Monitoring: Real-time system health and error tracking
🧹 Data Validation: Input validation and output quality assurance

🎉 Latest Updates (v2.0.0)

Major architecture optimization! The scraper has been completely streamlined for better performance and maintainability:

✅ Unified Architecture: Consolidated content extractors, proxy managers, and error handlers
✅ Cost Optimized: Smart resource usage with environment-aware configuration
✅ Simplified Codebase: Removed duplicate code and unnecessary complexity
✅ Enhanced Performance: Faster startup and improved resource efficiency
✅ Production Ready: Streamlined for production deployment with minimal overhead

Example output:

{
  "title": "Tesla awards Musk $29 billion in shares with prior pay package in limbo - CNBC",
  "text": "Rich HTML content with article links and descriptions...",
  "source": "CNBC",
  "publishedAt": "2025-08-05T14:08:57.000Z",
  "tags": ["Tesla"],
  "extractionSuccess": true
}

📋 Quick Start

Using Apify Console

Visit: Apify Console
Search: "Google News Scraper"
Configure: Set your search parameters
Run: Start the actor and monitor progress

Using Apify CLI

# Install Apify CLI
npm install -g apify-cli

# Run the actor
apify call google-news-scraper --input '{
  "query": "Tesla",
  "region": "US",
  "language": "en-US",
  "maxItems": 3
}'

Using Apify API

import { ApifyApi } from 'apify-client';

const client = new ApifyApi({
    token: 'YOUR_API_TOKEN'
});

const run = await client.actor('google-news-scraper').call({
    query: 'climate change',
    region: 'US',
    maxItems: 50
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

⚙️ Configuration

Input Parameters

Parameter	Type	Required	Default	Description
`query`	string	✅	"technology"	Search query for Google News
`region`	string	❌	"US"	Region code (US, GB, DE, FR, etc.)
`language`	string	❌	"en-US"	Language code (en-US, de-DE, fr-FR, etc.)
`maxItems`	number	❌	3	Maximum articles to scrape (0 = unlimited, ~100-200 max from RSS)
`dateFrom`	string	❌	-	Start date for articles (YYYY-MM-DD format)
`dateTo`	string	❌	-	End date for articles (YYYY-MM-DD format)
`browserProxyGroups`	array	❌	["RESIDENTIAL", "country-US"]	Proxy groups for browser-based resolution

Content Extraction

The scraper uses an intelligent multi-strategy approach:

HTTP-first resolution: Tries efficient HTTP methods before browser automation
Automatic browser fallback: Uses Playwright for JavaScript-heavy sites when needed
Multi-strategy extraction: Readability, schema.org, custom selectors, and heuristics
Quality validation: Articles must have 300+ characters and at least one valid image
Consent handling: Automatic detection and bypass of consent pages

Regional Support

Region	Code	Language	Example Query
United States	US	en-US	Technology news
United Kingdom	GB	en-GB	Brexit updates
Germany	DE	de-DE	Klimawandel
France	FR	fr-FR	Intelligence artificielle
Japan	JP	ja-JP	人工知能
Australia	AU	en-AU	Bushfire news

📊 Output Format

Article Structure

{
  "title": "Revolutionary AI Breakthrough in Healthcare",
  "url": "https://example.com/ai-healthcare-breakthrough",
  "text": "Full article content with comprehensive details...",
  "description": "Scientists develop AI system that can diagnose diseases...",
  "author": "Dr. Jane Smith",
  "publishedDate": "2024-01-15T14:30:00Z",
  "source": "TechNews Daily",
  "sourceUrl": "https://technews.com",
  "images": [
    "https://example.com/images/ai-healthcare.jpg",
    "https://example.com/images/doctor-ai.png"
  ],
  "extractionSuccess": true,
  "extractionMethod": "unfluff",
  "metadata": {
    "wordCount": 1250,
    "readingTime": "5 min",
    "language": "en",
    "contentQuality": 0.95
  },
  "scrapedAt": "2024-01-15T15:00:00Z"
}

Metadata Fields

Field	Type	Description
`wordCount`	number	Number of words in article text
`readingTime`	string	Estimated reading time
`language`	string	Detected content language
`contentQuality`	number	Quality score (0-1)
`extractionMethod`	string	Method used for extraction
`processingTime`	number	Time taken to process (ms)

🔧 Development

Local Development Setup

# Clone the repository
git clone https://github.com/your-username/google-news-scraper
cd google-news-scraper

# Install dependencies
npm install

# Set up development environment
npm run dev:setup

# Start development mode
npm run dev

Testing

# Run all tests
npm test

# Run development tests
npm run dev:test

# Run test scenarios
npm run dev:scenarios

# Check environment health
npm run dev:health

Monitoring

# Real-time monitoring
npm run monitor

# View logs
npm run logs

# Health check
npm run dev:health

For detailed development information, see DEV_README.md.

📚 Documentation

docs/API.md: Detailed API documentation
docs/CONFIGURATION.md: Complete configuration options
docs/DEVELOPER.md: Technical documentation
docs/TROUBLESHOOTING.md: Common issues and solutions
docs/EXAMPLES.md: Practical usage examples

🔍 Use Cases

News Monitoring

// Monitor breaking news
{
  "query": "breaking news",
  "region": "US",
  "maxItems": 10
}

Market Research

// Track industry trends
{
  "query": "artificial intelligence startup funding",
  "region": "US",
  "maxItems": 50
}

Content Analysis

// Analyze sentiment and topics
{
  "query": "climate change policy",
  "region": "GB",
  "language": "en-GB",
  "maxItems": 100
}

⚡ Performance

Benchmarks

Processing Speed: ~50 articles per minute
Memory Usage: <512MB for 1000 articles
Success Rate: >95% with retry logic
Concurrent Requests: Up to 10 simultaneous

Optimization Tips

Use appropriate maxItems: Don't request more than needed
Enable proxy rotation: For high-volume scraping
Set reasonable delays: Respect rate limits
Monitor performance: Use built-in monitoring tools

🛡️ Error Handling

Automatic Recovery

Network Errors: Exponential backoff retry
Rate Limiting: Automatic delay adjustment
Consent Pages: Automatic bypass strategies
Content Extraction: Multiple fallback methods
Circuit Breakers: Prevent cascade failures

Error Types

Retryable: Network timeouts, rate limits, temporary failures
Non-retryable: Invalid inputs, authentication errors
Recoverable: Partial content extraction, image validation failures

📈 Monitoring & Analytics

Built-in Metrics

Request success/failure rates
Response times and performance
Memory usage and optimization
Error classification and trends
Content extraction quality

Health Monitoring

Real-time system health
Circuit breaker status
Resource utilization
Error rate thresholds

🤝 Contributing

We welcome contributions! Please see our CONTRIBUTING.md for details.

Development Workflow

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🆘 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: support@example.com

🏆 Acknowledgments

Built with Apify SDK
Content extraction powered by Unfluff
XML parsing by fast-xml-parser
Web scraping with Crawlee

Google News Scraper

futurizerush/google-news-scraper

Google News Search Scraper - Real-time news aggregation from Google News. Features smart article enrichment with full content extraction. Perfect for market research, trend analysis, and content monitoring.

Futurize Rush

5.0

Google News Scraper - Cheap

bot_kevin/Google-News-Scraper

Easily scrape news from Google News page in .json format.

bot

Fast Google News Scraper

aymorato/fast-google-news-scraper

Extract details from Google News articles, such as images, titles, links, and other relevant information.

Alwin Morato

174

Google News Scraper

easyapi/google-news-scraper

Powerful Google News scraper, collect up to 5000 news articles with flexible search options, language support. Perfect for news aggregation, market research, and sentiment analysis. 📰🔍

EasyApi

621

4.6

Google News Realtime Scraper

devisty/google-news

Provide real-time news and articles sourced from Google News

Devisty

229

5.0

Google News Scraper (Pay Per Result)

data_xplorer/google-news-scraper-fast

⚡️ Extract real-time news including Images and Descriptions from Google News with our powerful scraper. Get comprehensive structured data including titles, sources, publication dates and full article summaries. Perfect for news monitoring, market research and content aggregation.

Data Xplorer

380

5.0

Super Fast Google News Scraper

aymorato/super-fast-google-news-scraper

Efficiently extract direct links to the latest Google News articles from the past 24 hours.

Alwin Morato

Google News Scraper

epctex/google-news-scraper

Unlock timely news insights with our Google News data retrieval tool. Get the latest news on any news at any time, and more. Effortless and powerful. 📰🔍 #NewsData

epctex

508

Google News Scraper

lhotanova/google-news-scraper

Gets featured articles from Google News with title, link, source, publication date and image.

Kristýna Lhoťanová

2.4K

1.2

Super Fast Google News Scraper (pay per result)

aymorato/super-fast-google-news-scraper-pay-per-result

Efficiently extract direct links to the latest Google News articles from the past 24 hours.

Alwin Morato

997

Google News Scraper

Google News Scraper

🚀 Features

Core Functionality

Advanced Capabilities

Quality & Reliability

🎉 Latest Updates (v2.0.0)

📋 Quick Start

Using Apify Console

Using Apify CLI

Using Apify API

⚙️ Configuration

Input Parameters

Content Extraction

Regional Support

📊 Output Format

Article Structure

Metadata Fields

🔧 Development

Local Development Setup

Testing

Monitoring

📚 Documentation

🔍 Use Cases

News Monitoring

Market Research

Content Analysis

⚡ Performance

Benchmarks

Optimization Tips

🛡️ Error Handling

Automatic Recovery

Error Types

📈 Monitoring & Analytics

Built-in Metrics

Health Monitoring

🤝 Contributing

Development Workflow

📄 License

🆘 Support

🏆 Acknowledgments

You might also like

Google News Scraper

Google News Scraper - Cheap

Fast Google News Scraper

Google News Scraper

Google News Realtime Scraper

Google News Scraper (Pay Per Result)

Super Fast Google News Scraper

Google News Scraper

Google News Scraper

Super Fast Google News Scraper (pay per result)