Smart News Scraper avatar
Smart News Scraper

Pricing

$5.00 / 1,000 results

Go to Apify Store
Smart News Scraper

Smart News Scraper

Developed by

AppliPlus

AppliPlus

Maintained by Community

Smart News Scraper (Apify Actor) – Scrape Google News by domains or keywords. Extract titles, summaries, URLs, dates & sources. Filter by time, remove duplicates, handle errors, and scale with Apify. Ideal for brand monitoring, competitor analysis, research, & trend tracking.

5.0 (1)

Pricing

$5.00 / 1,000 results

0

5

5

Last modified

21 days ago

📰 GNews API - Domain News Scraper

A powerful Flask API that scrapes news articles about any company from Google News across multiple topics. Perfect for monitoring company mentions, competitive intelligence, and market research.

🚀 Features

  • Multi-topic Search: Automatically searches for news, hiring, funding, layoffs, events, tech stack updates, and leadership changes
  • Google News Integration: Scrapes from Google News for comprehensive coverage
  • API Key Authentication: Secure access with environment-based API keys
  • Clean JSON Response: Structured data with titles, sources, timestamps, and URLs
  • Deployment Ready: Includes Procfile for easy Heroku/Railway/Render deployment
  • Modern Python: Built with uv package manager, Flask, and async Playwright

📋 Quick Start

Prerequisites

  • Python 3.12+
  • uv package manager

Installation

  1. Clone the repository
git clone <your-repo-url>
cd GNews
  1. Install dependencies
$uv install
  1. Install Playwright browsers
$uv run playwright install
  1. Set up environment variables
# Create .env file
echo "API_KEY=your_secret_key_here" > .env

🏃‍♂️ Running Locally

# Start the Flask API
uv run python app.py
# Or specify a port
PORT=5001 uv run python app.py

The API will be available at http://localhost:5000

🔌 API Usage

Authentication

All requests require an API key passed as a query parameter.

Endpoint

GET /search/<domain>?api_key=YOUR_API_KEY

Examples

# Search for OpenAI news
curl "http://localhost:5000/search/openai.com?api_key=1212"
# Search for Microsoft news
curl "http://localhost:5000/search/microsoft.com?api_key=1212"
# Welcome message
curl "http://localhost:5000/"

Response Format

Success Response:

{
"success": true,
"search_timestamp": "2025-09-05T20:05:22.244388",
"total_results": 98,
"news": [
{
"title": "OpenAI announces new features",
"source": "TechCrunch",
"published_at": "2025-09-05T12:30:00",
"topic": "news",
"url": "https://news.google.com/read/..."
},
{
"title": "OpenAI raises $100M in Series C",
"source": "Bloomberg",
"published_at": "2025-09-04T15:30:00",
"topic": "funding",
"url": "https://news.google.com/read/..."
}
]
}

Error Response:

{
"success": false,
"error": "Invalid or missing API key"
}

📊 Search Topics

The API automatically searches across these topics:

  • News - General company news and announcements
  • Hiring - Job postings and hiring announcements
  • Funding - Investment rounds and financial news
  • Layoffs - Workforce reduction announcements
  • Events - Company events and conferences
  • Tech Stack - Technology and infrastructure updates
  • Leadership Changes - Executive appointments and departures

🚀 Deployment

Heroku

# Login and create app
heroku login
heroku create your-app-name
# Set environment variables
heroku config:set API_KEY=your_secret_key_here
# Deploy
git push heroku main

Railway

  1. Connect your GitHub repository to Railway
  2. Set API_KEY environment variable in Railway dashboard
  3. Deploy automatically on git push

Render

  1. Connect your GitHub repository to Render
  2. Set API_KEY environment variable in Render dashboard
  3. Deploy automatically on git push

🛠️ Development

Project Structure

.
├── app.py # Flask API server
├── news_scraper.py # Core scraping logic
├── Procfile # Deployment configuration
├── pyproject.toml # Dependencies (uv)
├── uv.lock # Lock file
├── runtime.txt # Python version
├── .env # Environment variables (local)
└── README.md # This file

Environment Variables

  • API_KEY - Secret key for API authentication (required)
  • PORT - Server port (default: 5000)
  • FLASK_DEBUG - Debug mode (default: False)

Local Development

# Install dependencies
uv install
# Install browsers
uv run playwright install
# Start development server
uv run python app.py

🔒 Security

  • API key authentication required for all scraping endpoints
  • Environment variables for sensitive data
  • CORS enabled for cross-origin requests
  • Input validation and error handling

📝 CLI Usage (Optional)

You can also use the scraper directly from command line:

# Basic usage
uv run python news_scraper.py openai.com
# Save to file
uv run python news_scraper.py openai.com --output results.json
# Run with visible browser
uv run python news_scraper.py openai.com --head

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

📄 License

This project is licensed under the MIT License.

⚠️ Disclaimer

This tool is for research and monitoring purposes only. Please respect robots.txt files and rate limits. The authors are not responsible for how this tool is used.