Smart News Scraper
Pricing
$5.00 / 1,000 results
Smart News Scraper
Smart News Scraper (Apify Actor) – Scrape Google News by domains or keywords. Extract titles, summaries, URLs, dates & sources. Filter by time, remove duplicates, handle errors, and scale with Apify. Ideal for brand monitoring, competitor analysis, research, & trend tracking.
5.0 (1)
Pricing
$5.00 / 1,000 results
0
5
5
Last modified
21 days ago
📰 GNews API - Domain News Scraper
A powerful Flask API that scrapes news articles about any company from Google News across multiple topics. Perfect for monitoring company mentions, competitive intelligence, and market research.
🚀 Features
- Multi-topic Search: Automatically searches for news, hiring, funding, layoffs, events, tech stack updates, and leadership changes
- Google News Integration: Scrapes from Google News for comprehensive coverage
- API Key Authentication: Secure access with environment-based API keys
- Clean JSON Response: Structured data with titles, sources, timestamps, and URLs
- Deployment Ready: Includes Procfile for easy Heroku/Railway/Render deployment
- Modern Python: Built with uv package manager, Flask, and async Playwright
📋 Quick Start
Prerequisites
- Python 3.12+
- uv package manager
Installation
- Clone the repository
git clone <your-repo-url>cd GNews
- Install dependencies
$uv install
- Install Playwright browsers
$uv run playwright install
- Set up environment variables
# Create .env fileecho "API_KEY=your_secret_key_here" > .env
🏃♂️ Running Locally
# Start the Flask APIuv run python app.py# Or specify a portPORT=5001 uv run python app.py
The API will be available at http://localhost:5000
🔌 API Usage
Authentication
All requests require an API key passed as a query parameter.
Endpoint
GET /search/<domain>?api_key=YOUR_API_KEY
Examples
# Search for OpenAI newscurl "http://localhost:5000/search/openai.com?api_key=1212"# Search for Microsoft newscurl "http://localhost:5000/search/microsoft.com?api_key=1212"# Welcome messagecurl "http://localhost:5000/"
Response Format
Success Response:
{"success": true,"search_timestamp": "2025-09-05T20:05:22.244388","total_results": 98,"news": [{"title": "OpenAI announces new features","source": "TechCrunch","published_at": "2025-09-05T12:30:00","topic": "news","url": "https://news.google.com/read/..."},{"title": "OpenAI raises $100M in Series C","source": "Bloomberg","published_at": "2025-09-04T15:30:00","topic": "funding","url": "https://news.google.com/read/..."}]}
Error Response:
{"success": false,"error": "Invalid or missing API key"}
📊 Search Topics
The API automatically searches across these topics:
- News - General company news and announcements
- Hiring - Job postings and hiring announcements
- Funding - Investment rounds and financial news
- Layoffs - Workforce reduction announcements
- Events - Company events and conferences
- Tech Stack - Technology and infrastructure updates
- Leadership Changes - Executive appointments and departures
🚀 Deployment
Heroku
# Login and create appheroku loginheroku create your-app-name# Set environment variablesheroku config:set API_KEY=your_secret_key_here# Deploygit push heroku main
Railway
- Connect your GitHub repository to Railway
- Set
API_KEY
environment variable in Railway dashboard - Deploy automatically on git push
Render
- Connect your GitHub repository to Render
- Set
API_KEY
environment variable in Render dashboard - Deploy automatically on git push
🛠️ Development
Project Structure
.├── app.py # Flask API server├── news_scraper.py # Core scraping logic├── Procfile # Deployment configuration├── pyproject.toml # Dependencies (uv)├── uv.lock # Lock file├── runtime.txt # Python version├── .env # Environment variables (local)└── README.md # This file
Environment Variables
API_KEY
- Secret key for API authentication (required)PORT
- Server port (default: 5000)FLASK_DEBUG
- Debug mode (default: False)
Local Development
# Install dependenciesuv install# Install browsersuv run playwright install# Start development serveruv run python app.py
🔒 Security
- API key authentication required for all scraping endpoints
- Environment variables for sensitive data
- CORS enabled for cross-origin requests
- Input validation and error handling
📝 CLI Usage (Optional)
You can also use the scraper directly from command line:
# Basic usageuv run python news_scraper.py openai.com# Save to fileuv run python news_scraper.py openai.com --output results.json# Run with visible browseruv run python news_scraper.py openai.com --head
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
📄 License
This project is licensed under the MIT License.
⚠️ Disclaimer
This tool is for research and monitoring purposes only. Please respect robots.txt files and rate limits. The authors are not responsible for how this tool is used.