
News Website Crawler & Article Extractor
2 hours trial then $20.00/month - No credit card required now

News Website Crawler & Article Extractor
2 hours trial then $20.00/month - No credit card required now
Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.
Actor Metrics
20 monthly users
No reviews yet
2 bookmarks
>99% runs succeeded
13 days response time
Created in Feb 2025
Modified 8 days ago
News Source Crawler 📰🚀 (Apify Actor)
Crawl an entire news website and extract clean, structured data from all its articles. Get article text, metadata, keywords, summaries, and more – perfect for content analysis, market research, news aggregation, and SEO monitoring. No coding required!
Pricing 💰
- $20/month for unlimited usage
- Includes all features and Apify platform benefits
- No additional costs or hidden fees
Features ✨
- Full Website Crawl: 🌐 Scrapes articles from a specified news source URL
- Comprehensive Article Extraction: 📰 Get full article text, publication date, author(s), and source URL
- SEO & Content Analysis: 🔍 Extract keywords, meta descriptions, and automatically generated summaries
- Multimedia Extraction: 🖼️ Get links to the main image, all images, and embedded videos
- Language Support: 🌐 Specify the article language
- Limit Articles: 🔢 Set a maximum number of articles to scrape (optional)
- Proxy Support: ⚙️ Integrates with Apify Proxy for reliable scraping or use your custom proxy
- Analysis-Ready Data (JSON): 💾 Structured data output, perfect for analysis and integration
- Error Handling: ✅ Robust error handling
Why Use This News Source Crawler? 🤔
This Actor is designed to efficiently extract data from entire news websites. It crawls all linked articles from a starting URL, making it ideal for:
- Large-Scale Data Collection: Quickly gather data from an entire news source
- Comprehensive Analysis: Analyze the content, trends, and SEO strategies of a website
- Automated News Feeds: Build custom news feeds with structured data
- Time Savings: Automate the process of collecting articles from a specific source
Data Output 📦
The Actor pushes data to the dataset as it scrapes, providing results in real-time. Each item represents a single article (or an error) and contains the following fields:
articleURL
: The URL of the scraped articlesourceURL
: The base URL of the news sourcearticleLanguage
: The language of the article (e.g., "en", "es")articleTitle
: The title of the articlearticleAuthors
: A comma-separated list of the article's authorsarticlePublishDate
: The publication date (ISO 8601 format), if availablearticleText
: The full text content of the articlearticleTopImage
: The URL of the main imagearticleAllImages
: A comma-separated list of URLs for all imagesarticleVideos
: A comma-separated list of URLs for embedded videosarticleKeywords
: A comma-separated list of extracted keywordsarticleSummary
: A concise summary of the articlescrapedAt
: The timestamp of when the article was scraped (ISO 8601)scrapeSuccess
:true
if scraped successfully,false
otherwisearticleMetaDescription
: The meta description of the articlearticleMetaKeywords
: A comma-separated list of the meta keywordsscrapeErrorMessage
: An error message ifscrapeSuccess
isfalse
Example Output
1[ 2 { 3 "articleURL": "https://www.example.com/news/article1", 4 "sourceURL": "https://www.example.com", 5 "articleLanguage": "en", 6 "articleTitle": "Example News Article", 7 "articleAuthors": "John Doe, Jane Smith", 8 "articlePublishDate": "2024-07-27T10:00:00Z", 9 "articleText": "This is the full text of the example article...", 10 "articleTopImage": "https://www.example.com/images/article1.jpg", 11 "articleAllImages": "https://www.example.com/images/article1.jpg,https://www.example.com/images/article2.png", 12 "articleVideos": "", 13 "articleKeywords": "news, example, article", 14 "articleSummary": "A brief summary of the example article.", 15 "scrapedAt": "2024-07-27T12:34:56Z", 16 "scrapeSuccess": true, 17 "articleMetaDescription": "Meta description of the example news article.", 18 "articleMetaKeywords": "example, article, news" 19 } 20]
Use Cases 💡
Content Marketing & SEO 📢
- Competitor Analysis: Track all content published by competitors
- Content Audits: Analyze an entire website's content strategy
- Keyword Research: Identify trending topics across a whole site
- Backlink Monitoring: Find sites linking to a news source
- Brand Monitoring: Monitor your brand
Market Research & Business Intelligence 📊
- News Aggregation: Build comprehensive news feeds from specific sources
- Trend Analysis: Identify emerging trends within a news domain
- Sentiment Analysis: Analyze the tone and sentiment of articles from a source
Academic Research 🎓
- Data Collection: Gather large datasets of articles for research
- Text Analysis: Analyze the content of entire news websites
- Gather Specific Information: Gather articles of a specific niche
Other Applications 🌐
- Machine Learning: Train models with large sets of scraped articles
- Content Curation: Easily find and collect relevant articles
Getting Started 🚀
-
Find the "News Source Crawler" in the Apify Store
-
Configure the input:
url
: (Required) The URL of the news website to crawllanguage
: (Optional) The expected language (default: "en")maxArticles
: (Optional) The maximum number of articles to scrapeproxyConfiguration
: (Optional) Select an Apify Proxy configuration or provide custom proxies
-
Run the Actor
-
Access results in JSON, CSV, Excel, or other formats, directly from the dataset as the Actor runs
-
Optional: Schedule the Actor, set up webhooks, or integrate with other Actors
Key Benefits 🏆
Data Quality
- ✅ Reliable & Accurate: Provides high-quality extracted data
- ✅ Clean Data: Extracts only the relevant information
- ✅ Structured Format: Easy to use and integrate
Platform Advantages (Apify)
- ✅ Scalable & Serverless: Handles large crawls without infrastructure management
- ✅ Cost-Effective: Pay only for what you use
- ✅ Full Apify Integration: Connects seamlessly with other Apify tools
- ✅ User-Friendly: No coding required – simple input form
- ✅ Real-time Results: Data is pushed to the dataset as it's scraped
- ✅ Automated Updates: The Actor is maintained and updated
- ✅ Isolated Runs: Each run is in a fresh, isolated container
Start crawling news sources today! ➡️