![News Website Crawler & Article Extractor avatar](https://images.apifyusercontent.com/ng02i3SZuxE8yQlU1KfbL62re9NAK5ZAwwQMVBICfW0/rs:fill:250:250/cb:1/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMudXMtZWFzdC0xLmFtYXpvbmF3cy5jb20vU3NzZTVHWnlkalFqeHB1a3ctYWN0b3ItY0ZaTkZDMmRCQnR1eXNsckUtZm92ZjUwQktjcS1uZXdzLXN2Z3JlcG8tY29tLnBuZw.webp)
News Website Crawler & Article Extractor
2 hours trial then $20.00/month - No credit card required now
This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?
See alternative Actors![News Website Crawler & Article Extractor](https://images.apifyusercontent.com/ng02i3SZuxE8yQlU1KfbL62re9NAK5ZAwwQMVBICfW0/rs:fill:250:250/cb:1/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMudXMtZWFzdC0xLmFtYXpvbmF3cy5jb20vU3NzZTVHWnlkalFqeHB1a3ctYWN0b3ItY0ZaTkZDMmRCQnR1eXNsckUtZm92ZjUwQktjcS1uZXdzLXN2Z3JlcG8tY29tLnBuZw.webp)
News Website Crawler & Article Extractor
2 hours trial then $20.00/month - No credit card required now
Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.
News Source Crawler 📰🚀 (Apify Actor)
Crawl an entire news website and extract clean, structured data from all its articles. Get article text, metadata, keywords, summaries, and more – perfect for content analysis, market research, news aggregation, and SEO monitoring. No coding required!
Pricing 💰
- $35/month for unlimited usage
- Includes all features and Apify platform benefits
- No additional costs or hidden fees
Features ✨
- Full Website Crawl: 🌐 Scrapes articles from a specified news source URL
- Comprehensive Article Extraction: 📰 Get full article text, publication date, author(s), and source URL
- SEO & Content Analysis: 🔍 Extract keywords, meta descriptions, and automatically generated summaries
- Multimedia Extraction: 🖼️ Get links to the main image, all images, and embedded videos
- Language Support: 🌐 Specify the article language
- Limit Articles: 🔢 Set a maximum number of articles to scrape (optional)
- Proxy Support: ⚙️ Integrates with Apify Proxy for reliable scraping or use your custom proxy
- Analysis-Ready Data (JSON): 💾 Structured data output, perfect for analysis and integration
- Error Handling: ✅ Robust error handling
Why Use This News Source Crawler? 🤔
This Actor is designed to efficiently extract data from entire news websites. It crawls all linked articles from a starting URL, making it ideal for:
- Large-Scale Data Collection: Quickly gather data from an entire news source
- Comprehensive Analysis: Analyze the content, trends, and SEO strategies of a website
- Automated News Feeds: Build custom news feeds with structured data
- Time Savings: Automate the process of collecting articles from a specific source
Data Output 📦
The Actor pushes data to the dataset as it scrapes, providing results in real-time. Each item represents a single article (or an error) and contains the following fields:
articleURL
: The URL of the scraped articlesourceURL
: The base URL of the news sourcearticleLanguage
: The language of the article (e.g., "en", "es")articleTitle
: The title of the articlearticleAuthors
: A comma-separated list of the article's authorsarticlePublishDate
: The publication date (ISO 8601 format), if availablearticleText
: The full text content of the articlearticleTopImage
: The URL of the main imagearticleAllImages
: A comma-separated list of URLs for all imagesarticleVideos
: A comma-separated list of URLs for embedded videosarticleKeywords
: A comma-separated list of extracted keywordsarticleSummary
: A concise summary of the articlescrapedAt
: The timestamp of when the article was scraped (ISO 8601)scrapeSuccess
:true
if scraped successfully,false
otherwisearticleMetaDescription
: The meta description of the articlearticleMetaKeywords
: A comma-separated list of the meta keywordsscrapeErrorMessage
: An error message ifscrapeSuccess
isfalse
Example Output
1[ 2 { 3 "articleURL": "https://www.example.com/news/article1", 4 "sourceURL": "https://www.example.com", 5 "articleLanguage": "en", 6 "articleTitle": "Example News Article", 7 "articleAuthors": "John Doe, Jane Smith", 8 "articlePublishDate": "2024-07-27T10:00:00Z", 9 "articleText": "This is the full text of the example article...", 10 "articleTopImage": "https://www.example.com/images/article1.jpg", 11 "articleAllImages": "https://www.example.com/images/article1.jpg,https://www.example.com/images/article2.png", 12 "articleVideos": "", 13 "articleKeywords": "news, example, article", 14 "articleSummary": "A brief summary of the example article.", 15 "scrapedAt": "2024-07-27T12:34:56Z", 16 "scrapeSuccess": true, 17 "articleMetaDescription": "Meta description of the example news article.", 18 "articleMetaKeywords": "example, article, news" 19 } 20]
Use Cases 💡
Content Marketing & SEO 📢
- Competitor Analysis: Track all content published by competitors
- Content Audits: Analyze an entire website's content strategy
- Keyword Research: Identify trending topics across a whole site
- Backlink Monitoring: Find sites linking to a news source
- Brand Monitoring: Monitor your brand
Market Research & Business Intelligence 📊
- News Aggregation: Build comprehensive news feeds from specific sources
- Trend Analysis: Identify emerging trends within a news domain
- Sentiment Analysis: Analyze the tone and sentiment of articles from a source
Academic Research 🎓
- Data Collection: Gather large datasets of articles for research
- Text Analysis: Analyze the content of entire news websites
- Gather Specific Information: Gather articles of a specific niche
Other Applications 🌐
- Machine Learning: Train models with large sets of scraped articles
- Content Curation: Easily find and collect relevant articles
Getting Started 🚀
-
Find the "News Source Crawler" in the Apify Store
-
Configure the input:
url
: (Required) The URL of the news website to crawllanguage
: (Optional) The expected language (default: "en")maxArticles
: (Optional) The maximum number of articles to scrapeproxyConfiguration
: (Optional) Select an Apify Proxy configuration or provide custom proxies
-
Run the Actor
-
Access results in JSON, CSV, Excel, or other formats, directly from the dataset as the Actor runs
-
Optional: Schedule the Actor, set up webhooks, or integrate with other Actors
Key Benefits 🏆
Data Quality
- ✅ Reliable & Accurate: Provides high-quality extracted data
- ✅ Clean Data: Extracts only the relevant information
- ✅ Structured Format: Easy to use and integrate
Platform Advantages (Apify)
- ✅ Scalable & Serverless: Handles large crawls without infrastructure management
- ✅ Cost-Effective: Pay only for what you use
- ✅ Full Apify Integration: Connects seamlessly with other Apify tools
- ✅ User-Friendly: No coding required – simple input form
- ✅ Real-time Results: Data is pushed to the dataset as it's scraped
- ✅ Automated Updates: The Actor is maintained and updated
- ✅ Isolated Runs: Each run is in a fresh, isolated container
Start crawling news sources today! ➡️
Actor Metrics
2 monthly users
-
0 No stars yet
>99% runs succeeded
Created in Feb 2025
Modified 3 days ago