data:image/s3,"s3://crabby-images/a3ac9/a3ac9637d4881e7fb398f350bd45cbb60c2d5f19" alt="In Depth News Scraper avatar"
In Depth News Scraper
3 days trial then $5.00/month - No credit card required now
data:image/s3,"s3://crabby-images/a3ac9/a3ac9637d4881e7fb398f350bd45cbb60c2d5f19" alt="In Depth News Scraper"
In Depth News Scraper
3 days trial then $5.00/month - No credit card required now
Extract full length articles from top news sources, streamlining the collection of the latest updates on any subject. Its key feature is retrieving complete content—not just headlines. Customise your output from concise summaries to complete articles, transforming your news gathering process.
In-Depth News Scraper
The In-Depth News Scraper is an Apify actor designed to revolutionise how you gather and process news data. It stands apart from conventional scrapers by delivering complete article content rather than just headlines, enabling comprehensive analysis across diverse news categories.
Key Advantages
• Thorough content extraction, not just headlines • Support for major news categories and outlets • Flexible search and filtering capabilities • Structured, analysis-ready output
Features
• Category-Based Filtering: Focus your news gathering by targeting specific categories such as World, Business, or Technology. • Complete Article Extraction: Access full article content directly, surpassing the limitations of basic news aggregators. • Customisable Content Length: Control output size by specifying word count or retrieving complete articles. • Intelligent Filtering: Exclude irrelevant content using customisable keyword filters. • Time-Range Selection: Gather current news or research historical content with flexible time frame options. • Structured Data Output: Receive consistently formatted data including titles, URLs, dates, and sources. • Optional Image Support: Choose whether to include article images based on your requirements.
Input Parameters
The actor accepts the following configuration options:
Parameter | Type | Description |
---|---|---|
newsCategory | String | Required: Category filter (e.g., "World", "Technology") |
additionalKeywords | String | Optional: Refine search within selected category |
numberOfItems | Number | Number of articles to retrieve (default: 10, max: 100) |
filterBadKeywords | Array | Optional: Keywords to exclude from results |
contentLength | String | Content extraction mode: "Full" or "Summary" (default: Full) |
timeRange | String | Time period for article selection |
retrieveImage | Boolean | Include image URLs in output (default: false) |
Example configuration:
1{ 2 "newsCategory": "Technology", 3 "additionalKeywords": "artificial intelligence", 4 "numberOfItems": 20, 5 "filterBadKeywords": ["sponsored", "advertisement"], 6 "contentLength": "Full", 7 "timeRange": "Past week", 8 "retrieveImage": false 9}
Supported Categories
The actor provides coverage across these primary news categories:
- World
- Business
- Technology
- Entertainment
- Health
- Science
- Sports
- Politics
Output Structure
Each article in the dataset contains the following fields:
1{ 2 "title": "Article headline", 3 "link": "Article URL", 4 "pubDate": "2025-02-05T10:00:00.000Z", 5 "source": "Publishing outlet name", 6 "summary": "Brief article overview", 7 "content": "Full article text (length based on contentLength parameter)", 8 "imageUrl": "Main image URL (if retrieveImage is true)" 9}
Implementation Guide
- Choose your target news category
- Add any specific keywords to refine results
- Set additional parameters as needed
- Execute the actor
- Access your structured dataset
Performance Considerations
Performance varies based on several factors:
- Processing Duration: Typically 5-10 seconds per article for full extraction
- Volume Handling: Efficiently processes up to 100 articles per run
- Request Management: Sequential processing with appropriate intervals
For optimal results:
- Limit requests to 50 items for faster completion
- Use precise keywords to target relevant content
- Consider using word limits unless full text is required
- Disable image retrieval when not essential
Note: Network conditions and source website responsiveness may affect performance.
Error Handling and Troubleshooting
The actor implements comprehensive error handling:
- Connection Issues: Automatic retry (up to 3 attempts) for failed connections
- Rate Management: Dynamic delays between requests to prevent rate limiting
- Content Fallback: Defaults to article summary if full content extraction fails
- Input Validation: Clear error messages for invalid configurations
Troubleshooting Common Issues
- Timeout Errors: Consider reducing batch size or increasing time between requests
- Missing Content: Check if the source website requires authentication
- Rate Limiting: The actor will automatically pause and retry; no action needed
- Error Logs: Available in the actor's run details for debugging
For detailed error information, consult the actor's run log in the Apify Console.
Technical Support
For implementation assistance or to report issues:
- Check the actor's run log for specific error messages
- Review the troubleshooting section above
- Contact support with the actor run ID for detailed investigation
The actor continuously logs its progress and any errors encountered, facilitating quick problem resolution.
Actor Metrics
2 monthly users
-
2 bookmarks
>99% runs succeeded
Created in Feb 2025
Modified 16 days ago