NewsBot3000 Scraper & Aggregator
Pricing
$0.01 / 1,000 results
NewsBot3000 Scraper & Aggregator
Access real-time news from NPR, AP, CSM, and CNN. Articles are AI-summarized with importance scores, keywords, and categories. Supports filtering by date, source, and category, plus semantic search to find similar stories. Ideal for news monitoring and media research.
Pricing
$0.01 / 1,000 results
Rating
0.0
(0)
Developer
Leo
Actor stats
0
Bookmarked
3
Total users
2
Monthly active users
22 days ago
Last modified
Categories
Share
NewsBot3000 Data Access Actor
This Apify actor provides comprehensive access to news articles, topics, keywords, and daily summaries from the NewsBot3000 MongoDB database.
Features
- Get Stories: Fetch news articles with filtering by date, source, category, keyword, and importance
- Get Topics: Retrieve topic clusters with their associated stories
- Get Keywords: Access extracted keywords with analysis and Wikipedia data
- Get Daily Summaries: Retrieve AI-generated daily news summaries
- Search Stories: Full-text search across headlines and summaries
- Similar Stories: Find semantically similar articles using vector search
- Keyword-based Retrieval: Get stories associated with specific keywords
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
action | string | get_stories | The action to perform (see Actions below) |
startDate | string | null | Start date filter (ISO format: YYYY-MM-DD) |
endDate | string | null | End date filter (ISO format: YYYY-MM-DD) |
source | string | null | Filter by news source (NPR, Associated Press, CSM, CNN) |
category | string | null | Filter by category (e.g., "World", "Politics") |
keyword | string | null | Search/filter keyword |
minImportance | integer | null | Minimum importance score (1-10) |
maxImportance | integer | null | Maximum importance score (1-10) |
limit | integer | 100 | Maximum results to return (1-1000) |
skip | integer | 0 | Number of results to skip (pagination) |
sortBy | string | updated | Sort field (updated, importance, headline) |
sortOrder | string | desc | Sort order (asc, desc) |
storyId | string | null | Story ID for single-story operations |
topicId | string | null | Topic ID for single-topic operations |
includeEmbeddings | boolean | false | Include vector embeddings in response |
similarityThreshold | float | 0.8 | Min similarity for similar stories (0.0-1.0) |
mongoUri | string | null | MongoDB connection URI (or use MONGO_URI env var) |
Available Actions
get_stories
Fetch news articles with various filters.
{"action": "get_stories","startDate": "2024-01-01","endDate": "2024-01-31","source": "NPR","category": "Politics","limit": 50}
get_story_by_id
Fetch a single story by its MongoDB ObjectId.
{"action": "get_story_by_id","storyId": "65abc123def456789012345"}
get_topics
Fetch topic clusters with their associated stories.
{"action": "get_topics","startDate": "2024-01-01","limit": 20}
get_topic_by_id
Fetch a single topic with all its stories.
{"action": "get_topic_by_id","topicId": "65abc123def456789012345"}
get_keywords
Fetch extracted keywords with analysis data.
{"action": "get_keywords","keyword": "Ukraine","limit": 100}
get_daily_summaries
Fetch AI-generated daily news summaries.
{"action": "get_daily_summaries","startDate": "2024-01-01","endDate": "2024-01-31"}
search_stories
Full-text search across headlines and summaries.
{"action": "search_stories","keyword": "climate change","limit": 50}
get_stories_by_keyword
Get stories associated with a specific keyword (uses vector search if available).
{"action": "get_stories_by_keyword","keyword": "artificial intelligence","startDate": "2024-01-01","endDate": "2024-01-31","limit": 30}
get_similar_stories
Find semantically similar stories using vector embeddings.
{"action": "get_similar_stories","storyId": "65abc123def456789012345","similarityThreshold": 0.85,"limit": 10}
Output Structure
Story Object
{"id": "65abc123def456789012345","headline": "Article Headline","link": "https://example.com/article","source": "NPR","updated": "2024-01-15T10:30:00","summary": {"title": "Summary Title","summary": "AI-generated summary text...","time": "2024-01-15T10:00:00","importance": 8,"keywords": ["keyword1", "keyword2"],"category": "World/Europe","categories": ["World", "Europe"],"language": "en"},"topic_id": "65def456abc789012345678"}
Topic Object
{"id": "65def456abc789012345678","updated": "2024-01-15T12:00:00","source": "multiple","short_name": "Ukraine Peace Talks","summary": {"title": "Topic Title","summary": "Topic summary...","importance": 9,"keywords": ["Ukraine", "peace", "negotiations"],"category": "World"},"story_ids": ["65abc...", "65bcd..."],"stories": [/* full story objects */]}
Keyword Object
{"id": "65ghi789jkl012345678901","keyword": "Ukraine","analyzed": "2024-01-15T08:00:00","analysis": {"proper_noun": true,"obscure": false,"is_person": false,"is_place": true,"is_thing": false,"is_abstract": false,"is_organization": false},"wikipedia": {"summary": "Ukraine is a country in Eastern Europe...","url": "https://en.wikipedia.org/wiki/Ukraine","image_url": "https://upload.wikimedia.org/..."}}
Daily Summary Object
{"id": "65jkl012mno345678901234","date": "2024-01-15T00:00:00","title": "Daily News Summary","overall_summary": "<p>Today's top stories include...</p>","top_keywords": ["Ukraine", "Economy", "Climate"],"key_story_titles": ["Story 1", "Story 2"],"sentiment": "Mixed"}
Environment Variables
| Variable | Description |
|---|---|
MONGO_URI | MongoDB connection string (alternative to input parameter) |
Local Development
# Install dependenciespip install -r requirements.txt# Run locally with Apify CLIapify run# Or run directlypython main.py
Deployment
# Login to Apifyapify login# Push to Apify platformapify push
Data Sources
This actor accesses data from the NewsBot3000 platform, which aggregates news from:
- NPR (text.npr.org)
- Associated Press (apnews.com)
- Christian Science Monitor (csmonitor.com)
- CNN Lite (lite.cnn.com)
License
MIT License