NewsBot3000 Scraper & Aggregator
Pricing
$0.01 / 1,000 results
NewsBot3000 Scraper & Aggregator
Access real-time news from NPR, AP, CSM, and CNN. Articles are AI-summarized with importance scores, keywords, and categories. Supports filtering by date, source, and category, plus semantic search to find similar stories. Ideal for news monitoring and media research.
Pricing
$0.01 / 1,000 results
Rating
0.0
(0)
Developer
Leo
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
4 days ago
Last modified
Categories
Share
NewsBot3000 Data Access Actor
This Apify actor provides comprehensive access to news articles, topics, keywords, and daily summaries from the NewsBot3000 MongoDB database.
Features
- Get Stories: Fetch news articles with filtering by date, source, category, keyword, and importance
- Get Topics: Retrieve topic clusters with their associated stories
- Get Keywords: Access extracted keywords with analysis and Wikipedia data
- Get Daily Summaries: Retrieve AI-generated daily news summaries
- Search Stories: Full-text search across headlines and summaries
- Similar Stories: Find semantically similar articles using vector search
- Keyword-based Retrieval: Get stories associated with specific keywords
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
action | string | get_stories | The action to perform (see Actions below) |
startDate | string | null | Start date filter (ISO format: YYYY-MM-DD) |
endDate | string | null | End date filter (ISO format: YYYY-MM-DD) |
source | string | null | Filter by news source (NPR, Associated Press, CSM, CNN) |
category | string | null | Filter by category (e.g., "World", "Politics") |
keyword | string | null | Search/filter keyword |
minImportance | integer | null | Minimum importance score (1-10) |
maxImportance | integer | null | Maximum importance score (1-10) |
limit | integer | 100 | Maximum results to return (1-1000) |
skip | integer | 0 | Number of results to skip (pagination) |
sortBy | string | updated | Sort field (updated, importance, headline) |
sortOrder | string | desc | Sort order (asc, desc) |
storyId | string | null | Story ID for single-story operations |
topicId | string | null | Topic ID for single-topic operations |
includeEmbeddings | boolean | false | Include vector embeddings in response |
similarityThreshold | float | 0.8 | Min similarity for similar stories (0.0-1.0) |
mongoUri | string | null | MongoDB connection URI (or use MONGO_URI env var) |
Available Actions
get_stories
Fetch news articles with various filters.
{"action": "get_stories","startDate": "2024-01-01","endDate": "2024-01-31","source": "NPR","category": "Politics","limit": 50}
get_story_by_id
Fetch a single story by its MongoDB ObjectId.
{"action": "get_story_by_id","storyId": "65abc123def456789012345"}
get_topics
Fetch topic clusters with their associated stories.
{"action": "get_topics","startDate": "2024-01-01","limit": 20}
get_topic_by_id
Fetch a single topic with all its stories.
{"action": "get_topic_by_id","topicId": "65abc123def456789012345"}
get_keywords
Fetch extracted keywords with analysis data.
{"action": "get_keywords","keyword": "Ukraine","limit": 100}
get_daily_summaries
Fetch AI-generated daily news summaries.
{"action": "get_daily_summaries","startDate": "2024-01-01","endDate": "2024-01-31"}
search_stories
Full-text search across headlines and summaries.
{"action": "search_stories","keyword": "climate change","limit": 50}
get_stories_by_keyword
Get stories associated with a specific keyword (uses vector search if available).
{"action": "get_stories_by_keyword","keyword": "artificial intelligence","startDate": "2024-01-01","endDate": "2024-01-31","limit": 30}
get_similar_stories
Find semantically similar stories using vector embeddings.
{"action": "get_similar_stories","storyId": "65abc123def456789012345","similarityThreshold": 0.85,"limit": 10}
Output Structure
Story Object
{"id": "65abc123def456789012345","headline": "Article Headline","link": "https://example.com/article","source": "NPR","updated": "2024-01-15T10:30:00","summary": {"title": "Summary Title","summary": "AI-generated summary text...","time": "2024-01-15T10:00:00","importance": 8,"keywords": ["keyword1", "keyword2"],"category": "World/Europe","categories": ["World", "Europe"],"language": "en"},"topic_id": "65def456abc789012345678"}
Topic Object
{"id": "65def456abc789012345678","updated": "2024-01-15T12:00:00","source": "multiple","short_name": "Ukraine Peace Talks","summary": {"title": "Topic Title","summary": "Topic summary...","importance": 9,"keywords": ["Ukraine", "peace", "negotiations"],"category": "World"},"story_ids": ["65abc...", "65bcd..."],"stories": [/* full story objects */]}
Keyword Object
{"id": "65ghi789jkl012345678901","keyword": "Ukraine","analyzed": "2024-01-15T08:00:00","analysis": {"proper_noun": true,"obscure": false,"is_person": false,"is_place": true,"is_thing": false,"is_abstract": false,"is_organization": false},"wikipedia": {"summary": "Ukraine is a country in Eastern Europe...","url": "https://en.wikipedia.org/wiki/Ukraine","image_url": "https://upload.wikimedia.org/..."}}
Daily Summary Object
{"id": "65jkl012mno345678901234","date": "2024-01-15T00:00:00","title": "Daily News Summary","overall_summary": "<p>Today's top stories include...</p>","top_keywords": ["Ukraine", "Economy", "Climate"],"key_story_titles": ["Story 1", "Story 2"],"sentiment": "Mixed"}
Environment Variables
| Variable | Description |
|---|---|
MONGO_URI | MongoDB connection string (alternative to input parameter) |
Local Development
# Install dependenciespip install -r requirements.txt# Run locally with Apify CLIapify run# Or run directlypython main.py
Deployment
# Login to Apifyapify login# Push to Apify platformapify push
Data Sources
This actor accesses data from the NewsBot3000 platform, which aggregates news from:
- NPR (text.npr.org)
- Associated Press (apnews.com)
- Christian Science Monitor (csmonitor.com)
- CNN Lite (lite.cnn.com)
License
MIT License