NewsBot3000 Scraper & Aggregator avatar
NewsBot3000 Scraper & Aggregator

Pricing

$0.01 / 1,000 results

Go to Apify Store
NewsBot3000 Scraper & Aggregator

NewsBot3000 Scraper & Aggregator

Access real-time news from NPR, AP, CSM, and CNN. Articles are AI-summarized with importance scores, keywords, and categories. Supports filtering by date, source, and category, plus semantic search to find similar stories. Ideal for news monitoring and media research.

Pricing

$0.01 / 1,000 results

Rating

0.0

(0)

Developer

Leo

Leo

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

22 days ago

Last modified

Share

NewsBot3000 Data Access Actor

This Apify actor provides comprehensive access to news articles, topics, keywords, and daily summaries from the NewsBot3000 MongoDB database.

Features

  • Get Stories: Fetch news articles with filtering by date, source, category, keyword, and importance
  • Get Topics: Retrieve topic clusters with their associated stories
  • Get Keywords: Access extracted keywords with analysis and Wikipedia data
  • Get Daily Summaries: Retrieve AI-generated daily news summaries
  • Search Stories: Full-text search across headlines and summaries
  • Similar Stories: Find semantically similar articles using vector search
  • Keyword-based Retrieval: Get stories associated with specific keywords

Input Parameters

ParameterTypeDefaultDescription
actionstringget_storiesThe action to perform (see Actions below)
startDatestringnullStart date filter (ISO format: YYYY-MM-DD)
endDatestringnullEnd date filter (ISO format: YYYY-MM-DD)
sourcestringnullFilter by news source (NPR, Associated Press, CSM, CNN)
categorystringnullFilter by category (e.g., "World", "Politics")
keywordstringnullSearch/filter keyword
minImportanceintegernullMinimum importance score (1-10)
maxImportanceintegernullMaximum importance score (1-10)
limitinteger100Maximum results to return (1-1000)
skipinteger0Number of results to skip (pagination)
sortBystringupdatedSort field (updated, importance, headline)
sortOrderstringdescSort order (asc, desc)
storyIdstringnullStory ID for single-story operations
topicIdstringnullTopic ID for single-topic operations
includeEmbeddingsbooleanfalseInclude vector embeddings in response
similarityThresholdfloat0.8Min similarity for similar stories (0.0-1.0)
mongoUristringnullMongoDB connection URI (or use MONGO_URI env var)

Available Actions

get_stories

Fetch news articles with various filters.

{
"action": "get_stories",
"startDate": "2024-01-01",
"endDate": "2024-01-31",
"source": "NPR",
"category": "Politics",
"limit": 50
}

get_story_by_id

Fetch a single story by its MongoDB ObjectId.

{
"action": "get_story_by_id",
"storyId": "65abc123def456789012345"
}

get_topics

Fetch topic clusters with their associated stories.

{
"action": "get_topics",
"startDate": "2024-01-01",
"limit": 20
}

get_topic_by_id

Fetch a single topic with all its stories.

{
"action": "get_topic_by_id",
"topicId": "65abc123def456789012345"
}

get_keywords

Fetch extracted keywords with analysis data.

{
"action": "get_keywords",
"keyword": "Ukraine",
"limit": 100
}

get_daily_summaries

Fetch AI-generated daily news summaries.

{
"action": "get_daily_summaries",
"startDate": "2024-01-01",
"endDate": "2024-01-31"
}

search_stories

Full-text search across headlines and summaries.

{
"action": "search_stories",
"keyword": "climate change",
"limit": 50
}

get_stories_by_keyword

Get stories associated with a specific keyword (uses vector search if available).

{
"action": "get_stories_by_keyword",
"keyword": "artificial intelligence",
"startDate": "2024-01-01",
"endDate": "2024-01-31",
"limit": 30
}

get_similar_stories

Find semantically similar stories using vector embeddings.

{
"action": "get_similar_stories",
"storyId": "65abc123def456789012345",
"similarityThreshold": 0.85,
"limit": 10
}

Output Structure

Story Object

{
"id": "65abc123def456789012345",
"headline": "Article Headline",
"link": "https://example.com/article",
"source": "NPR",
"updated": "2024-01-15T10:30:00",
"summary": {
"title": "Summary Title",
"summary": "AI-generated summary text...",
"time": "2024-01-15T10:00:00",
"importance": 8,
"keywords": ["keyword1", "keyword2"],
"category": "World/Europe",
"categories": ["World", "Europe"],
"language": "en"
},
"topic_id": "65def456abc789012345678"
}

Topic Object

{
"id": "65def456abc789012345678",
"updated": "2024-01-15T12:00:00",
"source": "multiple",
"short_name": "Ukraine Peace Talks",
"summary": {
"title": "Topic Title",
"summary": "Topic summary...",
"importance": 9,
"keywords": ["Ukraine", "peace", "negotiations"],
"category": "World"
},
"story_ids": ["65abc...", "65bcd..."],
"stories": [/* full story objects */]
}

Keyword Object

{
"id": "65ghi789jkl012345678901",
"keyword": "Ukraine",
"analyzed": "2024-01-15T08:00:00",
"analysis": {
"proper_noun": true,
"obscure": false,
"is_person": false,
"is_place": true,
"is_thing": false,
"is_abstract": false,
"is_organization": false
},
"wikipedia": {
"summary": "Ukraine is a country in Eastern Europe...",
"url": "https://en.wikipedia.org/wiki/Ukraine",
"image_url": "https://upload.wikimedia.org/..."
}
}

Daily Summary Object

{
"id": "65jkl012mno345678901234",
"date": "2024-01-15T00:00:00",
"title": "Daily News Summary",
"overall_summary": "<p>Today's top stories include...</p>",
"top_keywords": ["Ukraine", "Economy", "Climate"],
"key_story_titles": ["Story 1", "Story 2"],
"sentiment": "Mixed"
}

Environment Variables

VariableDescription
MONGO_URIMongoDB connection string (alternative to input parameter)

Local Development

# Install dependencies
pip install -r requirements.txt
# Run locally with Apify CLI
apify run
# Or run directly
python main.py

Deployment

# Login to Apify
apify login
# Push to Apify platform
apify push

Data Sources

This actor accesses data from the NewsBot3000 platform, which aggregates news from:

  • NPR (text.npr.org)
  • Associated Press (apnews.com)
  • Christian Science Monitor (csmonitor.com)
  • CNN Lite (lite.cnn.com)

License

MIT License