NewsBot3000 Scraper & Aggregator avatar
NewsBot3000 Scraper & Aggregator

Pricing

$0.01 / 1,000 results

Go to Apify Store
NewsBot3000 Scraper & Aggregator

NewsBot3000 Scraper & Aggregator

Access real-time news from NPR, AP, CSM, and CNN. Articles are AI-summarized with importance scores, keywords, and categories. Supports filtering by date, source, and category, plus semantic search to find similar stories. Ideal for news monitoring and media research.

Pricing

$0.01 / 1,000 results

Rating

0.0

(0)

Developer

Leo

Leo

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

4 days ago

Last modified

Share

NewsBot3000 Data Access Actor

This Apify actor provides comprehensive access to news articles, topics, keywords, and daily summaries from the NewsBot3000 MongoDB database.

Features

  • Get Stories: Fetch news articles with filtering by date, source, category, keyword, and importance
  • Get Topics: Retrieve topic clusters with their associated stories
  • Get Keywords: Access extracted keywords with analysis and Wikipedia data
  • Get Daily Summaries: Retrieve AI-generated daily news summaries
  • Search Stories: Full-text search across headlines and summaries
  • Similar Stories: Find semantically similar articles using vector search
  • Keyword-based Retrieval: Get stories associated with specific keywords

Input Parameters

ParameterTypeDefaultDescription
actionstringget_storiesThe action to perform (see Actions below)
startDatestringnullStart date filter (ISO format: YYYY-MM-DD)
endDatestringnullEnd date filter (ISO format: YYYY-MM-DD)
sourcestringnullFilter by news source (NPR, Associated Press, CSM, CNN)
categorystringnullFilter by category (e.g., "World", "Politics")
keywordstringnullSearch/filter keyword
minImportanceintegernullMinimum importance score (1-10)
maxImportanceintegernullMaximum importance score (1-10)
limitinteger100Maximum results to return (1-1000)
skipinteger0Number of results to skip (pagination)
sortBystringupdatedSort field (updated, importance, headline)
sortOrderstringdescSort order (asc, desc)
storyIdstringnullStory ID for single-story operations
topicIdstringnullTopic ID for single-topic operations
includeEmbeddingsbooleanfalseInclude vector embeddings in response
similarityThresholdfloat0.8Min similarity for similar stories (0.0-1.0)
mongoUristringnullMongoDB connection URI (or use MONGO_URI env var)

Available Actions

get_stories

Fetch news articles with various filters.

{
"action": "get_stories",
"startDate": "2024-01-01",
"endDate": "2024-01-31",
"source": "NPR",
"category": "Politics",
"limit": 50
}

get_story_by_id

Fetch a single story by its MongoDB ObjectId.

{
"action": "get_story_by_id",
"storyId": "65abc123def456789012345"
}

get_topics

Fetch topic clusters with their associated stories.

{
"action": "get_topics",
"startDate": "2024-01-01",
"limit": 20
}

get_topic_by_id

Fetch a single topic with all its stories.

{
"action": "get_topic_by_id",
"topicId": "65abc123def456789012345"
}

get_keywords

Fetch extracted keywords with analysis data.

{
"action": "get_keywords",
"keyword": "Ukraine",
"limit": 100
}

get_daily_summaries

Fetch AI-generated daily news summaries.

{
"action": "get_daily_summaries",
"startDate": "2024-01-01",
"endDate": "2024-01-31"
}

search_stories

Full-text search across headlines and summaries.

{
"action": "search_stories",
"keyword": "climate change",
"limit": 50
}

get_stories_by_keyword

Get stories associated with a specific keyword (uses vector search if available).

{
"action": "get_stories_by_keyword",
"keyword": "artificial intelligence",
"startDate": "2024-01-01",
"endDate": "2024-01-31",
"limit": 30
}

get_similar_stories

Find semantically similar stories using vector embeddings.

{
"action": "get_similar_stories",
"storyId": "65abc123def456789012345",
"similarityThreshold": 0.85,
"limit": 10
}

Output Structure

Story Object

{
"id": "65abc123def456789012345",
"headline": "Article Headline",
"link": "https://example.com/article",
"source": "NPR",
"updated": "2024-01-15T10:30:00",
"summary": {
"title": "Summary Title",
"summary": "AI-generated summary text...",
"time": "2024-01-15T10:00:00",
"importance": 8,
"keywords": ["keyword1", "keyword2"],
"category": "World/Europe",
"categories": ["World", "Europe"],
"language": "en"
},
"topic_id": "65def456abc789012345678"
}

Topic Object

{
"id": "65def456abc789012345678",
"updated": "2024-01-15T12:00:00",
"source": "multiple",
"short_name": "Ukraine Peace Talks",
"summary": {
"title": "Topic Title",
"summary": "Topic summary...",
"importance": 9,
"keywords": ["Ukraine", "peace", "negotiations"],
"category": "World"
},
"story_ids": ["65abc...", "65bcd..."],
"stories": [/* full story objects */]
}

Keyword Object

{
"id": "65ghi789jkl012345678901",
"keyword": "Ukraine",
"analyzed": "2024-01-15T08:00:00",
"analysis": {
"proper_noun": true,
"obscure": false,
"is_person": false,
"is_place": true,
"is_thing": false,
"is_abstract": false,
"is_organization": false
},
"wikipedia": {
"summary": "Ukraine is a country in Eastern Europe...",
"url": "https://en.wikipedia.org/wiki/Ukraine",
"image_url": "https://upload.wikimedia.org/..."
}
}

Daily Summary Object

{
"id": "65jkl012mno345678901234",
"date": "2024-01-15T00:00:00",
"title": "Daily News Summary",
"overall_summary": "<p>Today's top stories include...</p>",
"top_keywords": ["Ukraine", "Economy", "Climate"],
"key_story_titles": ["Story 1", "Story 2"],
"sentiment": "Mixed"
}

Environment Variables

VariableDescription
MONGO_URIMongoDB connection string (alternative to input parameter)

Local Development

# Install dependencies
pip install -r requirements.txt
# Run locally with Apify CLI
apify run
# Or run directly
python main.py

Deployment

# Login to Apify
apify login
# Push to Apify platform
apify push

Data Sources

This actor accesses data from the NewsBot3000 platform, which aggregates news from:

  • NPR (text.npr.org)
  • Associated Press (apnews.com)
  • Christian Science Monitor (csmonitor.com)
  • CNN Lite (lite.cnn.com)

License

MIT License