PH News API - Multi-Source RSS Aggregator
Pricing
from $5.00 / 1,000 article scrapeds
PH News API - Multi-Source RSS Aggregator
Aggregate Philippine news from PhilStar, BusinessWorld, and Rappler via RSS. Full article text, excerpts, categories, author info, and metadata. Supports keyword filtering and per-source limits.
Pricing
from $5.00 / 1,000 article scrapeds
Rating
0.0
(0)
Developer
Joey Del Rosario
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
PH News API — Full-Text Article Scraper
Get complete Philippine news articles, not just headlines. This scraper fetches RSS feeds from 5+ Philippine publications, then extracts the full body text from every article page — ads, navigation, and sidebars removed.
Output is clean JSON with title, URL, full body text, publication date, author, categories, and more. Ready for AI pipelines, NLP, databases, or dashboards.
What you get
- Full article body text — Not just RSS summaries. Each article URL is visited and the main content extracted via trafilatura.
- 5 publications (honest count): PhilStar (5 sections), Rappler, SunStar, Daily Tribune. BusinessWorld kept for auto-recovery. Plus 5 additional sources attempted.
- 100+ articles per run across all sources
- Keyword filtering — Only return articles matching your topic
- ISO 8601 dates — UTC-normalized, query-ready
Output sample
{"title": "Senate approves 2026 national budget on final reading","source": "PhilStar","section": "headlines","url": "https://www.philstar.com/headlines/2026/...","published_date": "2026-06-13T09:04:00+00:00","author": "Kristine Daguno-Bersamina","body": "MANILA, Philippines — The Senate on Friday approved on third and final reading the proposed P6.352-trillion national budget for 2026...","summary": "The Senate has approved the 2026 national budget on final reading...","categories": ["Senate", "national budget", "2026"],"scraped_at": "2026-06-13T12:00:00+00:00"}
How it works
- Fetch RSS/Atom feeds from each publication
- Normalize to common schema (title, date, author, categories, image)
- For every article URL, fetch the page and extract clean body text
- Sort by date (newest first) and push to dataset
Full-text extraction is powered by trafilatura, a Python library that extracts the main content from news articles while removing boilerplate.
Pricing
$5.00 per 1,000 articles ($0.005 each). A typical run of 100 articles costs $0.50. Full-text extraction is included at no extra charge.
Sources
| Publication | Status | Feeds |
|---|---|---|
| PhilStar | Working | headlines, nation, opinion, business, world |
| Rappler | Working | all articles |
| SunStar | Working | all articles |
| Daily Tribune | Working | all articles |
| BusinessWorld | Blocked (403) | kept for auto-recovery |
| Manila Bulletin | Attempted | RSS feed may be blocked |
| Manila Times | Attempted | RSS feed may be blocked |
| Inquirer | Attempted | RSS feed may be blocked |
| ABS-CBN News | Attempted | RSS feed may be blocked |
| GMA News | Attempted | RSS feed may be blocked |
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| sources | string | all | Comma-separated: all, philstar, rappler, sunstar, tribune, businessworld, manila-bulletin, manila-times, inquirer, abs-cbn, gma-news |
| keyword | string | empty | Filter articles by keyword in title/summary |
| extractFullText | boolean | true | Enable/disable full-text body extraction |
| maxItems | integer | 100 | Maximum articles total (1-1000) |
| maxPerSource | integer | 20 | Maximum articles per individual RSS feed (1-200) |