PH News API - Multi-Source RSS Aggregator avatar

PH News API - Multi-Source RSS Aggregator

Pricing

from $5.00 / 1,000 article scrapeds

Go to Apify Store
PH News API - Multi-Source RSS Aggregator

PH News API - Multi-Source RSS Aggregator

Aggregate Philippine news from PhilStar, BusinessWorld, and Rappler via RSS. Full article text, excerpts, categories, author info, and metadata. Supports keyword filtering and per-source limits.

Pricing

from $5.00 / 1,000 article scrapeds

Rating

0.0

(0)

Developer

Joey Del Rosario

Joey Del Rosario

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Categories

Share

PH News API — Full-Text Article Scraper

Get complete Philippine news articles, not just headlines. This scraper fetches RSS feeds from 5+ Philippine publications, then extracts the full body text from every article page — ads, navigation, and sidebars removed.

Output is clean JSON with title, URL, full body text, publication date, author, categories, and more. Ready for AI pipelines, NLP, databases, or dashboards.

What you get

  • Full article body text — Not just RSS summaries. Each article URL is visited and the main content extracted via trafilatura.
  • 5 publications (honest count): PhilStar (5 sections), Rappler, SunStar, Daily Tribune. BusinessWorld kept for auto-recovery. Plus 5 additional sources attempted.
  • 100+ articles per run across all sources
  • Keyword filtering — Only return articles matching your topic
  • ISO 8601 dates — UTC-normalized, query-ready

Output sample

{
"title": "Senate approves 2026 national budget on final reading",
"source": "PhilStar",
"section": "headlines",
"url": "https://www.philstar.com/headlines/2026/...",
"published_date": "2026-06-13T09:04:00+00:00",
"author": "Kristine Daguno-Bersamina",
"body": "MANILA, Philippines — The Senate on Friday approved on third and final reading the proposed P6.352-trillion national budget for 2026...",
"summary": "The Senate has approved the 2026 national budget on final reading...",
"categories": ["Senate", "national budget", "2026"],
"scraped_at": "2026-06-13T12:00:00+00:00"
}

How it works

  1. Fetch RSS/Atom feeds from each publication
  2. Normalize to common schema (title, date, author, categories, image)
  3. For every article URL, fetch the page and extract clean body text
  4. Sort by date (newest first) and push to dataset

Full-text extraction is powered by trafilatura, a Python library that extracts the main content from news articles while removing boilerplate.

Pricing

$5.00 per 1,000 articles ($0.005 each). A typical run of 100 articles costs $0.50. Full-text extraction is included at no extra charge.

Sources

PublicationStatusFeeds
PhilStarWorkingheadlines, nation, opinion, business, world
RapplerWorkingall articles
SunStarWorkingall articles
Daily TribuneWorkingall articles
BusinessWorldBlocked (403)kept for auto-recovery
Manila BulletinAttemptedRSS feed may be blocked
Manila TimesAttemptedRSS feed may be blocked
InquirerAttemptedRSS feed may be blocked
ABS-CBN NewsAttemptedRSS feed may be blocked
GMA NewsAttemptedRSS feed may be blocked

Input parameters

ParameterTypeDefaultDescription
sourcesstringallComma-separated: all, philstar, rappler, sunstar, tribune, businessworld, manila-bulletin, manila-times, inquirer, abs-cbn, gma-news
keywordstringemptyFilter articles by keyword in title/summary
extractFullTextbooleantrueEnable/disable full-text body extraction
maxItemsinteger100Maximum articles total (1-1000)
maxPerSourceinteger20Maximum articles per individual RSS feed (1-200)