# PH News API - Multi-Source RSS Aggregator (`nekohaii/philippine-news-scraper`) Actor

Aggregate Philippine news from PhilStar, BusinessWorld, and Rappler via RSS. Full article text, excerpts, categories, author info, and metadata. Supports keyword filtering and per-source limits.

- **URL**: https://apify.com/nekohaii/philippine-news-scraper.md
- **Developed by:** [Joey Del Rosario](https://apify.com/nekohaii) (community)
- **Categories:** News
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $5.00 / 1,000 article scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## PH News API — Full-Text Article Scraper

**Get complete Philippine news articles, not just headlines.** This scraper fetches RSS feeds from 5+ Philippine publications, then **extracts the full body text** from every article page — ads, navigation, and sidebars removed.

Output is clean JSON with title, URL, full body text, publication date, author, categories, and more. Ready for AI pipelines, NLP, databases, or dashboards.

### What you get

- **Full article body text** — Not just RSS summaries. Each article URL is visited and the main content extracted via trafilatura.
- **5 publications** (honest count): PhilStar (5 sections), Rappler, SunStar, Daily Tribune. BusinessWorld kept for auto-recovery. Plus 5 additional sources attempted.
- **100+ articles per run** across all sources
- **Keyword filtering** — Only return articles matching your topic
- **ISO 8601 dates** — UTC-normalized, query-ready

### Output sample

```json
{
  "title": "Senate approves 2026 national budget on final reading",
  "source": "PhilStar",
  "section": "headlines",
  "url": "https://www.philstar.com/headlines/2026/...",
  "published_date": "2026-06-13T09:04:00+00:00",
  "author": "Kristine Daguno-Bersamina",
  "body": "MANILA, Philippines — The Senate on Friday approved on third and final reading the proposed P6.352-trillion national budget for 2026...",
  "summary": "The Senate has approved the 2026 national budget on final reading...",
  "categories": ["Senate", "national budget", "2026"],
  "scraped_at": "2026-06-13T12:00:00+00:00"
}
````

### How it works

1. Fetch RSS/Atom feeds from each publication
2. Normalize to common schema (title, date, author, categories, image)
3. For every article URL, fetch the page and extract clean body text
4. Sort by date (newest first) and push to dataset

Full-text extraction is powered by [trafilatura](https://trafilatura.readthedocs.io/), a Python library that extracts the main content from news articles while removing boilerplate.

### Pricing

$5.00 per 1,000 articles ($0.005 each). A typical run of 100 articles costs $0.50. Full-text extraction is included at no extra charge.

### Sources

| Publication | Status | Feeds |
|-------------|--------|-------|
| PhilStar | Working | headlines, nation, opinion, business, world |
| Rappler | Working | all articles |
| SunStar | Working | all articles |
| Daily Tribune | Working | all articles |
| BusinessWorld | Blocked (403) | kept for auto-recovery |
| Manila Bulletin | Attempted | RSS feed may be blocked |
| Manila Times | Attempted | RSS feed may be blocked |
| Inquirer | Attempted | RSS feed may be blocked |
| ABS-CBN News | Attempted | RSS feed may be blocked |
| GMA News | Attempted | RSS feed may be blocked |

### Input parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| sources | string | `all` | Comma-separated: `all`, `philstar`, `rappler`, `sunstar`, `tribune`, `businessworld`, `manila-bulletin`, `manila-times`, `inquirer`, `abs-cbn`, `gma-news` |
| keyword | string | empty | Filter articles by keyword in title/summary |
| extractFullText | boolean | true | Enable/disable full-text body extraction |
| maxItems | integer | 100 | Maximum articles total (1-1000) |
| maxPerSource | integer | 20 | Maximum articles per individual RSS feed (1-200) |

# Actor input Schema

## `sources` (type: `string`):

Comma-separated list of sources. Options: all (confirmed working), all-including-blocked, philstar, rappler, sunstar, tribune, businessworld, manila-bulletin, manila-times, inquirer, abs-cbn, gma-news. Default: all

## `keyword` (type: `string`):

Only return articles containing this keyword in title or summary (leave empty for all)

## `extractFullText` (type: `boolean`):

If enabled, scrapes each article page to extract the full body text. Adds ~1-3s per article but gives you complete articles ready for AI processing.

## `maxItems` (type: `integer`):

Maximum number of news articles across all sources

## `maxPerSource` (type: `integer`):

Maximum articles per individual RSS feed

## Actor input object example

```json
{
  "sources": "all",
  "keyword": "",
  "extractFullText": true,
  "maxItems": 100,
  "maxPerSource": 20
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "sources": "all",
    "keyword": ""
};

// Run the Actor and wait for it to finish
const run = await client.actor("nekohaii/philippine-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "sources": "all",
    "keyword": "",
}

# Run the Actor and wait for it to finish
run = client.actor("nekohaii/philippine-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "sources": "all",
  "keyword": ""
}' |
apify call nekohaii/philippine-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=nekohaii/philippine-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "PH News API - Multi-Source RSS Aggregator",
        "description": "Aggregate Philippine news from PhilStar, BusinessWorld, and Rappler via RSS. Full article text, excerpts, categories, author info, and metadata. Supports keyword filtering and per-source limits.",
        "version": "3.0",
        "x-build-id": "wjb73WNjlHRJj7lzD"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/nekohaii~philippine-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-nekohaii-philippine-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/nekohaii~philippine-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-nekohaii-philippine-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/nekohaii~philippine-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-nekohaii-philippine-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "sources": {
                        "title": "News Sources",
                        "type": "string",
                        "description": "Comma-separated list of sources. Options: all (confirmed working), all-including-blocked, philstar, rappler, sunstar, tribune, businessworld, manila-bulletin, manila-times, inquirer, abs-cbn, gma-news. Default: all",
                        "default": "all"
                    },
                    "keyword": {
                        "title": "Keyword filter",
                        "type": "string",
                        "description": "Only return articles containing this keyword in title or summary (leave empty for all)",
                        "default": ""
                    },
                    "extractFullText": {
                        "title": "Extract full article text",
                        "type": "boolean",
                        "description": "If enabled, scrapes each article page to extract the full body text. Adds ~1-3s per article but gives you complete articles ready for AI processing.",
                        "default": true
                    },
                    "maxItems": {
                        "title": "Max articles (total)",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of news articles across all sources",
                        "default": 100
                    },
                    "maxPerSource": {
                        "title": "Max articles per source",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Maximum articles per individual RSS feed",
                        "default": 20
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
