# Bloomberg News Extractor (`kawsar/bloomberg-news-extractor`) Actor

Bloomberg news scraper that pulls headlines, body text, authors, and tags from article and section pages, so your data pipelines get financial news without the copy-paste.

- **URL**: https://apify.com/kawsar/bloomberg-news-extractor.md
- **Developed by:** [Kawsar](https://apify.com/kawsar) (community)
- **Categories:** News, Automation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Bloomberg News Extractor

Extract structured article data from Bloomberg.com. Paste one or more Bloomberg article URLs and the actor returns a clean dataset with headlines, authors, publish dates, full article body text, image URLs, content tags, categories, reading time, and more.

Every article is fully scraped — body text is always included, no extra configuration needed.

---

### What it extracts

Every article record contains the following fields:

| Field | Type | Description |
|-------|------|-------------|
| `url` | string | Canonical Bloomberg article URL |
| `articleId` | string | Bloomberg internal SUID identifier |
| `headline` | string | Main article headline |
| `seoHeadline` | string | SEO-optimised version of the headline |
| `byline` | string | Author name as it appears on the article |
| `authorName` | string | First credited author full name |
| `authorTwitter` | string | Author Twitter handle (without @) |
| `publishedAt` | string | ISO 8601 UTC publish timestamp |
| `updatedAt` | string | ISO 8601 UTC last-update timestamp |
| `articleSummary` | string | Article lede or summary paragraph |
| `bodyText` | string | Full plain-text article body |
| `imageUrl` | string | Main article image URL |
| `imageCaption` | string | Caption for the main image |
| `imageCredit` | string | Photographer or agency credit |
| `section` | string | Bloomberg section: markets, technology, politics, etc. |
| `categories` | string | Comma-separated list of section categories |
| `tags` | string | Comma-separated list of content tag names |
| `isPremium` | boolean | True if the article requires a Bloomberg subscription |
| `readingTimeMinutes` | number | Estimated reading time in minutes |
| `slug` | string | URL date-slug (e.g. `2026-05-23/article-title`) |
| `scrapedAt` | string | ISO 8601 UTC timestamp of when the record was collected |
| `error` | string | Error message if the article failed to scrape, null on success |

---

### How to use it

#### 1. Open the actor on Apify

Go to the actor page and click **Try for free** to open the input editor.

#### 2. Add Bloomberg article URLs

Paste one or more full Bloomberg article URLs into the **Start URLs** field. Use the full article URL with the `/news/articles/` path:

````

https://www.bloomberg.com/news/articles/2026-05-22/abrego-garcia-wins-dismissal-of-us-human-smuggling-case
https://www.bloomberg.com/news/articles/2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days

````

Query string parameters like `?srnd=phx-markets` are stripped automatically before scraping.

#### 3. Set your limits

- **Max articles** — cap on total articles processed per run (default: 50, max: 1000)
- **Request timeout** — per-request timeout in seconds (default: 30)

#### 4. Run and download

Click **Start**. The actor processes each URL and pushes results to the dataset. Download as JSON, CSV, Excel, or XML from the **Storage** tab when the run finishes.

---

### Input reference

```json
{
    "startUrls": [
        "https://www.bloomberg.com/news/articles/2026-05-22/abrego-garcia-wins-dismissal-of-us-human-smuggling-case",
        "https://www.bloomberg.com/news/articles/2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days"
    ],
    "maxArticles": 50,
    "requestTimeoutSecs": 30
}
````

| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| `startUrls` | Yes | — | List of Bloomberg article URLs (`/news/articles/...` paths) |
| `maxArticles` | No | 50 | Maximum articles to process per run (1–1000) |
| `requestTimeoutSecs` | No | 30 | Per-request timeout in seconds (5–120) |

#### URL format

Use the full article URL. The path must contain `/news/articles/`:

```
https://www.bloomberg.com/news/articles/YYYY-MM-DD/article-slug
```

Any query parameters (`?srnd=...`, `?utm_source=...`) are removed automatically.

***

### Example output record

```json
{
    "url": "https://www.bloomberg.com/news/articles/2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days",
    "articleId": "TFGSTUKK3NYD00",
    "headline": "India Raises Diesel, Gasoline Prices for Third Time in Eight Days",
    "seoHeadline": "India Raises Diesel, Gasoline Prices for Third Time in Eight Days",
    "byline": "Rakesh Sharma",
    "authorName": "Rakesh Sharma",
    "authorTwitter": "journorakesh",
    "publishedAt": "2026-05-23T01:30:56.367Z",
    "updatedAt": "2026-05-23T03:42:27.089Z",
    "articleSummary": "India's state-run refiners raised retail prices again of diesel and gasoline on Saturday to help processors cut losses on discounted sales and to control a spike in demand.",
    "bodyText": "India's state-run refiners raised retail prices again of diesel and gasoline on Saturday...",
    "imageUrl": "https://assets.bwbx.io/images/users/iqjWHBFdfxIU/itJ0yPa0NDcg/v0/-1x-1.webp",
    "imageCaption": "A fuel station in New Delhi.",
    "imageCredit": "Photographer: Anindito Mukherjee/Bloomberg",
    "section": "markets",
    "categories": "markets",
    "tags": "Retail, Government, Taxes, Energy, India",
    "isPremium": false,
    "readingTimeMinutes": 2.5,
    "slug": "2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days",
    "scrapedAt": "2026-05-23T05:12:00.000Z",
    "error": null
}
```

***

### Notes on premium articles

The `isPremium` field is `true` for subscriber-only articles. Metadata fields — headline, author, publish date, summary, image URL, tags — are always collected regardless of subscription status. Full body text on paywalled articles may be truncated; the `isPremium` flag lets you identify and filter these records downstream.

***

### Output formats

The dataset can be downloaded from Apify in several formats:

| Format | Best for |
|--------|----------|
| JSON | Database ingestion, APIs, Python/Node scripts |
| CSV | Excel, Google Sheets, pandas DataFrames |
| JSONL | Streaming pipelines, BigQuery, S3 |
| XML | Legacy system integrations |

***

### Use cases

**Financial research** — bulk-scrape Bloomberg articles on a specific market sector and run sentiment analysis or topic modeling across the corpus.

**News monitoring** — paste a fresh set of article URLs daily and track how Bloomberg covers specific companies, geopolitical events, or industries over time.

**Competitive intelligence** — collect article metadata at scale and filter by `tags`, `section`, or `authorName` to understand Bloomberg's editorial focus on a topic.

**Data journalism** — pull authorship and publication patterns across hundreds of articles for investigative or academic research.

**News aggregation pipelines** — feed clean structured Bloomberg data into internal dashboards, Slack alerts, or downstream NLP systems.

***

### How to get Bloomberg article URLs

Bloomberg article URLs follow this pattern:

```
https://www.bloomberg.com/news/articles/YYYY-MM-DD/article-slug
```

Ways to collect them:

- Browse any Bloomberg section (Markets, Technology, Politics, etc.) and copy article links from the page
- Use Bloomberg's own search at bloomberg.com/search to find articles by keyword, then copy the URLs
- Monitor Bloomberg's RSS feeds or Twitter/X account for article links
- Use another actor or script to collect article URLs from Bloomberg section pages and pass them as input here

***

### Performance tips

- Increase `requestTimeoutSecs` to 60 if you see timeout errors on slow article pages.
- Use `maxArticles` to cap scope during test runs before processing a large batch.
- For batches over 200 articles, consider splitting into multiple runs of 100–200 each.

***

### Scheduling

Use Apify's built-in **Schedules** feature to run this actor on a recurring basis:

1. Go to **Schedules** in your Apify account
2. Click **Create new schedule**
3. Select this actor and configure your article URL list
4. Choose a cron expression, e.g. `0 8 * * *` for daily at 8am UTC
5. Results accumulate in the dataset automatically with each run

This works well for monitoring a fixed list of Bloomberg articles for updates — the `updatedAt` field tells you when Bloomberg last edited each piece.

***

### Error handling

Each article is processed independently. If one URL fails (network error, page not found, parse failure), the actor logs the error and continues to the next URL. Failed records appear in the dataset with `error` set to a message string and all other fields set to `null`. The run does not stop on individual article failures.

# Actor input Schema

## `startUrls` (type: `array`):

Bloomberg article URLs (https://www.bloomberg.com/news/articles/...). Paste one or more full article URLs to scrape.

## `maxArticles` (type: `integer`):

Maximum number of articles to extract per run. Applies across all start URLs combined.

## `requestTimeoutSecs` (type: `integer`):

Per-request timeout in seconds.

## Actor input object example

```json
{
  "startUrls": [
    "https://www.bloomberg.com/news/articles/2026-05-22/abrego-garcia-wins-dismissal-of-us-human-smuggling-case",
    "https://www.bloomberg.com/news/articles/2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days"
  ],
  "maxArticles": 50,
  "requestTimeoutSecs": 30
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        "https://www.bloomberg.com/news/articles/2026-05-22/abrego-garcia-wins-dismissal-of-us-human-smuggling-case",
        "https://www.bloomberg.com/news/articles/2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("kawsar/bloomberg-news-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [
        "https://www.bloomberg.com/news/articles/2026-05-22/abrego-garcia-wins-dismissal-of-us-human-smuggling-case",
        "https://www.bloomberg.com/news/articles/2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days",
    ] }

# Run the Actor and wait for it to finish
run = client.actor("kawsar/bloomberg-news-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    "https://www.bloomberg.com/news/articles/2026-05-22/abrego-garcia-wins-dismissal-of-us-human-smuggling-case",
    "https://www.bloomberg.com/news/articles/2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days"
  ]
}' |
apify call kawsar/bloomberg-news-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=kawsar/bloomberg-news-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Bloomberg News Extractor",
        "description": "Bloomberg news scraper that pulls headlines, body text, authors, and tags from article and section pages, so your data pipelines get financial news without the copy-paste.",
        "version": "0.0",
        "x-build-id": "p7b83Tgvhdcd03K1W"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/kawsar~bloomberg-news-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-kawsar-bloomberg-news-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/kawsar~bloomberg-news-extractor/runs": {
            "post": {
                "operationId": "runs-sync-kawsar-bloomberg-news-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/kawsar~bloomberg-news-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-kawsar-bloomberg-news-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "Bloomberg article URLs (https://www.bloomberg.com/news/articles/...). Paste one or more full article URLs to scrape.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxArticles": {
                        "title": "Max articles",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of articles to extract per run. Applies across all start URLs combined.",
                        "default": 50
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout (seconds)",
                        "minimum": 5,
                        "maximum": 120,
                        "type": "integer",
                        "description": "Per-request timeout in seconds.",
                        "default": 30
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```