Bloomberg News Extractor
Pricing
Pay per usage
Bloomberg News Extractor
Bloomberg news scraper that pulls headlines, body text, authors, and tags from article and section pages, so your data pipelines get financial news without the copy-paste.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Kawsar
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Extract structured article data from Bloomberg.com. Paste one or more Bloomberg article URLs and the actor returns a clean dataset with headlines, authors, publish dates, full article body text, image URLs, content tags, categories, reading time, and more.
Every article is fully scraped — body text is always included, no extra configuration needed.
What it extracts
Every article record contains the following fields:
| Field | Type | Description |
|---|---|---|
url | string | Canonical Bloomberg article URL |
articleId | string | Bloomberg internal SUID identifier |
headline | string | Main article headline |
seoHeadline | string | SEO-optimised version of the headline |
byline | string | Author name as it appears on the article |
authorName | string | First credited author full name |
authorTwitter | string | Author Twitter handle (without @) |
publishedAt | string | ISO 8601 UTC publish timestamp |
updatedAt | string | ISO 8601 UTC last-update timestamp |
articleSummary | string | Article lede or summary paragraph |
bodyText | string | Full plain-text article body |
imageUrl | string | Main article image URL |
imageCaption | string | Caption for the main image |
imageCredit | string | Photographer or agency credit |
section | string | Bloomberg section: markets, technology, politics, etc. |
categories | string | Comma-separated list of section categories |
tags | string | Comma-separated list of content tag names |
isPremium | boolean | True if the article requires a Bloomberg subscription |
readingTimeMinutes | number | Estimated reading time in minutes |
slug | string | URL date-slug (e.g. 2026-05-23/article-title) |
scrapedAt | string | ISO 8601 UTC timestamp of when the record was collected |
error | string | Error message if the article failed to scrape, null on success |
How to use it
1. Open the actor on Apify
Go to the actor page and click Try for free to open the input editor.
2. Add Bloomberg article URLs
Paste one or more full Bloomberg article URLs into the Start URLs field. Use the full article URL with the /news/articles/ path:
https://www.bloomberg.com/news/articles/2026-05-22/abrego-garcia-wins-dismissal-of-us-human-smuggling-casehttps://www.bloomberg.com/news/articles/2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days
Query string parameters like ?srnd=phx-markets are stripped automatically before scraping.
3. Set your limits
- Max articles — cap on total articles processed per run (default: 50, max: 1000)
- Request timeout — per-request timeout in seconds (default: 30)
4. Run and download
Click Start. The actor processes each URL and pushes results to the dataset. Download as JSON, CSV, Excel, or XML from the Storage tab when the run finishes.
Input reference
{"startUrls": ["https://www.bloomberg.com/news/articles/2026-05-22/abrego-garcia-wins-dismissal-of-us-human-smuggling-case","https://www.bloomberg.com/news/articles/2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days"],"maxArticles": 50,"requestTimeoutSecs": 30}
| Field | Required | Default | Description |
|---|---|---|---|
startUrls | Yes | — | List of Bloomberg article URLs (/news/articles/... paths) |
maxArticles | No | 50 | Maximum articles to process per run (1–1000) |
requestTimeoutSecs | No | 30 | Per-request timeout in seconds (5–120) |
URL format
Use the full article URL. The path must contain /news/articles/:
https://www.bloomberg.com/news/articles/YYYY-MM-DD/article-slug
Any query parameters (?srnd=..., ?utm_source=...) are removed automatically.
Example output record
{"url": "https://www.bloomberg.com/news/articles/2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days","articleId": "TFGSTUKK3NYD00","headline": "India Raises Diesel, Gasoline Prices for Third Time in Eight Days","seoHeadline": "India Raises Diesel, Gasoline Prices for Third Time in Eight Days","byline": "Rakesh Sharma","authorName": "Rakesh Sharma","authorTwitter": "journorakesh","publishedAt": "2026-05-23T01:30:56.367Z","updatedAt": "2026-05-23T03:42:27.089Z","articleSummary": "India's state-run refiners raised retail prices again of diesel and gasoline on Saturday to help processors cut losses on discounted sales and to control a spike in demand.","bodyText": "India's state-run refiners raised retail prices again of diesel and gasoline on Saturday...","imageUrl": "https://assets.bwbx.io/images/users/iqjWHBFdfxIU/itJ0yPa0NDcg/v0/-1x-1.webp","imageCaption": "A fuel station in New Delhi.","imageCredit": "Photographer: Anindito Mukherjee/Bloomberg","section": "markets","categories": "markets","tags": "Retail, Government, Taxes, Energy, India","isPremium": false,"readingTimeMinutes": 2.5,"slug": "2026-05-23/india-raises-diesel-gasoline-prices-for-third-time-in-eight-days","scrapedAt": "2026-05-23T05:12:00.000Z","error": null}
Notes on premium articles
The isPremium field is true for subscriber-only articles. Metadata fields — headline, author, publish date, summary, image URL, tags — are always collected regardless of subscription status. Full body text on paywalled articles may be truncated; the isPremium flag lets you identify and filter these records downstream.
Output formats
The dataset can be downloaded from Apify in several formats:
| Format | Best for |
|---|---|
| JSON | Database ingestion, APIs, Python/Node scripts |
| CSV | Excel, Google Sheets, pandas DataFrames |
| JSONL | Streaming pipelines, BigQuery, S3 |
| XML | Legacy system integrations |
Use cases
Financial research — bulk-scrape Bloomberg articles on a specific market sector and run sentiment analysis or topic modeling across the corpus.
News monitoring — paste a fresh set of article URLs daily and track how Bloomberg covers specific companies, geopolitical events, or industries over time.
Competitive intelligence — collect article metadata at scale and filter by tags, section, or authorName to understand Bloomberg's editorial focus on a topic.
Data journalism — pull authorship and publication patterns across hundreds of articles for investigative or academic research.
News aggregation pipelines — feed clean structured Bloomberg data into internal dashboards, Slack alerts, or downstream NLP systems.
How to get Bloomberg article URLs
Bloomberg article URLs follow this pattern:
https://www.bloomberg.com/news/articles/YYYY-MM-DD/article-slug
Ways to collect them:
- Browse any Bloomberg section (Markets, Technology, Politics, etc.) and copy article links from the page
- Use Bloomberg's own search at bloomberg.com/search to find articles by keyword, then copy the URLs
- Monitor Bloomberg's RSS feeds or Twitter/X account for article links
- Use another actor or script to collect article URLs from Bloomberg section pages and pass them as input here
Performance tips
- Increase
requestTimeoutSecsto 60 if you see timeout errors on slow article pages. - Use
maxArticlesto cap scope during test runs before processing a large batch. - For batches over 200 articles, consider splitting into multiple runs of 100–200 each.
Scheduling
Use Apify's built-in Schedules feature to run this actor on a recurring basis:
- Go to Schedules in your Apify account
- Click Create new schedule
- Select this actor and configure your article URL list
- Choose a cron expression, e.g.
0 8 * * *for daily at 8am UTC - Results accumulate in the dataset automatically with each run
This works well for monitoring a fixed list of Bloomberg articles for updates — the updatedAt field tells you when Bloomberg last edited each piece.
Error handling
Each article is processed independently. If one URL fails (network error, page not found, parse failure), the actor logs the error and continues to the next URL. Failed records appear in the dataset with error set to a message string and all other fields set to null. The run does not stop on individual article failures.