# United Carriers AI News Scraper (`bravolad/united-carriers-ai-news-scraper`) Actor

- **URL**: https://apify.com/bravolad/united-carriers-ai-news-scraper.md
- **Developed by:** [Mathieux Barlow-Ladias](https://apify.com/bravolad) (community)
- **Categories:** AI, News, Agents
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## United Carriers AI News Scraper

Scrapes supply chain, logistics, and freight news articles from configurable publications, filters by keyword and geographic region, and ranks by popularity - ready for United Carriers' ChatGPT summarisation workflow.

### What does this Actor do?

This Actor visits a configurable list of news publication homepages, discovers recent articles, extracts the full body text, filters for supply chain / logistics relevance, and returns the top N most popular articles per geographic region (Americas, Asia, Europe, Middle East, Africa, Oceania, Global). Output is structured JSON ready to paste into the United Carriers ChatGPT framework.

### Why use this Actor?

Replaces a manual daily workflow: staff no longer need to hunt publications, copy-paste articles, or filter by region manually. The Actor runs on a schedule and produces a clean, ranked article list in minutes.

### How to use

1. Deploy the Actor with `apify push` (or run from Apify Console)
2. In the Input tab, confirm `sourceUrls` matches United Carriers' preferred publications
3. Adjust `maxAgeHours` (default: 72) and `maxArticlesPerRegion` (default: 3) as needed
4. Run the Actor
5. Open the Output tab - each row is an article with `region`, `title`, `bodyText`, and `url`
6. Copy `title` + `bodyText` for each article into the ChatGPT framework

### Input

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `sourceUrls` | string[] | 8 publications | Homepage or section URLs to scrape |
| `keywords` | string[] | 23 supply chain terms | Articles must match at least one keyword |
| `maxArticlesPerRegion` | integer | 3 | Top articles returned per region |
| `maxAgeHours` | integer | 72 | Only include articles from the last N hours |
| `minContentLength` | integer | 500 | Min character count - filters stubs / paywalled content |

### Output

Each dataset item:

```json
{
  "region": "Asia",
  "title": "Port congestion delays shipments across Southeast Asia",
  "url": "https://www.supplychaindive.com/news/...",
  "publishedAt": "2026-06-01T14:00:00Z",
  "source": "supplychaindive.com",
  "bodyText": "Full article body text ready for ChatGPT...",
  "popularityScore": 87,
  "popularitySignal": "recency",
  "matchedKeywords": ["port congestion", "shipping"],
  "matchedRegion": "Asia"
}
````

### Data fields

| Field | Type | Description |
|-------|------|-------------|
| `region` | string | Geographic region (Americas / Asia / Europe / Middle East / Africa / Oceania / Global) |
| `title` | string | Article headline |
| `url` | string | Original article URL |
| `publishedAt` | string | ISO 8601 publish date |
| `source` | string | Publication hostname |
| `bodyText` | string | Full article body text - paste directly into ChatGPT framework |
| `popularityScore` | number | Recency-weighted score (0-400). Higher = more popular / recent |
| `popularitySignal` | string | What drove the score: `recency` or `recency+shareCount` |
| `matchedKeywords` | string\[] | Which keywords triggered inclusion |
| `matchedRegion` | string | Region this row represents |

### Pricing / cost estimation

This Actor uses CheerioCrawler (HTTP only, no browser). Estimated costs:

- \~8 sources × ~100 articles each = ~800 requests per run
- At default concurrency (20): completes in ~2-4 minutes
- Estimated: ~0.05-0.10 compute units per run
- On a $49/month plan, ~500+ runs per month are feasible

### Tips

- **Paywalled sources** (e.g. Lloyd's List): the Actor will collect whatever is publicly accessible. Paywalled articles will fail `minContentLength` and be skipped - this is expected behaviour.
- **Increasing coverage**: add more `sourceUrls`, increase `maxAgeHours`, or lower `minContentLength`.
- **Reducing noise**: raise `minContentLength` to 1000+ to filter short summaries and stubs.
- **Scheduling**: configure a daily or twice-weekly schedule in Apify Console to automate the workflow entirely.
- **RSS feeds**: if a publication provides an RSS feed URL, add it to `sourceUrls` - the Actor handles them like any other listing page.

### FAQ, disclaimers, and support

**Is this legal?** This Actor only scrapes publicly available content. It does not log in, bypass paywalls, or scrape personal data. Always verify compliance with each publication's Terms of Service before production use.

**A source returns no articles.** The publication may use JavaScript rendering. Open an issue and we can evaluate adding PlaywrightCrawler support for that specific source.

**Articles are being skipped.** Check `maxAgeHours` (default 72h) and `minContentLength` (default 500 chars). Paywalled articles will always be skipped.

**Questions or changes**: contact Apify Professional Services.

# Actor input Schema

## `sourceUrls` (type: `array`):

List of news publication homepage or section URLs to scrape. Defaults to 8 pre-configured supply chain publications.

## `keywords` (type: `array`):

Articles must contain at least one of these keywords to be included. Case-insensitive.

## `maxArticlesPerRegion` (type: `integer`):

Maximum number of top articles to return per geographic region.

## `maxAgeHours` (type: `integer`):

Only include articles published within this many hours. Default 72 = last 3 days.

## `minContentLength` (type: `integer`):

Minimum article body length in characters. Filters out stubs and paywalled content.

## Actor input object example

```json
{
  "sourceUrls": [
    "https://www.supplychaindive.com",
    "https://www.freightwaves.com",
    "https://www.logisticsmgmt.com",
    "https://www.hellenicshippingnews.com",
    "https://www.reuters.com/business/transportation/",
    "https://www.tradeready.ca",
    "https://www.joc.com",
    "https://www.worldcargoalliance.com"
  ],
  "keywords": [
    "supply chain",
    "logistics",
    "freight",
    "shipping",
    "cargo",
    "container",
    "customs",
    "import",
    "export",
    "trade",
    "tariff",
    "freight forwarding",
    "disruption",
    "port congestion"
  ],
  "maxArticlesPerRegion": 3,
  "maxAgeHours": 72,
  "minContentLength": 500
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("bravolad/united-carriers-ai-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("bravolad/united-carriers-ai-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call bravolad/united-carriers-ai-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=bravolad/united-carriers-ai-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "United Carriers AI News Scraper",
        "version": "0.1",
        "x-build-id": "ksURKcM6iFddAXCe6"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/bravolad~united-carriers-ai-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-bravolad-united-carriers-ai-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/bravolad~united-carriers-ai-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-bravolad-united-carriers-ai-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/bravolad~united-carriers-ai-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-bravolad-united-carriers-ai-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "sourceUrls": {
                        "title": "Source URLs",
                        "type": "array",
                        "description": "List of news publication homepage or section URLs to scrape. Defaults to 8 pre-configured supply chain publications.",
                        "default": [
                            "https://www.supplychaindive.com",
                            "https://www.freightwaves.com",
                            "https://www.logisticsmgmt.com",
                            "https://www.hellenicshippingnews.com",
                            "https://www.reuters.com/business/transportation/",
                            "https://www.tradeready.ca",
                            "https://www.joc.com",
                            "https://www.worldcargoalliance.com"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "keywords": {
                        "title": "Keywords",
                        "type": "array",
                        "description": "Articles must contain at least one of these keywords to be included. Case-insensitive.",
                        "default": [
                            "supply chain",
                            "logistics",
                            "freight",
                            "shipping",
                            "cargo",
                            "container",
                            "customs",
                            "import",
                            "export",
                            "trade",
                            "tariff",
                            "freight forwarding",
                            "disruption",
                            "port congestion"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxArticlesPerRegion": {
                        "title": "Max articles per region",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Maximum number of top articles to return per geographic region.",
                        "default": 3
                    },
                    "maxAgeHours": {
                        "title": "Max article age (hours)",
                        "minimum": 1,
                        "maximum": 720,
                        "type": "integer",
                        "description": "Only include articles published within this many hours. Default 72 = last 3 days.",
                        "default": 72
                    },
                    "minContentLength": {
                        "title": "Min content length (characters)",
                        "minimum": 100,
                        "type": "integer",
                        "description": "Minimum article body length in characters. Filters out stubs and paywalled content.",
                        "default": 500
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
