United Carriers AI News Scraper avatar

United Carriers AI News Scraper

Pricing

Pay per usage

Go to Apify Store
United Carriers AI News Scraper

United Carriers AI News Scraper

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Mathieux Barlow-Ladias

Mathieux Barlow-Ladias

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

Scrapes supply chain, logistics, and freight news articles from configurable publications, filters by keyword and geographic region, and ranks by popularity - ready for United Carriers' ChatGPT summarisation workflow.

What does this Actor do?

This Actor visits a configurable list of news publication homepages, discovers recent articles, extracts the full body text, filters for supply chain / logistics relevance, and returns the top N most popular articles per geographic region (Americas, Asia, Europe, Middle East, Africa, Oceania, Global). Output is structured JSON ready to paste into the United Carriers ChatGPT framework.

Why use this Actor?

Replaces a manual daily workflow: staff no longer need to hunt publications, copy-paste articles, or filter by region manually. The Actor runs on a schedule and produces a clean, ranked article list in minutes.

How to use

  1. Deploy the Actor with apify push (or run from Apify Console)
  2. In the Input tab, confirm sourceUrls matches United Carriers' preferred publications
  3. Adjust maxAgeHours (default: 72) and maxArticlesPerRegion (default: 3) as needed
  4. Run the Actor
  5. Open the Output tab - each row is an article with region, title, bodyText, and url
  6. Copy title + bodyText for each article into the ChatGPT framework

Input

FieldTypeDefaultDescription
sourceUrlsstring[]8 publicationsHomepage or section URLs to scrape
keywordsstring[]23 supply chain termsArticles must match at least one keyword
maxArticlesPerRegioninteger3Top articles returned per region
maxAgeHoursinteger72Only include articles from the last N hours
minContentLengthinteger500Min character count - filters stubs / paywalled content

Output

Each dataset item:

{
"region": "Asia",
"title": "Port congestion delays shipments across Southeast Asia",
"url": "https://www.supplychaindive.com/news/...",
"publishedAt": "2026-06-01T14:00:00Z",
"source": "supplychaindive.com",
"bodyText": "Full article body text ready for ChatGPT...",
"popularityScore": 87,
"popularitySignal": "recency",
"matchedKeywords": ["port congestion", "shipping"],
"matchedRegion": "Asia"
}

Data fields

FieldTypeDescription
regionstringGeographic region (Americas / Asia / Europe / Middle East / Africa / Oceania / Global)
titlestringArticle headline
urlstringOriginal article URL
publishedAtstringISO 8601 publish date
sourcestringPublication hostname
bodyTextstringFull article body text - paste directly into ChatGPT framework
popularityScorenumberRecency-weighted score (0-400). Higher = more popular / recent
popularitySignalstringWhat drove the score: recency or recency+shareCount
matchedKeywordsstring[]Which keywords triggered inclusion
matchedRegionstringRegion this row represents

Pricing / cost estimation

This Actor uses CheerioCrawler (HTTP only, no browser). Estimated costs:

  • ~8 sources × ~100 articles each = ~800 requests per run
  • At default concurrency (20): completes in ~2-4 minutes
  • Estimated: ~0.05-0.10 compute units per run
  • On a $49/month plan, ~500+ runs per month are feasible

Tips

  • Paywalled sources (e.g. Lloyd's List): the Actor will collect whatever is publicly accessible. Paywalled articles will fail minContentLength and be skipped - this is expected behaviour.
  • Increasing coverage: add more sourceUrls, increase maxAgeHours, or lower minContentLength.
  • Reducing noise: raise minContentLength to 1000+ to filter short summaries and stubs.
  • Scheduling: configure a daily or twice-weekly schedule in Apify Console to automate the workflow entirely.
  • RSS feeds: if a publication provides an RSS feed URL, add it to sourceUrls - the Actor handles them like any other listing page.

FAQ, disclaimers, and support

Is this legal? This Actor only scrapes publicly available content. It does not log in, bypass paywalls, or scrape personal data. Always verify compliance with each publication's Terms of Service before production use.

A source returns no articles. The publication may use JavaScript rendering. Open an issue and we can evaluate adding PlaywrightCrawler support for that specific source.

Articles are being skipped. Check maxAgeHours (default 72h) and minContentLength (default 500 chars). Paywalled articles will always be skipped.

Questions or changes: contact Apify Professional Services.