United Carriers AI News Scraper
Pricing
Pay per usage
United Carriers AI News Scraper
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Mathieux Barlow-Ladias
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Share
Scrapes supply chain, logistics, and freight news articles from configurable publications, filters by keyword and geographic region, and ranks by popularity - ready for United Carriers' ChatGPT summarisation workflow.
What does this Actor do?
This Actor visits a configurable list of news publication homepages, discovers recent articles, extracts the full body text, filters for supply chain / logistics relevance, and returns the top N most popular articles per geographic region (Americas, Asia, Europe, Middle East, Africa, Oceania, Global). Output is structured JSON ready to paste into the United Carriers ChatGPT framework.
Why use this Actor?
Replaces a manual daily workflow: staff no longer need to hunt publications, copy-paste articles, or filter by region manually. The Actor runs on a schedule and produces a clean, ranked article list in minutes.
How to use
- Deploy the Actor with
apify push(or run from Apify Console) - In the Input tab, confirm
sourceUrlsmatches United Carriers' preferred publications - Adjust
maxAgeHours(default: 72) andmaxArticlesPerRegion(default: 3) as needed - Run the Actor
- Open the Output tab - each row is an article with
region,title,bodyText, andurl - Copy
title+bodyTextfor each article into the ChatGPT framework
Input
| Field | Type | Default | Description |
|---|---|---|---|
sourceUrls | string[] | 8 publications | Homepage or section URLs to scrape |
keywords | string[] | 23 supply chain terms | Articles must match at least one keyword |
maxArticlesPerRegion | integer | 3 | Top articles returned per region |
maxAgeHours | integer | 72 | Only include articles from the last N hours |
minContentLength | integer | 500 | Min character count - filters stubs / paywalled content |
Output
Each dataset item:
{"region": "Asia","title": "Port congestion delays shipments across Southeast Asia","url": "https://www.supplychaindive.com/news/...","publishedAt": "2026-06-01T14:00:00Z","source": "supplychaindive.com","bodyText": "Full article body text ready for ChatGPT...","popularityScore": 87,"popularitySignal": "recency","matchedKeywords": ["port congestion", "shipping"],"matchedRegion": "Asia"}
Data fields
| Field | Type | Description |
|---|---|---|
region | string | Geographic region (Americas / Asia / Europe / Middle East / Africa / Oceania / Global) |
title | string | Article headline |
url | string | Original article URL |
publishedAt | string | ISO 8601 publish date |
source | string | Publication hostname |
bodyText | string | Full article body text - paste directly into ChatGPT framework |
popularityScore | number | Recency-weighted score (0-400). Higher = more popular / recent |
popularitySignal | string | What drove the score: recency or recency+shareCount |
matchedKeywords | string[] | Which keywords triggered inclusion |
matchedRegion | string | Region this row represents |
Pricing / cost estimation
This Actor uses CheerioCrawler (HTTP only, no browser). Estimated costs:
- ~8 sources × ~100 articles each = ~800 requests per run
- At default concurrency (20): completes in ~2-4 minutes
- Estimated: ~0.05-0.10 compute units per run
- On a $49/month plan, ~500+ runs per month are feasible
Tips
- Paywalled sources (e.g. Lloyd's List): the Actor will collect whatever is publicly accessible. Paywalled articles will fail
minContentLengthand be skipped - this is expected behaviour. - Increasing coverage: add more
sourceUrls, increasemaxAgeHours, or lowerminContentLength. - Reducing noise: raise
minContentLengthto 1000+ to filter short summaries and stubs. - Scheduling: configure a daily or twice-weekly schedule in Apify Console to automate the workflow entirely.
- RSS feeds: if a publication provides an RSS feed URL, add it to
sourceUrls- the Actor handles them like any other listing page.
FAQ, disclaimers, and support
Is this legal? This Actor only scrapes publicly available content. It does not log in, bypass paywalls, or scrape personal data. Always verify compliance with each publication's Terms of Service before production use.
A source returns no articles. The publication may use JavaScript rendering. Open an issue and we can evaluate adding PlaywrightCrawler support for that specific source.
Articles are being skipped. Check maxAgeHours (default 72h) and minContentLength (default 500 chars). Paywalled articles will always be skipped.
Questions or changes: contact Apify Professional Services.