Guardian News Scraper avatar

Guardian News Scraper

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Guardian News Scraper

Guardian News Scraper

Scrape full The Guardian articles with headline, body, authors, section, and tags. Supports `mode: latest` to get newest news via Guardian world RSS. HTTP-only.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Xtractoo

Xtractoo

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

2 days ago

Last modified

Share

The Guardian Article Scraper

Extract full article text, author, publication date, section, and description from any theguardian.com article URL. The Guardian is one of the world's most-read English-language news sites with extensive international coverage across politics, culture, and science.

Why Use This Actor?

  • Academic research - Guardian long-form journalism is widely used in media studies and political research.
  • Content curation - aggregate Guardian articles by topic for newsletters or reading lists.
  • Sentiment and bias analysis - Guardian editorial stance makes it a reference in media bias research.
  • Open access - Guardian content is freely available globally with no paywall or geo-restriction.

How It Works

This actor uses only HTTP requests - no browser, no Selenium, no Playwright. Articles are extracted in seconds with RAM usage well under 256 MB.

Input

{
"url": "https://www.theguardian.com/world/2026/apr/13/example-article",
"urls": [
"https://www.theguardian.com/world/2026/apr/13/article-one",
"https://www.theguardian.com/technology/2026/apr/12/article-two"
]
,
"mode": "article",
"limit": 10
}

Output

{
"url": "https://www.theguardian.com/world/2026/may/15/mali-airstrikes-rebel-alliance-separatists",
"source": "The Guardian",
"title": "Mali’s forces target rebel alliance in junta’s fight to keep power",
"description": "Army supported by Russian mercenaries launches airstrikes after offensive by coalition of Islamist extremists and Tuareg separatists",
"content": "Mali’s armed forces, supported by Russian mercenaries, have launched airstrikes targeting a rebel alliance of Islamist extremists and Tuareg separatists as the ruling junta struggles to maintain its hold on power in the unstable west African country. Earlier this week warplanes targeted the key northern town of Kidal,which was lostwhen the rebels launched a surprise offensive across much of Mali in late April....",
"image": "https://i.guim.co.uk/img/media/e6d26af1123d872554af9a427c5d33abf01bc499/650_22_3090_2473/master/3090.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&precrop=40:21,offset-x50,offset-y0&overlay-align=bottom%2Cleft&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctZGVmYXVsdC5wbmc&enable=upscale&s=46f9527d36a676fc922f988649bb5fe9",
"language": "en",
"word_count": 847,
"published_date": "2026-05-15T14:57:35.000Z",
"modified_date": "",
"authors": [],
"categories": "",
"tags": ""
}

Fetch Latest News

Set mode to "latest" to fetch the newest article URLs and titles from The Guardian instead of extracting a single article.

Input:

{
"mode": "latest",
"limit": 10
}

Output - array of objects:

[
{
"url": "https://www.theguardian.com/world/2026/apr/20/madagascar-gen-z-protesters-fear-new-regime",
"title": "Arrests fuel fears among Madagascar’s gen Z protesters that new regime no better than one they overthrew",
"published_date": "Mon, 20 Apr 2026 04:00:02 GMT",
"source": "The Guardian"
}
//...
]

Source: https://www.theguardian.com/world/rss (RSS feed)

Cron Schedule: Auto-Fetch Newest Articles

Combine mode: "latest" and mode: "article" to keep a fresh feed running on autopilot:

  1. Schedule a recurring run of this Actor with {"mode": "latest", "limit": 20} via Apify Schedules (UI ▸ Schedules ▸ Create new). A cron expression like */30 * * * * runs it every 30 minutes.
  2. Webhook the dataset of the latest run into another Actor run with mode: "article" and the new URLs as input — Apify integrations let you chain runs via the "Actor finished" webhook without any glue code.
  3. The article-mode run extracts the full body, image, authors, and metadata for each URL and appends to your master dataset.

Common cron expressions:

FrequencyCron
Every 15 minutes*/15 * * * *
Hourly0 * * * *
Every 6 hours0 */6 * * *
Daily at 06:00 UTC0 6 * * *

Notes

  • The Guardian rarely paywalls content; full article text is usually returned
  • For high-volume production use, register for The Guardian's free Content API

Other News Actors

Need a different news source? All actors in this collection:

ActorSource
aljazeera-scraperAl Jazeera
apnews-scraperAP News
bbc-scraperBBC News
cnbc-scraperCNBC
forbes-scraperForbes
fortune-scraperFortune
ft-scraperFinancial Times
guardian-scraperThe Guardian
msn-scraperMSN News
nytimes-scraperNew York Times
reuters-scraperReuters
scmp-scraperSouth China Morning Post
techcrunch-scraperTechCrunch
upi-scraperUPI
yahoo-finance-scraperYahoo Finance
smart-news-loaderAny URL - adaptive HTTP loader
bloomberg-scraperBloomberg

All actors support mode: "latest" for fetching newest article URLs from each source.