Pricing

from $29.62 / 1,000 results

The Guardian Article Search & Archive Scraper

Search The Guardian's full article archive (2.6M+ articles since 1999). Filter by query, section, tag, contributor, date, or production office. Returns headline, byline, body, tags, contributors, and publication metadata.

Pricing

from $29.62 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

8 days ago

Last modified

📰 The Guardian Article Search Scraper

🚀 Search 2.6 million Guardian articles in seconds. Headlines, bylines, full body text, tags, contributors, star ratings, and section metadata across the complete archive since 1999. No sign-up, no manual scraping.

The Guardian Article Search Scraper exports articles from The Guardian and returns 30 fields per record, including headline, byline, full body text and HTML, contributors, tags, section metadata, star ratings for reviews, and image gallery URLs. The Guardian archive is one of the most-cited English-language news corpora in academic research, NLP training, and media-trends analysis.

The catalogue covers 2.6 million-plus articles across 32 sections, including World, UK, US, Australia, Politics, Business, Technology, Science, Environment, Sport, Culture, and Opinion, with full archive coverage from 1999 onward. This Actor makes the corpus searchable as CSV, Excel, JSON, or XML in under a minute. Filtering by section, tag, contributor, date, language, production office, and minimum star rating runs server-side.

🎯 Target Audience	💡 Primary Use Cases
Media-monitoring teams, NLP researchers, journalism students, data scientists, content strategists, OSINT analysts, librarians	Brand mentions tracking, sentiment & topic models, journalism research, media-bias studies, archival queries, training corpora for LLMs

📋 What the Guardian Article Search Scraper does

Six powerful filters in a single run:

🔍 Free-text search. Operators include AND, OR, NOT, and quoted phrases.
📂 Section filter. Pick one of 32 sections or search every section.
🏷️ Tag filter. Combine multiple Guardian tags (e.g. environment/climate-change, football/premierleague).
📅 Date range. Restrict by fromDate and toDate.
🌍 Production office. Filter by UK, US, Australia, or international edition.
⭐ Minimum star rating. Pull only 4-star and above film, TV, music, or restaurant reviews.

Each record includes the article ID, section, pillar, byline, contributors, full body text and HTML, image gallery URLs, word count, star rating (where applicable), and live-blog status.

💡 Why it matters: The Guardian is one of the most influential English-language newsrooms. Its archive is cited in NLP papers, media-bias studies, and journalism education. Building your own pipeline means parsing the article search response and reconstructing tag taxonomies. This Actor skips all of that.

📊 Data fields

Each record includes: blocks, bodyHtml, bodyText, byline, contributors, firstPublicationDate, headline, id, imageGallery, imageUrl, keywords, language, lastModified, liveBloggingNow, newspaperTags, pillarId, pillarName, productionOffice, publication, references, rights, sectionId, sectionName, series, snapshotTime, standfirst, starRating, tones, trailText, type, webPublicationDate, webTitle, webUrl, wordCount. These field names come straight from the actor's dataset schema, so what you see here is what lands in your dataset.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the Guardian Article Search Scraper page on the Apify Store.
🎯 Set input. Enter a search query, optionally pick a section, tag, date range, and star rating.
🚀 Run it. Click Start and let the Actor pull matching articles.
📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded archive: 3-5 minutes. No coding required.

🔗 Recommended Actors

🚦 TfL London Live Status Scraper - Live London transport status and disruptions
🌍 Carbon Intensity UK Scraper - UK grid carbon intensity in gCO2/kWh
🇬🇧 Hansard UK Debates Scraper - Search the UK Parliament debate record
📰 BBC News Search Scraper - Search the BBC news archive
📊 Federal Reserve H.15 Rates Scraper - U.S. Treasury yield-curve history

💡 Pro Tip: browse the complete ParseForge collection for more news and reference-data scrapers.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Guardian News & Media or any of its affiliates. All trademarks mentioned are the property of their respective owners. Only publicly available content is collected.

🆘 Need Help?

If you hit a bug, have questions about setup, or need a scraper we haven't built yet, open our contact form or write to parseforge@protonmail.com. We also take on paid custom data projects.

For faster answers, join our Discord. It's the best place to get support and suggest new actors.

The Guardian Article Extractor

rambunctious_fingerprint/guardian-news-scraper

Casey Marsh

Guardian Scraper

chimerical_quicklime/guardian-scraper

Scrape The Guardian articles via the open Content API: title, section, byline, publication date, trail text, thumbnail, and URL. Filter by query or section. Built for news monitoring and media datasets.

Khrystyna Skotte

Guardian News Scraper

xtracto/guardian-scraper

Scrape full The Guardian articles with headline, body, authors, section, and tags. Supports `mode: latest` to get newest news via Guardian world RSS. HTTP-only.

Farhan Febrian Nauval

Guardian Singapore Reviews Scraper

hello.datawizards/Guardian-Singapore-Scraper

The Guardian Singapore Reviews Scraper extracts real customer reviews, ratings, and product insights from Guardian Singapore product pages in structured JSON. Ideal for market research, brand analysis, and consumer sentiment tracking with fast, accurate, and proxy-supported scraping.

datawizards

News Archive Scraper

quarterly_jingo/news-archive-scraper

Petey Boy

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

Maged

News Article Scraper for Feeding LLM

proscraper/newsarticlescraper

Scrape news articles metadata to feed into LLM models. Returns article body, published date, article title, author etc.

Owais Nazir

183

Internet Archive Metadata Scraper — Bulk archive.org Export

logiover/internet-archive-metadata-scraper

Bulk-export item metadata from the Internet Archive (archive.org) by full-text query, collection, media type, creator, subject and date range. Extract identifier, title, creator, date, downloads, format, subject and more. Millions of items. No API key, no login.

Logiover

Internet Archive Items Scraper - archive.org Search by Query

gio21/archive-org-items-scraper

Search Internet Archive (archive.org) items: books, movies, audio, software, images, web archives, data. Returns title, creator, date, description, downloads, identifier, URLs. Free, no key. For research, content discovery, digital preservation.