Sydney Morning Herald Article Scraper avatar

Sydney Morning Herald Article Scraper

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Sydney Morning Herald Article Scraper

Sydney Morning Herald Article Scraper

Extract article metadata and visible intro content from smh.com.au. Full articles require a Nine subscription. No browser needed - HTTP-only.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Xtractoo

Xtractoo

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Extract article metadata, author, and available text from any smh.com.au article URL. The Sydney Morning Herald is Australia's oldest continually published newspaper, covering national politics, business, culture, and international affairs.

Why Use This Actor?

  • Australian news monitoring - track Australia's most prominent broadsheet.
  • Asia-Pacific perspective - strong coverage of Australia, New Zealand, and Pacific region news.
  • Metadata and headline extraction - author, title, date, and intro paragraphs always available even under the paywall.

How It Works

This actor uses only HTTP requests - no browser, no Selenium, no Playwright. Articles are extracted in seconds with RAM usage well under 256 MB.

Note: SMH uses a Piano paywall (Nine platform). Only the intro paragraphs (typically 2–4) are freely visible. Author, title, description, and publication date are always extracted from structured metadata.

Input

{
"url": "https://www.smh.com.au/national/nsw/principal-teachers-accused-of-assaulting-multiple-school-boys-20260421-p5zprg.html",
"urls": [
"https://www.smh.com.au/national/article-one.html",
"https://www.smh.com.au/business/article-two.html"
],
"mode": "article",
"limit": 10
}

Output

{
"url": "https://www.smh.com.au/goodfood/perth-eating-out/the-sandwich-and-90s-westraliana-rule-this-tiny-tuart-hill-bakery-20260515-p5zxln.html?ref=rss&utm_medium=rss&utm_source=rss_feed",
"source": "Sydney Morning Herald",
"title": "The sandwich and 90s Westraliana rule this tiny Tuart Hill bakery",
"description": "Come for nostalgic house-baked treats: stay for the pitch-perfect homage to the
suburban lunch bar and corner deli.",
"content": "Strollio’s Luncheonette The life of a sourdough starter is a precarious one. One moment your owner is feeding you daily and (over)sharing stories of you online. The next you’re a mystery jar in the back of the fridge. Of all the calamities the sourdough community has faced, few have been asdisastrous as the Great Perth Starter Dry-up of 2020: an extinction event triggered by the easing of COVID restrictions and the reopening of restaurants, cafés and bars almost exactly six years ago to the day....",
"image": "https://static.ffx.io/images/$zoom_0.1175%2C$multiply_0.7025%2C$ratio_1.777778%2C$width_1059%2C$x_0%2C$y_141/t_crop_custom/q_86%2Cf_auto/28f25ee04bb674768533fe6bc1d74827bc176fe15004614999326c95ba877a0e",
"language": "en_AU",
"word_count": 664,
"published_date": "2026-05-15T17:00:00Z",
"modified_date": "2026-05-15T17:00:00Z",
"authors": [
"Max Veenhuyzen"
],
"categories": "",
"tags": ""
}

Fetch Latest News

Set mode to "latest" to fetch the newest article URLs and titles from SMH instead of extracting a single article.

Input:

{
"mode": "latest",
"limit": 10
}

Output - array of objects:

[
{
"url": "https://www.smh.com.au/business/companies/warner-bros-shareholders-approve-us110-billion-paramount-mega-merger-20260424-p5zqnk.html",
"title": "Warner Bros shareholders approve $US110 billion Paramount mega-merger",
"published_date": "Fri, 24 Apr 2026 01:18:46 +1000",
"source": "Sydney Morning Herald"
}
]

Source: https://www.smh.com.au/rss/feed.xml (RSS feed)

Cron Schedule: Auto-Fetch Newest Articles

Combine mode: "latest" and mode: "article" to keep a fresh feed running on autopilot:

  1. Schedule a recurring run of this Actor with {"mode": "latest", "limit": 20} via Apify Schedules (UI ▸ Schedules ▸ Create new). A cron expression like */30 * * * * runs it every 30 minutes.
  2. Webhook the dataset of the latest run into another Actor run with mode: "article" and the new URLs as input — Apify integrations let you chain runs via the "Actor finished" webhook without any glue code.
  3. The article-mode run extracts the full body, image, authors, and metadata for each URL and appends to your master dataset.

Common cron expressions:

FrequencyCron
Every 15 minutes*/15 * * * *
Hourly0 * * * *
Every 6 hours0 */6 * * *
Daily at 06:00 UTC0 6 * * *

Other News Actors

Need a different news source? All actors in this collection:

ActorSource
aljazeera-scraperAl Jazeera
apnews-scraperAP News
bbc-scraperBBC News
bisnis-scraperBisnis Indonesia
cnbc-scraperCNBC
dataindonesia-scraperData Indonesia
forbes-scraperForbes
fortune-scraperFortune
ft-scraperFinancial Times
guardian-scraperThe Guardian
investors-scraperInvestor's Business Daily
msn-scraperMSN News
nytimes-scraperNew York Times
reuters-scraperReuters
scmp-scraperSouth China Morning Post
smh-scraperSydney Morning Herald
straitstimes-scraperThe Straits Times
techcrunch-scraperTechCrunch
thestar-scraperThe Star (Malaysia)
upi-scraperUPI
yahoo-finance-scraperYahoo Finance