Aljazeera Scraper avatar

Aljazeera Scraper

Pricing

from $1.00 / 1,000 listing results

Go to Apify Store
Aljazeera Scraper

Aljazeera Scraper

Scrapes news articles from Al Jazeera, extracting titles, excerpts, dates, authors, tags, geographic regions, full body text, image credits, and audio URLs from section pages or direct article URLs — ideal for media monitoring, news aggregation, and content analysis.

Pricing

from $1.00 / 1,000 listing results

Rating

0.0

(0)

Developer

FalconScrape

FalconScrape

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Al Jazeera Scraper

Extract news articles from Al Jazeera with structured data. Get article metadata from section pages, or full enriched content including body text, tags, geographic regions, and related articles.

Features

  • Two output modes: Lean listings (metadata only, ultra-fast) or enriched articles (full body text, tags, regions, related articles)
  • Section page scraping: Provide any Al Jazeera section URL to get 20-60 articles per page
  • Direct article URLs: Skip discovery and scrape specific articles directly
  • Rich structured data: Authors, tags, categories, geographic regions, audio URLs, image credits
  • No browser needed: Pure HTTP/Cheerio-based, making it fast and cost-efficient

How it works

  1. Provide section URLs (e.g. https://www.aljazeera.com/news/) and/or direct article URLs
  2. The scraper extracts article data from Al Jazeera's server-rendered Apollo GraphQL cache
  3. With Enrich articles enabled, each article page is visited for full body text, tags, and regions
  4. With Enrich articles disabled, only metadata from the section page is returned (no extra requests)

Input

FieldTypeDefaultDescription
sectionUrlsstring[]Al Jazeera section URLs to scrape (e.g. /middle-east/, /news/, /opinion/)
articleUrlsstring[]Specific article URLs to scrape directly
enrichArticlesbooleantrueVisit each article page for full body text, tags, and regions
maxArticlesinteger100Maximum total articles to process
proxyConfigurationobjectNo proxyOptional proxy settings (datacenter proxies are sufficient)

Sample output

Enriched article (enrichArticles: true)

{
"type": "article",
"url": "https://www.aljazeera.com/news/2026/3/7/hezbollah-israeli-troops-clash-on-the-ground",
"title": "Hezbollah, Israeli troops clash on the ground in eastern Lebanon's Bekaa",
"excerpt": "Lebanese Health Ministry says at least 16 people killed, 35 wounded...",
"subheading": "Lebanese Health Ministry says at least 16 people killed...",
"date": "2026-03-07T07:35:45",
"modifiedDate": "2026-03-07T07:51:36",
"articleType": "post",
"author": [
{
"name": "Al Jazeera Staff",
"url": "https://www.aljazeera.com/author/al_jazeera_staff",
"avatarUrl": "https://www.aljazeera.com/wp-content/uploads/2024/05/...",
"jobTitle": ""
}
],
"source": ["Al Jazeera"],
"tags": ["Israel attacks Lebanon"],
"categories": ["News"],
"where": ["Iran", "Israel", "Lebanon", "Middle East"],
"shortUrl": "https://aje.news/cd9pbr",
"featuredImageUrl": "https://www.aljazeera.com/wp-content/uploads/2026/03/...",
"featuredImageCaption": "Lebanese civil defence inspect the destruction...",
"featuredImageCredit": "AFP",
"bodyHtml": "<p>Clashes have erupted as Israeli forces attempted...</p>",
"bodyText": "Clashes have erupted as Israeli forces attempted...",
"wordCount": 1247,
"audioPlaybackUrl": "https://tts.aljazeera.net/aje/2026/03/...",
"audioDuration": "237.505306",
"relatedArticles": [
{
"title": "Why has Hezbollah joined Middle East war?",
"url": "https://www.aljazeera.com/video/inside-story/2026/3/5/..."
}
],
"videoId": "",
"isBreaking": false,
"isLive": false,
"isDeveloping": false
}

Lean listing (enrichArticles: false)

{
"type": "listing",
"url": "https://www.aljazeera.com/news/2026/3/7/hezbollah-israeli-troops-clash-on-the-ground",
"title": "Hezbollah, Israeli troops clash on the ground in eastern Lebanon's Bekaa",
"excerpt": "Lebanese Health Ministry says at least 16 people killed...",
"date": "2026-03-07T07:35:45",
"articleType": "post",
"author": ["Al Jazeera Staff"],
"source": ["Al Jazeera"],
"shortUrl": "https://aje.news/cd9pbr",
"featuredImageUrl": "https://www.aljazeera.com/wp-content/uploads/2026/03/...",
"audioPlaybackUrl": "https://tts.aljazeera.net/aje/2026/03/...",
"discoverySource": "section",
"sectionUrl": "https://www.aljazeera.com/middle-east/"
}

Available sections

Use any of these as sectionUrls:

  • https://www.aljazeera.com/news/
  • https://www.aljazeera.com/middle-east/
  • https://www.aljazeera.com/opinion/
  • https://www.aljazeera.com/sport/
  • https://www.aljazeera.com/economy/
  • https://www.aljazeera.com/science-and-technology/
  • https://www.aljazeera.com/features/
  • https://www.aljazeera.com/africa/
  • https://www.aljazeera.com/explained/

Pricing

This Actor uses a Pay Per Event pricing model.

EventPricePer 1,000What you get
Lean listing$0.001$1.00/1KTitle, excerpt, URL, date, author, type, image, short URL
Enriched article$0.004$4.00/1KFull body text, tags, categories, geographic regions, related articles, image credits, audio URL
Actor start$0.00005/GBOne-time charge per run based on memory allocation

Cost examples

ScenarioRequestsEst. cost
1 section, lean only1~$0.03
1 section, enriched~30~$0.12
3 sections, enriched~83~$0.32
10 direct articles, enriched10~$0.04