Head News AI avatar

Head News AI

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Head News AI

Head News AI

Scrape 500+ news & financial portals in 106 countries. 79 financial sources (Bloomberg, CNBC, CoinDesk). AI topic classification. Stock market, crypto, corporate intelligence.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Rodrigo Pacelli

Rodrigo Pacelli

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Scrape headlines from 500+ news portals across 106 countries in seconds. Extract titles, links, descriptions, and images. Optionally classify each article into customizable news topics using AI.

Includes 79 financial & business portals — Bloomberg, Yahoo Finance, CNBC, Forbes, Seeking Alpha, CoinDesk, and more — ideal for stock market monitoring, crypto tracking, and corporate intelligence.

Why use this Actor?

  • 500+ pre-configured news portals with manually audited CSS selectors
  • 106 countries covered: USA, Brazil, UK, India, Germany, France, Japan, Australia, Canada, Mexico, and many more
  • 79 financial portals: Bloomberg, Reuters, CNBC, WSJ, Forbes, Seeking Alpha, CoinDesk, Nikkei, and more
  • Ultra-fast: Uses Cheerio + Axios (pure HTTP, no browser) — ~2-3 seconds per portal vs ~30s with Playwright
  • Parallel scraping: 5 URLs at a time for faster batch processing
  • Optional AI classification with customizable topics and model selection via Groq (free API key)
  • Smart cache: Articles already classified are cached between runs — no duplicate AI calls, saving time and tokens
  • Zero configuration: Works out-of-the-box with any supported portal
  • No URL formatting required: You can enter edition.cnn.com or https://edition.cnn.com — both work

How it works

  1. You provide a list of news portal URLs
  2. The Actor fetches each page via HTTP and parses it with Cheerio
  3. Pre-configured CSS selectors extract all headlines, links, descriptions, and images from the page
  4. (Optional) Each new article is classified into topics using Groq AI — previously classified articles are loaded from cache
  5. Results are saved to the Apify dataset as structured JSON

Important: This Actor scrapes only the exact URL you provide. It does not crawl or follow links to subpages. For example, if you pass https://edition.cnn.com, it will extract only the headlines visible on CNN's homepage — it will not navigate into https://edition.cnn.com/sport or any other section automatically. To scrape specific sections, add each section URL separately in the urls list.

Input parameters

ParameterTypeDefaultDescription
urlsstring[]requiredList of news portal URLs to scrape. You can enter with or without https://
includeImagesbooleantrueExtract article thumbnail images
classifyWithAIbooleanfalseEnable AI topic classification (requires Groq API key)
groqApiKeystringYour Groq API key (free at console.groq.com)
groqModelstringllama-3.3-70b-versatileGroq AI model — select from the dropdown
topicsstring[](25 default topics)Custom list of topics for classification. If empty, uses the default 25 topics
debugbooleanfalseEnable verbose logging

Output format

Each article is saved as an individual item in the dataset:

{
"title": "Breaking: Major policy change announced",
"url": "https://example.com/article/123",
"description": "Government officials revealed new regulations...",
"urlToImage": "https://example.com/images/thumb.jpg",
"source": "example.com",
"publishedAt": "2025-01-15T10:30:00.000Z",
"scrapedAt": "2025-01-15T10:35:12.000Z",
"topics": ["politics", "economy"]
}

Article fields

FieldTypeDescription
titlestringArticle headline
urlstringFull article link
descriptionstringArticle summary/subtitle
urlToImagestringThumbnail image URL (if available)
sourcestringSource domain
publishedAtstringPublication timestamp (ISO 8601)
scrapedAtstringScraping timestamp (ISO 8601)
topicsstring[]AI-assigned topics (only if classification is enabled)

Usage examples

Basic: Extract headlines from the homepage of top news sites

{
"urls": [
"https://edition.cnn.com",
"https://www.bbc.com",
"https://www.reuters.com"
]
}

Scrape specific sections of a news site

The Actor only scrapes the exact URL you provide. To cover multiple sections, add each one separately:

{
"urls": [
"https://edition.cnn.com",
"https://edition.cnn.com/sport",
"https://edition.cnn.com/business",
"https://edition.cnn.com/entertainment",
"https://edition.cnn.com/health"
]
}

With AI topic classification (default 25 topics)

{
"urls": [
"https://edition.cnn.com",
"https://www.nytimes.com",
"https://www.theguardian.com"
],
"classifyWithAI": true,
"groqApiKey": "gsk_your_groq_api_key_here"
}

With custom topics and model

You can define your own topics and choose a different Groq model from the dropdown:

{
"urls": [
"https://edition.cnn.com",
"https://www.bbc.com"
],
"classifyWithAI": true,
"groqApiKey": "gsk_your_groq_api_key_here",
"groqModel": "qwen/qwen3-32b",
"topics": ["finance", "geopolitics", "climate", "ai_tech", "healthcare", "conflict"]
}

Scrape Brazilian news portals

{
"urls": [
"https://www.globo.com",
"https://www.uol.com.br",
"https://www.folha.uol.com.br",
"https://www.estadao.com.br"
]
}

Scrape with images disabled (faster)

{
"urls": [
"https://www.aljazeera.com",
"https://www.dw.com"
],
"includeImages": false
}

Smart deduplication and classification cache

The Actor has built-in deduplication at two levels to avoid wasting AI tokens:

1. URL deduplication (within a run)

If you scrape multiple URLs from the same portal (e.g., CNN homepage + CNN/sport), articles that appear on both pages are automatically deduplicated by URL. Each article is extracted only once, so the AI never classifies the same article twice in a single run.

2. Classification cache (between runs)

When AI classification is enabled, the Actor caches classified articles between runs using a persistent Key-Value Store. This means:

  • 1st run: Scrapes 77 articles from CNN, classifies all 77 with AI → ~58 seconds
  • 2nd run (10 minutes later): Scrapes CNN again, finds 77 articles but 74 are already in cache → classifies only the 3 new ones → ~3 seconds
  • 3rd run (1 hour later): Same portal, only brand-new headlines go to the AI

This saves time, money, and Groq API tokens — only truly new articles are sent to the AI. The cache is automatically cleaned after 7 days.

Example log from a cached run:

🤖 Classifying 3 new articles with AI (74 already cached)...
✅ Classification complete (3 new + 74 cached)
✅ Done! 77 articles in 3.1s

Use cases

Stock market & financial intelligence

Monitor 79+ financial portals (Bloomberg, CNBC, WSJ, Seeking Alpha, Yahoo Finance) for breaking market news. Use custom topics like ["earnings", "ipo", "merger", "sec_filing", "downgrade", "upgrade"] to track events that move stock prices. Set up recurring runs every 15 minutes to catch market-moving headlines before they spread.

{
"urls": [
"bloomberg.com", "cnbc.com", "wsj.com", "seekingalpha.com",
"benzinga.com", "marketwatch.com", "barrons.com", "finance.yahoo.com"
],
"classifyWithAI": true,
"groqApiKey": "gsk_your_key",
"topics": ["earnings", "ipo", "merger", "acquisition", "sec_filing", "bankruptcy", "upgrade", "downgrade", "dividend"]
}

Cryptocurrency & blockchain tracking

Track crypto news from CoinDesk, Cointelegraph, The Block, Decrypt, Blockworks, and Bitcoin Magazine. Classify by topic to filter signal from noise.

{
"urls": [
"coindesk.com", "cointelegraph.com", "theblock.co",
"decrypt.co", "blockworks.co", "bitcoinmagazine.com", "cryptoslate.com"
],
"classifyWithAI": true,
"groqApiKey": "gsk_your_key",
"topics": ["bitcoin", "ethereum", "defi", "regulation", "hack", "exchange", "nft", "stablecoin"]
}

Commodities & energy monitoring

Track oil, gas, and commodity news from specialized portals.

{
"urls": [
"oilprice.com", "rigzone.com", "bloomberg.com",
"reuters.com", "ft.com", "tradingeconomics.com"
],
"classifyWithAI": true,
"groqApiKey": "gsk_your_key",
"topics": ["oil", "gas", "gold", "commodities", "opec", "sanctions", "supply_chain"]
}

Media monitoring

Track headlines from specific portals on a schedule. Set up a recurring run to scrape your target portals every 15 minutes and get a structured feed of new headlines with AI-classified topics.

Corporate intelligence & M&A

Monitor press releases from Business Wire, PR Newswire, and GlobeNewsWire alongside financial news to catch M&A announcements, earnings surprises, and SEC filings before they become mainstream news.

Academic research

Collect and classify news coverage across regions for media studies, political science, or communication research. Compare how different countries cover the same global events.

Content curation

Build automated news digests or newsletters. Scrape headlines from 10+ portals, classify by topic, and feed the structured output into your content pipeline or CMS.

Competitive analysis

Track news mentions across portals in your industry. Combine with custom topics relevant to your market to identify trends, competitor mentions, and emerging narratives.

Multi-language news aggregation

With 106 countries covered, aggregate news in English, Portuguese, Spanish, French, German, Arabic, Japanese, and many more languages from a single Actor.

AI classification topics

When classifyWithAI is enabled, articles are classified into up to 25 default topics (or your custom topics):

Hard news: politics, economy, crime, war_conflict, terrorism, natural_disaster, accident, health, science

Soft news: sports, entertainment, celebrity, technology, business, education, environment, religion, lifestyle, travel, food, real_estate, automotive, opinion, weather

Meta: breaking_news

Each article receives 1-3 of the most relevant topics. You can replace these with your own custom topics.

Available Groq models

Select from the dropdown — no need to type model names:

ModelBest for
Llama 3.3 70B Versatile (default)Best accuracy, recommended
Llama 4 Scout 17BGood balance of speed and accuracy
Qwen 3 32BStrong multilingual support
Llama 3.1 8B InstantFastest, lower accuracy
Kimi K2 InstructAlternative general-purpose
GPT-OSS 120BLarge model (may return empty responses)
GPT-OSS 20BSmaller GPT-OSS variant
Allam 2 7BArabic-optimized

Getting a Groq API key

  1. Go to console.groq.com
  2. Create a free account
  3. Generate an API key
  4. Paste it in the groqApiKey input field

The free tier allows 30 requests/minute, which is sufficient for most scraping runs. The Actor automatically rate-limits AI requests to stay within this limit.

Financial & business portals

79 specialized financial portals for market intelligence:

CategoryPortals
Wire Servicesbloomberg.com, businesswire.com, prnewswire.com, globenewswire.com
USA Financefinance.yahoo.com, barrons.com, seekingalpha.com, benzinga.com, fool.com, zacks.com, thestreet.com, fortune.com, forbes.com, businessinsider.com, investopedia.com, morningstar.com, tipranks.com, kiplinger.com, investors.com
Europe Financeeconomist.com, handelsblatt.com, lesechos.fr, latribune.fr, ilsole24ore.com, cincodias.elpais.com, expansion.com, borsen.dk, di.se, kauppalehti.fi, cityam.com, fd.nl
Asia Financeasia.nikkei.com, caixinglobal.com, economictimes.indiatimes.com, livemint.com, business-standard.com, moneycontrol.com, theedgemarkets.com, koreaherald.com, financeasia.com, afr.com
LatAm Financevalor.globo.com, infomoney.com.br, exame.com, bloomberglinea.com, elfinanciero.com.mx, eleconomista.com.mx, df.cl, portafolio.co, mercopress.com
MENA & Africaarabianbusiness.com, gulfbusiness.com, zawya.com, menafn.com, businessday.ng, businessdailyafrica.com, kenyanwallstreet.com
Crypto / Fintechcoindesk.com, theblock.co, cointelegraph.com, decrypt.co, cryptoslate.com, blockworks.co, bitcoinmagazine.com
Sector-Specifictechcrunch.com, oilprice.com, rigzone.com, globest.com, insurancejournal.com, americanbanker.com, fiercepharma.com, freightwaves.com
Exchanges & Datanasdaq.com, nyse.com, investing.com, tradingview.com, tradingeconomics.com, fxstreet.com, stockanalysis.com

Supported countries and portals

The Actor supports 500+ news portals across 106 countries:

CountryPortals
Andorraaltaveu.com, diariandorra.ad, elperiodic.ad
Antigua and Barbudaantiguaobserver.com, antiguanewsroom.com, antigua.news
Argentinainfobae.com, lanacion.com.ar, clarin.com, pagina12.com.ar, ambito.com
Australiamiragenews.com, smh.com.au, theaustralian.com.au, news.com.au, abc.net.au, heraldsun.com.au
Austriaorf.at, krone.at, derstandard.at, heute.at
Bahamastribune242.com, thenassauguardian.com
Bahraingdnonline.com, akhbar-alkhaleej.com, newsofbahrain.com
Bangladeshprothomalo.com, thedailystar.net, bdnews24.com
Barbadosnationnews.com, barbadostoday.bb, cbc.bb
Belgiumhln.be, nieuwsblad.be, rtbf.be, vrt.be
Belizebreakingbelizenews.com, 7newsbelize.com, channel5belize.com
Botswanammegi.bw, dailynews.gov.bw, guardiansun.co.bw, sundaystandard.info
Brazilglobo.com, g1.globo.com, uol.com.br, r7.com, estadao.com.br, folha.uol.com.br
Bruneiborneobulletin.com.bn, brudirect.com, rtbnews.rtb.gov.bn
Canadacbc.ca, ctvnews.ca, globalnews.ca, theglobeandmail.com
Cape Verdeasemana.cv, inforpress.cv, rtc.cv
Chilelacuarta.com, emol.com, biobiochile.cl, latercera.com
Colombiacanalrcn.com, eltiempo.com, semana.com, elespectador.com
Costa Ricateletica.com, crhoy.com, nacion.com
Croatiaindex.hr, jutarnji.hr, 24sata.hr, vecernji.hr
Cypruscyprus-mail.com, kathimerini.com.cy, philenews.com
Czech Republicnovinky.cz, idnes.cz, aktualne.cz, blesk.cz, seznamzpravy.cz
Denmarkekstrabladet.dk, bt.dk, dr.dk, politiken.dk, tv2.dk, berlingske.dk
Egyptyoum7.com, masrawy.com, ahram.org.eg
Estoniadelfi.ee, postimees.ee, err.ee
Fijifijivillage.com, fbcnews.com.fj, fijitimes.com.fj
Finlandis.fi, hs.fi, yle.fi, iltalehti.fi
Francelemonde.fr, lefigaro.fr, liberation.fr, 20minutes.fr, leparisien.fr, france24.com
Georgiainterpressnews.ge, ambebi.ge, rustavi2.ge, imedi.ge
Germanybild.de, spiegel.de, zeit.de, faz.net, sueddeutsche.de
Greeceprotothema.gr, kathimerini.gr, news247.gr, skai.gr
Grenadanowgrenada.com, thenewtodaygrenada.com, gbn.gd
Guammbjguam.com, pacificislandtimes.com
Hong Kongscmp.com
Hungarytelex.hu, index.hu, 24.hu, origo.hu
Icelandmbl.is, ruv.is, visir.is
Indiaindiatimes.com, aajtak.in, news18.com, hindustantimes.com, ndtv.com
Indonesiadetik.com, kompas.com, tribunnews.com, kumparan.com
Internationalaljazeera.com, dw.com, reuters.com
Irelandrte.ie, independent.ie, irishtimes.com, irishexaminer.com, thejournal.ie
Israelynet.co.il, walla.co.il, mako.co.il, maariv.co.il, haaretz.com, jpost.com, timesofisrael.com
Italyrepubblica.it, corriere.it, ansa.it, libero.it, fanpage.it
Jamaicajamaica-gleaner.com, jamaicaobserver.com, rjrnewsonline.com
Japanyomiuri.co.jp, asahi.com, mainichi.jp, nhk.or.jp
Kenyanation.africa, tuko.co.ke, the-star.co.ke, standardmedia.co.ke
Kiribatikiribatigovernmenttimes.com
Kuwaitkuwaittimes.com, arabtimesonline.com, timeskuwait.com
Latviadelfi.lv, tvnet.lv, apollo.lv
Liechtensteinvaterland.li, radio.li, 1fl.li
Lithuaniadelfi.lt, 15min.lt, lrt.lt, lrytas.lt
Luxembourgrtl.lu, wort.lu, lessentiel.lu
Malaysiaastroawani.com, malaysiakini.com, thestar.com.my, bharian.com.my, freemalaysiatoday.com, nst.com.my, malaymail.com
Maldivesmihaaru.com, en.sun.mv, edition.mv
Maltatimesofmalta.com, maltatoday.com.mt, maltadaily.mt
Marshall Islandsmarshallislandsjournal.com, mh.usembassy.gov
Mauritiuslexpress.mu, defimedia.info, lemauricien.com
Mexicoeluniversal.com.mx, reforma.com, excelsior.com.mx
Micronesiagov.fm, fsmned.fm, fsmis.fm
Monacomonacomatin.mc, lobservateurdemonaco.com, monacolife.net
Montenegrovijesti.me, dan.co.me, pobjeda.me
Namibianamibian.com.na, namibiansun.com, neweralive.na, nbc.na
Nauruewnews.com
Netherlandsnu.nl, ad.nl, telegraaf.nl, nos.nl
New Zealandstuff.co.nz, nzherald.co.nz, newshub.co.nz, rnz.co.nz, kanivatonga.co.nz
Nigeriapremiumtimesng.com, punchng.com, vanguardngr.com, thecable.ng
North Macedoniamkd.mk, novamakedonija.com.mk, makfax.com.mk
Norwayvg.no, nrk.no, dagbladet.no, aftenposten.no
Omantimesofoman.com, muscatdaily.com, omanobserver.om
Pakistandawn.com, tribune.com.pk, geo.tv, thenews.com.pk
Palauislandtimes.org, tiabelaunews.com
Panamatvn-2.com, telemetro.com, prensa.com, laestrella.com.pa
Perurpp.pe, elcomercio.pe, larepublica.pe, gestion.pe
Philippinesinquirer.net, gmanetwork.com, abs-cbn.com, rappler.com
Polandonet.pl, wp.pl, interia.pl, gazeta.pl
Portugalpublico.pt, cmjornal.pt, expresso.pt, observador.pt
Qatarthepeninsulaqatar.com, gulf-times.com, qatar-tribune.com
Romaniadigi24.ro, libertatea.ro, stirileprotv.ro, adevarul.ro
Russiatass.com, ria.ru, lenta.ru, interfax.ru, rt.com
Saint Luciastluciatimes.com, thevoiceslu.com, stluciastar.com
Saint Vincentiwnsvg.com, searchlight.vc, stvincenttimes.com
Samoasamoaobserver.ws, talamua.com, samoaglobalnews.com
San Marinosanmarinortv.sm, libertas.sm, giornalesm.com
Saudi Arabiaarabnews.com, sabq.org, okaz.com.sa, saudigazette.com.sa
Serbiablic.rs, kurir.rs, b92.net, novosti.rs
Seychellesnation.sc, todayinseychelles.com, sbc.sc
Singaporechannelnewsasia.com, straitstimes.com, mothership.sg, todayonline.com
Slovenia24ur.com, rtvslo.si, delo.si
Solomon Islandssolomontimes.com, solomonstarnews.com, tavulinews.com.sb
South Africanews24.com, iol.co.za, timeslive.co.za, dailymaverick.co.za, citizen.co.za
South Koreanews.nate.com, daum.net, news.naver.com, donga.com
Spainelpais.com, elmundo.es, 20minutos.es, abc.es, lavanguardia.com
Surinamestarnieuws.com, dwtonline.com, dbsuriname.com
Swedenaftonbladet.se, expressen.se, svt.se, dn.se
Switzerlandblick.ch, 20min.ch, srf.ch, nzz.ch, tagesanzeiger.ch
Taiwanchinatimes.com, ettoday.net, ltn.com.tw, udn.com
Thailandsanook.com, thairath.co.th, khaosod.co.th, matichon.co.th
Tongamatangitonga.to, talanoaotonga.to
Trinidad and Tobagotrinidadexpress.com, guardian.co.tt, newsday.co.tt
Turkeyhurriyet.com.tr, sozcu.com.tr, milliyet.com.tr, sabah.com.tr, haberturk.com
UAEemirates247.com, gulfnews.com, khaleejtimes.com, thenationalnews.com
UKbbc.com, dailymail.co.uk, theguardian.com, independent.co.uk, ft.com
Ukrainepravda.com.ua, liga.net, ukrinform.net
Uruguayelpais.com.uy, elobservador.com.uy, ladiaria.com.uy, subrayado.com.uy
USAnytimes.com, edition.cnn.com, news.yahoo.com, foxnews.com, nbcnews.com, usatoday.com, washingtonpost.com, apnews.com, wsj.com, msnbc.com, cnbc.com, huffpost.com, cbsnews.com, abcnews.go.com, newsweek.com, latimes.com, marketwatch.com, time.com, npr.org, politico.com, theatlantic.com, thehill.com, vox.com, chicagotribune.com, axios.com, bostonglobe.com, slate.com
Vanuatudailypost.vu, vbr.vu, buzzfm.vu
Vietnamvnexpress.net, tuoitre.vn, thanhnien.vn, vietnamnet.vn

Performance

MetricValue
Speed per portal~2-3 seconds
Memory usage~128-256 MB
Typical run (10 portals, no AI)~30 seconds
Typical run (10 portals, with AI)~5 minutes (1st run), ~30s (cached)
Articles per portal20-100+ depending on site

Technology

  • Runtime: Node.js 20
  • Scraping: Cheerio + Axios (pure HTTP, no browser)
  • AI: Groq API (default model: Llama 3.3 70B Versatile)
  • Cache: Apify Named Key-Value Store (persists between runs, 7-day TTL)
  • Platform: Apify (serverless)

Limitations

  • Only works with the 424 pre-configured portals (sites not in the database will return 0 articles)
  • Some paywalled sites may return limited results
  • AI classification requires a Groq API key (free tier available)
  • Rate-limited to respect source websites

Support

If you have questions or need help, open an issue on GitHub or contact us through the Apify platform.