Head News AI
Pricing
from $2.00 / 1,000 results
Head News AI
Scrape 500+ news & financial portals in 106 countries. 79 financial sources (Bloomberg, CNBC, CoinDesk). AI topic classification. Stock market, crypto, corporate intelligence.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer

Rodrigo Pacelli
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Share
Scrape headlines from 500+ news portals across 106 countries in seconds. Extract titles, links, descriptions, and images. Optionally classify each article into customizable news topics using AI.
Includes 79 financial & business portals — Bloomberg, Yahoo Finance, CNBC, Forbes, Seeking Alpha, CoinDesk, and more — ideal for stock market monitoring, crypto tracking, and corporate intelligence.
Why use this Actor?
- 500+ pre-configured news portals with manually audited CSS selectors
- 106 countries covered: USA, Brazil, UK, India, Germany, France, Japan, Australia, Canada, Mexico, and many more
- 79 financial portals: Bloomberg, Reuters, CNBC, WSJ, Forbes, Seeking Alpha, CoinDesk, Nikkei, and more
- Ultra-fast: Uses Cheerio + Axios (pure HTTP, no browser) — ~2-3 seconds per portal vs ~30s with Playwright
- Parallel scraping: 5 URLs at a time for faster batch processing
- Optional AI classification with customizable topics and model selection via Groq (free API key)
- Smart cache: Articles already classified are cached between runs — no duplicate AI calls, saving time and tokens
- Zero configuration: Works out-of-the-box with any supported portal
- No URL formatting required: You can enter
edition.cnn.comorhttps://edition.cnn.com— both work
How it works
- You provide a list of news portal URLs
- The Actor fetches each page via HTTP and parses it with Cheerio
- Pre-configured CSS selectors extract all headlines, links, descriptions, and images from the page
- (Optional) Each new article is classified into topics using Groq AI — previously classified articles are loaded from cache
- Results are saved to the Apify dataset as structured JSON
Important: This Actor scrapes only the exact URL you provide. It does not crawl or follow links to subpages. For example, if you pass
https://edition.cnn.com, it will extract only the headlines visible on CNN's homepage — it will not navigate intohttps://edition.cnn.com/sportor any other section automatically. To scrape specific sections, add each section URL separately in theurlslist.
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
urls | string[] | required | List of news portal URLs to scrape. You can enter with or without https:// |
includeImages | boolean | true | Extract article thumbnail images |
classifyWithAI | boolean | false | Enable AI topic classification (requires Groq API key) |
groqApiKey | string | — | Your Groq API key (free at console.groq.com) |
groqModel | string | llama-3.3-70b-versatile | Groq AI model — select from the dropdown |
topics | string[] | (25 default topics) | Custom list of topics for classification. If empty, uses the default 25 topics |
debug | boolean | false | Enable verbose logging |
Output format
Each article is saved as an individual item in the dataset:
{"title": "Breaking: Major policy change announced","url": "https://example.com/article/123","description": "Government officials revealed new regulations...","urlToImage": "https://example.com/images/thumb.jpg","source": "example.com","publishedAt": "2025-01-15T10:30:00.000Z","scrapedAt": "2025-01-15T10:35:12.000Z","topics": ["politics", "economy"]}
Article fields
| Field | Type | Description |
|---|---|---|
title | string | Article headline |
url | string | Full article link |
description | string | Article summary/subtitle |
urlToImage | string | Thumbnail image URL (if available) |
source | string | Source domain |
publishedAt | string | Publication timestamp (ISO 8601) |
scrapedAt | string | Scraping timestamp (ISO 8601) |
topics | string[] | AI-assigned topics (only if classification is enabled) |
Usage examples
Basic: Extract headlines from the homepage of top news sites
{"urls": ["https://edition.cnn.com","https://www.bbc.com","https://www.reuters.com"]}
Scrape specific sections of a news site
The Actor only scrapes the exact URL you provide. To cover multiple sections, add each one separately:
{"urls": ["https://edition.cnn.com","https://edition.cnn.com/sport","https://edition.cnn.com/business","https://edition.cnn.com/entertainment","https://edition.cnn.com/health"]}
With AI topic classification (default 25 topics)
{"urls": ["https://edition.cnn.com","https://www.nytimes.com","https://www.theguardian.com"],"classifyWithAI": true,"groqApiKey": "gsk_your_groq_api_key_here"}
With custom topics and model
You can define your own topics and choose a different Groq model from the dropdown:
{"urls": ["https://edition.cnn.com","https://www.bbc.com"],"classifyWithAI": true,"groqApiKey": "gsk_your_groq_api_key_here","groqModel": "qwen/qwen3-32b","topics": ["finance", "geopolitics", "climate", "ai_tech", "healthcare", "conflict"]}
Scrape Brazilian news portals
{"urls": ["https://www.globo.com","https://www.uol.com.br","https://www.folha.uol.com.br","https://www.estadao.com.br"]}
Scrape with images disabled (faster)
{"urls": ["https://www.aljazeera.com","https://www.dw.com"],"includeImages": false}
Smart deduplication and classification cache
The Actor has built-in deduplication at two levels to avoid wasting AI tokens:
1. URL deduplication (within a run)
If you scrape multiple URLs from the same portal (e.g., CNN homepage + CNN/sport), articles that appear on both pages are automatically deduplicated by URL. Each article is extracted only once, so the AI never classifies the same article twice in a single run.
2. Classification cache (between runs)
When AI classification is enabled, the Actor caches classified articles between runs using a persistent Key-Value Store. This means:
- 1st run: Scrapes 77 articles from CNN, classifies all 77 with AI → ~58 seconds
- 2nd run (10 minutes later): Scrapes CNN again, finds 77 articles but 74 are already in cache → classifies only the 3 new ones → ~3 seconds
- 3rd run (1 hour later): Same portal, only brand-new headlines go to the AI
This saves time, money, and Groq API tokens — only truly new articles are sent to the AI. The cache is automatically cleaned after 7 days.
Example log from a cached run:
🤖 Classifying 3 new articles with AI (74 already cached)...✅ Classification complete (3 new + 74 cached)✅ Done! 77 articles in 3.1s
Use cases
Stock market & financial intelligence
Monitor 79+ financial portals (Bloomberg, CNBC, WSJ, Seeking Alpha, Yahoo Finance) for breaking market news. Use custom topics like ["earnings", "ipo", "merger", "sec_filing", "downgrade", "upgrade"] to track events that move stock prices. Set up recurring runs every 15 minutes to catch market-moving headlines before they spread.
{"urls": ["bloomberg.com", "cnbc.com", "wsj.com", "seekingalpha.com","benzinga.com", "marketwatch.com", "barrons.com", "finance.yahoo.com"],"classifyWithAI": true,"groqApiKey": "gsk_your_key","topics": ["earnings", "ipo", "merger", "acquisition", "sec_filing", "bankruptcy", "upgrade", "downgrade", "dividend"]}
Cryptocurrency & blockchain tracking
Track crypto news from CoinDesk, Cointelegraph, The Block, Decrypt, Blockworks, and Bitcoin Magazine. Classify by topic to filter signal from noise.
{"urls": ["coindesk.com", "cointelegraph.com", "theblock.co","decrypt.co", "blockworks.co", "bitcoinmagazine.com", "cryptoslate.com"],"classifyWithAI": true,"groqApiKey": "gsk_your_key","topics": ["bitcoin", "ethereum", "defi", "regulation", "hack", "exchange", "nft", "stablecoin"]}
Commodities & energy monitoring
Track oil, gas, and commodity news from specialized portals.
{"urls": ["oilprice.com", "rigzone.com", "bloomberg.com","reuters.com", "ft.com", "tradingeconomics.com"],"classifyWithAI": true,"groqApiKey": "gsk_your_key","topics": ["oil", "gas", "gold", "commodities", "opec", "sanctions", "supply_chain"]}
Media monitoring
Track headlines from specific portals on a schedule. Set up a recurring run to scrape your target portals every 15 minutes and get a structured feed of new headlines with AI-classified topics.
Corporate intelligence & M&A
Monitor press releases from Business Wire, PR Newswire, and GlobeNewsWire alongside financial news to catch M&A announcements, earnings surprises, and SEC filings before they become mainstream news.
Academic research
Collect and classify news coverage across regions for media studies, political science, or communication research. Compare how different countries cover the same global events.
Content curation
Build automated news digests or newsletters. Scrape headlines from 10+ portals, classify by topic, and feed the structured output into your content pipeline or CMS.
Competitive analysis
Track news mentions across portals in your industry. Combine with custom topics relevant to your market to identify trends, competitor mentions, and emerging narratives.
Multi-language news aggregation
With 106 countries covered, aggregate news in English, Portuguese, Spanish, French, German, Arabic, Japanese, and many more languages from a single Actor.
AI classification topics
When classifyWithAI is enabled, articles are classified into up to 25 default topics (or your custom topics):
Hard news: politics, economy, crime, war_conflict, terrorism, natural_disaster, accident, health, science
Soft news: sports, entertainment, celebrity, technology, business, education, environment, religion, lifestyle, travel, food, real_estate, automotive, opinion, weather
Meta: breaking_news
Each article receives 1-3 of the most relevant topics. You can replace these with your own custom topics.
Available Groq models
Select from the dropdown — no need to type model names:
| Model | Best for |
|---|---|
| Llama 3.3 70B Versatile (default) | Best accuracy, recommended |
| Llama 4 Scout 17B | Good balance of speed and accuracy |
| Qwen 3 32B | Strong multilingual support |
| Llama 3.1 8B Instant | Fastest, lower accuracy |
| Kimi K2 Instruct | Alternative general-purpose |
| GPT-OSS 120B | Large model (may return empty responses) |
| GPT-OSS 20B | Smaller GPT-OSS variant |
| Allam 2 7B | Arabic-optimized |
Getting a Groq API key
- Go to console.groq.com
- Create a free account
- Generate an API key
- Paste it in the
groqApiKeyinput field
The free tier allows 30 requests/minute, which is sufficient for most scraping runs. The Actor automatically rate-limits AI requests to stay within this limit.
Financial & business portals
79 specialized financial portals for market intelligence:
| Category | Portals |
|---|---|
| Wire Services | bloomberg.com, businesswire.com, prnewswire.com, globenewswire.com |
| USA Finance | finance.yahoo.com, barrons.com, seekingalpha.com, benzinga.com, fool.com, zacks.com, thestreet.com, fortune.com, forbes.com, businessinsider.com, investopedia.com, morningstar.com, tipranks.com, kiplinger.com, investors.com |
| Europe Finance | economist.com, handelsblatt.com, lesechos.fr, latribune.fr, ilsole24ore.com, cincodias.elpais.com, expansion.com, borsen.dk, di.se, kauppalehti.fi, cityam.com, fd.nl |
| Asia Finance | asia.nikkei.com, caixinglobal.com, economictimes.indiatimes.com, livemint.com, business-standard.com, moneycontrol.com, theedgemarkets.com, koreaherald.com, financeasia.com, afr.com |
| LatAm Finance | valor.globo.com, infomoney.com.br, exame.com, bloomberglinea.com, elfinanciero.com.mx, eleconomista.com.mx, df.cl, portafolio.co, mercopress.com |
| MENA & Africa | arabianbusiness.com, gulfbusiness.com, zawya.com, menafn.com, businessday.ng, businessdailyafrica.com, kenyanwallstreet.com |
| Crypto / Fintech | coindesk.com, theblock.co, cointelegraph.com, decrypt.co, cryptoslate.com, blockworks.co, bitcoinmagazine.com |
| Sector-Specific | techcrunch.com, oilprice.com, rigzone.com, globest.com, insurancejournal.com, americanbanker.com, fiercepharma.com, freightwaves.com |
| Exchanges & Data | nasdaq.com, nyse.com, investing.com, tradingview.com, tradingeconomics.com, fxstreet.com, stockanalysis.com |
Supported countries and portals
The Actor supports 500+ news portals across 106 countries:
| Country | Portals |
|---|---|
| Andorra | altaveu.com, diariandorra.ad, elperiodic.ad |
| Antigua and Barbuda | antiguaobserver.com, antiguanewsroom.com, antigua.news |
| Argentina | infobae.com, lanacion.com.ar, clarin.com, pagina12.com.ar, ambito.com |
| Australia | miragenews.com, smh.com.au, theaustralian.com.au, news.com.au, abc.net.au, heraldsun.com.au |
| Austria | orf.at, krone.at, derstandard.at, heute.at |
| Bahamas | tribune242.com, thenassauguardian.com |
| Bahrain | gdnonline.com, akhbar-alkhaleej.com, newsofbahrain.com |
| Bangladesh | prothomalo.com, thedailystar.net, bdnews24.com |
| Barbados | nationnews.com, barbadostoday.bb, cbc.bb |
| Belgium | hln.be, nieuwsblad.be, rtbf.be, vrt.be |
| Belize | breakingbelizenews.com, 7newsbelize.com, channel5belize.com |
| Botswana | mmegi.bw, dailynews.gov.bw, guardiansun.co.bw, sundaystandard.info |
| Brazil | globo.com, g1.globo.com, uol.com.br, r7.com, estadao.com.br, folha.uol.com.br |
| Brunei | borneobulletin.com.bn, brudirect.com, rtbnews.rtb.gov.bn |
| Canada | cbc.ca, ctvnews.ca, globalnews.ca, theglobeandmail.com |
| Cape Verde | asemana.cv, inforpress.cv, rtc.cv |
| Chile | lacuarta.com, emol.com, biobiochile.cl, latercera.com |
| Colombia | canalrcn.com, eltiempo.com, semana.com, elespectador.com |
| Costa Rica | teletica.com, crhoy.com, nacion.com |
| Croatia | index.hr, jutarnji.hr, 24sata.hr, vecernji.hr |
| Cyprus | cyprus-mail.com, kathimerini.com.cy, philenews.com |
| Czech Republic | novinky.cz, idnes.cz, aktualne.cz, blesk.cz, seznamzpravy.cz |
| Denmark | ekstrabladet.dk, bt.dk, dr.dk, politiken.dk, tv2.dk, berlingske.dk |
| Egypt | youm7.com, masrawy.com, ahram.org.eg |
| Estonia | delfi.ee, postimees.ee, err.ee |
| Fiji | fijivillage.com, fbcnews.com.fj, fijitimes.com.fj |
| Finland | is.fi, hs.fi, yle.fi, iltalehti.fi |
| France | lemonde.fr, lefigaro.fr, liberation.fr, 20minutes.fr, leparisien.fr, france24.com |
| Georgia | interpressnews.ge, ambebi.ge, rustavi2.ge, imedi.ge |
| Germany | bild.de, spiegel.de, zeit.de, faz.net, sueddeutsche.de |
| Greece | protothema.gr, kathimerini.gr, news247.gr, skai.gr |
| Grenada | nowgrenada.com, thenewtodaygrenada.com, gbn.gd |
| Guam | mbjguam.com, pacificislandtimes.com |
| Hong Kong | scmp.com |
| Hungary | telex.hu, index.hu, 24.hu, origo.hu |
| Iceland | mbl.is, ruv.is, visir.is |
| India | indiatimes.com, aajtak.in, news18.com, hindustantimes.com, ndtv.com |
| Indonesia | detik.com, kompas.com, tribunnews.com, kumparan.com |
| International | aljazeera.com, dw.com, reuters.com |
| Ireland | rte.ie, independent.ie, irishtimes.com, irishexaminer.com, thejournal.ie |
| Israel | ynet.co.il, walla.co.il, mako.co.il, maariv.co.il, haaretz.com, jpost.com, timesofisrael.com |
| Italy | repubblica.it, corriere.it, ansa.it, libero.it, fanpage.it |
| Jamaica | jamaica-gleaner.com, jamaicaobserver.com, rjrnewsonline.com |
| Japan | yomiuri.co.jp, asahi.com, mainichi.jp, nhk.or.jp |
| Kenya | nation.africa, tuko.co.ke, the-star.co.ke, standardmedia.co.ke |
| Kiribati | kiribatigovernmenttimes.com |
| Kuwait | kuwaittimes.com, arabtimesonline.com, timeskuwait.com |
| Latvia | delfi.lv, tvnet.lv, apollo.lv |
| Liechtenstein | vaterland.li, radio.li, 1fl.li |
| Lithuania | delfi.lt, 15min.lt, lrt.lt, lrytas.lt |
| Luxembourg | rtl.lu, wort.lu, lessentiel.lu |
| Malaysia | astroawani.com, malaysiakini.com, thestar.com.my, bharian.com.my, freemalaysiatoday.com, nst.com.my, malaymail.com |
| Maldives | mihaaru.com, en.sun.mv, edition.mv |
| Malta | timesofmalta.com, maltatoday.com.mt, maltadaily.mt |
| Marshall Islands | marshallislandsjournal.com, mh.usembassy.gov |
| Mauritius | lexpress.mu, defimedia.info, lemauricien.com |
| Mexico | eluniversal.com.mx, reforma.com, excelsior.com.mx |
| Micronesia | gov.fm, fsmned.fm, fsmis.fm |
| Monaco | monacomatin.mc, lobservateurdemonaco.com, monacolife.net |
| Montenegro | vijesti.me, dan.co.me, pobjeda.me |
| Namibia | namibian.com.na, namibiansun.com, neweralive.na, nbc.na |
| Nauru | ewnews.com |
| Netherlands | nu.nl, ad.nl, telegraaf.nl, nos.nl |
| New Zealand | stuff.co.nz, nzherald.co.nz, newshub.co.nz, rnz.co.nz, kanivatonga.co.nz |
| Nigeria | premiumtimesng.com, punchng.com, vanguardngr.com, thecable.ng |
| North Macedonia | mkd.mk, novamakedonija.com.mk, makfax.com.mk |
| Norway | vg.no, nrk.no, dagbladet.no, aftenposten.no |
| Oman | timesofoman.com, muscatdaily.com, omanobserver.om |
| Pakistan | dawn.com, tribune.com.pk, geo.tv, thenews.com.pk |
| Palau | islandtimes.org, tiabelaunews.com |
| Panama | tvn-2.com, telemetro.com, prensa.com, laestrella.com.pa |
| Peru | rpp.pe, elcomercio.pe, larepublica.pe, gestion.pe |
| Philippines | inquirer.net, gmanetwork.com, abs-cbn.com, rappler.com |
| Poland | onet.pl, wp.pl, interia.pl, gazeta.pl |
| Portugal | publico.pt, cmjornal.pt, expresso.pt, observador.pt |
| Qatar | thepeninsulaqatar.com, gulf-times.com, qatar-tribune.com |
| Romania | digi24.ro, libertatea.ro, stirileprotv.ro, adevarul.ro |
| Russia | tass.com, ria.ru, lenta.ru, interfax.ru, rt.com |
| Saint Lucia | stluciatimes.com, thevoiceslu.com, stluciastar.com |
| Saint Vincent | iwnsvg.com, searchlight.vc, stvincenttimes.com |
| Samoa | samoaobserver.ws, talamua.com, samoaglobalnews.com |
| San Marino | sanmarinortv.sm, libertas.sm, giornalesm.com |
| Saudi Arabia | arabnews.com, sabq.org, okaz.com.sa, saudigazette.com.sa |
| Serbia | blic.rs, kurir.rs, b92.net, novosti.rs |
| Seychelles | nation.sc, todayinseychelles.com, sbc.sc |
| Singapore | channelnewsasia.com, straitstimes.com, mothership.sg, todayonline.com |
| Slovenia | 24ur.com, rtvslo.si, delo.si |
| Solomon Islands | solomontimes.com, solomonstarnews.com, tavulinews.com.sb |
| South Africa | news24.com, iol.co.za, timeslive.co.za, dailymaverick.co.za, citizen.co.za |
| South Korea | news.nate.com, daum.net, news.naver.com, donga.com |
| Spain | elpais.com, elmundo.es, 20minutos.es, abc.es, lavanguardia.com |
| Suriname | starnieuws.com, dwtonline.com, dbsuriname.com |
| Sweden | aftonbladet.se, expressen.se, svt.se, dn.se |
| Switzerland | blick.ch, 20min.ch, srf.ch, nzz.ch, tagesanzeiger.ch |
| Taiwan | chinatimes.com, ettoday.net, ltn.com.tw, udn.com |
| Thailand | sanook.com, thairath.co.th, khaosod.co.th, matichon.co.th |
| Tonga | matangitonga.to, talanoaotonga.to |
| Trinidad and Tobago | trinidadexpress.com, guardian.co.tt, newsday.co.tt |
| Turkey | hurriyet.com.tr, sozcu.com.tr, milliyet.com.tr, sabah.com.tr, haberturk.com |
| UAE | emirates247.com, gulfnews.com, khaleejtimes.com, thenationalnews.com |
| UK | bbc.com, dailymail.co.uk, theguardian.com, independent.co.uk, ft.com |
| Ukraine | pravda.com.ua, liga.net, ukrinform.net |
| Uruguay | elpais.com.uy, elobservador.com.uy, ladiaria.com.uy, subrayado.com.uy |
| USA | nytimes.com, edition.cnn.com, news.yahoo.com, foxnews.com, nbcnews.com, usatoday.com, washingtonpost.com, apnews.com, wsj.com, msnbc.com, cnbc.com, huffpost.com, cbsnews.com, abcnews.go.com, newsweek.com, latimes.com, marketwatch.com, time.com, npr.org, politico.com, theatlantic.com, thehill.com, vox.com, chicagotribune.com, axios.com, bostonglobe.com, slate.com |
| Vanuatu | dailypost.vu, vbr.vu, buzzfm.vu |
| Vietnam | vnexpress.net, tuoitre.vn, thanhnien.vn, vietnamnet.vn |
Performance
| Metric | Value |
|---|---|
| Speed per portal | ~2-3 seconds |
| Memory usage | ~128-256 MB |
| Typical run (10 portals, no AI) | ~30 seconds |
| Typical run (10 portals, with AI) | ~5 minutes (1st run), ~30s (cached) |
| Articles per portal | 20-100+ depending on site |
Technology
- Runtime: Node.js 20
- Scraping: Cheerio + Axios (pure HTTP, no browser)
- AI: Groq API (default model: Llama 3.3 70B Versatile)
- Cache: Apify Named Key-Value Store (persists between runs, 7-day TTL)
- Platform: Apify (serverless)
Limitations
- Only works with the 424 pre-configured portals (sites not in the database will return 0 articles)
- Some paywalled sites may return limited results
- AI classification requires a Groq API key (free tier available)
- Rate-limited to respect source websites
Support
If you have questions or need help, open an issue on GitHub or contact us through the Apify platform.