Globo.com Scraper avatar

Globo.com Scraper

Pricing

$7.00 / 1,000 results

Go to Apify Store
Globo.com Scraper

Globo.com Scraper

this is live data scraper from globo.com for more advanced api usage you can visit here http://rapidapi.com/matepapava123/api/globo-com-news-live-api

Pricing

$7.00 / 1,000 results

Rating

0.0

(0)

Developer

Mate Papava

Mate Papava

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Categories

Share

An Apify Actor that scrapes content from Brazil's largest media network Globo.com, including G1 news, GE sports, Brasileirão top scorers, Cartola FC fantasy football, GShow entertainment, recipes, podcasts, business news, radio, newspaper, rural, auto, and celebrity content.

Features

  • Get G1 news articles by category (economia, mundo, politica, tecnologia, etc.)
  • Get GE sports articles by category (futebol, basquete, tenis, volei, atletismo)
  • Get Brasileirão Serie A top scorers
  • Get match schedule from GE Agenda
  • Get Cartola FC market status, player market, and round matches
  • Get GShow entertainment articles
  • Get G1 podcast episodes
  • Get Valor Econômico business news
  • Get CBN radio articles
  • Get Extra popular news
  • Get O Globo newspaper articles
  • Get Globo Rural agribusiness articles
  • Get Autoesporte automotive articles
  • Get Revista Quem celebrity articles
  • Get recipes by dish type or occasion from Receitas Globo
  • Get full recipe details with ingredients and instructions
  • Get full article content from any Globo domain

Actions

1. get_news

Get paginated G1 news articles.

Parameters:

  • news_category (optional): "all", "economia", "mundo", "politica", "tecnologia", "ciencia-e-saude", "educacao", "pop-arte", "natureza", "carros" (default: "all")
  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_news",
"news_category": "tecnologia",
"page": 1
}

2. get_news_article

Get full content of a G1 news article.

Parameters:

  • url (required): Full .ghtml article URL from g1.globo.com

Example Input:

{
"action": "get_news_article",
"url": "https://g1.globo.com/tecnologia/noticia/2026/01/01/example.ghtml"
}

3. get_sports

Get paginated GE sports articles.

Parameters:

  • sports_category (optional): "all", "futebol", "basquete", "tenis", "volei", "atletismo" (default: "all")
  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_sports",
"sports_category": "futebol"
}

4. get_sports_article

Get full content of a GE sports article.

Parameters:

  • url (required): Full .ghtml article URL from ge.globo.com

Example Input:

{
"action": "get_sports_article",
"url": "https://ge.globo.com/futebol/noticia/2026/01/01/example.ghtml"
}

5. get_brasileirao

Get Brasileirão Serie A top scorers.

Parameters: None

Example Input:

{
"action": "get_brasileirao"
}

6. get_match_schedule

Get match schedule from GE Agenda.

Parameters: None

Example Input:

{
"action": "get_match_schedule"
}

7. get_cartola_status

Get Cartola FC market status (current round, market state).

Parameters: None

Example Input:

{
"action": "get_cartola_status"
}

8. get_cartola_market

Get Cartola FC full player market (athletes with prices, scores, stats).

Parameters: None

Example Input:

{
"action": "get_cartola_market"
}

9. get_cartola_matches

Get Cartola FC round matches.

Parameters: None

Example Input:

{
"action": "get_cartola_matches"
}

10. get_entertainment

Get paginated GShow entertainment articles.

Parameters:

  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_entertainment",
"page": 1
}

11. get_entertainment_article

Get full content of a GShow article.

Parameters:

  • url (required): Full .ghtml article URL from gshow.globo.com

Example Input:

{
"action": "get_entertainment_article",
"url": "https://gshow.globo.com/novelas/noticia/example.ghtml"
}

12. get_podcasts

Get paginated G1 podcast episodes.

Parameters:

  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_podcasts",
"page": 1
}

13. get_podcast_episode

Get full content of a podcast episode.

Parameters:

  • url (required): Full .ghtml podcast episode URL from g1.globo.com

Example Input:

{
"action": "get_podcast_episode",
"url": "https://g1.globo.com/podcast/noticia/example.ghtml"
}

14. get_business

Get paginated Valor Econômico business articles.

Parameters:

  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_business",
"page": 1
}

15. get_business_article

Get full content of a Valor Econômico article.

Parameters:

  • url (required): Full .ghtml article URL from valor.globo.com

Example Input:

{
"action": "get_business_article",
"url": "https://valor.globo.com/financas/noticia/example.ghtml"
}

16. get_radio

Get paginated CBN radio articles.

Parameters:

  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_radio",
"page": 1
}

17. get_radio_article

Get full content of a CBN radio article.

Parameters:

  • url (required): Full .ghtml article URL from cbn.globo.com

Example Input:

{
"action": "get_radio_article",
"url": "https://cbn.globo.com/noticia/example.ghtml"
}

18. get_popular

Get paginated Extra popular news articles.

Parameters:

  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_popular",
"page": 1
}

19. get_popular_article

Get full content of an Extra article.

Parameters:

  • url (required): Full .ghtml article URL from extra.globo.com

Example Input:

{
"action": "get_popular_article",
"url": "https://extra.globo.com/noticia/example.ghtml"
}

20. get_newspaper

Get paginated O Globo newspaper articles.

Parameters:

  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_newspaper",
"page": 1
}

21. get_newspaper_article

Get full content of an O Globo article.

Parameters:

  • url (required): Full .ghtml article URL from oglobo.globo.com

Example Input:

{
"action": "get_newspaper_article",
"url": "https://oglobo.globo.com/brasil/noticia/example.ghtml"
}

22. get_rural

Get paginated Globo Rural agribusiness articles.

Parameters:

  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_rural",
"page": 1
}

23. get_rural_article

Get full content of a Globo Rural article.

Parameters:

  • url (required): Full .ghtml article URL from globorural.globo.com

Example Input:

{
"action": "get_rural_article",
"url": "https://globorural.globo.com/noticia/example.ghtml"
}

24. get_auto

Get paginated Autoesporte automotive articles.

Parameters:

  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_auto",
"page": 1
}

25. get_auto_article

Get full content of an Autoesporte article.

Parameters:

  • url (required): Full .ghtml article URL from autoesporte.globo.com

Example Input:

{
"action": "get_auto_article",
"url": "https://autoesporte.globo.com/noticia/example.ghtml"
}

26. get_celebrities

Get paginated Revista Quem celebrity articles.

Parameters:

  • page (optional): Page number (default: 1)
  • limit (optional): Items per page (default: 20)

Example Input:

{
"action": "get_celebrities",
"page": 1
}

27. get_celebrities_article

Get full content of a Revista Quem article.

Parameters:

  • url (required): Full .ghtml article URL from revistaquem.globo.com

Example Input:

{
"action": "get_celebrities_article",
"url": "https://revistaquem.globo.com/noticia/example.ghtml"
}

28. get_recipes_by_category

Get recipes by dish type.

Parameters:

  • recipe_category (required): Dish type slug — "tortas-e-bolos", "massas", "saladas", "sopas-e-caldos", "carnes", "peixes-e-frutos-do-mar", "entradas", "doces-e-sobremesas", "bebidas", "lanches", "acompanhamentos"
  • page (optional): Page number (default: 1)

Example Input:

{
"action": "get_recipes_by_category",
"recipe_category": "carnes",
"page": 1
}

29. get_recipes_by_occasion

Get recipes by occasion.

Parameters:

  • recipe_occasion (required): Occasion slug — "aniversario", "cafe-da-manha", "jantar", "almoco", "lanche-da-tarde", "festa-infantil", "pascoa", "natal", "reuniao-com-amigos", "reuniao-em-familia"
  • page (optional): Page number (default: 1)

Example Input:

{
"action": "get_recipes_by_occasion",
"recipe_occasion": "natal",
"page": 1
}

30. get_recipe_detail

Get full recipe details with ingredients and instructions.

Parameters:

  • url (required): Full .ghtml recipe URL from receitas.globo.com

Example Input:

{
"action": "get_recipe_detail",
"url": "https://receitas.globo.com/receita/example.ghtml"
}

Output Format

All actions return data in a consistent format:

{
"action": "action_name",
"success": true,
"data": { ... },
"error": null,
"timestamp": "2026-01-01T00:00:00.000Z"
}

On error:

{
"action": "action_name",
"success": false,
"data": null,
"error": "Error message",
"timestamp": "2026-01-01T00:00:00.000Z"
}

Typical Workflows

News Monitoring

  1. get_news with a category filter to browse headlines
  2. get_news_article with the article URL to get full content

Brazilian Football

  1. get_brasileirao for top scorer leaderboard
  2. get_match_schedule for upcoming fixtures
  3. get_cartola_status to check if Cartola market is open
  4. get_cartola_market for player prices and stats

Recipe Discovery

  1. get_recipes_by_category or get_recipes_by_occasion to browse recipes
  2. get_recipe_detail with the recipe URL for full ingredients and instructions

Multi-Source Coverage

  1. Use feed actions (get_news, get_sports, get_business, etc.) to aggregate headlines from across all Globo brands
  2. Use article actions to get full content for any item

Notes

  • Uses curl_cffi with Chrome TLS fingerprint impersonation (required to bypass CDN bot protection)
  • RSS feeds are parsed with lxml for speed; HTML articles are parsed with BeautifulSoup
  • All article actions accept a full .ghtml URL from the corresponding Globo domain
  • Cartola FC endpoints call the official Cartola API directly (JSON)
  • Match schedule data is extracted from embedded JavaScript on the GE Agenda page
  • Recipe data is enriched with JSON-LD structured data when available