Google News Scraper avatar

Google News Scraper

Try for free

7 days trial then $20.00/month - No credit card required now

Go to Store
Google News Scraper

Google News Scraper

lhotanova/google-news-scraper
Try for free

7 days trial then $20.00/month - No credit card required now

Gets featured articles from Google News with title, link, source, publication date and image.

DH

scraper returning wrong links

Open

dynamic_harvest opened this issue
5 months ago

The scraper is returning wrong links "https://news.google.com/rss/articles/CBMib2h0dHBzOi8vd3d3LnBybmV3c3dpcmUuY29tL25ld3MtcmVsZWFzZXMvYW1lcmljYW4tY2VudHVyeS1sYXVuY2hlcy1jYWxpZm9ybmlhLW11bmljaXBhbC1ib25kLWV0Zi0zMDIyMDAyMzcuaHRtbNIBAA?oc=5&hl=en-IN&gl=IN&ceid=IN:en"

starting with "https://news.google.com/"

on clicking this link you get redirected to the correct website. This is breaking all my flows, this has been happening since last week, I was getting correct links before that

DH

dynamic_harvest

5 months ago

After

DH

dynamic_harvest

5 months ago

Before

lhotanova avatar

Hello, thank you for reporting this bug and I'm sorry for a late reply. I'm currently reworking the Actor to address this issue as well as other bugs. I'll keep you updated about the progress here.

lhotanova avatar

Hi again, there has been a recent change of Google News API that caused this bug. The Actor has been fixed today, see the example run: https://console.apify.com/view/runs/oqDhZjFezcTVZHdAi

The Actor first extracts the links in the https://news.google.com/ format from the API, then it decodes them to the actual target links and opens the target pages to extract preview images. It needs to be done this way because the Actor doesn't use a web browser that could resolve the redirects automatically. It uses HTTP requests only to keep the expenses low. Google News has recently made the encoding of target URLs more difficult to deal with, so the Actor now uses rather a hacky way to decode the links. Hopefully the Google News API will be stable now and won't break the Actor's flow again.

If you encounter any other issues, please report them using new issue threads 🙏

BO

boothdev

a month ago

Hi, we seems to be getting a lot of links like that causing it to show a good amount of failures on every run.

BO

boothdev

19 days ago

Any update on this as we are seeing quite a few links come up as https://news.google.com/_/DotsSplashUi/data/batchexecute?rpcids=Fbv4je

Developer
Maintained by Community

Actor Metrics

  • 93 monthly users

  • 16 stars

  • >99% runs succeeded

  • 40 days response time

  • Created in Oct 2022

  • Modified 3 months ago

Categories