Web Scraper avatar
Web Scraper
Try for free

No credit card required

View all Actors
Web Scraper

Web Scraper

apify/web-scraper
Try for free

No credit card required

Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

User avatar

rss API on Readybot.io: The feed could not be parsed or is empty

Closed

changchiyou opened this issue
a month ago

I tried to follow the tutorial How to turn any website into an RSS feed.

The datas have been scraped perfectly from target website and storaged in database, but the error message "The feed could not be parsed or is empty" pop out from Readybot.io's client with "https://api.apify.com/v2/actor-tasks/[TASK_ID]/runs/last/dataset/items?token=[YOUR_API_TOKEN]&format=rss" and Readybot.io noted that "The response body may have been truncated." with:

1[{
2  "#error": false,
3  "#debug": {
4    "requestId": "0hfqY6MHbN19ofn",
5    "url": "https://wildrift.leagueoflegends.com/zh-tw/news/",
6    "loadedUrl": "https://wildrift.leagueoflegends.com/zh-tw/news/",
7    "method": "GET",
8    "retryCount": 0,
9    "errorMessages": [],
10    "statusCode": 200
11  }
12},
13{
14  "url": "https://wildrift.leagueoflegends.com/zh-tw/news/game-updates/wild-rift-patch-notes-5-0c/",
15  "title": "《激鬥峽谷》5.0c版本更新公告",
16  "date": "Wed, 13 Mar 2024 00:00:00 GMT",
17  "guid": "https://wildrift.leagueo

Haved I missed something about setting web scraper for rss?

User avatar

changchiyou

a month ago

I replaced "https://api.apify.com/v2/actor-tasks/%5BTASK_ID%5D/runs/last/dataset/items?token=%5BYOUR_API_TOKEN%5D&format=rss" with:

  1. https://api.apify.com/v2/actor-tasks/%5BTASK_ID%5D/runs/last/dataset/items?token=%5BYOUR_API_TOKEN%5D&format=rss&clean=true

    W3C - Feed Validation Service

  2. https://api.apify.com/v2/actor-tasks/%5BTASK_ID%5D/runs/last/dataset/items?token=%5BYOUR_API_TOKEN%5D&format=xml&clean=true

    W3C - Feed Validation Service

but still didn't work. :(

User avatar

Hello @changchiyou and thank you for your interest in this Actor!

The errors you're getting from ReadyBot.io are indeed weird - in the first one (with the "The response body may have been truncated" error message), it seems you're passing a JSON object there. Make sure you only use the URL with the format=rss query parameter whenever you want to get an RSS feed. Even though RSS feeds are technically XML documents, the .xml file you get from Apify when you pick format=xml is not valid RSS.

Now comes the strange part: I just tried making a ReadyBot.io bot with your dataset - and it worked just fine. Try clearing everything you have done until now and create a new ReadyBot.io feed bot with the following RSS URL:

https://api.apify.com/v2/actor-tasks/changchiyou~wildrift-news-zh-tw/runs/last/dataset/items?token=this_is_an_example_token&format=rss

Make sure to replace the token query parameter in the URL (this_is_an_example_token) with your actual Integration token - you'll find that one here in Apify Console - Settings (left menu) - Integrations (tab) - Personal API tokens.

Hope this helps. Let us know if you run into any issues with this approach. Good luck! :)

User avatar

changchiyou

a month ago

@jindrich.bar Thanks for your reply! But I have to apologize for forgetting to update this issue after I solved this problem.


I fixed the wrong fields provided by https://blog.apify.com/how-to-turn-any-website-into-an-rss-feed-a8f9f216e1b0/ :

  1. url->link
  2. date -> pubDate

After that, the first replacement actually works for me after rerunning the task again and obtaining the new database (havn't try without clean=true):

https://api.apify.com/v2/actor-tasks/%5BTASK_ID%5D/runs/last/dataset/items?token=%5BYOUR_API_TOKEN%5D&format=rss&clean=true

I believe I forgot to wait for a while after initially running the task, and that's the reason why I got The feed could not be parsed or is empty(empty database) error at first.

Althought I don't know clearly where is the key point of this bug(fields, url param), but this issue has already been solved yesterday.

Developer
Maintained by Apify
Actor metrics
  • 3.7k monthly users
  • 98.8% runs succeeded
  • 3.6 days response time
  • Created in Mar 2019
  • Modified about 1 month ago