
Web Scraper
Pricing
Pay per usage

Web Scraper
Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.
4.5 (22)
Pricing
Pay per usage
727
Total users
83.3k
Monthly users
4.2k
Runs succeeded
>99%
Issue response
43 days
Last modified
a month ago
rss API on Readybot.io: The feed could not be parsed or is empty
Closed
I tried to follow the tutorial How to turn any website into an RSS feed.
The datas have been scraped perfectly from target website and storaged in database, but the error message "The feed could not be parsed or is empty" pop out from Readybot.io's client with "https://api.apify.com/v2/actor-tasks/[TASK_ID]/runs/last/dataset/items?token=[YOUR_API_TOKEN]&format=rss" and Readybot.io noted that "The response body may have been truncated." with:
[{"#error": false,"#debug": {"requestId": "0hfqY6MHbN19ofn","url": "https://wildrift.leagueoflegends.com/zh-tw/news/","loadedUrl": "https://wildrift.leagueoflegends.com/zh-tw/news/","method": "GET","retryCount": 0,"errorMessages": [],"statusCode": 200}},{"url": "https://wildrift.leagueoflegends.com/zh-tw/news/game-updates/wild-rift-patch-notes-5-0c/","title": "《激鬥峽谷》5.0c版本更新公告","date": "Wed, 13 Mar 2024 00:00:00 GMT","guid": "https://wildrift.leagueo
Haved I missed something about setting web scraper for rss?
changchiyou
I replaced "https://api.apify.com/v2/actor-tasks/%5BTASK_ID%5D/runs/last/dataset/items?token=%5BYOUR_API_TOKEN%5D&format=rss" with:
-
https://api.apify.com/v2/actor-tasks/%5BTASK_ID%5D/runs/last/dataset/items?token=%5BYOUR_API_TOKEN%5D&format=rss&clean=true
W3C - Feed Validation Service
-
https://api.apify.com/v2/actor-tasks/%5BTASK_ID%5D/runs/last/dataset/items?token=%5BYOUR_API_TOKEN%5D&format=xml&clean=true
W3C - Feed Validation Service
but still didn't work. :(
Hello @changchiyou and thank you for your interest in this Actor!
The errors you're getting from ReadyBot.io are indeed weird - in the first one (with the "The response body may have been truncated" error message), it seems you're passing a JSON object there.
Make sure you only use the URL with the format=rss
query parameter whenever you want to get an RSS feed. Even though RSS feeds are technically XML documents, the .xml
file you get from Apify when you pick format=xml
is not valid RSS.
Now comes the strange part: I just tried making a ReadyBot.io bot with your dataset - and it worked just fine. Try clearing everything you have done until now and create a new ReadyBot.io feed bot with the following RSS URL:
https://api.apify.com/v2/actor-tasks/changchiyou~wildrift-news-zh-tw/runs/last/dataset/items?token=this_is_an_example_token&format=rss
Make sure to replace the token query parameter in the URL (this_is_an_example_token
) with your actual Integration token - you'll find that one here in Apify Console - Settings (left menu) - Integrations (tab) - Personal API tokens.
Hope this helps. Let us know if you run into any issues with this approach. Good luck! :)
changchiyou
@jindrich.bar Thanks for your reply! But I have to apologize for forgetting to update this issue after I solved this problem.
I fixed the wrong fields provided by https://blog.apify.com/how-to-turn-any-website-into-an-rss-feed-a8f9f216e1b0/ :
url
->link
date
->pubDate
After that, the first replacement actually works for me after rerunning the task again and obtaining the new database (havn't try without clean=true
):
https://api.apify.com/v2/actor-tasks/%5BTASK_ID%5D/runs/last/dataset/items?token=%5BYOUR_API_TOKEN%5D&format=rss&clean=true
I believe I forgot to wait for a while after initially running the task, and that's the reason why I got The feed could not be parsed or is empty
(empty database) error at first.
Althought I don't know clearly where is the key point of this bug(fields, url param), but this issue has already been solved yesterday.