BookWyrm Book Reviews Scraper
Pricing
from $1.19 / 1,000 results
BookWyrm Book Reviews Scraper
Scrape public BookWyrm reviews, ratings, book metadata, and reviewer details by book name, book URL, or profile handle from federated BookWyrm instances.
Pricing
from $1.19 / 1,000 results
Rating
0.0
(0)
Developer
Inus Grobler
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Scrape public BookWyrm book reviews, ratings, reviewer profiles, and book metadata from federated BookWyrm instances. This Apify Actor is built for book-focused review collection, ActivityPub book-review enrichment, public profile review monitoring, and multi-source book-review datasets.
BookWyrm is federated, so reviews are spread across independent servers such as bookwyrm.social, bookwyrm.world, and many smaller community instances. Add book URLs, profile handles, or book search queries for the instances you care about, and the Actor collects the public data that those pages and feeds expose. Download the results as JSON, CSV, Excel, XML, or HTML from the Apify dataset.
Why use this BookWyrm scraper?
- Collect public BookWyrm reviews from selected books, profile handles, or book searches
- Build a public book review dataset from federated BookWyrm instances
- Monitor public reviews from selected BookWyrm users
- Enrich book-review records with reviewer and book metadata
- Combine BookWyrm reviews with Goodreads, StoryGraph, Open Library, or other book data sources
- Export clean book and review rows to a dataset, database, spreadsheet, or downstream analytics workflow
What you can extract
- Review URL, title, rating, review text, original HTML, publication date, language, visibility, tags, and content warning
- Reviewer profile URL, handle, display name, avatar URL, ActivityPub ID, public profile bio, outbox URL, and federation metadata
- Book title, subtitle, authors, author aliases, author bio/ISNI links, ISBNs, cover image, BookWyrm book URL, work/edition URL, publisher, page count, subjects, languages, series, physical format, and visible identifiers
- Public comments and quotes linked to books when enabled through advanced input
- Optional raw ActivityPub and RSS fields for advanced workflows
Simple setup
Most runs use one of three inputs:
- Book URLs: scrape selected public BookWyrm book pages
- Profiles: add one federated handle, profile URL, or
instance | handleper line - Book search: add one book title, author, or ISBN per line
Example book URL:
https://bookwyrm.world/book/20954
Example profile:
sigvie@bookwyrm.worldmouse@bookwyrm.socialbookwyrm.world | sigviehttps://bookwyrm.world/user/sigvie
Example book search:
Min skyld Abid RajaSula Toni Morrison9788202713461
Plain searches use a built-in BookWyrm search instance. To target a specific server, use instance | query, such as bookwyrm.social | Sula Toni Morrison.
Example input
{"books": ["https://bookwyrm.world/book/20954"],"profiles": ["sigvie@bookwyrm.world","mouse@bookwyrm.social"],"search": ["Min skyld Abid Raja","Sula Toni Morrison"],"maxReviews": 100,"maxSearchResults": 10}
Example output
Book rows are emitted as soon as book metadata is available:
{"entityType": "book","source": "bookwyrm","sourceInstance": "https://bookwyrm.social","activityPubId": "https://bookwyrm.social/book/2","bookUrl": "https://bookwyrm.social/book/2/s/hamlet","title": "Hamlet","authors": [{"name": "William Shakespeare","url": "https://bookwyrm.social/author/1/s/william-shakespeare","activityPubId": "https://bookwyrm.social/author/1"}],"isbn10": "0140714545","isbn13": "9780140714548","aggregateRating": 3.7,"reviewsCount": 272,"bookReviewDiscoveryStatus": "collected_from_html_public_page","scrapedAt": "2026-05-23T10:21:23.443Z"}
Every review is emitted as its own dataset row:
{"entityType": "review","source": "bookwyrm","sourceInstance": "https://bookwyrm.world","activityPubId": "https://bookwyrm.world/user/sigvie/review/10524","reviewUrl": "https://bookwyrm.world/user/sigvie/review/10524","reviewType": "Article","title": "Review of \"Min skyld\" (5 stars)","rating": 5,"ratingScale": 5,"ratingSource": "activitypub","reviewText": "Fantastisk og rørende bok. Elsker skrivinga!","publishedAt": "2022-12-06T00:00:00.000Z","visibility": "public","reviewer": {"activityPubId": "https://bookwyrm.world/user/sigvie","profileUrl": "https://bookwyrm.world/user/sigvie","handle": "sigvie@bookwyrm.world","displayName": "Sigurd Vie"},"book": {"activityPubId": "https://bookwyrm.world/book/20954","bookUrl": "https://bookwyrm.world/book/20954","title": "Min skyld","authors": ["Abid Qayyum Raja"],"isbn13": "9788202713461"},"bookDetails": {"title": "Min skyld","publisher": "Cappelen Damm","publishedDate": "2021-08-11","pageCount": 240,"languages": ["Norsk (Bokmål)"],"authors": [{"name": "Abid Qayyum Raja","aliases": ["Abid Raja", "Abid Q. Raja"],"isni": "0000000041008712","wikipediaLink": "https://da.wikipedia.org/wiki/Abid_Raja"}],"bookwyrm": {"workUrl": "https://bookwyrm.world/book/20953","physicalFormat": "Hardcover"}},"reviewerProfile": {"handle": "sigvie@bookwyrm.world","displayName": "Sigurd Vie","outboxUrl": "https://bookwyrm.world/user/sigvie/outbox"},"bookwyrm": {"repliesUrl": "https://bookwyrm.world/user/sigvie/review/10524/replies","repliesCount": 0},"discoveryMethod": "rss_reviews","scrapedAt": "2026-05-23T10:21:23.443Z"}
Output
The dataset contains one standalone row for each discovered book and one standalone row for each public review. Reviews are never nested under books, so large runs stay easy to stream into spreadsheets, databases, dashboards, or analytics pipelines.
Book rows include the best available public metadata. Review rows include compact nested book and reviewer summaries, plus richer bookDetails and reviewerProfile fields when enrichment is available. Rows are streamed in batches as they are collected, which lowers memory pressure on large runs and lets you inspect partial results before the run finishes.
For faster large book runs, full reviewer profile enrichment is off by default. Review rows still include the reviewer information visible on the review page, such as reviewer name, profile URL, and handle when available.
How the Actor gets BookWyrm data
The Actor uses the safest available public source first:
- ActivityPub JSON for structured review, profile, book, and status data
- RSS feeds for reliable profile-level review, comment, quote, and activity discovery
- Public BookWyrm search pages when you provide book search queries
- Public HTML fallback for visible metadata when structured sources do not expose enough data
It does not use browser automation by default. It does not log in, solve CAPTCHAs, bypass Cloudflare, bypass anti-bot pages, or access private/followers-only content.
For book URLs, the Actor follows public BookWyrm review pagination when it is visible on the book page. This is the cheapest way to collect hundreds or thousands of reviews for a book because it uses normal HTTP requests and Cheerio parsing, not a browser.
ActivityPub support
BookWyrm often exposes ActivityPub JSON when a URL is requested with ActivityPub headers or when .json is appended to a public entity URL. The Actor supports public actors, outboxes, collections, collection pages, Review objects, Article review objects, Create activities, comments, quotes, books, shelves, and lists where those objects are exposed.
RSS support
When you provide profile handles, the Actor automatically checks public BookWyrm profile feeds where available:
/rss-reviewsfor public reviews/rssfor public activity/rss-quotesfor public quotes/rss-commentsfor public comments
RSS feeds are often the best way to collect profile-level reviews. Some RSS items include ratings in the title, such as (5 stars). If a rating is not available in RSS or enrichment sources, the Actor returns rating: null and labels the source clearly.
Important coverage limits
BookWyrm is not one centralized review database. A book page on one instance may not expose all reviews from all BookWyrm servers. Profile-level scraping is usually more complete because profile RSS feeds and ActivityPub outboxes are scoped to that user.
The Actor does not pretend to scrape every review for a book unless the public page or ActivityPub JSON actually exposes those review links. When a book page exposes paginated public reviews, the Actor follows those pages until maxReviews, the hidden safety page limit, or the end of pagination is reached. When book-level review discovery is incomplete, the book row labels that limitation with bookReviewDiscoveryStatus.
Privacy and ethical use
This Actor is for public BookWyrm data only.
- No login is required or supported
- Private, followers-only, restricted, and login-only pages are skipped
- Cloudflare challenges, CAPTCHAs, and access controls are not bypassed
- robots.txt is respected where practical
- Defaults use public HTML fallback, low concurrency, and polite delays
Use this Actor only for lawful, ethical collection of public data from instances you are allowed to access.
Troubleshooting
No reviews found
Add profiles as handles when possible. Book pages and book search results do not always expose review collections.
The instance returned 403, 404, or 410
The page may be private, deleted, restricted, unavailable through ActivityPub, or blocked by the instance. The Actor records the failed URL in run statistics and continues with other sources.
RSS ratings are missing
Some BookWyrm RSS feeds include ratings in titles, and some do not. If the Actor cannot find a rating in RSS, ActivityPub, or public HTML, it returns null instead of guessing.
A book page did not return all reviews
That is expected on some federated instances. Add profile handles for reviewers you care about to get better profile-level coverage.
The instance rate-limited requests
Lower the maximum number of reviews or search results and keep runs targeted to the profiles and books you need. The Actor uses polite built-in request delays and respects robots.txt where practical.
Pricing suggestion
Recommended Apify Store model: pay per event. Do not use raw pay-per-result or automatic default dataset item charging as the primary pricing event, because the dataset intentionally contains both book metadata rows and review rows. Users should mainly pay for review rows.
review-scraped:$0.00075per review row ($0.75per 1,000 reviews)book-enriched:$0.00010per book row ($0.10per 1,000 books)profile-enriched:$0.00025per full profile enrichment when reviewer profile enrichment is enabled- Keep Apify's synthetic
apify-actor-startevent at the default low price if using pay-per-event monetization
Stress testing showed that BookWyrm review density varies heavily by instance and book. Some searches return many low-review books, while popular books can return more visible public review links. A review-first event price keeps small runs cheap while still covering the extra discovery work needed for low-density or zero-result books.
Suggested starting price: $0.75 per 1,000 public reviews, plus a small book metadata event. Revisit after real user runs and monitor cost per 1,000 reviews, zero-result search rates, average reviews per book, and profile enrichment usage.
API usage
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")actor = client.actor("thescrapelab/bookwyrm-book-reviews-scraper")run_input = {"books": ["https://bookwyrm.world/book/20954"],"profiles": ["sigvie@bookwyrm.world","mouse@bookwyrm.social",],"search": ["Min skyld Abid Raja","Sula Toni Morrison",],"maxReviews": 100,"maxSearchResults": 10,}run = actor.call(run_input=run_input)dataset = client.dataset(run["defaultDatasetId"])for item in dataset.list_items().items:if item["entityType"] == "book":print("BOOK", item.get("title"), item.get("reviewsCount"))elif item["entityType"] == "review":print("REVIEW", item.get("reviewUrl"), item.get("rating"), item.get("book", {}).get("title"))