BookWyrm Book Reviews Scraper
Pricing
from $1.19 / 1,000 results
BookWyrm Book Reviews Scraper
Scrape public BookWyrm reviews, ratings, book metadata, and reviewer details by book name, book URL, or profile handle from federated BookWyrm instances.
Pricing
from $1.19 / 1,000 results
Rating
0.0
(0)
Developer
Inus Grobler
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
At a glance: what it does is scrape public BookWyrm reviews, ratings, profiles, and book metadata; input examples include BookWyrm URLs, profile handles, and book search queries; output examples are review, book, reviewer, and ActivityPub-related rows; use cases include book-review datasets; limitations, troubleshooting, and pricing/cost notes are covered below.
Scrape public BookWyrm book reviews, ratings, reviewer profiles, and book metadata from federated BookWyrm instances. This Apify Actor is built for book-focused review collection, ActivityPub book-review enrichment, public profile review monitoring, and multi-source book-review datasets.
BookWyrm is federated, so reviews are spread across independent servers such as bookwyrm.social, bookwyrm.world, and many smaller community instances. Add book URLs, profile handles, or book search queries for the instances you care about, and the Actor collects the public data that those pages and feeds expose. Download the results as JSON, CSV, Excel, XML, or HTML from the Apify dataset.
Why use this BookWyrm scraper?
- Collect public BookWyrm reviews from selected books, profile handles, or book searches
- Build a public book review dataset from federated BookWyrm instances
- Monitor public reviews from selected BookWyrm users
- Enrich book-review records with reviewer and book metadata
- Combine BookWyrm reviews with Goodreads, StoryGraph, Open Library, or other book data sources
- Export clean book and review rows to a dataset, database, spreadsheet, or downstream analytics workflow
What you can extract
- Review URL, title, rating, review text, original HTML, publication date, language, visibility, tags, and content warning
- Reviewer profile URL, handle, display name, avatar URL, ActivityPub ID, public profile bio, outbox URL, and federation metadata
- Book title, subtitle, authors, author aliases, author bio/ISNI links, ISBNs, cover image, BookWyrm book URL, work/edition URL, publisher, page count, subjects, languages, series, physical format, and visible identifiers
- Public comments and quotes linked to books when enabled through advanced input
- Optional raw ActivityPub and RSS fields for advanced workflows
Simple setup
Most runs use one of four inputs:
- Any BookWyrm URLs: paste mixed public BookWyrm book, profile, review, shelf, list, RSS, or ActivityPub JSON URLs
- Book URLs: scrape selected public BookWyrm book pages
- Profiles: add one federated handle, profile URL, or
instance | handleper line - Book search: add one book title, author, or ISBN per line
Example book URL:
https://bookwyrm.world/book/20954
Example profile:
sigvie@bookwyrm.worldmouse@bookwyrm.socialbookwyrm.world | sigviehttps://bookwyrm.world/user/sigvie
Example book search:
Min skyld Abid RajaSula Toni Morrison9788202713461
Plain searches use a built-in BookWyrm search instance. To target a specific server, use instance | query, such as bookwyrm.social | Sula Toni Morrison.
Example input
{"startUrls": ["https://bookwyrm.world/book/20954"],"books": ["https://bookwyrm.world/book/15515/s/dune"],"profiles": ["sigvie@bookwyrm.world","mouse@bookwyrm.social"],"search": ["Min skyld Abid Raja","Sula Toni Morrison"],"maxReviews": 100,"maxSearchResults": 10}
Example output
Book rows are emitted as soon as book metadata is available:
{"entityType": "book","source": "bookwyrm","sourceInstance": "https://bookwyrm.social","activityPubId": "https://bookwyrm.social/book/2","bookUrl": "https://bookwyrm.social/book/2/s/hamlet","title": "Hamlet","authors": [{"name": "William Shakespeare","url": "https://bookwyrm.social/author/1/s/william-shakespeare","activityPubId": "https://bookwyrm.social/author/1"}],"isbn10": "0140714545","isbn13": "9780140714548","aggregateRating": 3.7,"reviewsCount": 272,"bookReviewDiscoveryStatus": "collected_from_html_public_page","scrapedAt": "2026-05-23T10:21:23.443Z"}
Every review is emitted as its own dataset row:
{"entityType": "review","source": "bookwyrm","sourceInstance": "https://bookwyrm.world","activityPubId": "https://bookwyrm.world/user/sigvie/review/10524","reviewUrl": "https://bookwyrm.world/user/sigvie/review/10524","reviewType": "Article","title": "Review of \"Min skyld\" (5 stars)","rating": 5,"ratingScale": 5,"ratingSource": "activitypub","reviewText": "Fantastisk og rørende bok. Elsker skrivinga!","publishedAt": "2022-12-06T00:00:00.000Z","visibility": "public","reviewer": {"activityPubId": "https://bookwyrm.world/user/sigvie","profileUrl": "https://bookwyrm.world/user/sigvie","handle": "sigvie@bookwyrm.world","displayName": "Sigurd Vie"},"book": {"activityPubId": "https://bookwyrm.world/book/20954","bookUrl": "https://bookwyrm.world/book/20954","title": "Min skyld","authors": ["Abid Qayyum Raja"],"isbn13": "9788202713461"},"bookDetails": {"title": "Min skyld","publisher": "Cappelen Damm","publishedDate": "2021-08-11","pageCount": 240,"languages": ["Norsk (Bokmål)"],"authors": [{"name": "Abid Qayyum Raja","aliases": ["Abid Raja", "Abid Q. Raja"],"isni": "0000000041008712","wikipediaLink": "https://da.wikipedia.org/wiki/Abid_Raja"}],"bookwyrm": {"workUrl": "https://bookwyrm.world/book/20953","physicalFormat": "Hardcover"}},"reviewerProfile": {"handle": "sigvie@bookwyrm.world","displayName": "Sigurd Vie","outboxUrl": "https://bookwyrm.world/user/sigvie/outbox"},"bookwyrm": {"repliesUrl": "https://bookwyrm.world/user/sigvie/review/10524/replies","repliesCount": 0},"discoveryMethod": "rss_reviews","scrapedAt": "2026-05-23T10:21:23.443Z"}
Output
The dataset contains one standalone row for each discovered book and one standalone row for each public review. Reviews are never nested under books, so large runs stay easy to stream into spreadsheets, databases, dashboards, or analytics pipelines.
Book rows include the best available public metadata. Review rows include compact nested book and reviewer summaries, plus richer bookDetails and reviewerProfile fields when enrichment is available. Rows are streamed in small batches as they are collected, which lowers memory pressure on large runs and lets you inspect partial results before the run finishes.
For faster large book runs, full reviewer profile enrichment is off by default. Review rows still include the reviewer information visible on the review page, such as reviewer name, profile URL, and handle when available.
How the Actor gets BookWyrm data
The Actor uses the safest available public source first:
- ActivityPub JSON for structured review, profile, book, and status data
- RSS feeds for reliable profile-level review, comment, quote, and activity discovery
- Public BookWyrm search pages when you provide book search queries
- Public HTML fallback for visible metadata when structured sources do not expose enough data
It does not use browser automation by default. It does not log in, solve CAPTCHAs, bypass Cloudflare, bypass anti-bot pages, or access private/followers-only content.
For book URLs, the Actor follows public BookWyrm review pagination when it is visible on the book page. This is the cheapest way to collect visible reviews for a book because it uses normal HTTP requests and Cheerio parsing, not a browser.
ActivityPub support
BookWyrm often exposes ActivityPub JSON when a URL is requested with ActivityPub headers or when .json is appended to a public entity URL. The Actor supports public actors, outboxes, collections, collection pages, Review objects, Article review objects, Create activities, comments, quotes, books, shelves, and lists where those objects are exposed.
RSS support
When you provide profile handles, the Actor automatically checks public BookWyrm profile feeds where available:
/rss-reviewsfor public reviews/rssfor public activity/rss-quotesfor public quotes/rss-commentsfor public comments
RSS feeds are often the best way to collect profile-level reviews. Some RSS items include ratings in the title, such as (5 stars). If a rating is not available in RSS or enrichment sources, the Actor returns rating: null and labels the source clearly.
Important coverage limits
BookWyrm is not one centralized review database. A book page on one instance may not expose all reviews from all BookWyrm servers. Profile-level scraping is usually more complete because profile RSS feeds and ActivityPub outboxes are scoped to that user.
The Actor does not pretend to scrape every review for a book unless the public page or ActivityPub JSON actually exposes those review links. When a book page exposes paginated public reviews, the Actor follows those pages until maxReviews, the hidden safety page limit, or the end of pagination is reached. When book-level review discovery is incomplete, the book row labels that limitation with bookReviewDiscoveryStatus.
Privacy and ethical use
This Actor is for public BookWyrm data only.
- No login is required or supported
- Private, followers-only, restricted, and login-only pages are skipped
- Cloudflare challenges, CAPTCHAs, and access controls are not bypassed
- robots.txt is respected where practical
- Defaults use public HTML fallback, low concurrency, and polite delays
Use this Actor only for lawful, ethical collection of public data from instances you are allowed to access.
Troubleshooting
No reviews found
Add profiles as handles when possible. Book pages and book search results do not always expose review collections.
The instance returned 403, 404, or 410
The page may be private, deleted, restricted, unavailable through ActivityPub, or blocked by the instance. The Actor records the failed URL in run statistics and continues with other sources.
RSS ratings are missing
Some BookWyrm RSS feeds include ratings in titles, and some do not. If the Actor cannot find a rating in RSS, ActivityPub, or public HTML, it returns null instead of guessing.
A book page did not return all reviews
That is expected on some federated instances. Add profile handles for reviewers you care about to get better profile-level coverage.
The instance rate-limited requests
Lower the maximum number of reviews or search results and keep runs targeted to the profiles and books you need. The Actor uses polite built-in request delays and respects robots.txt where practical.
Pricing
Recommended Apify Store model: pay per event. The simplest setup is one charged dataset result event for every emitted row. This keeps pricing predictable because the dataset contains both book metadata rows and review rows.
apify-default-dataset-item: recommended starting price$0.00119per dataset row ($1.19per 1,000 book or review rows)- Keep Apify's synthetic
apify-actor-startevent at the default low price if using pay-per-event monetization
Stress testing showed that BookWyrm review density varies heavily by instance and book. Some searches return many low-review books, while popular books can return more visible public review links. A simple per-row price keeps small tests cheap while still covering discovery work for low-density or zero-result inputs.
For higher-volume customers, revisit pricing after real user runs and monitor cost per 1,000 rows, zero-result search rates, average reviews per book, and profile enrichment usage.
FAQ
Can this scrape BookWyrm reviews by book title?
Yes. Add book titles, authors, or ISBNs in Book search. To search one instance, use instance | query, such as bookwyrm.social | Dune Frank Herbert.
Can this scrape reviews from BookWyrm profiles?
Yes. Add federated handles like sigvie@bookwyrm.world or profile URLs. Profile RSS feeds are often the best public source for profile-level reviews.
Does it scrape every BookWyrm review for a book?
Only reviews exposed by the public book page, ActivityPub data, RSS feeds, or linked public pages are returned. BookWyrm is federated, so one instance may not show every review from every server.
Does it need a browser or login?
No. The Actor uses public HTTP, ActivityPub, RSS, and HTML pages. It does not log in or bypass private pages, CAPTCHAs, Cloudflare challenges, or followers-only content.
Why did I get book rows but few review rows?
Some books have little public review data on the selected instance. Try adding reviewer profile handles or targeted book URLs from the instance where the reviews are visible.
API usage
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")actor = client.actor("thescrapelab/bookwyrm-book-reviews-scraper")run_input = {"startUrls": ["https://bookwyrm.world/book/20954",],"books": ["https://bookwyrm.world/book/15515/s/dune",],"profiles": ["sigvie@bookwyrm.world","mouse@bookwyrm.social",],"search": ["Min skyld Abid Raja","Sula Toni Morrison",],"maxReviews": 100,"maxSearchResults": 10,}run = actor.call(run_input=run_input)dataset = client.dataset(run["defaultDatasetId"])for item in dataset.list_items().items:if item["entityType"] == "book":print("BOOK", item.get("title"), item.get("reviewsCount"))elif item["entityType"] == "review":print("REVIEW", item.get("reviewUrl"), item.get("rating"), item.get("book", {}).get("title"))