OnlineBookClub Book Reviews Scraper
Pricing
from $0.99 / 1,000 results
OnlineBookClub Book Reviews Scraper
OnlineBookClub scraper for authorized public book reviews, ratings, reviewer names, metadata, genres, and purchase links.
Pricing
from $0.99 / 1,000 results
Rating
0.0
(0)
Developer
Inus Grobler
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
This Actor extracts public OnlineBookClub book review data and book metadata from authorized pages. It is designed for users who have permission to collect data from OnlineBookClub.org.
Use it as an authorized OnlineBookClub scraper for public official review text, ratings, reviewer display names, genres, book metadata, and purchase links.
This Actor is for authorized use only. OnlineBookClub.org’s Terms prohibit scraping without express written permission. Use this Actor only if you have permission.
What This Actor Does
OnlineBookClub Book Reviews Scraper collects public official review listings, public official review text, and public book metadata from OnlineBookClub pages you are authorized to access. It is HTTP-based by default, uses polite request delays, and does not use browser automation unless a future debugging mode is explicitly added.
It does not log in, does not accept cookies, does not accept session tokens, does not create accounts, and does not bypass Cloudflare, captchas, bot challenges, rate limits, login walls, or access controls.
Who It Is For
Use this Actor if you have written permission to collect public OnlineBookClub review data for research, catalog enrichment, book discovery workflows, review monitoring, or internal analysis.
Do not use it for private pages, member-only discussions, login-only replies, contact harvesting, or account-based access.
Data Extracted
- Review listing details: book title, author, rating, reviewer display name, review URL, listing date text, genre, reply count, book URL, cover image URL, and public purchase links.
- Full public official review pages: review title, topic ID, reviewer display name, public profile URL when visible, post date, declaration text, book title, author, rating, review text, book URL, cover image URL, and public purchase links.
- Public book metadata: book ID, title, author, author URL, genre, release date, word count, language, average reviewer rating, official review link, official review rating, cover image URL, and public purchase links.
Replies that require login are not collected. If a page says login is required to view replies, the Actor skips replies and records a warning in run statistics.
Simple Setup
- Confirm you have permission to collect data from OnlineBookClub.org.
- Set I have permission to collect this data to
true. - Add review index URLs, official review URLs, or book page URLs in Start URLs.
- Choose the maximum number of reviews.
- Start the Actor and open the Dataset while it is running to see review records appear progressively.
The input form is intentionally simple. The Actor automatically detects URL types and uses built-in polite crawling defaults.
Input Examples
Scrape Latest Review Listings
{"confirmAuthorizedUse": true,"startUrls": ["https://onlinebookclub.org/reviews/"],"maxReviews": 100}
Scrape Specific Review URLs
{"confirmAuthorizedUse": true,"startUrls": ["https://forums.onlinebookclub.org/viewtopic.php?f=63&t=745520"],"maxReviews": 25}
Scrape Specific Book Pages
{"confirmAuthorizedUse": true,"startUrls": ["https://onlinebookclub.org/shelves/book.php?id=728007"],"maxReviews": 25}
Output
The Actor always outputs one dataset row per review item. Book metadata, cover images, public purchase links, and listing details are included in the same review row when available.
Example Output
{"entityType": "review","source": "onlinebookclub","reviewUrl": "https://forums.onlinebookclub.org/viewtopic.php?f=63&t=745520","reviewTitle": "Review of Souls Run Wild","bookTitle": "Souls Run Wild","authorName": "David Payne","genre": "Historical Fiction","rating": 4,"ratingScale": 5,"normalizedRatingOutOf5": 4,"reviewerName": "Amanda Collier","postedAt": "2026-05-13T22:08:00","reviewText": "Full public review text...","bookUrl": "https://onlinebookclub.org/shelves/book.php?id=728007","scrapedAt": "2026-05-29T00:00:00.000Z"}
Rating Scales
OnlineBookClub reviews may use different rating scales. Some older reviews use a 4-star scale, such as 3 out of 4 stars, while newer reviews may use a 5-star scale.
The Actor preserves:
ratingratingScaleratingTextnormalizedRatingOutOf5
For example, 3 out of 4 stars is preserved as rating 3, scale 4, and normalized rating 3.75 out of 5.
Important Limitations
- Authorized use only.
- No login support.
- No cookies or session tokens.
- No private or member-only pages.
- No account creation.
- No bypassing Cloudflare, captchas, bot challenges, rate limits, or access controls.
- Replies requiring login are skipped.
- Purchase links are stored as public links found on the page and are not followed.
- Pages that require JavaScript may still be parsed only when public HTML content is present.
Recommended Run Settings
Based on fixture stress tests and the HTTP-only design:
{"memoryMbytes": 512,"timeoutSecs": 3600}
Small run, up to 50 reviews:
- Memory: 512 MB
- Timeout: 15-30 minutes
Medium run, up to 250 reviews:
- Memory: 512 MB
- Timeout: 1-2 hours
- Use 1024 MB only if your run includes large book enrichment or unusually large pages.
Large authorized run:
- Keep browser automation disabled.
- Estimate timeout from
maxReviewsand page response speed. - Increase timeout before increasing memory unless memory warnings appear.
Cost Control
- Lower
maxReviewsto reduce runtime and compute cost. - Keep browser automation disabled.
- Use 512 MB memory unless testing shows higher memory is needed.
- Increase timeout only for larger authorized runs.
Troubleshooting
Actor refuses to run because confirmAuthorizedUse is false:
Set it to true only if you have permission to collect data from OnlineBookClub.org.
No reviews found: Check that your URLs are public review index, review, or book pages and that robots.txt allows the requested paths.
Review page has login-required replies: Replies are skipped. Public official review text can still be extracted when visible without login.
Book page missing average rating:
Some books do not show an average reviewer rating until published reviews exist. The field will be null.
Rating scale is 4 instead of 5:
This is expected for older reviews. The original scale is preserved and normalizedRatingOutOf5 is also provided.
Date has inferred year:
Some listing dates omit the year. The Actor infers the year using the run date and marks dateParseConfidence as inferred_year.
Site returned 403 or 429:
The Actor does not aggressively retry forbidden pages. For 429 rate limits, it respects Retry-After when provided.
Challenge or access-denied page detected: The Actor stops or skips the page and records a warning. It will not attempt to bypass the restriction.
Some purchase links are redirects: The Actor stores public href values as found and does not follow affiliate or redirect links.
Some pages require JavaScript but public content was still parseable: The Actor may continue when public HTML content is present. It will not use JavaScript execution to bypass restrictions.
Dataset appears empty at first: For full review runs, the first records appear after the first review page and optional book metadata page are processed.
Run timed out before finishing:
Reduce maxReviews or increase timeout for larger authorized runs.
Memory limit exceeded: Use 512 MB for normal runs and 1024 MB for larger enriched runs.
Pricing Suggestion
Recommended pay-per-event pricing:
review-scraped: suggested launch price $0.00099 per successful review row ($0.99 per 1,000).
Charge only for successful review rows. Do not charge for failed requests, duplicate records, skipped restricted pages, blocked/challenge pages, or login-required replies. Keep Apify's default synthetic Actor start event enabled.
API Example
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run_input = {"confirmAuthorizedUse": True,"startUrls": ["https://onlinebookclub.org/reviews/"],"maxReviews": 25,}run = client.actor("TheScrapeLab/onlinebookclub-book-reviews-scraper").call(run_input=run_input)for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
FAQ
Can this Actor scrape private member data? No. It is designed for public authorized pages only.
Can I provide login credentials or cookies? No. Login, cookies, and session tokens are not supported.
Does it use browser automation? No browser automation is used by default. The Actor is optimized for low-cost HTTP extraction.
Does it bypass blocking? No. It stops or skips pages when challenges, access-denied pages, or login requirements are detected.
Can I collect forum replies? Only public replies visible without login may be considered in a future version. Login-required replies are skipped.
Changelog
1.0.0
- Initial authorized-use-only Actor.
- Public review index, official review page, and book page extraction.
- Low-concurrency HTTP client.
- Rating/date parsing with preserved original scales.
- Streaming dataset output and RUN-STATS reporting.
Support
For support, contact the Actor maintainer through your Apify support or marketplace contact channel.