Scribd Document Search Scraper
Pricing
from $3.50 / 1,000 results
Scribd Document Search Scraper
[💰 $3.5 / 1K] Search Scribd by keyword and export structured metadata for every matching document, book, audiobook, sheet music, or podcast — title, author, type, page count, ratings, views, language, categories, and links.
Pricing
from $3.50 / 1,000 results
Rating
0.0
(0)
Developer
SolidCode
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Search Scribd by keyword and pull clean, structured metadata for every matching document — title, author, page count, view counts, ratings, language, categories, and direct reader and download links. Run many keywords in a single pass and get one tidy row per document. Built for market researchers, competitive analysts, and content and lead-gen teams who need Scribd document data at scale without paging through search results by hand.
Why This Scraper?
- Multi-keyword batch in one run — pass a list of search terms and each keyword runs as its own search, so you cover an entire topic map in a single pass instead of one query at a time.
- Up to 10,000 results per keyword — lifts the typical 100-result search ceiling 100× so you can sweep a whole topic, not just the first page.
- 23 structured fields per document — id, title, author, type, description, page count, release date, views, reading time, ratings, language, categories, and direct links — every row consumption-ready, no raw markup.
- Derived 0–5 star rating — a clean star score computed from each document's community upvotes and downvotes, alongside the raw
upvoteCount,downvoteCount, andratingCount. - Author profile URLs for outreach — every row carries the primary author's name and absolute profile link, plus a full
authorsarray with each contributor's id, name, and profile URL. - Engagement signals built in — real view counts (parsed from Scribd's "15K"/"1.2M" shorthand to plain integers) and estimated reading time let you rank documents by popularity, not just relevance.
- Direct reader and download links — every row includes the canonical reader URL and a ready-to-use download link when the document is downloadable, so you never have to reconstruct paths.
- Result-language preference across 21 languages — bias results toward English, Spanish, Portuguese, French, German, Arabic, Hindi, Japanese, and more so you collect documents in the language your audience reads.
- Query provenance on every row — each document carries the exact keyword that surfaced it, so a single mixed dataset stays attributable per search term.
Use Cases
Market & Content Research
- Map how much Scribd content exists around a topic, product, or industry
- Surface the most-viewed and highest-rated documents in a niche
- Track templates, whitepapers, and guides circulating in your space
- Build topic libraries spanning dozens of keywords in one run
Competitive Analysis
- See which authors and brands publish the most in your category
- Benchmark engagement (views, ratings) against competing documents
- Monitor new uploads tied to a brand or product name
- Compare document depth by page count across competitors
Lead Generation & Outreach
- Collect author profile URLs and names for creator outreach
- Identify prolific publishers in a target vertical
- Build prospect lists from documents matching buyer-intent keywords
- Prioritize outreach by author reach using view counts and ratings
Academic & Reference
- Gather reference document metadata across many search terms at once
- Filter your reading list by page count and reading time before opening anything
- Prefer results in a specific language for non-English literature reviews
- Catalog community ratings to triage which documents are worth reading
Content Curation
- Power recommendation feeds and resource roundups with fresh metadata
- Enrich an existing content database with views, ratings, and categories
- Curate by category labels Scribd files each document under
- Feed a newsletter or knowledge base with structured document records
Getting Started
Simple — one keyword
{"queries": ["business plan template"]}
Several keywords, more results each
{"queries": ["machine learning", "data science", "neural networks"],"maxResultsPerQuery": 250}
Advanced — language preference and a deep sweep
{"queries": ["recetas de cocina", "plan de negocios"],"maxResultsPerQuery": 1000,"language": "4"}
Input Reference
All fields are optional — run with just a keyword and sensible defaults handle the rest.
| Parameter | Type | Default | Description |
|---|---|---|---|
queries | string[] | ["business plan template"] | One or more keywords to search on Scribd. Each keyword runs its own search — add several to cover a whole topic in one run. |
maxResultsPerQuery | integer | 100 | How many documents to return per keyword. Set to 0 to fetch every available match. Results arrive in pages of 40, so the final page may slightly overshoot rather than cut off mid-page. |
language | select | Any language | Prefer results written in a chosen language — English, Spanish, Portuguese, French, German, Italian, Dutch, Russian, Japanese, Korean, Chinese, Arabic, Hindi, Indonesian, Turkish, Polish, Danish, Romanian, Thai, Swedish, or Czech. Leave on "Any language" for no preference. Coverage depends on how much content Scribd has in that language for your keyword. |
Output
Each matching document becomes one flat row. Here's a representative result:
{"id": "238702049","title": "Sample Business Plan Template","author": "Jane Author","authorUrl": "https://www.scribd.com/user/12345678/jane-author","authors": [{ "id": 12345678, "name": "Jane Author", "url": "https://www.scribd.com/user/12345678/jane-author" }],"type": "document","description": "A complete business plan template covering executive summary, market analysis, and financials...","url": "https://www.scribd.com/document/238702049/Sample-Business-Plan-Template","downloadUrl": "https://www.scribd.com/document_downloads/238702049","imageUrl": "https://imgv2-1-f.scribdassets.com/img/document/238702049/original.jpg","pageCount": 32,"releasedAt": "2018-04-12","views": 15000,"consumptionTime": 24,"isUnlocked": true,"rating": 4.5,"upvoteCount": 90,"downvoteCount": 10,"ratingCount": 100,"language": "English","languageIso": "en","categories": ["Business", "Templates"],"query": "business plan template"}
Document Fields
| Field | Type | Description |
|---|---|---|
id | string | Scribd document identifier |
title | string | Document title |
type | string | Document type label as classified by Scribd |
description | string | Description or snippet |
pageCount | integer | Number of pages (null for non-paged content) |
releasedAt | string | Publication or upload date |
consumptionTime | integer | Estimated reading time in minutes |
isUnlocked | boolean | Whether the document is freely accessible |
categories | string[] | Category labels Scribd files the document under |
query | string | The search keyword that surfaced this row |
Author Fields
| Field | Type | Description |
|---|---|---|
author | string | Primary author name (may be null) |
authorUrl | string | Primary author profile URL |
authors | object[] | All contributors, each with id, name, and profile url |
Engagement & Ratings
| Field | Type | Description |
|---|---|---|
views | integer | View count, parsed to a plain integer |
rating | number | Derived 0–5 star rating from community votes |
upvoteCount | integer | Number of upvotes |
downvoteCount | integer | Number of downvotes |
ratingCount | integer | Total ratings cast |
language | string | Language name |
languageIso | string | ISO language code |
Links & Media
| Field | Type | Description |
|---|---|---|
url | string | Canonical Scribd reader URL |
downloadUrl | string | Direct download link when available |
imageUrl | string | Cover thumbnail image URL |
Tips for Best Results
- Use specific multi-word phrases to narrow large topics — a broad single word like "business" returns tens of thousands of loosely related documents, while "small business marketing plan" returns a focused, usable set.
- Batch related keywords in one run — the
queryfield tags every row with its source keyword, so you can split one mixed dataset back out per term afterward. - Start with a small
maxResultsPerQuery(40–100) to confirm the results match your intent, then raise it once you're happy with the keywords. - Set
maxResultsPerQueryto 0 only when you genuinely want the full match set — it sweeps deep and is best paired with tight, specific phrases. - Treat
languageas a preference, not a hard filter — for keywords with little Scribd content in a given language, results fall back to the most available language; pair a language with a keyword written in that language for the best hit rate. - Rank by
viewsandratingtogether — a high view count with a strong derived star score is the surest sign a document is both popular and well received. - Use
pageCountandconsumptionTimeto pre-screen depth before opening anything — filter out one-page stubs or zero in on long-form references in seconds.
Pricing
From $3.50 per 1,000 results — undercuts comparable Scribd search scrapers while lifting the result ceiling 100×. No compute or time-based charges — you pay per result, plus a small fixed per-run start fee. Bronze, Silver, and Gold subscribers pay progressively less; the table below shows total cost at each discount tier.
| Results | No discount | Bronze | Silver | Gold |
|---|---|---|---|---|
| 100 | $0.42 | $0.40 | $0.38 | $0.35 |
| 1,000 | $4.20 | $3.95 | $3.75 | $3.50 |
| 10,000 | $42.00 | $39.50 | $37.50 | $35.00 |
| 100,000 | $420.00 | $395.00 | $375.00 | $350.00 |
A "result" is any document row in the output dataset. The fixed per-run start fee and any platform usage (storage) are additional and depend on your Apify plan.
Integrations
Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:
- Zapier / Make / n8n — Workflow automation
- Google Sheets — Direct spreadsheet export
- Slack / Email — Notifications on new results
- Webhooks — Trigger custom APIs on run completion
- Apify API — Full programmatic access
Legal & Ethical Use
This actor is designed for legitimate research, market analysis, content curation, and lead generation. Users are responsible for complying with applicable laws and Scribd's Terms of Service. Only collect publicly available document metadata, respect copyright and authors' rights, and do not use extracted data for spam, harassment, or any unlawful purpose.