Scribd Document Search Scraper avatar

Scribd Document Search Scraper

Pricing

from $3.50 / 1,000 results

Go to Apify Store
Scribd Document Search Scraper

Scribd Document Search Scraper

[💰 $3.5 / 1K] Search Scribd by keyword and export structured metadata for every matching document, book, audiobook, sheet music, or podcast — title, author, type, page count, ratings, views, language, categories, and links.

Pricing

from $3.50 / 1,000 results

Rating

0.0

(0)

Developer

SolidCode

SolidCode

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Search Scribd by keyword and pull clean, structured metadata for every matching document — title, author, page count, view counts, ratings, language, categories, and direct reader and download links. Run many keywords in a single pass and get one tidy row per document. Built for market researchers, competitive analysts, and content and lead-gen teams who need Scribd document data at scale without paging through search results by hand.

Why This Scraper?

  • Multi-keyword batch in one run — pass a list of search terms and each keyword runs as its own search, so you cover an entire topic map in a single pass instead of one query at a time.
  • Up to 10,000 results per keyword — lifts the typical 100-result search ceiling 100× so you can sweep a whole topic, not just the first page.
  • 23 structured fields per document — id, title, author, type, description, page count, release date, views, reading time, ratings, language, categories, and direct links — every row consumption-ready, no raw markup.
  • Derived 0–5 star rating — a clean star score computed from each document's community upvotes and downvotes, alongside the raw upvoteCount, downvoteCount, and ratingCount.
  • Author profile URLs for outreach — every row carries the primary author's name and absolute profile link, plus a full authors array with each contributor's id, name, and profile URL.
  • Engagement signals built in — real view counts (parsed from Scribd's "15K"/"1.2M" shorthand to plain integers) and estimated reading time let you rank documents by popularity, not just relevance.
  • Direct reader and download links — every row includes the canonical reader URL and a ready-to-use download link when the document is downloadable, so you never have to reconstruct paths.
  • Result-language preference across 21 languages — bias results toward English, Spanish, Portuguese, French, German, Arabic, Hindi, Japanese, and more so you collect documents in the language your audience reads.
  • Query provenance on every row — each document carries the exact keyword that surfaced it, so a single mixed dataset stays attributable per search term.

Use Cases

Market & Content Research

  • Map how much Scribd content exists around a topic, product, or industry
  • Surface the most-viewed and highest-rated documents in a niche
  • Track templates, whitepapers, and guides circulating in your space
  • Build topic libraries spanning dozens of keywords in one run

Competitive Analysis

  • See which authors and brands publish the most in your category
  • Benchmark engagement (views, ratings) against competing documents
  • Monitor new uploads tied to a brand or product name
  • Compare document depth by page count across competitors

Lead Generation & Outreach

  • Collect author profile URLs and names for creator outreach
  • Identify prolific publishers in a target vertical
  • Build prospect lists from documents matching buyer-intent keywords
  • Prioritize outreach by author reach using view counts and ratings

Academic & Reference

  • Gather reference document metadata across many search terms at once
  • Filter your reading list by page count and reading time before opening anything
  • Prefer results in a specific language for non-English literature reviews
  • Catalog community ratings to triage which documents are worth reading

Content Curation

  • Power recommendation feeds and resource roundups with fresh metadata
  • Enrich an existing content database with views, ratings, and categories
  • Curate by category labels Scribd files each document under
  • Feed a newsletter or knowledge base with structured document records

Getting Started

Simple — one keyword

{
"queries": ["business plan template"]
}

Several keywords, more results each

{
"queries": ["machine learning", "data science", "neural networks"],
"maxResultsPerQuery": 250
}

Advanced — language preference and a deep sweep

{
"queries": ["recetas de cocina", "plan de negocios"],
"maxResultsPerQuery": 1000,
"language": "4"
}

Input Reference

All fields are optional — run with just a keyword and sensible defaults handle the rest.

ParameterTypeDefaultDescription
queriesstring[]["business plan template"]One or more keywords to search on Scribd. Each keyword runs its own search — add several to cover a whole topic in one run.
maxResultsPerQueryinteger100How many documents to return per keyword. Set to 0 to fetch every available match. Results arrive in pages of 40, so the final page may slightly overshoot rather than cut off mid-page.
languageselectAny languagePrefer results written in a chosen language — English, Spanish, Portuguese, French, German, Italian, Dutch, Russian, Japanese, Korean, Chinese, Arabic, Hindi, Indonesian, Turkish, Polish, Danish, Romanian, Thai, Swedish, or Czech. Leave on "Any language" for no preference. Coverage depends on how much content Scribd has in that language for your keyword.

Output

Each matching document becomes one flat row. Here's a representative result:

{
"id": "238702049",
"title": "Sample Business Plan Template",
"author": "Jane Author",
"authorUrl": "https://www.scribd.com/user/12345678/jane-author",
"authors": [
{ "id": 12345678, "name": "Jane Author", "url": "https://www.scribd.com/user/12345678/jane-author" }
],
"type": "document",
"description": "A complete business plan template covering executive summary, market analysis, and financials...",
"url": "https://www.scribd.com/document/238702049/Sample-Business-Plan-Template",
"downloadUrl": "https://www.scribd.com/document_downloads/238702049",
"imageUrl": "https://imgv2-1-f.scribdassets.com/img/document/238702049/original.jpg",
"pageCount": 32,
"releasedAt": "2018-04-12",
"views": 15000,
"consumptionTime": 24,
"isUnlocked": true,
"rating": 4.5,
"upvoteCount": 90,
"downvoteCount": 10,
"ratingCount": 100,
"language": "English",
"languageIso": "en",
"categories": ["Business", "Templates"],
"query": "business plan template"
}

Document Fields

FieldTypeDescription
idstringScribd document identifier
titlestringDocument title
typestringDocument type label as classified by Scribd
descriptionstringDescription or snippet
pageCountintegerNumber of pages (null for non-paged content)
releasedAtstringPublication or upload date
consumptionTimeintegerEstimated reading time in minutes
isUnlockedbooleanWhether the document is freely accessible
categoriesstring[]Category labels Scribd files the document under
querystringThe search keyword that surfaced this row

Author Fields

FieldTypeDescription
authorstringPrimary author name (may be null)
authorUrlstringPrimary author profile URL
authorsobject[]All contributors, each with id, name, and profile url

Engagement & Ratings

FieldTypeDescription
viewsintegerView count, parsed to a plain integer
ratingnumberDerived 0–5 star rating from community votes
upvoteCountintegerNumber of upvotes
downvoteCountintegerNumber of downvotes
ratingCountintegerTotal ratings cast
languagestringLanguage name
languageIsostringISO language code
FieldTypeDescription
urlstringCanonical Scribd reader URL
downloadUrlstringDirect download link when available
imageUrlstringCover thumbnail image URL

Tips for Best Results

  • Use specific multi-word phrases to narrow large topics — a broad single word like "business" returns tens of thousands of loosely related documents, while "small business marketing plan" returns a focused, usable set.
  • Batch related keywords in one run — the query field tags every row with its source keyword, so you can split one mixed dataset back out per term afterward.
  • Start with a small maxResultsPerQuery (40–100) to confirm the results match your intent, then raise it once you're happy with the keywords.
  • Set maxResultsPerQuery to 0 only when you genuinely want the full match set — it sweeps deep and is best paired with tight, specific phrases.
  • Treat language as a preference, not a hard filter — for keywords with little Scribd content in a given language, results fall back to the most available language; pair a language with a keyword written in that language for the best hit rate.
  • Rank by views and rating together — a high view count with a strong derived star score is the surest sign a document is both popular and well received.
  • Use pageCount and consumptionTime to pre-screen depth before opening anything — filter out one-page stubs or zero in on long-form references in seconds.

Pricing

From $3.50 per 1,000 results — undercuts comparable Scribd search scrapers while lifting the result ceiling 100×. No compute or time-based charges — you pay per result, plus a small fixed per-run start fee. Bronze, Silver, and Gold subscribers pay progressively less; the table below shows total cost at each discount tier.

ResultsNo discountBronzeSilverGold
100$0.42$0.40$0.38$0.35
1,000$4.20$3.95$3.75$3.50
10,000$42.00$39.50$37.50$35.00
100,000$420.00$395.00$375.00$350.00

A "result" is any document row in the output dataset. The fixed per-run start fee and any platform usage (storage) are additional and depend on your Apify plan.

Integrations

Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:

  • Zapier / Make / n8n — Workflow automation
  • Google Sheets — Direct spreadsheet export
  • Slack / Email — Notifications on new results
  • Webhooks — Trigger custom APIs on run completion
  • Apify API — Full programmatic access

This actor is designed for legitimate research, market analysis, content curation, and lead generation. Users are responsible for complying with applicable laws and Scribd's Terms of Service. Only collect publicly available document metadata, respect copyright and authors' rights, and do not use extracted data for spam, harassment, or any unlawful purpose.