Wikiquote Scraper
Pricing
Pay per event
Wikiquote Scraper
Extract quotes from any Wikiquote page — by person, work, or topic — via the Wikiquote MediaWiki API. Returns each quote with attribution, source work, year, and language, exported to JSON or CSV. Free, multilingual.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Share
🎯 What this scrapes
Wikiquote is the world's largest community-edited quotes library — a sister project of Wikipedia with strict citation requirements. This Actor accepts a list of Wikiquote article titles (or full URLs) and writes one dataset row per quote, with full attribution metadata and — when the page supplies it — the source work and year.
Works across every Wikiquote language subdomain: pass language: "de" and get German-language quotes. Unsourced, disputed, and misattributed sections are labelled separately in the section field so you can filter on attribution quality rather than guessing.
🔥 Features
- 🛡️ Browser fingerprint rotation —
curl-cffireplays real Chrome / Firefox / Safari TLS handshakes so the target sees a real browser, not a Python script. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block signal.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down and wait rather than triggering a ban.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, export-ready as JSON / CSV / Excel from the Apify Console.
- 💰 Pay-Per-Event pricing — you pay only when a result lands in your dataset. No data, no charge.
💡 Use cases
- Daily-quote service — schedule a run for a curated list and push one quote per day to your app or newsletter.
- Citation enrichment — find a properly sourced quote when you have only the speaker's name.
- Multilingual analysis — pull quotes on the same topic across 5 or more language editions.
- Movie / book reference assembly — extract every quote from a film or novel's Wikiquote page for a study guide or quiz app.
- Attribution-real RAG corpus — small, clean, citation-grounded text for LLM retrieval demos where hallucinated attributions are unacceptable.
- Education / language-learning apps — real sourced quotes in the target language, with section labels for difficulty filtering.
⚙️ How to use it
- Click Try for free at the top of the Store page.
- Enter your list of Wikiquote article titles — use the exact name as shown on the Wikiquote page (e.g.
Albert Einstein,The Dark Knight). - Set your
languagecode if you need a non-English edition. - Click Start. Results stream into the run's dataset in real time.
- Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify dataset API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
pages | array | yes | ["Albert Einstein", "Oscar Wilde"] | List of Wikiquote article titles or full page URLs. Use the exact title as shown on Wikiquote. |
language | string | no | "en" | Wikiquote subdomain ISO code (en, de, fr, es, etc.). |
maxQuotesPerPage | integer | no | 50 | Cap on quotes extracted per page. Some pages have hundreds; default keeps cost predictable. |
proxyConfiguration | object | no | {"useApifyProxy": false} | Wikiquote serves programmatic clients. Proxy is optional but available if needed. |
Example input
{"pages": ["Albert Einstein","Marcus Aurelius"],"language": "en","maxQuotesPerPage": 50,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
quote | string | The quote text, plain string. |
attribution | string | Who the quote is attributed to (the Wikiquote page title). |
source | string | null | Source work or context when Wikiquote supplies it. |
year | string | null | Year of the quote when detectable from the page. |
section | string | null | Section heading the quote appeared under (e.g. "Sourced", "Disputed", "Misattributed"). |
page_url | string | Full URL of the source Wikiquote page. |
language | string | Wikiquote language code used for this page. |
scraped_at | string | ISO-8601 timestamp when this row was recorded. |
Example output
{"quote": "Everything should be made as simple as possible, but not simpler.","attribution": "Albert Einstein","source": "Reader's Digest, October 1977","year": "1933","section": "Sourced","page_url": "https://en.wikiquote.org/wiki/Albert_Einstein","language": "en","scraped_at": "2026-06-01T09:00:00Z"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.001 | Per dataset item pushed |
Example: 1 000 results ≈ $1.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
- We parse the HTML rendered by Wikiquote's MediaWiki engine. Pages that use unusual templates may surface a quote without a source or year — the
sectionfield tells you whether it came from a verified, attributed, disputed, or misattributed section. - Themed list pages (
List of quotes about X) are supported, but theattributionfield will be the page title rather than an individual speaker. Category:pages (index pages listing many articles) are not yet supported — pass individual article titles. Category enumeration is on the roadmap.- Wikiquote markup varies across language editions; rare edge-case pages may parse with reduced fidelity. We surface the
sectionlabel so you can filter on quality.
❓ FAQ
Is this legal?
Yes. Wikiquote content is published under the CC BY-SA licence. Attribute the source when reusing quotes in a commercial product.
Is there a Wikiquote API I can use directly instead?
Wikiquote exposes the standard MediaWiki API, but it returns raw WikiText — a brittle, per-page markup that varies wildly across language editions and page authors. Parsing it correctly requires handling dozens of template variants, nested sections, and unsourced markers. This Actor absorbs that complexity so you receive clean, structured JSON rows.
Can I use this as a free famous quotes API?
Yes — for attribution-real, citation-grounded quotes it's the best free option. Wikiquote is the only community-edited source that requires citations; random "famous quotes" APIs typically contain hallucinated or misattributed text. Export your results as JSON, host them behind a Cloudflare Worker, and you have a GET /random endpoint backed by real sources.
Why are some quotes missing a source?
Wikiquote contributors don't always supply a citation. The section field tells you which attribution tier the quote is in — filter to section: "Sourced" for citation-confirmed quotes only.
What if a page is huge?
Use maxQuotesPerPage to cap output. Some pages have hundreds of quotes; the default of 50 keeps cost predictable. Remove the cap if you need the full page.
Do you support multilingual pages?
Yes. Set language to any ISO code with a Wikiquote subdomain — "de", "fr", "es", "pt", "it", "ru", and many more. Each language edition is a separate subdomain with its own article set.
Do you support Category: pages?
Not yet — pass individual article titles. Category enumeration is on the roadmap.
💬 Your feedback
Spotted a bug, hit a weird parse edge case, or need a new field? Open an issue on the Actor's Issues tab in the Apify Console — we ship fixes weekly and we read every report.