Wiktionary Definitions Scraper
Pricing
from $9.00 / 1,000 result items
Wiktionary Definitions Scraper
Fetch dictionary definitions from Wiktionary in 9 source languages. Returns part of speech, definitions, examples, and cross-language meanings per word. Plain-text and HTML output for one-shot or bulk word lists.
Pricing
from $9.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share

📖 Wiktionary Definitions Scraper
🚀 Export multilingual dictionary entries in seconds. Pull definitions, parts of speech, and example sentences from 6 million+ Wiktionary entries across nine language editions. No API key, no registration, no XML wrangling.
🕒 Last updated: 2026-05-22 · 📊 9 fields per record · 📖 6M+ entries · 🌐 9 language editions · 🔤 50+ parts of speech
The Wiktionary Definitions Scraper queries the community-maintained multilingual dictionary and returns 9 fields per record, including the headword, part of speech, every definition, illustrative examples, source language edition, and a link back to the canonical page. The underlying dataset is the largest open dictionary in the world, maintained by a global community of lexicographers, linguists, and language enthusiasts.
The catalog covers English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, and Korean editions, with cross-language meanings included whenever the source edition documents them. This Actor turns the dictionary into downloadable CSV, Excel, JSON, or XML in under five minutes. Bulk lookups, single-word checks, and cross-edition merges all run from the same input form.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| NLP engineers, lexicographers, linguists, language-learning apps, search teams, content moderators, dictionary builders | Lemma dictionaries, training data for tokenizers and word embeddings, definition lookups in chat UIs, vocabulary expansion for language apps |
📋 What the Wiktionary Definitions Scraper does
Three lookup workflows in a single run:
- 🔤 Single-word lookup. Resolve one headword across every documented language in the chosen edition.
- 📚 Bulk word list. Pass dozens or thousands of words and stream a record per entry.
- 🌐 Cross-language coverage. From the English edition, pull meanings for words in any documented language with English glosses.
Each record includes the headword, source language edition, entry language code, entry language name, part of speech, definition count, the full definitions array, illustrative examples, and the canonical Wiktionary page URL.
💡 Why it matters: dictionary data underpins autocomplete, spell-check, machine translation, and language-learning features. Building your own scraper means handling wiki markup, namespace conventions, and per-edition schema drift. This Actor skips all of that and refreshes on every run.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
words | array | ["hello","world","etymology","lexicon","polyglot"] | Headwords to look up. One record per word and language pair found. |
language | string | "en" | Wiktionary edition. English has the broadest cross-language coverage. |
Example: 25 English words with full definitions.
{"maxItems": 25,"words": ["serendipity", "ephemeral", "ubiquitous", "obfuscate", "perspicacious"],"language": "en"}
Example: French headwords from the French edition.
{"maxItems": 50,"words": ["bonjour", "maison", "amitié", "liberté"],"language": "fr"}
⚠️ Good to Know: the English Wiktionary edition documents the broadest cross-language coverage. If you query a word that only exists in one edition, switch the
languagefield to that edition for the richest result.
📊 Output
Each entry record contains 9 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🔤 word | string | "hello" |
🌐 sourceLanguage | string | "en" |
🆔 entryLanguageCode | string | "en" |
🗣️ entryLanguage | string | "English" |
📚 partOfSpeech | string | "Interjection" |
🔢 definitionCount | number | 4 |
📖 definitions | string[] | ["A greeting (salutation) said when meeting someone or acknowledging someone's arrival..."] |
💬 examples | string[] | ["Hello, everyone."] |
🔗 pageUrl | string | "https://en.wiktionary.org/wiki/hello" |
🕒 scrapedAt | ISO 8601 | "2026-05-22T10:00:00.000Z" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 📖 | Massive coverage. Six million+ entries spanning nine language editions and dozens of part-of-speech categories. |
| 🌐 | Cross-language ready. The English edition documents meanings for thousands of languages, all returned in one structured record. |
| 🧩 | Plain-text definitions. Wiki markup is stripped, so output is ready for tokenizers and embeddings. |
| ⚡ | Fast. 10 entries in under 5 seconds, 1,000 entries in under three minutes. |
| 💬 | Examples included. Every record exposes the illustrative sentences alongside the definitions. |
| 🔁 | Always fresh. Each run hits the live Wiktionary edition, so the dataset reflects current edits. |
| 🚫 | No authentication. Works against the public Wiktionary content. No login or API key needed. |
📊 Dictionary data is the foundation of every spell-checker, autocomplete, translation feature, and language-learning app in modern software.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Setup |
|---|---|---|---|---|
| ⭐ Wiktionary Definitions Scraper (this Actor) | $5 free credit, then pay-per-use | 6M+ entries, 9 editions | Live per run | ⚡ 2 min |
| Manual Wiktionary downloads | Free | Full dump, stale by weeks | Monthly | 🐢 Hours |
| In-house wiki-markup parser | Free + engineering | Full | Build it yourself | 🛠️ Weeks |
| Commercial dictionary APIs | $99+/month | Curated subset | Daily | ⏳ Hours |
Pick this Actor when you want broad multilingual coverage, fresh data, and zero markup parsing.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the Wiktionary Definitions Scraper page on the Apify Store.
- 🎯 Set input. Pick a language edition and paste a list of words. Set
maxItems. - 🚀 Run it. Click Start and let the Actor collect your data.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating Wiktionary Definitions Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily or weekly refreshes keep dictionary content current automatically.
🌟 Beyond business use cases
Dictionary data powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Paste a list of words, pick a Wiktionary edition, click Start, and the Actor resolves each word against the chosen edition and emits a clean structured record per entry. No browser automation, no captchas, no setup.
📏 How accurate is the data?
Wiktionary entries are community-maintained, peer-reviewed, and cited across major NLP datasets and language-learning products. For mission-critical lexicography, treat it as the same way you would treat any community reference.
🔁 How often is the dataset refreshed?
Wiktionary editors publish updates continuously. Every run of this Actor pulls live entries, so your dataset reflects the latest community edits as of run time.
🌐 Which languages are supported?
Nine source editions: English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, and Korean. The English edition documents meanings for thousands of additional languages via cross-language entries.
⏰ Can I schedule regular runs?
Yes. Use Apify Schedules to run this Actor on any cron interval (daily, weekly, monthly) and keep a downstream dictionary in sync.
⚖️ Is this data legal to use?
Wiktionary content is published under Creative Commons Attribution-ShareAlike. Attribution and share-alike requirements apply to redistributed entries. Review the license before integrating into a commercial product.
💼 Can I use this data commercially?
Yes, under the Creative Commons Attribution-ShareAlike terms. You are responsible for the attribution and share-alike requirements in your downstream product.
💳 Do I need a paid Apify plan to use this Actor?
No. The free Apify plan is enough for testing and small word lists (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger word lists.
🔁 What happens if a word does not exist in the chosen edition?
A diagnostic record is pushed with an error field explaining the miss. The run continues processing the rest of the word list.
🔤 Does it include etymology and pronunciation?
This Actor returns definitions, part of speech, and examples. For etymology and IPA pronunciation, reach out via the contact form below to request a companion etymology scraper.
🆘 What if I need help?
Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.
🔌 Integrate with any app
Wiktionary Definitions Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe entries into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Push fresh dictionary entries into your model retraining loop, or alert your team in Slack.
🔗 Recommended Actors
- 🧬 Wikidata Lexemes Scraper - Structured multilingual lexeme records
- 📚 Wikipedia Scraper - Encyclopedic articles and references
- 📖 Open Library Scraper - Bibliographic metadata for millions of books
- 🗺️ Nominatim OSM Scraper - Geocode addresses via OpenStreetMap
- 📰 ArXiv Scraper - Scientific preprint metadata
💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Wiktionary, the Wikimedia Foundation, or any of its contributors. All trademarks mentioned are the property of their respective owners. Only publicly available open dictionary data is collected.