Wiktionary Definitions Scraper avatar

Wiktionary Definitions Scraper

Pricing

from $9.00 / 1,000 result items

Go to Apify Store
Wiktionary Definitions Scraper

Wiktionary Definitions Scraper

Fetch dictionary definitions from Wiktionary in 9 source languages. Returns part of speech, definitions, examples, and cross-language meanings per word. Plain-text and HTML output for one-shot or bulk word lists.

Pricing

from $9.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

📖 Wiktionary Definitions Scraper

🚀 Export multilingual dictionary entries in seconds. Pull definitions, parts of speech, and example sentences from 6 million+ Wiktionary entries across nine language editions. No API key, no registration, no XML wrangling.

🕒 Last updated: 2026-05-22 · 📊 9 fields per record · 📖 6M+ entries · 🌐 9 language editions · 🔤 50+ parts of speech

The Wiktionary Definitions Scraper queries the community-maintained multilingual dictionary and returns 9 fields per record, including the headword, part of speech, every definition, illustrative examples, source language edition, and a link back to the canonical page. The underlying dataset is the largest open dictionary in the world, maintained by a global community of lexicographers, linguists, and language enthusiasts.

The catalog covers English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, and Korean editions, with cross-language meanings included whenever the source edition documents them. This Actor turns the dictionary into downloadable CSV, Excel, JSON, or XML in under five minutes. Bulk lookups, single-word checks, and cross-edition merges all run from the same input form.

🎯 Target Audience💡 Primary Use Cases
NLP engineers, lexicographers, linguists, language-learning apps, search teams, content moderators, dictionary buildersLemma dictionaries, training data for tokenizers and word embeddings, definition lookups in chat UIs, vocabulary expansion for language apps

📋 What the Wiktionary Definitions Scraper does

Three lookup workflows in a single run:

  • 🔤 Single-word lookup. Resolve one headword across every documented language in the chosen edition.
  • 📚 Bulk word list. Pass dozens or thousands of words and stream a record per entry.
  • 🌐 Cross-language coverage. From the English edition, pull meanings for words in any documented language with English glosses.

Each record includes the headword, source language edition, entry language code, entry language name, part of speech, definition count, the full definitions array, illustrative examples, and the canonical Wiktionary page URL.

💡 Why it matters: dictionary data underpins autocomplete, spell-check, machine translation, and language-learning features. Building your own scraper means handling wiki markup, namespace conventions, and per-edition schema drift. This Actor skips all of that and refreshes on every run.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
wordsarray["hello","world","etymology","lexicon","polyglot"]Headwords to look up. One record per word and language pair found.
languagestring"en"Wiktionary edition. English has the broadest cross-language coverage.

Example: 25 English words with full definitions.

{
"maxItems": 25,
"words": ["serendipity", "ephemeral", "ubiquitous", "obfuscate", "perspicacious"],
"language": "en"
}

Example: French headwords from the French edition.

{
"maxItems": 50,
"words": ["bonjour", "maison", "amitié", "liberté"],
"language": "fr"
}

⚠️ Good to Know: the English Wiktionary edition documents the broadest cross-language coverage. If you query a word that only exists in one edition, switch the language field to that edition for the richest result.


📊 Output

Each entry record contains 9 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🔤 wordstring"hello"
🌐 sourceLanguagestring"en"
🆔 entryLanguageCodestring"en"
🗣️ entryLanguagestring"English"
📚 partOfSpeechstring"Interjection"
🔢 definitionCountnumber4
📖 definitionsstring[]["A greeting (salutation) said when meeting someone or acknowledging someone's arrival..."]
💬 examplesstring[]["Hello, everyone."]
🔗 pageUrlstring"https://en.wiktionary.org/wiki/hello"
🕒 scrapedAtISO 8601"2026-05-22T10:00:00.000Z"

📦 Sample records


✨ Why choose this Actor

Capability
📖Massive coverage. Six million+ entries spanning nine language editions and dozens of part-of-speech categories.
🌐Cross-language ready. The English edition documents meanings for thousands of languages, all returned in one structured record.
🧩Plain-text definitions. Wiki markup is stripped, so output is ready for tokenizers and embeddings.
Fast. 10 entries in under 5 seconds, 1,000 entries in under three minutes.
💬Examples included. Every record exposes the illustrative sentences alongside the definitions.
🔁Always fresh. Each run hits the live Wiktionary edition, so the dataset reflects current edits.
🚫No authentication. Works against the public Wiktionary content. No login or API key needed.

📊 Dictionary data is the foundation of every spell-checker, autocomplete, translation feature, and language-learning app in modern software.


📈 How it compares to alternatives

ApproachCostCoverageRefreshSetup
⭐ Wiktionary Definitions Scraper (this Actor)$5 free credit, then pay-per-use6M+ entries, 9 editionsLive per run⚡ 2 min
Manual Wiktionary downloadsFreeFull dump, stale by weeksMonthly🐢 Hours
In-house wiki-markup parserFree + engineeringFullBuild it yourself🛠️ Weeks
Commercial dictionary APIs$99+/monthCurated subsetDaily⏳ Hours

Pick this Actor when you want broad multilingual coverage, fresh data, and zero markup parsing.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Wiktionary Definitions Scraper page on the Apify Store.
  3. 🎯 Set input. Pick a language edition and paste a list of words. Set maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🤖 NLP & Search Teams

  • Lemma dictionaries for tokenizers
  • Training data for word embeddings
  • Spell-check and autocomplete backbones
  • Stopword and morphology references

📱 Language-Learning Apps

  • In-app definition popovers
  • Vocabulary builder content packs
  • Example sentences for flashcards
  • Multilingual lookup features

📚 Lexicographers & Linguists

  • Comparative entries across editions
  • Coverage gap analyses by language
  • Reference corpus for academic papers
  • Source dataset for derived dictionaries

💬 Content & Chat Platforms

  • Definition lookups inside chat UIs
  • Glossary widgets in publishing tools
  • Content moderation glossaries
  • Knowledge-base entity enrichment

🔌 Automating Wiktionary Definitions Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily or weekly refreshes keep dictionary content current automatically.


🌟 Beyond business use cases

Dictionary data powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Lexicographic studies and theses
  • Corpus linguistics with cited datasets
  • Cross-edition comparative research
  • Open-data exercises on dictionary coverage

🎨 Personal and creative

  • Indie language-learning side projects
  • Vocabulary games and puzzle apps
  • Writer reference tools and content packs
  • Hobbyist multilingual dictionaries

🤝 Non-profit and civic

  • Endangered-language documentation aids
  • Community translation projects
  • Educational glossaries for schools and libraries
  • Civic literacy programs and public reference sites

🧪 Experimentation

  • Train word-sense disambiguation models
  • Prototype agents that paraphrase or define
  • Build glossary chrome extensions
  • Test definition lookup UX with real data

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Paste a list of words, pick a Wiktionary edition, click Start, and the Actor resolves each word against the chosen edition and emits a clean structured record per entry. No browser automation, no captchas, no setup.

📏 How accurate is the data?

Wiktionary entries are community-maintained, peer-reviewed, and cited across major NLP datasets and language-learning products. For mission-critical lexicography, treat it as the same way you would treat any community reference.

🔁 How often is the dataset refreshed?

Wiktionary editors publish updates continuously. Every run of this Actor pulls live entries, so your dataset reflects the latest community edits as of run time.

🌐 Which languages are supported?

Nine source editions: English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, and Korean. The English edition documents meanings for thousands of additional languages via cross-language entries.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (daily, weekly, monthly) and keep a downstream dictionary in sync.

Wiktionary content is published under Creative Commons Attribution-ShareAlike. Attribution and share-alike requirements apply to redistributed entries. Review the license before integrating into a commercial product.

💼 Can I use this data commercially?

Yes, under the Creative Commons Attribution-ShareAlike terms. You are responsible for the attribution and share-alike requirements in your downstream product.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small word lists (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger word lists.

🔁 What happens if a word does not exist in the chosen edition?

A diagnostic record is pushed with an error field explaining the miss. The run continues processing the rest of the word list.

🔤 Does it include etymology and pronunciation?

This Actor returns definitions, part of speech, and examples. For etymology and IPA pronunciation, reach out via the contact form below to request a companion etymology scraper.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

Wiktionary Definitions Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe entries into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh dictionary entries into your model retraining loop, or alert your team in Slack.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Wiktionary, the Wikimedia Foundation, or any of its contributors. All trademarks mentioned are the property of their respective owners. Only publicly available open dictionary data is collected.