Cambridge Dictionary Definition & IPA Scraper
Pricing
Pay per event
Cambridge Dictionary Definition & IPA Scraper
Scrapes Cambridge Dictionary entries with full learner metadata: headword, CEFR level (A1–C2), UK and US IPA pronunciation, audio URLs, part of speech, guideword, definitions, and example sentences. Ideal for vocabulary apps, language-learning curricula, and NLP datasets.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 days ago
Last modified
Share
Scrapes Cambridge Dictionary (dictionary.cambridge.org) entries and returns the structured learner-oriented metadata that vocabulary apps, NLP pipelines, and language-curriculum tools actually need: headword, CEFR level (A1–C2), UK and US IPA pronunciation strings, audio MP3 URLs, part of speech, guidewords, definitions, and example sentences.
What you get
Each output record represents one sense block for a headword (a word like "bank" has multiple senses — MONEY, GROUND, STORE, etc. — each becoming a separate row).
| Field | Description |
|---|---|
headword | Dictionary headword (e.g. "hello") |
part_of_speech | Part of speech (exclamation, noun, verb, …) |
cefr_level | Cambridge CEFR tag: A1, A2, B1, B2, C1, C2 — or empty if untagged |
uk_ipa | British English IPA string (e.g. heˈləʊ) |
us_ipa | American English IPA string (e.g. heˈloʊ) |
uk_audio_url | Absolute URL to the UK pronunciation MP3 |
us_audio_url | Absolute URL to the US pronunciation MP3 |
guideword | Sense disambiguator (e.g. "MONEY" for bank's financial sense) |
definitions | Pipe-separated list of definitions for this sense |
example_sentences | Pipe-separated list of example sentences |
url | Canonical source URL for the entry page |
scrapedAt | ISO-8601 scrape timestamp |
How to use
Look up specific words (fastest)
Supply a list of headwords and the actor fetches only those entry pages:
{"startWords": ["hello", "bank", "run", "beautiful"],"maxItems": 0}
Set maxItems: 0 for no limit, or a positive integer to cap output at that many senses.
Crawl the full English dictionary (A–Z browse)
Leave startWords empty to crawl all ~140,000 English headwords via Cambridge's A-Z browse hierarchy:
{"startWords": [],"maxItems": 0}
A full crawl processes the browse hierarchy: root → letter pages (A–Z) → sub-group pages → individual entries. Set maxItems to limit output for testing.
Input parameters
| Parameter | Type | Description |
|---|---|---|
startWords | array | Headwords to look up directly. Empty = full A-Z crawl. |
maxItems | integer | Max sense records to output. 0 = no limit. Default: 10. |
Data source
All data is scraped from dictionary.cambridge.org (the Cambridge Advanced Learner's Dictionary sub-domain). This scraper does not cover bilingual or specialized Cambridge dictionaries. The site serves static HTML — no JavaScript rendering required, no proxy needed.
Notes
- CEFR tags (A1–C2) appear only on headwords Cambridge has officially tagged; less common words may have an empty
cefr_level. - UK and US audio URLs point directly to Cambridge's CDN MP3 files.
- Some entries appear in both the CALD4 (British) and CACD (American) dictionaries on the same page — the actor may produce two records for the same headword with differing US IPA strings.
