Cambridge Dictionary Definition & IPA Scraper avatar

Cambridge Dictionary Definition & IPA Scraper

Pricing

Pay per event

Go to Apify Store
Cambridge Dictionary Definition & IPA Scraper

Cambridge Dictionary Definition & IPA Scraper

Scrapes Cambridge Dictionary entries with full learner metadata: headword, CEFR level (A1–C2), UK and US IPA pronunciation, audio URLs, part of speech, guideword, definitions, and example sentences. Ideal for vocabulary apps, language-learning curricula, and NLP datasets.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Categories

Share

Scrapes Cambridge Dictionary (dictionary.cambridge.org) entries and returns the structured learner-oriented metadata that vocabulary apps, NLP pipelines, and language-curriculum tools actually need: headword, CEFR level (A1–C2), UK and US IPA pronunciation strings, audio MP3 URLs, part of speech, guidewords, definitions, and example sentences.

What you get

Each output record represents one sense block for a headword (a word like "bank" has multiple senses — MONEY, GROUND, STORE, etc. — each becoming a separate row).

FieldDescription
headwordDictionary headword (e.g. "hello")
part_of_speechPart of speech (exclamation, noun, verb, …)
cefr_levelCambridge CEFR tag: A1, A2, B1, B2, C1, C2 — or empty if untagged
uk_ipaBritish English IPA string (e.g. heˈləʊ)
us_ipaAmerican English IPA string (e.g. heˈloʊ)
uk_audio_urlAbsolute URL to the UK pronunciation MP3
us_audio_urlAbsolute URL to the US pronunciation MP3
guidewordSense disambiguator (e.g. "MONEY" for bank's financial sense)
definitionsPipe-separated list of definitions for this sense
example_sentencesPipe-separated list of example sentences
urlCanonical source URL for the entry page
scrapedAtISO-8601 scrape timestamp

How to use

Look up specific words (fastest)

Supply a list of headwords and the actor fetches only those entry pages:

{
"startWords": ["hello", "bank", "run", "beautiful"],
"maxItems": 0
}

Set maxItems: 0 for no limit, or a positive integer to cap output at that many senses.

Crawl the full English dictionary (A–Z browse)

Leave startWords empty to crawl all ~140,000 English headwords via Cambridge's A-Z browse hierarchy:

{
"startWords": [],
"maxItems": 0
}

A full crawl processes the browse hierarchy: root → letter pages (A–Z) → sub-group pages → individual entries. Set maxItems to limit output for testing.

Input parameters

ParameterTypeDescription
startWordsarrayHeadwords to look up directly. Empty = full A-Z crawl.
maxItemsintegerMax sense records to output. 0 = no limit. Default: 10.

Data source

All data is scraped from dictionary.cambridge.org (the Cambridge Advanced Learner's Dictionary sub-domain). This scraper does not cover bilingual or specialized Cambridge dictionaries. The site serves static HTML — no JavaScript rendering required, no proxy needed.

Notes

  • CEFR tags (A1–C2) appear only on headwords Cambridge has officially tagged; less common words may have an empty cefr_level.
  • UK and US audio URLs point directly to Cambridge's CDN MP3 files.
  • Some entries appear in both the CALD4 (British) and CACD (American) dictionaries on the same page — the actor may produce two records for the same headword with differing US IPA strings.