I18n Audit avatar
I18n Audit
Under maintenance

Pricing

Pay per usage

Go to Apify Store
I18n Audit

I18n Audit

Under maintenance

Detects translation gaps and meaning/structural differences between multilingual pages. - Finds missing content and meaning drift in translated web pages - Compares multilingual pages to detect translation and structure gaps - Identifies incomplete or inconsistent page translations across languages

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Lisa Akinfiieva

Lisa Akinfiieva

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

13 days ago

Last modified

Categories

Share

i18n-audit — Internationalization Audit Actor

Automatically detect language inconsistencies on your multilingual websites. This Actor crawls web pages and identifies content that doesn't match the expected language, helping you maintain quality across all language versions of your site.

Features

  • Detects dominant language and per-element language distribution
  • Computes a consistency rate (fraction of content in the expected language)
  • Lists suspected discrepancies with short text snippets for manual review
  • Suggests a translation for the detected discrepancies

Input

The Actor requires the following parameters:

  • start_urls — starting URLs to crawl and audit
{
"start_urls": [
{
"url": "https://www.postaonline.cz/en"
}
]
}

Output

The Actor writes results to the default dataset. Only pages with language discrepancies are included in the output, sorted by discrepancy count (most broken pages first). Each item contains the following fields (see .actor/output_schema.json):

Language Detection:

  • url — audited page URL (final URL after any redirects)
  • title — page title
  • expected_language — automatically detected ISO 639-1 language code from URL or HTML
  • language_source — source of the expected language: "url" (from URL path or subdomain) or "html" (from <html lang> attribute)
  • url_language — language code extracted from the URL (e.g., "en" from /en/page or en.example.com), or null if not found
  • html_language — language code from the HTML lang attribute (e.g., "en" from <html lang="en">), or null if not found
  • html_url_mismatch — boolean indicating whether URL language and HTML lang attribute differ

Consistency Analysis:

  • is_consistent — boolean, whether the page meets the 90% consistency threshold
  • consistency_rate — number between 0.0 and 1.0 indicating percentage of content in the expected language
  • dominant_language — language detected as most frequent on the page
  • language_distribution — map of language code → count of detected text elements

Discrepancy Reports:

  • discrepancies_count — number of verified language discrepancies (double-checked with both langdetect and lingua)
  • discrepancies — array of up to 10 verified discrepancies with fields:
    • tag — HTML tag name (e.g., "p", "h1", "li")
    • text — first 100 characters of the text
    • language — detected language code
    • full_length — total character length of the original text
    • translation — machine translation of the text to the expected language (optional)
    • translation_language — target language of the translation (optional)

Example output item:

{
"url": "https://www.postaonline.cz/en/podani-online",
"title": "Online Posting",
"expected_language": "en",
"language_source": "url",
"url_language": "en",
"html_language": "en",
"html_url_mismatch": false,
"is_consistent": false,
"consistency_rate": 0.93,
"dominant_language": "en",
"language_distribution": {"en": 150, "cs": 12},
"discrepancies_count": 12,
"discrepancies": [
{
"tag": "p",
"text": "Vstup do nové aplikace",
"language": "cs",
"full_length": 22,
"translation": "Enter a new app",
"translation_language": "en"
},
...
]
}