I18n Audit
Pricing
Pay per usage
I18n Audit
Detects translation gaps and meaning/structural differences between multilingual pages. - Finds missing content and meaning drift in translated web pages - Compares multilingual pages to detect translation and structure gaps - Identifies incomplete or inconsistent page translations across languages
Pricing
Pay per usage
Rating
5.0
(1)
Developer

Lisa Akinfiieva
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
13 days ago
Last modified
Share
i18n-audit — Internationalization Audit Actor
Automatically detect language inconsistencies on your multilingual websites. This Actor crawls web pages and identifies content that doesn't match the expected language, helping you maintain quality across all language versions of your site.
Features
- Detects dominant language and per-element language distribution
- Computes a consistency rate (fraction of content in the expected language)
- Lists suspected discrepancies with short text snippets for manual review
- Suggests a translation for the detected discrepancies
Input
The Actor requires the following parameters:
start_urls— starting URLs to crawl and audit
{"start_urls": [{"url": "https://www.postaonline.cz/en"}]}
Output
The Actor writes results to the default dataset. Only pages with language discrepancies are included in the output, sorted by discrepancy count (most broken pages first). Each item contains the following fields (see .actor/output_schema.json):
Language Detection:
url— audited page URL (final URL after any redirects)title— page titleexpected_language— automatically detected ISO 639-1 language code from URL or HTMLlanguage_source— source of the expected language:"url"(from URL path or subdomain) or"html"(from<html lang>attribute)url_language— language code extracted from the URL (e.g.,"en"from/en/pageoren.example.com), ornullif not foundhtml_language— language code from the HTMLlangattribute (e.g.,"en"from<html lang="en">), ornullif not foundhtml_url_mismatch— boolean indicating whether URL language and HTML lang attribute differ
Consistency Analysis:
is_consistent— boolean, whether the page meets the 90% consistency thresholdconsistency_rate— number between 0.0 and 1.0 indicating percentage of content in the expected languagedominant_language— language detected as most frequent on the pagelanguage_distribution— map of language code → count of detected text elements
Discrepancy Reports:
discrepancies_count— number of verified language discrepancies (double-checked with both langdetect and lingua)discrepancies— array of up to 10 verified discrepancies with fields:tag— HTML tag name (e.g.,"p","h1","li")text— first 100 characters of the textlanguage— detected language codefull_length— total character length of the original texttranslation— machine translation of the text to the expected language (optional)translation_language— target language of the translation (optional)
Example output item:
{"url": "https://www.postaonline.cz/en/podani-online","title": "Online Posting","expected_language": "en","language_source": "url","url_language": "en","html_language": "en","html_url_mismatch": false,"is_consistent": false,"consistency_rate": 0.93,"dominant_language": "en","language_distribution": {"en": 150, "cs": 12},"discrepancies_count": 12,"discrepancies": [{"tag": "p","text": "Vstup do nové aplikace","language": "cs","full_length": 22,"translation": "Enter a new app","translation_language": "en"},...]}

