Website Language Detector avatar

Website Language Detector

Pricing

Pay per event

Go to Apify Store
Website Language Detector

Website Language Detector

This actor detects the language of web pages by analyzing multiple signals: HTML `lang` attribute, `Content-Language` HTTP header, meta tags, Open Graph locale, and hreflang tags. It reports language codes, confidence levels, charset, text direction, and identifies language-related issues.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Detect the language of web pages from HTML attributes and content analysis.

What does Website Language Detector do?

This actor detects the language of web pages by analyzing multiple signals: HTML lang attribute, Content-Language HTTP header, meta tags, Open Graph locale, and hreflang tags. It reports language codes, confidence levels, charset, text direction, and identifies language-related issues.

Process hundreds of URLs in a single run to audit language configuration across entire websites. The actor flags inconsistencies between different language signals, helping you identify and fix problems that can confuse search engines and screen readers.

Use cases

  • International SEO specialists -- verify that language signals (lang attribute, hreflang tags, Content-Language headers) are consistent across all pages
  • Content localization teams -- audit lang attributes across multilingual websites to ensure correct language tagging
  • SEO auditors -- find missing or conflicting hreflang tags that can cause search engines to show the wrong language version
  • Accessibility testers -- ensure the lang attribute is set on every page, which is required for screen readers to pronounce content correctly
  • Web developers -- quickly check language configuration after deploying new pages or updating CMS settings
  • Compliance teams -- verify that government or regulated websites meet language accessibility requirements across all pages
  • Migration teams -- ensure that language attributes and hreflang tags are preserved correctly during site migrations or CMS switches

Why use Website Language Detector?

  • Multi-signal analysis -- checks HTML lang, Content-Language header, meta tags, OG locale, and hreflang tags
  • Batch processing -- detect languages across hundreds of URLs in a single run
  • Confidence scoring -- each detection includes a confidence level based on how many signals agree
  • Issue detection -- flags missing lang attributes, conflicting signals, and other language-related problems
  • Structured output -- clean JSON with all language signals and hreflang data, ready for analysis
  • Pay-per-event pricing -- cost-effective at scale, just $0.001 per URL

Input parameters

ParameterTypeRequiredDefaultDescription
urlsstring[]Yes--List of URLs to detect language for

Example input

{
"urls": [
"https://www.google.com",
"https://www.wikipedia.org",
"https://www.lemonde.fr"
]
}

Output example

{
"url": "https://www.lemonde.fr",
"detectedLanguage": "fr (French)",
"htmlLang": "fr",
"contentLanguageHeader": null,
"metaLanguage": null,
"ogLocale": "fr_FR",
"hreflangTags": [
{ "lang": "fr", "href": "https://www.lemonde.fr/" }
],
"confidence": "medium",
"charsetDetected": "UTF-8",
"textDirection": null,
"issues": [],
"error": null,
"checkedAt": "2026-03-01T12:00:00.000Z"
}

How much does it cost?

EventPriceDescription
Start$0.035One-time per run
URL checked$0.001Per URL checked

Example costs:

  • 10 URLs: $0.035 + 10 x $0.001 = $0.045
  • 100 URLs: $0.035 + 100 x $0.001 = $0.135
  • 1,000 URLs: $0.035 + 1,000 x $0.001 = $1.035

Using the Apify API

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('automation-lab/website-language-detector').call({
urls: ['https://www.lemonde.fr'],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('automation-lab/website-language-detector').call(run_input={
'urls': ['https://www.lemonde.fr'],
})
items = client.dataset(run['defaultDatasetId']).list_items().items
for item in items:
print(f'{item["url"]}: {item["detectedLanguage"]} (confidence: {item["confidence"]})')

Integrations

Website Language Detector integrates with your existing workflow through the Apify platform. Connect it to Make (formerly Integromat), Zapier, or n8n to automate language audits on a schedule. Export results to Google Sheets for team review, send alerts to Slack when language misconfigurations are detected, or use webhooks to trigger follow-up actions such as flagging pages for translation review.

Common integration patterns include:

  • International SEO dashboard -- schedule weekly runs and push results to Google Sheets to track language signal consistency across all localized pages
  • Translation workflow -- trigger language detection after new content is published and alert the localization team if lang attributes are missing
  • Multi-market monitoring -- feed all regional URLs into the actor and verify that hreflang tags correctly reference each alternate language version

Tips and best practices

  • Check all language signals -- a page may have a correct lang attribute but a conflicting Content-Language header, which can confuse search engines.
  • Audit hreflang tags -- feed all language versions of your pages to verify that hreflang tags point to the correct alternates.
  • Look at the issues array -- this field highlights specific problems like missing lang attributes or conflicting signals, saving you from manual analysis.
  • Run on your full sitemap -- language issues often hide on deep pages that were missed during initial setup; test all URLs, not just the homepage.
  • Combine with OG Meta Extractor -- for a complete international SEO audit, pair language detection with metadata extraction.

FAQ

What language signals does it check? It checks the HTML lang attribute, Content-Language HTTP header, <meta> language tags, Open Graph og:locale, and <link rel="alternate" hreflang="..."> tags.

Does it detect the language from page content? No. It analyzes HTML attributes, HTTP headers, and meta tags rather than the actual text content. This approach is faster and aligns with how search engines determine page language.

Can it detect multiple languages on a single page? It reports the primary language declaration and lists all hreflang tags found on the page. If different signals declare different languages, the issues array will flag the conflict.

What does the confidence level mean? Confidence is based on how many language signals agree. If the HTML lang attribute, Content-Language header, and OG locale all point to the same language, confidence is high. If only one signal is present, or if signals conflict, confidence is lower.

How many URLs can I check in one run? There is no hard limit. The actor processes URLs concurrently, making it efficient for auditing hundreds or thousands of pages across multilingual websites.

What does the textDirection field indicate? The textDirection field reports whether the page specifies a text direction (LTR for left-to-right or RTL for right-to-left). This is important for languages like Arabic and Hebrew that use right-to-left scripts. A null value means no explicit direction was declared.

Does it support all language codes? It reports whatever language codes are declared in the page's HTML attributes, HTTP headers, and meta tags. These typically follow the BCP 47 standard (e.g., en, fr, de, zh-CN). The actor does not limit which language codes it can detect.