Unicode Text Inspector avatar

Unicode Text Inspector

Pricing

from $0.40 / 1,000 text inspections

Go to Apify Store
Unicode Text Inspector

Unicode Text Inspector

Inspect pasted text for hidden Unicode characters, zero-width spaces, bidi controls, control characters, and homoglyphs. Get risk levels, issue evidence, category counts, cleaned text, and batch summaries.

Pricing

from $0.40 / 1,000 text inspections

Rating

0.0

(0)

Developer

Maxime Dupré

Maxime Dupré

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 hours ago

Last modified

Categories

Share

🔎 Unicode text inspector for hidden characters

Unicode Text Inspector checks pasted text for hidden Unicode characters, zero-width spaces, bidi controls, control characters, homoglyphs, Unicode category counts, risk levels, and cleaned text. Paste one string or a batch of strings, then get one output item per submitted text.

Use it when you need to audit product titles, domains, email subjects, CRM fields, usernames, form submissions, code snippets, search keywords, or imported text before it enters another system. The Actor analyzes text locally. It does not fetch URLs, use cookies, require accounts, call an external Unicode API, or send your submitted text to a third-party service.

For a quick first run, keep the prefilled examples. They include a zero-width character, a Cyrillic homoglyph in a domain-like string, and a clean text sample so you can see suspicious and clean output in the same dataset.

✅ What this Actor checks

  • Zero-width and invisible format characters such as U+200B, U+200C, U+200D, and U+FEFF.
  • Bidirectional controls used in Trojan Source-style display-order attacks, including overrides, embeddings, isolates, and marks.
  • ASCII and C1 control characters such as null bytes, escape characters, tabs, line feeds, and delete.
  • Practical homoglyphs and confusables across common Cyrillic, Greek, fullwidth Latin, mathematical digit, and typography cases.
  • Unicode category composition, including letters, numbers, punctuation, symbols, marks, separators, controls, format characters, private use, and unassigned codepoints.
  • Deterministic risk levels from none to critical.
  • Mechanical cleaned text that removes flagged invisible, control, and bidi characters without rewriting user language.

The Actor keeps all checks enabled by default. There are no strictness sliders or per-check toggles because these checks are local, useful, and do not change the price per inspected text.

📊 What data you get

Each output item represents one submitted text string. Rows can include:

FieldDescription
inputIndexPosition of the text in your submitted list.
originalTextExact text submitted for inspection.
textPreviewShort visible preview after removable hidden/control characters are stripped.
cleanedTextFull mechanically cleaned text when suspicious invisible, control, or bidi characters can be removed safely.
characterCount, codePointCount, codeUnitLengthText length counts for Unicode-aware audits.
issueCount, suspiciousContent, riskLevelTriage fields for filtering clean, low-risk, and high-risk text.
issuesExact issue evidence with type, severity, position, codepoint, decimal value, Unicode name, category, context, description, and recommendation.
issueTypeCountsPer-text counts for invisible/format, bidi, control, and homoglyph issues.
unicodeCategoryCountsUnicode category counts for the inspected text.
batchSummaryRun-level totals repeated with each row for large batch triage.
analyzedAtUTC timestamp when the text was inspected.

The output is designed for JSON, CSV, Excel, API, webhook, scheduled audit, spreadsheet, search-index QA, moderation, and security-review workflows.

🚀 How to run it

  1. Open the Actor input.
  2. Paste text strings into Texts to inspect. Use one string per line.
  3. Start the Actor.
  4. Open the dataset and filter by riskLevel, suspiciousContent, issueCount, or issueTypeCounts.

You can submit plain text strings from the Apify Console, API, or integrations. The Actor preserves input order with inputIndex, so you can map each output item back to your submitted batch.

🧾 Input example

{
"texts": [
"Hello​ World",
"pаypal.com",
"Normal clean text"
]
}

📤 Output example

{
"inputIndex": 1,
"originalText": "Hello​ World",
"textPreview": "Hello World",
"cleanedText": "Hello World",
"characterCount": 12,
"codePointCount": 12,
"codeUnitLength": 12,
"issueCount": 1,
"suspiciousContent": true,
"riskLevel": "low",
"issues": [
{
"type": "invisible_format",
"severity": "low",
"position": 5,
"codeUnitIndex": 5,
"character": "​",
"codePoint": "U+200B",
"decimalCodePoint": 8203,
"unicodeName": "ZERO WIDTH SPACE",
"unicodeCategory": "Cf",
"unicodeCategoryName": "Format character",
"description": "Invisible or format character can affect matching, searching, copy-paste, or display.",
"recommendation": "Remove when this text should be plain visible text.",
"context": {
"before": "Hello",
"after": " World"
}
}
],
"issueTypeCounts": {
"invisible_format": 1,
"bidi_control": 0,
"control_character": 0,
"homoglyph_confusable": 0
},
"unicodeCategoryCounts": {
"Lu": 2,
"Ll": 8,
"Cf": 1,
"Zs": 1
},
"batchSummary": {
"totalTexts": 3,
"suspiciousTexts": 2,
"cleanTexts": 1,
"totalIssues": 2,
"highestRiskLevel": "medium",
"issueTypeCounts": {
"invisible_format": 1,
"bidi_control": 0,
"control_character": 0,
"homoglyph_confusable": 1
}
},
"analyzedAt": "2026-06-15T00:00:00.000Z"
}

🎯 Common use cases

  • Find hidden copy-paste characters in product titles, slugs, names, and search keywords.
  • Catch bidi controls before text enters source code, review queues, support tools, or documentation.
  • Detect homoglyphs in domain-like strings, usernames, brand terms, and moderation inputs.
  • Clean text before importing it into a CRM, database, spreadsheet, search index, or analytics pipeline.
  • Build a scheduled Unicode quality gate for user-generated text, scraped text, or submitted forms.
  • Export issue evidence for security review, data QA, or moderation workflows.

💳 Pricing

This Actor uses pay-per-event pricing. You are charged once per submitted text string that is inspected and saved as an output item.

The current event prices are:

  • FREE: $0.60 per 1,000 inspected texts
  • BRONZE: $0.55 per 1,000 inspected texts
  • SILVER: $0.45 per 1,000 inspected texts
  • GOLD: $0.40 per 1,000 inspected texts
  • PLATINUM: $0.30 per 1,000 inspected texts
  • DIAMOND: $0.20 per 1,000 inspected texts

Runs that stop before saving any inspected text items do not create text-inspection charges.

⚠️ Limits and notes

Unicode Text Inspector is deterministic. It does not use AI, infer malicious intent, score phishing risk, decide whether a brand is impersonated, rewrite language, or claim complete Unicode TR39 coverage across every script.

Homoglyph detection focuses on practical Latin-lookalike cases that are useful for text QA and security review. Cleaned text removes hidden, control, and bidi characters when that cleanup is mechanical. It does not replace homoglyphs with guessed intended characters.

❓ FAQ

🌐 Does this Actor scrape websites?

No. It only inspects text strings that you provide. It does not fetch URLs, crawl pages, use a proxy, or call external APIs.

🔌 Can I use it from the Apify API?

Yes. Submit texts as an array of strings and read one output item per inspected text from the dataset.

🧹 Does cleaned text change what I wrote?

Cleaned text removes flagged invisible, control, and bidi characters when that can be done mechanically. It does not rewrite words, translate text, or replace homoglyphs with guessed characters.

✅ Why are there no detection toggles?

All detection checks are local and useful. Keeping them on gives a more complete audit without changing the price per inspected text.

📝 Changelog

  • 0.1: Initial release.

🆘 Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡

🔗 Other actors

Made with ❤️ by Maxime Dupré