Unicode Text Inspector
Pricing
from $0.40 / 1,000 text inspections
Unicode Text Inspector
Inspect pasted text for hidden Unicode characters, zero-width spaces, bidi controls, control characters, and homoglyphs. Get risk levels, issue evidence, category counts, cleaned text, and batch summaries.
Pricing
from $0.40 / 1,000 text inspections
Rating
0.0
(0)
Developer
Maxime Dupré
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 hours ago
Last modified
Categories
Share
🔎 Unicode text inspector for hidden characters
Unicode Text Inspector checks pasted text for hidden Unicode characters, zero-width spaces, bidi controls, control characters, homoglyphs, Unicode category counts, risk levels, and cleaned text. Paste one string or a batch of strings, then get one output item per submitted text.
Use it when you need to audit product titles, domains, email subjects, CRM fields, usernames, form submissions, code snippets, search keywords, or imported text before it enters another system. The Actor analyzes text locally. It does not fetch URLs, use cookies, require accounts, call an external Unicode API, or send your submitted text to a third-party service.
For a quick first run, keep the prefilled examples. They include a zero-width character, a Cyrillic homoglyph in a domain-like string, and a clean text sample so you can see suspicious and clean output in the same dataset.
✅ What this Actor checks
- Zero-width and invisible format characters such as
U+200B,U+200C,U+200D, andU+FEFF. - Bidirectional controls used in Trojan Source-style display-order attacks, including overrides, embeddings, isolates, and marks.
- ASCII and C1 control characters such as null bytes, escape characters, tabs, line feeds, and delete.
- Practical homoglyphs and confusables across common Cyrillic, Greek, fullwidth Latin, mathematical digit, and typography cases.
- Unicode category composition, including letters, numbers, punctuation, symbols, marks, separators, controls, format characters, private use, and unassigned codepoints.
- Deterministic risk levels from
nonetocritical. - Mechanical cleaned text that removes flagged invisible, control, and bidi characters without rewriting user language.
The Actor keeps all checks enabled by default. There are no strictness sliders or per-check toggles because these checks are local, useful, and do not change the price per inspected text.
📊 What data you get
Each output item represents one submitted text string. Rows can include:
| Field | Description |
|---|---|
inputIndex | Position of the text in your submitted list. |
originalText | Exact text submitted for inspection. |
textPreview | Short visible preview after removable hidden/control characters are stripped. |
cleanedText | Full mechanically cleaned text when suspicious invisible, control, or bidi characters can be removed safely. |
characterCount, codePointCount, codeUnitLength | Text length counts for Unicode-aware audits. |
issueCount, suspiciousContent, riskLevel | Triage fields for filtering clean, low-risk, and high-risk text. |
issues | Exact issue evidence with type, severity, position, codepoint, decimal value, Unicode name, category, context, description, and recommendation. |
issueTypeCounts | Per-text counts for invisible/format, bidi, control, and homoglyph issues. |
unicodeCategoryCounts | Unicode category counts for the inspected text. |
batchSummary | Run-level totals repeated with each row for large batch triage. |
analyzedAt | UTC timestamp when the text was inspected. |
The output is designed for JSON, CSV, Excel, API, webhook, scheduled audit, spreadsheet, search-index QA, moderation, and security-review workflows.
🚀 How to run it
- Open the Actor input.
- Paste text strings into Texts to inspect. Use one string per line.
- Start the Actor.
- Open the dataset and filter by
riskLevel,suspiciousContent,issueCount, orissueTypeCounts.
You can submit plain text strings from the Apify Console, API, or integrations. The Actor preserves input order with inputIndex, so you can map each output item back to your submitted batch.
🧾 Input example
{"texts": ["Hello World","pаypal.com","Normal clean text"]}
📤 Output example
{"inputIndex": 1,"originalText": "Hello World","textPreview": "Hello World","cleanedText": "Hello World","characterCount": 12,"codePointCount": 12,"codeUnitLength": 12,"issueCount": 1,"suspiciousContent": true,"riskLevel": "low","issues": [{"type": "invisible_format","severity": "low","position": 5,"codeUnitIndex": 5,"character": "","codePoint": "U+200B","decimalCodePoint": 8203,"unicodeName": "ZERO WIDTH SPACE","unicodeCategory": "Cf","unicodeCategoryName": "Format character","description": "Invisible or format character can affect matching, searching, copy-paste, or display.","recommendation": "Remove when this text should be plain visible text.","context": {"before": "Hello","after": " World"}}],"issueTypeCounts": {"invisible_format": 1,"bidi_control": 0,"control_character": 0,"homoglyph_confusable": 0},"unicodeCategoryCounts": {"Lu": 2,"Ll": 8,"Cf": 1,"Zs": 1},"batchSummary": {"totalTexts": 3,"suspiciousTexts": 2,"cleanTexts": 1,"totalIssues": 2,"highestRiskLevel": "medium","issueTypeCounts": {"invisible_format": 1,"bidi_control": 0,"control_character": 0,"homoglyph_confusable": 1}},"analyzedAt": "2026-06-15T00:00:00.000Z"}
🎯 Common use cases
- Find hidden copy-paste characters in product titles, slugs, names, and search keywords.
- Catch bidi controls before text enters source code, review queues, support tools, or documentation.
- Detect homoglyphs in domain-like strings, usernames, brand terms, and moderation inputs.
- Clean text before importing it into a CRM, database, spreadsheet, search index, or analytics pipeline.
- Build a scheduled Unicode quality gate for user-generated text, scraped text, or submitted forms.
- Export issue evidence for security review, data QA, or moderation workflows.
💳 Pricing
This Actor uses pay-per-event pricing. You are charged once per submitted text string that is inspected and saved as an output item.
The current event prices are:
- FREE:
$0.60per 1,000 inspected texts - BRONZE:
$0.55per 1,000 inspected texts - SILVER:
$0.45per 1,000 inspected texts - GOLD:
$0.40per 1,000 inspected texts - PLATINUM:
$0.30per 1,000 inspected texts - DIAMOND:
$0.20per 1,000 inspected texts
Runs that stop before saving any inspected text items do not create text-inspection charges.
⚠️ Limits and notes
Unicode Text Inspector is deterministic. It does not use AI, infer malicious intent, score phishing risk, decide whether a brand is impersonated, rewrite language, or claim complete Unicode TR39 coverage across every script.
Homoglyph detection focuses on practical Latin-lookalike cases that are useful for text QA and security review. Cleaned text removes hidden, control, and bidi characters when that cleanup is mechanical. It does not replace homoglyphs with guessed intended characters.
❓ FAQ
🌐 Does this Actor scrape websites?
No. It only inspects text strings that you provide. It does not fetch URLs, crawl pages, use a proxy, or call external APIs.
🔌 Can I use it from the Apify API?
Yes. Submit texts as an array of strings and read one output item per inspected text from the dataset.
🧹 Does cleaned text change what I wrote?
Cleaned text removes flagged invisible, control, and bidi characters when that can be done mechanically. It does not rewrite words, translate text, or replace homoglyphs with guessed characters.
✅ Why are there no detection toggles?
All detection checks are local and useful. Keeping them on gives a more complete audit without changing the price per inspected text.
📝 Changelog
- 0.1: Initial release.
🆘 Support
For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡
🔗 Other actors
- Email MX Verifier ↗ - Check email syntax, MX records, disposable domains, and list-cleaning risk signals.
- SMTP Email Verifier ↗ - Verify email addresses with DNS, SMTP, catch-all, and deliverability evidence.
- Website Emails Scraper ↗ - Extract contact emails from public websites for CRM and outreach workflows.
- Font Detector ↗ - Audit website fonts, Google Fonts, Adobe Fonts, and CSS font evidence from public pages.
- Gmail Username Checker ↗ - Check Gmail username availability in bulk for launch and account-name planning.
Made with ❤️ by Maxime Dupré