# Unicode Text Inspector (`maximedupre/unicode-text-inspector`) Actor

Inspect pasted text for hidden Unicode characters, zero-width spaces, bidi controls, control characters, and homoglyphs. Get risk levels, issue evidence, category counts, cleaned text, and batch summaries.

- **URL**: https://apify.com/maximedupre/unicode-text-inspector.md
- **Developed by:** [Maxime Dupré](https://apify.com/maximedupre) (community)
- **Categories:** Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.40 / 1,000 text inspections

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### 🔎 Unicode text inspector for hidden characters

Unicode Text Inspector checks pasted text for hidden Unicode characters, zero-width spaces, bidi controls, control characters, homoglyphs, Unicode category counts, risk levels, and cleaned text. Paste one string or a batch of strings, then get one output item per submitted text.

Use it when you need to audit product titles, domains, email subjects, CRM fields, usernames, form submissions, code snippets, search keywords, or imported text before it enters another system. The Actor analyzes text locally. It does not fetch URLs, use cookies, require accounts, call an external Unicode API, or send your submitted text to a third-party service.

For a quick first run, keep the prefilled examples. They include a zero-width character, a Cyrillic homoglyph in a domain-like string, and a clean text sample so you can see suspicious and clean output in the same dataset.

### ✅ What this Actor checks

- Zero-width and invisible format characters such as `U+200B`, `U+200C`, `U+200D`, and `U+FEFF`.
- Bidirectional controls used in Trojan Source-style display-order attacks, including overrides, embeddings, isolates, and marks.
- ASCII and C1 control characters such as null bytes, escape characters, tabs, line feeds, and delete.
- Practical homoglyphs and confusables across common Cyrillic, Greek, fullwidth Latin, mathematical digit, and typography cases.
- Unicode category composition, including letters, numbers, punctuation, symbols, marks, separators, controls, format characters, private use, and unassigned codepoints.
- Deterministic risk levels from `none` to `critical`.
- Mechanical cleaned text that removes flagged invisible, control, and bidi characters without rewriting user language.

The Actor keeps all checks enabled by default. There are no strictness sliders or per-check toggles because these checks are local, useful, and do not change the price per inspected text.

### 📊 What data you get

Each output item represents one submitted text string. Rows can include:

| Field | Description |
| --- | --- |
| `inputIndex` | Position of the text in your submitted list. |
| `originalText` | Exact text submitted for inspection. |
| `textPreview` | Short visible preview after removable hidden/control characters are stripped. |
| `cleanedText` | Full mechanically cleaned text when suspicious invisible, control, or bidi characters can be removed safely. |
| `characterCount`, `codePointCount`, `codeUnitLength` | Text length counts for Unicode-aware audits. |
| `issueCount`, `suspiciousContent`, `riskLevel` | Triage fields for filtering clean, low-risk, and high-risk text. |
| `issues` | Exact issue evidence with type, severity, position, codepoint, decimal value, Unicode name, category, context, description, and recommendation. |
| `issueTypeCounts` | Per-text counts for invisible/format, bidi, control, and homoglyph issues. |
| `unicodeCategoryCounts` | Unicode category counts for the inspected text. |
| `batchSummary` | Run-level totals repeated with each row for large batch triage. |
| `analyzedAt` | UTC timestamp when the text was inspected. |

The output is designed for JSON, CSV, Excel, API, webhook, scheduled audit, spreadsheet, search-index QA, moderation, and security-review workflows.

### 🚀 How to run it

1. Open the Actor input.
2. Paste text strings into **Texts to inspect**. Use one string per line.
3. Start the Actor.
4. Open the dataset and filter by `riskLevel`, `suspiciousContent`, `issueCount`, or `issueTypeCounts`.

You can submit plain text strings from the Apify Console, API, or integrations. The Actor preserves input order with `inputIndex`, so you can map each output item back to your submitted batch.

### 🧾 Input example

```json
{
	"texts": [
		"Hello​ World",
		"pаypal.com",
		"Normal clean text"
	]
}
````

### 📤 Output example

```json
{
	"inputIndex": 1,
	"originalText": "Hello​ World",
	"textPreview": "Hello World",
	"cleanedText": "Hello World",
	"characterCount": 12,
	"codePointCount": 12,
	"codeUnitLength": 12,
	"issueCount": 1,
	"suspiciousContent": true,
	"riskLevel": "low",
	"issues": [
		{
			"type": "invisible_format",
			"severity": "low",
			"position": 5,
			"codeUnitIndex": 5,
			"character": "​",
			"codePoint": "U+200B",
			"decimalCodePoint": 8203,
			"unicodeName": "ZERO WIDTH SPACE",
			"unicodeCategory": "Cf",
			"unicodeCategoryName": "Format character",
			"description": "Invisible or format character can affect matching, searching, copy-paste, or display.",
			"recommendation": "Remove when this text should be plain visible text.",
			"context": {
				"before": "Hello",
				"after": " World"
			}
		}
	],
	"issueTypeCounts": {
		"invisible_format": 1,
		"bidi_control": 0,
		"control_character": 0,
		"homoglyph_confusable": 0
	},
	"unicodeCategoryCounts": {
		"Lu": 2,
		"Ll": 8,
		"Cf": 1,
		"Zs": 1
	},
	"batchSummary": {
		"totalTexts": 3,
		"suspiciousTexts": 2,
		"cleanTexts": 1,
		"totalIssues": 2,
		"highestRiskLevel": "medium",
		"issueTypeCounts": {
			"invisible_format": 1,
			"bidi_control": 0,
			"control_character": 0,
			"homoglyph_confusable": 1
		}
	},
	"analyzedAt": "2026-06-15T00:00:00.000Z"
}
```

### 🎯 Common use cases

- Find hidden copy-paste characters in product titles, slugs, names, and search keywords.
- Catch bidi controls before text enters source code, review queues, support tools, or documentation.
- Detect homoglyphs in domain-like strings, usernames, brand terms, and moderation inputs.
- Clean text before importing it into a CRM, database, spreadsheet, search index, or analytics pipeline.
- Build a scheduled Unicode quality gate for user-generated text, scraped text, or submitted forms.
- Export issue evidence for security review, data QA, or moderation workflows.

### 💳 Pricing

This Actor uses pay-per-event pricing. You are charged once per submitted text string that is inspected and saved as an output item.

The current event prices are:

- FREE: `$0.60` per 1,000 inspected texts
- BRONZE: `$0.55` per 1,000 inspected texts
- SILVER: `$0.45` per 1,000 inspected texts
- GOLD: `$0.40` per 1,000 inspected texts
- PLATINUM: `$0.30` per 1,000 inspected texts
- DIAMOND: `$0.20` per 1,000 inspected texts

Runs that stop before saving any inspected text items do not create text-inspection charges.

### ⚠️ Limits and notes

Unicode Text Inspector is deterministic. It does not use AI, infer malicious intent, score phishing risk, decide whether a brand is impersonated, rewrite language, or claim complete Unicode TR39 coverage across every script.

Homoglyph detection focuses on practical Latin-lookalike cases that are useful for text QA and security review. Cleaned text removes hidden, control, and bidi characters when that cleanup is mechanical. It does not replace homoglyphs with guessed intended characters.

### ❓ FAQ

#### 🌐 Does this Actor scrape websites?

No. It only inspects text strings that you provide. It does not fetch URLs, crawl pages, use a proxy, or call external APIs.

#### 🔌 Can I use it from the Apify API?

Yes. Submit `texts` as an array of strings and read one output item per inspected text from the dataset.

#### 🧹 Does cleaned text change what I wrote?

Cleaned text removes flagged invisible, control, and bidi characters when that can be done mechanically. It does not rewrite words, translate text, or replace homoglyphs with guessed characters.

#### ✅ Why are there no detection toggles?

All detection checks are local and useful. Keeping them on gives a more complete audit without changing the price per inspected text.

### 📝 Changelog

- 0.1: Initial release.

### 🆘 Support

For issues, questions, or feature requests, [file a ticket](https://console.apify.com/actors/maximedupre~unicode-text-inspector/issues) and I'll fix or implement it in less than 24h 🫡

### 🔗 Other actors

- [Email MX Verifier ↗](https://apify.com/maximedupre/email-mx-verifier) - Check email syntax, MX records, disposable domains, and list-cleaning risk signals.
- [SMTP Email Verifier ↗](https://apify.com/maximedupre/smtp-email-verifier) - Verify email addresses with DNS, SMTP, catch-all, and deliverability evidence.
- [Website Emails Scraper ↗](https://apify.com/maximedupre/website-emails-scraper) - Extract contact emails from public websites for CRM and outreach workflows.
- [Font Detector ↗](https://apify.com/maximedupre/font-detector) - Audit website fonts, Google Fonts, Adobe Fonts, and CSS font evidence from public pages.
- [Gmail Username Checker ↗](https://apify.com/maximedupre/gmail-username-checker) - Check Gmail username availability in bulk for launch and account-name planning.

**Made with ❤️ by Maxime Dupré**

# Actor input Schema

## `texts` (type: `array`):

Enter one text string per line. Each string gets its own Unicode inspection row.

## Actor input object example

```json
{
  "texts": [
    "Hello​ World",
    "pаypal.com",
    "Normal clean text"
  ]
}
```

# Actor output Schema

## `results` (type: `string`):

Open the dataset to view Unicode text inspection results.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "texts": [
        "Hello​ World",
        "pаypal.com",
        "Normal clean text"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("maximedupre/unicode-text-inspector").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "texts": [
        "Hello​ World",
        "pаypal.com",
        "Normal clean text",
    ] }

# Run the Actor and wait for it to finish
run = client.actor("maximedupre/unicode-text-inspector").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "texts": [
    "Hello​ World",
    "pаypal.com",
    "Normal clean text"
  ]
}' |
apify call maximedupre/unicode-text-inspector --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=maximedupre/unicode-text-inspector",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Unicode Text Inspector",
        "description": "Inspect pasted text for hidden Unicode characters, zero-width spaces, bidi controls, control characters, and homoglyphs. Get risk levels, issue evidence, category counts, cleaned text, and batch summaries.",
        "version": "0.1",
        "x-build-id": "NGUmQAo4LhMLBBvfL"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/maximedupre~unicode-text-inspector/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-maximedupre-unicode-text-inspector",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/maximedupre~unicode-text-inspector/runs": {
            "post": {
                "operationId": "runs-sync-maximedupre-unicode-text-inspector",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/maximedupre~unicode-text-inspector/run-sync": {
            "post": {
                "operationId": "run-sync-maximedupre-unicode-text-inspector",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "texts"
                ],
                "properties": {
                    "texts": {
                        "title": "Texts to inspect",
                        "minItems": 1,
                        "maxItems": 100000,
                        "type": "array",
                        "description": "Enter one text string per line. Each string gets its own Unicode inspection row.",
                        "items": {
                            "type": "string",
                            "minLength": 1
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
