Superclean URLs avatar

Superclean URLs

Pricing

from $0.35 / 1,000 results

Go to Apify Store
Superclean URLs

Superclean URLs

Clean messy URLs from lead exports. Remove 60+ tracking parameters (utm_*, fbclid, gclid), normalize format, extract domains, and optionally verify URLs are reachable. Perfect for cold email personalization and CRM data hygiene.

Pricing

from $0.35 / 1,000 results

Rating

0.0

(0)

Developer

Superlative

Superlative

Maintained by Community

Actor stats

2

Bookmarked

4

Total users

3

Monthly active users

4 days ago

Last modified

Share

Clean messy URLs from lead exports. Remove tracking parameters, normalize format, and extract domains.

What does Superclean URLs do?

Superclean URLs normalizes URLs from lead lists, CRM exports, and web scraping using rule-based parsing (no AI/LLM required).

  • Removes tracking parameters — Strip UTM, fbclid, gclid, and 50+ other tracking params
  • Normalizes format — Consistent protocol, lowercase domains, clean paths
  • Extracts domains — Pull clean domain names from full URLs
  • Validates URLs — Identify and flag invalid URL formats
  • Fixes missing protocols — Adds https:// to bare domains
  • Instant API mode — Sub-second single-URL cleaning via Standby HTTP server

Use with AI Agents

Available as an MCP tool via the Apify MCP Server.

PropertyValue
Actor IDsuperlativetech/superclean-urls
Standby URLhttps://superlativetech--superclean-urls.apify.actor
Inputitems (string[]) or item (string)
Output{id, input, output, domain, protocol, path, valid, confidence} per item
Pricing$0.50 per 1,000 items
IdempotentYes — same input always produces same output

Output schema

{ "id": 1, "input": "https://example.com/page?utm_source=google", "output": "https://example.com/page", "domain": "example.com", "valid": true, "confidence": 0.95 }

Pipeline composability

This actor works in data cleaning pipelines:

  1. Scrape → 2. Clean (this actor) → 3. Enrich (DNS/WHOIS) → 4. Score (ICP Scorer)

Standby (instant API)

GET https://superlativetech--superclean-urls.apify.actor?token=TOKEN&input=https://example.com?utm_source=google

What else can Superclean do?

If you're cleaning lead data, you might also need:

Why clean URLs?

Your lead data comes with messy, tracking-laden URLs:

Clean data means better:

  • Cold email personalization — Clean company URLs for email templates
  • Lead enrichment — Normalize URLs from scraped or imported lead lists
  • Data hygiene — Remove tracking params before storing in CRM
  • Domain extraction — Pull domains for company research or deduplication

How to use Superclean URLs

  1. Paste your URLs into the input field (one per line)
  2. Select your output style (Full or Domain)
  3. Click Start and download your cleaned results

Output styles

StyleBest forExample InputExample Output
FullCleaned URLshttps://example.com/?utm_source=xhttps://example.com
DomainDomain extractionhttps://www.example.com/aboutexample.com

Full (default)

Complete cleaned URL with tracking removed, protocol normalized, and format standardized.

Domain

Just the registrable domain (e.g., example.com). Useful for deduplication or company matching.

Standby mode (instant API)

Standby mode keeps a warm container running so you get instant URL cleaning without cold-start delays. Instead of starting a full Actor run, you make a simple HTTP GET request and get results in milliseconds.

This is ideal for:

  • Clay enrichment steps — single-URL cleaning inline
  • Make / n8n HTTP modules — real-time URL normalization in workflows
  • MCP agents — AI tools that need instant URL cleaning

Standby URL

https://superlativetech--superclean-urls.apify.actor?token=YOUR_API_TOKEN

Or use a Bearer token in the Authorization header instead of the token query parameter.

Clean a URL

$curl "https://superlativetech--superclean-urls.apify.actor?token=YOUR_API_TOKEN&input=https://example.com/%3Futm_source%3Dlinkedin%26fbclid%3Dabc123"

Extract domain only

$curl "https://superlativetech--superclean-urls.apify.actor?token=YOUR_API_TOKEN&input=https://www.example.com/about&style=domain"

Query parameters

ParameterRequiredDescription
inputYesURL to clean
styleNoOutput format: full (default) or domain
forceHttpsNoConvert http to https (default: true)
removeTrackingNoRemove tracking parameters (default: true)

Response format

{
"id": 1,
"input": "https://example.com/?utm_source=linkedin",
"output": "https://example.com",
"domain": "example.com",
"protocol": "https",
"path": "",
"query": "",
"hash": "",
"valid": true,
"confidence": 0.9
}

Error responses

CodeCause
400Missing input parameter or invalid style
405Non-GET request
500Unexpected server error

What gets cleaned?

Tracking parameters removed

The Actor removes 60+ tracking parameters including:

  • UTM — utm_source, utm_medium, utm_campaign, utm_term, utm_content
  • Facebook — fbclid, fb_action_ids, fb_source
  • Google — gclid, gclsrc, dclid, gbraid, wbraid
  • Microsoft — msclkid
  • LinkedIn — li_fat_id, li_tc
  • Email marketing — mc_eid, mc_cid, _hsenc, _hsmi, mkt_tok
  • Analytics — _ga, _gl, ref, spm, clickid

Normalization applied

  • HTTP upgraded to HTTPS (configurable)
  • Domains lowercased
  • Trailing slashes removed
  • Empty query strings removed

How many URLs can you clean?

There's no limit. Process as many URLs as you need — from a handful to hundreds of thousands. The Actor scales automatically.

For best performance, batch your requests. Processing 1,000 URLs at once is more efficient than 10 separate runs of 100 URLs each.

How much will it cost you?

This Actor uses pay-per-result pricing at half the cost of LLM-based actors (rule-based normalization with no external API calls):

URLsCost
1,000$0.50
10,000$5.00
100,000$50.00

Volume discounts apply automatically:

  • Bronze (100+ items): $0.00045/URL
  • Silver (1,000+ items): $0.0004/URL
  • Gold (10,000+ items): $0.00035/URL

Input parameters

ParameterTypeDefaultDescription
itemsarrayList of URLs to clean (one per line in the UI, or JSON array)
itemstringSingle URL to clean — API shorthand for integration callers (Clay, Make, n8n). If both item and items are provided, item is prepended to the list
stylestringfullOutput format: full (cleaned URL) or domain (domain only)
forceHttpsbooleantrueConvert http:// to https://
removeTrackingbooleantrueRemove tracking parameters (utm_*, fbclid, etc.)

Input example

{
"items": [
"https://www.example.com/?utm_source=linkedin&fbclid=abc123",
"http://ACME.COM/about/",
"example.com/contact",
"not a valid url"
],
"style": "full"
}

items also accepts objects, which is useful for API and MCP integrations:

{
"items": [
{ "input": "https://www.example.com/?utm_source=linkedin&fbclid=abc123" },
{ "input": "http://ACME.COM/about/" }
],
"style": "full"
}

For API and integration callers who want to clean a single value without wrapping it in an array, use the item shorthand:

{
"item": "https://www.example.com/?utm_source=linkedin&fbclid=abc123",
"style": "full"
}

During the Actor run

The Actor processes URLs quickly using rule-based parsing. You'll see progress updates as items are processed.

If you provide invalid input (e.g., an empty list), the Actor will stop immediately with an error message explaining what went wrong.

Results are available in real-time — you can start downloading cleaned URLs before the full run completes.

Output format

Results are saved to the default dataset. Each cleaned URL is a separate item.

You can export results as JSON, CSV, Excel, or other formats directly from Apify Console. Or access them programmatically via the API.

Output example

[
{
"id": 1,
"input": "https://www.example.com/?utm_source=linkedin&fbclid=abc123",
"output": "https://www.example.com",
"domain": "example.com",
"protocol": "https",
"path": "",
"query": "",
"hash": "",
"valid": true,
"confidence": 0.9
},
{
"id": 2,
"input": "http://ACME.COM/about/",
"output": "https://acme.com/about",
"domain": "acme.com",
"protocol": "https",
"path": "/about",
"query": "",
"hash": "",
"valid": true,
"confidence": 0.9
},
{
"id": 3,
"input": "example.com/contact",
"output": "https://example.com/contact",
"domain": "example.com",
"protocol": "https",
"path": "/contact",
"query": "",
"hash": "",
"valid": true,
"confidence": 0.7
},
{
"id": 4,
"input": "not a valid url",
"output": "not a valid url",
"domain": "",
"protocol": "",
"path": "",
"query": "",
"hash": "",
"valid": false,
"confidence": 0
}
]
FieldDescription
idRow number (1-based, matches Apify's displayed row numbers)
inputOriginal URL before cleaning
outputCleaned result (format depends on style)
domainExtracted registrable domain (e.g., example.com)
protocolProtocol (http or https)
pathURL path (e.g., /about/contact)
queryQuery string without ? (e.g., foo=bar&baz=1)
hashFragment/anchor without #
validWhether the URL format is valid
confidenceConfidence score from 0 to 1

Confidence scores

  • 1.0 — Valid URL, no changes needed
  • 0.9 — Valid URL, tracking removed or normalized
  • 0.7 — URL fixed (protocol added)
  • 0.3 — Partially valid (domain extracted but issues remain)
  • 0.0 — Invalid URL (couldn't parse)

Integrations

Superclean URLs works with any tool that can call Apify Actors:

  • Clay — Add as an enrichment step in your Clay tables
  • Make — Use the Apify module to run the Actor
  • Zapier — Trigger runs and retrieve results automatically
  • n8n — Self-hosted workflow automation

You can also use webhooks to trigger actions when a run completes — for example, send a Slack notification or automatically import results into your CRM.

Using Superclean URLs with the Apify API

The Apify API gives you programmatic access to run Actors, retrieve results, and manage datasets.

Node.js:

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('superlativetech/superclean-urls').call({
items: ['https://example.com/?utm_source=linkedin', 'http://ACME.COM/about/'],
style: 'full'
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python:

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('superlativetech/superclean-urls').call(run_input={
'items': ['https://example.com/?utm_source=linkedin', 'http://ACME.COM/about/'],
'style': 'full'
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

Check out the Apify API reference for full details, or click the API tab above for more code examples.

Your feedback

We're always improving Superclean Actors. If you have feature requests, find a bug, or need help with a specific use case, please open an issue in the Actor's Issues tab.

When Apify asks to share your run data with us, we encourage you to opt in — it's the fastest way for us to spot edge cases and improve results. Sharing is completely optional (you can toggle it anytime under Account Settings → Privacy), and shared runs are automatically deleted by Apify based on your plan's data retention period. We only use shared data to debug issues and improve this Actor.

Leave a review

If Superclean URLs saves you time or improves your lead data, please leave a review. Your feedback helps other users discover the tool and helps us understand what's working well.


Built by Superlative