Pricing

from $0.01 / actor invocation

Regex Helper

Apply named regular expressions to a list of strings and extract structured matches. Handy for contact info extraction and other text-processing workflows.

Pricing

from $0.01 / actor invocation

Rating

0.0

(0)

Developer

R.L.

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Why use Regex Helper?

Drop-in pipeline step — read input directly from any upstream Actor's dataset and post-process scraped text without writing a single line of glue code.
Reusable extraction step — define your patterns once and run them over thousands of strings.
No code required — paste strings and regexes into the input form, hit run, download results.
Structured output — every match comes with its value, position, and capture groups, grouped by the pattern name you chose.
Works out of the box — omit the patterns and Regex Helper falls back to a built-in contact-extraction set (email, phone, url, linkedin), so a chained run needs almost no configuration.

A typical use case is contact information extraction: chain it after a scraper, and get back a tidy record of emails, phones, and URLs for every page that was crawled.

How to use Regex Helper

Standalone

Open the Input tab.
Add the strings you want to process (one per line).
Add your named regular expressions — each needs a name and a regex (Python re syntax), with optional flags.
Click Start and download the results from the Output tab.

Pipeline mode — chain it after another Actor

Instead of pasting strings, you can have Regex Helper read its input straight from another Actor's output dataset:

Run any Actor that produces a dataset of text (a scraper, RAG Web Browser, an LLM Actor, etc.).
Start Regex Helper with inputDatasetId set to that run's dataset ID. When set, it overrides the inline strings list.
Set textField to the dataset field that holds the text to match against (defaults to markdown, the field RAG Web Browser produces).
Leave patterns empty to use the built-in contact-extraction set, or supply your own.

The cleanest way to wire this up is a webhook / integration on the upstream Actor: on run success, start Regex Helper and map the upstream dataset into inputDatasetId. For example, in an upstream Actor's Integrations → Webhook configuration, set the payload so that:

{
    "inputDatasetId": "{{resource.defaultDatasetId}}",
    "textField": "markdown"
}

Now every time the upstream run finishes, Regex Helper automatically extracts matches from its output — a native, two-step Apify pipeline with no code in between. You can chain it the same way via the API, Apify CLI, scheduled tasks, or any of the supported integrations.

Input

All fields are optional, but you must supply text to process one way or another — either inline strings or an inputDatasetId (pipeline mode).

Field	Type	Description
`strings`	array of strings	The free-form strings to process. Each one produces a single output record. Ignored when `inputDatasetId` is set.
`inputDatasetId`	string	Pipeline mode. ID of an upstream dataset to read input strings from. When set, each item's `textField` is matched against the patterns, overriding `strings`. Map `{{resource.defaultDatasetId}}` from an upstream run to chain Actors.
`textField`	string	When reading from `inputDatasetId`, the dataset item field whose value is matched. Defaults to `markdown` (the field produced by RAG Web Browser). Items without the field are skipped.
`patterns`	array of objects	The named regexes to apply. Each item is `{ "name", "regex", "flags" }`. If omitted, a default contact-extraction set (`email`, `phone`, `url`, `linkedin`) is used.
`firstMatchOnly`	boolean	If `true`, only the first match of each pattern per string is returned. Defaults to `false` (all matches).

Each entry in patterns accepts:

name (string, required) — used as the key under matches in the output. Must be unique.
regex (string, required) — a Python re pattern.
flags (string, optional) — any combination of i (ignore case), m (multiline), s (dotall), x (verbose), a (ASCII).

Example input

{
    "strings": [
        "John Doe, john.doe@example.com, +1 (415) 555-0132, https://example.com",
        "Reach Jane at jane_smith@work.co.uk or call 020 7946 0958."
    ],
    "patterns": [
        { "name": "email", "regex": "[\\w.+-]+@[\\w-]+\\.[\\w.-]+", "flags": "i" },
        { "name": "phone", "regex": "\\+?\\d[\\d\\s().-]{7,}\\d" },
        { "name": "url", "regex": "https?://[^\\s]+" }
    ],
    "firstMatchOnly": false
}

Example input (pipeline mode)

Read text from an upstream run's dataset and apply the default contact-extraction patterns — no strings or patterns needed:

{
    "inputDatasetId": "{{resource.defaultDatasetId}}",
    "textField": "markdown"
}

Output

The Actor pushes one record per input string to the dataset. You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

{
    "index": 0,
    "input": "John Doe, john.doe@example.com, +1 (415) 555-0132, https://example.com",
    "matchCount": 3,
    "matches": {
        "email": [
            { "value": "john.doe@example.com", "start": 10, "end": 30, "groups": [], "namedGroups": {} }
        ],
        "phone": [
            { "value": "+1 (415) 555-0132", "start": 32, "end": 49, "groups": [], "namedGroups": {} }
        ],
        "url": [
            { "value": "https://example.com", "start": 51, "end": 70, "groups": [], "namedGroups": {} }
        ]
    }
}

Data table

Field	Description
`index`	Zero-based position of the string in the input list.
`input`	The original input string.
`matchCount`	Total number of matches found across all patterns for this string.
`matches`	Object keyed by pattern name; each value is a list of match objects.
`matches.<name>[].value`	The matched text (full match, group 0).
`matches.<name>[].start` / `end`	Character offsets of the match within the input string.
`matches.<name>[].groups`	Numbered capture groups, if the pattern defines any.
`matches.<name>[].namedGroups`	Named capture groups, e.g. `(?P<area>\d{3})`.

Cost estimation

This Actor does plain in-memory text processing — no proxies, no browsers — so it's cheap to run. Cost scales with the number and length of input strings and the complexity of your regexes. Most small-to-medium batches comfortably fit within the Apify free tier.

Tips

Use named capture groups ((?P<name>...)) to pull sub-parts of a match into namedGroups.
Enable firstMatchOnly when you only expect one hit per pattern (e.g. a single primary email) to keep output compact.
Remember to escape backslashes when writing regexes in JSON (\\d, \\w, ...).
Patterns are validated before processing — an invalid regex or a duplicate pattern name fails the run early with a clear message.
In pipeline mode, double-check that textField matches the field your upstream Actor actually writes (e.g. text, markdown, html); items lacking that field are skipped and logged.

FAQ and support

Which regex dialect is used? Python's built-in re module.

What happens if a string has no matches? You still get a record for it, with empty match lists and matchCount: 0.

How do I chain this after my scraper? Use pipeline mode: set inputDatasetId to the scraper's output dataset (e.g. {{resource.defaultDatasetId}} in a webhook payload) and textField to the field holding the text. See Pipeline mode above.

Disclaimer: Make sure you have the right to process any text and personal data (such as emails or phone numbers) you pass through this Actor, in line with applicable laws and the source's terms of service.

Found a bug or have a feature request? Open an issue on the Actor's Issues tab.

Handy Scraper

quarterly_jingo/handy-scraper

Petey Boy

Regex Toolkit - Test, Explain, Benchmark

lazymac/regex-toolkit

🔥 7-DAY LAUNCH SPRINT (May 1–8, 2026): First 100 runs free for new users. Test regular expressions against sample text, get human-readable pattern explanations, find all matches with capture groups, and benchmark pattern performance. Supports common regex presets.

2x lazymac

RegExp Scraper

ib4ngz/regexp-scraper

This actor scrapes data from a list of provided URLs using regular expressions for precise and customizable pattern matching. It can handle both static and dynamic web pages and supports depth-based crawling to explore links and extract data from multiple levels of the web.

Iqbal R

Named Entity Extractor & Name Validator

dominic-quaiser/named-entity-extractor

Extract named entities from text using a NER API. Supports multilingual, English, and German text extraction with confidence scores for each detected name.

Dominic M. Quaiser

Regex Tester API

vivid_astronaut/regex-tester

Fabio Suizu

Handy Home Services Scraper

fortuitous_pirate/handy-scraper

Extract home services data from Handy.com with categories and descriptions.

Fortuitous Pirate

Contact Info Scraper with Emails and Phones

intelecta/fast-contact-info-scraper-with-emails

A powerful Apify actor that scrapes emails, phone numbers, and social media profiles from a list of websites, following internal links for thorough contact extraction. Ideal for lead generation, research, and building structured contact databases.

Intelecta.ai

167

3.3

Regex Tester â€” Matches, Groups & Plain-English Explanation

eliai/regex-tester

Regex Tester: send a regex pattern plus sample text, get back every match with its capture groups and a plain-English explanation of what the pattern does. Simple JSON in, JSON out via the Apify API â€” pay only per result, so a quick pattern check costs next to nothing.

Anthony Snider

Openclaw Regex Engine

yagamiyedan/openclaw-regex-engine

yagami yedan

Website Contact & Email Extractor

technicaldost/website-contact-extractor

Extract contact details from websites in bulk: emails, phone numbers and social profiles, plus the contact page. Turn a list of domains into a lead list.