Regex Helper
Pricing
from $0.01 / actor invocation
Regex Helper
Apply named regular expressions to a list of strings and extract structured matches. Handy for contact info extraction and other text-processing workflows.
Pricing
from $0.01 / actor invocation
Rating
0.0
(0)
Developer
R.L.
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Regex Helper applies a list of named regular expressions to a list of free-form strings and returns structured, ready-to-use matches. It's a small building block for plugging regex-based extraction — contact info, IDs, URLs, SKUs, hashtags, anything — into a larger data-processing workflow on the Apify platform.
Because it runs as an Actor, you get an HTTP API, scheduling, and integrations (Make, Zapier, n8n, Google Drive, ...) for free — so you can chain it after a scraper that produces raw text and before whatever consumes the cleaned-up fields.
Built to be chained. Regex Helper has a dedicated pipeline mode: point it at another Actor's output dataset and it reads the text straight from there — no glue code, no manual export/import. Combined with sensible defaults, this lets you drop it into a two-step Apify pipeline (for example RAG Web Browser → Regex Helper) where an upstream run finishes, fires a webhook, and Regex Helper extracts contacts from every page it scraped.
Why use Regex Helper?
- Drop-in pipeline step — read input directly from any upstream Actor's dataset and post-process scraped text without writing a single line of glue code.
- Reusable extraction step — define your patterns once and run them over thousands of strings.
- No code required — paste strings and regexes into the input form, hit run, download results.
- Structured output — every match comes with its value, position, and capture groups, grouped by the pattern name you chose.
- Works out of the box — omit the patterns and Regex Helper falls back to a built-in contact-extraction set (
email,phone,url,linkedin), so a chained run needs almost no configuration.
A typical use case is contact information extraction: chain it after a scraper, and get back a tidy record of emails, phones, and URLs for every page that was crawled.
How to use Regex Helper
Standalone
- Open the Input tab.
- Add the strings you want to process (one per line).
- Add your named regular expressions — each needs a
nameand aregex(Pythonresyntax), with optionalflags. - Click Start and download the results from the Output tab.
Pipeline mode — chain it after another Actor
Instead of pasting strings, you can have Regex Helper read its input straight from another Actor's output dataset:
- Run any Actor that produces a dataset of text (a scraper, RAG Web Browser, an LLM Actor, etc.).
- Start Regex Helper with
inputDatasetIdset to that run's dataset ID. When set, it overrides the inlinestringslist. - Set
textFieldto the dataset field that holds the text to match against (defaults tomarkdown, the field RAG Web Browser produces). - Leave
patternsempty to use the built-in contact-extraction set, or supply your own.
The cleanest way to wire this up is a webhook / integration on the upstream Actor: on run success, start Regex Helper and map the upstream dataset into inputDatasetId. For example, in an upstream Actor's Integrations → Webhook configuration, set the payload so that:
{"inputDatasetId": "{{resource.defaultDatasetId}}","textField": "markdown"}
Now every time the upstream run finishes, Regex Helper automatically extracts matches from its output — a native, two-step Apify pipeline with no code in between. You can chain it the same way via the API, Apify CLI, scheduled tasks, or any of the supported integrations.
Input
All fields are optional, but you must supply text to process one way or another — either inline strings or an inputDatasetId (pipeline mode).
| Field | Type | Description |
|---|---|---|
strings | array of strings | The free-form strings to process. Each one produces a single output record. Ignored when inputDatasetId is set. |
inputDatasetId | string | Pipeline mode. ID of an upstream dataset to read input strings from. When set, each item's textField is matched against the patterns, overriding strings. Map {{resource.defaultDatasetId}} from an upstream run to chain Actors. |
textField | string | When reading from inputDatasetId, the dataset item field whose value is matched. Defaults to markdown (the field produced by RAG Web Browser). Items without the field are skipped. |
patterns | array of objects | The named regexes to apply. Each item is { "name", "regex", "flags" }. If omitted, a default contact-extraction set (email, phone, url, linkedin) is used. |
firstMatchOnly | boolean | If true, only the first match of each pattern per string is returned. Defaults to false (all matches). |
Each entry in patterns accepts:
name(string, required) — used as the key undermatchesin the output. Must be unique.regex(string, required) — a Pythonrepattern.flags(string, optional) — any combination ofi(ignore case),m(multiline),s(dotall),x(verbose),a(ASCII).
Example input
{"strings": ["John Doe, john.doe@example.com, +1 (415) 555-0132, https://example.com","Reach Jane at jane_smith@work.co.uk or call 020 7946 0958."],"patterns": [{ "name": "email", "regex": "[\\w.+-]+@[\\w-]+\\.[\\w.-]+", "flags": "i" },{ "name": "phone", "regex": "\\+?\\d[\\d\\s().-]{7,}\\d" },{ "name": "url", "regex": "https?://[^\\s]+" }],"firstMatchOnly": false}
Example input (pipeline mode)
Read text from an upstream run's dataset and apply the default contact-extraction patterns — no strings or patterns needed:
{"inputDatasetId": "{{resource.defaultDatasetId}}","textField": "markdown"}
Output
The Actor pushes one record per input string to the dataset. You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.
{"index": 0,"input": "John Doe, john.doe@example.com, +1 (415) 555-0132, https://example.com","matchCount": 3,"matches": {"email": [{ "value": "john.doe@example.com", "start": 10, "end": 30, "groups": [], "namedGroups": {} }],"phone": [{ "value": "+1 (415) 555-0132", "start": 32, "end": 49, "groups": [], "namedGroups": {} }],"url": [{ "value": "https://example.com", "start": 51, "end": 70, "groups": [], "namedGroups": {} }]}}
Data table
| Field | Description |
|---|---|
index | Zero-based position of the string in the input list. |
input | The original input string. |
matchCount | Total number of matches found across all patterns for this string. |
matches | Object keyed by pattern name; each value is a list of match objects. |
matches.<name>[].value | The matched text (full match, group 0). |
matches.<name>[].start / end | Character offsets of the match within the input string. |
matches.<name>[].groups | Numbered capture groups, if the pattern defines any. |
matches.<name>[].namedGroups | Named capture groups, e.g. (?P<area>\d{3}). |
Cost estimation
This Actor does plain in-memory text processing — no proxies, no browsers — so it's cheap to run. Cost scales with the number and length of input strings and the complexity of your regexes. Most small-to-medium batches comfortably fit within the Apify free tier.
Tips
- Use named capture groups (
(?P<name>...)) to pull sub-parts of a match intonamedGroups. - Enable
firstMatchOnlywhen you only expect one hit per pattern (e.g. a single primary email) to keep output compact. - Remember to escape backslashes when writing regexes in JSON (
\\d,\\w, ...). - Patterns are validated before processing — an invalid regex or a duplicate pattern name fails the run early with a clear message.
- In pipeline mode, double-check that
textFieldmatches the field your upstream Actor actually writes (e.g.text,markdown,html); items lacking that field are skipped and logged.
FAQ and support
Which regex dialect is used? Python's built-in re module.
What happens if a string has no matches? You still get a record for it, with empty match lists and matchCount: 0.
How do I chain this after my scraper? Use pipeline mode: set inputDatasetId to the scraper's output dataset (e.g. {{resource.defaultDatasetId}} in a webhook payload) and textField to the field holding the text. See Pipeline mode above.
Disclaimer: Make sure you have the right to process any text and personal data (such as emails or phone numbers) you pass through this Actor, in line with applicable laws and the source's terms of service.
Found a bug or have a feature request? Open an issue on the Actor's Issues tab.