DataPulse URL Extractor

Deterministic, SSRF-guarded structured-data extraction from any public URL. Returns title, meta tags, headings, links and clean text with a code-computed summary. Optional AI enrichment.

Pricing

$1.00 / 1,000 results

Rating

0.0

(0)

Developer

Ahmed Moussa

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What it does

For each input URL the actor returns a single dataset item with:

url — the requested URL
status — completed, blocked, failed, or empty
data — structured extraction: title, meta_tags, headings, links, text (and, when AI enrichment is enabled, an ai_extracted block)
meta — code-owned, deterministic: extracted_at (real server time), method, and a summary (row_count, numeric stats) computed in code from the extracted data — never trusted from any model.

Input

{
  "url": "https://example.com",
  "urls": ["https://www.iana.org/help/example-domains"],
  "schema_hint": "company info",
  "llm_api_key": "(optional — enables AI enrichment)",
  "llm_model": "deepseek/deepseek-chat"
}

url/urls are public http(s) URLs. If llm_api_key (or the OPENROUTER_API_KEY secret) is empty, the actor returns a fully deterministic, code-only extraction.

Output

One JSON item per URL pushed to the default dataset:

{
  "url": "https://example.com",
  "status": "completed",
  "data": {
    "title": "Example Domain",
    "meta_tags": { "description": "..." },
    "headings": [ { "level": "h1", "text": "Example Domain" } ],
    "links": [ { "href": "https://www.iana.org/domains/example", "text": "More information..." } ],
    "text": "Example Domain This domain is for use in..."
  },
  "meta": { "extracted_at": "2026-06-23T00:00:00+00:00", "method": "deterministic_code", "summary": { "row_count": 1 } }
}

Use cases

Turn an arbitrary public page into clean, structured JSON for a pipeline or LLM prompt.
Pull title / meta / headings / links / body text for SEO, monitoring or content audits.
Lightweight "fetch + parse" step that is safe to run on untrusted URLs (SSRF-guarded).

How it works (deterministic, code-only)

Pure code: HTTP fetch through an SSRF-guarded client, then stdlib/regex HTML parsing to extract title, meta tags, headings, links and clean text. The meta.summary is computed in code from the parsed data. No headless browser, no AI on the default path.

Safety (always on)

SSRF guard — any URL that resolves to a private / loopback / link-local / reserved address is blocked (fail-closed); each redirect hop is re-validated before being followed.
Blocklist — login-walled / ToS-sensitive domains (LinkedIn, Facebook, Instagram, X, Glassdoor, Indeed, Zillow, Yelp, …) are refused.
Bounded fetch — 5s connect / 10s read timeout, 2 MB hard size cap, content-type allowlist, max 3 redirects. Never crashes or hangs.

Limitations (honest)

Pages that render entirely client-side (heavy JS, no server-side HTML) expose less content — there is no headless browser on the default path.
Login-walled / blocklisted domains return status: "blocked" by design.
AI enrichment requires your own OpenRouter key; the actor ships no built-in key.

SaaS Pricing Intelligence Extractor

timely_quarterstaff/saas-pricing-extractor

Deterministic, SSRF-guarded extraction of SaaS pricing tiers from any public pricing-page URL. Returns structured plans (name, price, billing period, features) plus all detected price strings. Pure code, no AI and no paid API by default. Optional AI enrichment with your own key.

Ahmed Moussa

Page to API - Sitemap to JSON

timely_quarterstaff/page-to-api-extractor

Turn any public site URL or sitemap.xml into a clean API-style JSON feed. Crawls a bounded set of pages (hard cap 50/run) and returns one structured record per page: title, meta, headings, links, main text, JSON-LD + OpenGraph. SSRF-guarded, pure code, no AI by default.

Ahmed Moussa

E-commerce Product Scraper

timely_quarterstaff/ecommerce-scraper

Deterministic SSRF-guarded extraction of structured product data from a SINGLE public product-page URL: title, price, currency, availability, brand, rating, reviews, images, SKU, description via JSON-LD/OpenGraph/meta. Pure code, no proxy/headless/AI/paid API. Single-page, not bulk crawling.

Ahmed Moussa

Meta Tags Extractor

krawlify/meta-tags-extractor

Extract SEO meta tags, Open Graph, Twitter Cards, JSON-LD structured data, and headings from any website. Perfect for SEO analysis, competitor research, and content audits.