Text-to-JSON Structured Extractor avatar

Text-to-JSON Structured Extractor

Pricing

from $10.00 / 1,000 results

Go to Apify Store
Text-to-JSON Structured Extractor

Text-to-JSON Structured Extractor

A versatile Apify actor that converts unstructured text and HTML into clean, structured JSON. Supports four extraction modes with auto-detection, URL fetching, and batch processing.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Jamshaid Arif

Jamshaid Arif

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

2 days ago

Last modified

Share

A versatile Apify actor that converts unstructured text and HTML into clean, structured JSON. Supports four extraction modes with auto-detection, URL fetching, and batch processing.


🎯 What It Does

ModeInputOutput
ResumePlain-text resume/CVContact info, experience, education, skills, certifications
E-CommerceProduct page HTMLProduct name, price, brand, SKU, rating, images, availability
Blog SEOBlog/article HTMLSEO score (A–F), meta tags, headings, links, content stats, recommendations
Chat LogChat exports (WhatsApp, Slack, Discord, IRC)Messages, participants, topics, shared links, statistics
AutoAny of the aboveDetects the best extractor automatically

🚀 Quick Start

Minimal Input (uses defaults)

{
"extractionMode": "auto",
"inputType": "raw_text",
"rawInput": "Jane Smith\njane@email.com\n\nExperience\nEngineer at Google..."
}

Fetch from URLs

{
"extractionMode": "ecommerce",
"inputType": "urls",
"urls": [
"https://example.com/products/widget-pro",
"https://example.com/products/widget-lite"
],
"outputFormat": "compact",
"maxConcurrency": 10
}

Blog SEO Audit

{
"extractionMode": "blog_seo",
"inputType": "urls",
"urls": ["https://myblog.com/latest-post"],
"outputFormat": "full"
}

📥 Input Schema

FieldTypeDefaultDescription
extractionModeenum"auto"resume, ecommerce, blog_seo, chat_log, or auto
inputTypeenum"raw_text"raw_text, urls, or key_value_store
rawInputstring(sample resume)Direct text/HTML input
urlsstring[][]URLs to fetch content from
kvStoreKeysstring[][]Keys to read from KV store
chatLogFormatenum"auto"auto, whatsapp, slack, discord, irc, generic, simple
outputFormatenum"full"full, compact, or flat
includeSourceTextbooleanfalseInclude original text in output
maxConcurrencyinteger5Parallel URL fetches (1–20)
proxyConfigurationobjectApify ProxyProxy settings for URL fetching
requestTimeoutSecsinteger30URL fetch timeout (5–120)

📤 Output Format

Each dataset record looks like:

{
"source": "raw_input",
"extraction_mode": "resume",
"output_format": "full",
"success": true,
"error": null,
"data": { ... }
}

Output Modes

  • full — All extracted fields, deeply nested
  • compact — Key fields only (great for dashboards)
  • flat — Single-level dict with underscore-separated keys (great for spreadsheets)

🔍 Extraction Details

Resume Extractor

Detects and parses: name, email, phone, LinkedIn, GitHub, location, summary, work history with bullet points, education with GPA, categorized skills, projects, certifications, and languages.

E-Commerce Extractor

Three-priority pipeline: (1) JSON-LD Schema.org, (2) Open Graph meta tags, (3) HTML class-based parsing. Extracts product name, description, price with currency, brand, SKU, availability, rating, review count, and images.

Blog SEO Extractor

Produces a complete SEO audit with a score (0–100, grade A–F) based on 14 weighted checks. Analyzes title, meta description, Open Graph, Twitter Card, heading hierarchy, image alt text, internal/external links, structured data, content length, and more.

Chat Log Extractor

Auto-detects format from WhatsApp, Slack, Discord, IRC, and generic patterns. Builds participant profiles (message count, word average), extracts shared links with context, identifies topics via keyword frequency, and counts media messages.


🧪 Running Locally

# Install dependencies
pip install -r requirements.txt
# Run with Apify CLI
apify run --input-file=INPUT.json

📋 Example Output (Compact Resume)

{
"name": "John Doe",
"email": "johndoe@email.com",
"phone": "(555) 123-4567",
"location": "New York, NY",
"summary": "Full-stack developer with 5+ years...",
"skills": ["Python", "JavaScript", "TypeScript", "React", "Django", "AWS"],
"experience_count": 2,
"education_count": 1,
"certifications": ["AWS Solutions Architect Associate", "Certified Kubernetes Administrator"],
"languages": ["English (Native)", "French (Conversational)"]
}