Text-to-JSON Structured Extractor
Pricing
from $10.00 / 1,000 results
Text-to-JSON Structured Extractor
A versatile Apify actor that converts unstructured text and HTML into clean, structured JSON. Supports four extraction modes with auto-detection, URL fetching, and batch processing.
Pricing
from $10.00 / 1,000 results
Rating
0.0
(0)
Developer
Jamshaid Arif
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
A versatile Apify actor that converts unstructured text and HTML into clean, structured JSON. Supports four extraction modes with auto-detection, URL fetching, and batch processing.
🎯 What It Does
| Mode | Input | Output |
|---|---|---|
| Resume | Plain-text resume/CV | Contact info, experience, education, skills, certifications |
| E-Commerce | Product page HTML | Product name, price, brand, SKU, rating, images, availability |
| Blog SEO | Blog/article HTML | SEO score (A–F), meta tags, headings, links, content stats, recommendations |
| Chat Log | Chat exports (WhatsApp, Slack, Discord, IRC) | Messages, participants, topics, shared links, statistics |
| Auto | Any of the above | Detects the best extractor automatically |
🚀 Quick Start
Minimal Input (uses defaults)
{"extractionMode": "auto","inputType": "raw_text","rawInput": "Jane Smith\njane@email.com\n\nExperience\nEngineer at Google..."}
Fetch from URLs
{"extractionMode": "ecommerce","inputType": "urls","urls": ["https://example.com/products/widget-pro","https://example.com/products/widget-lite"],"outputFormat": "compact","maxConcurrency": 10}
Blog SEO Audit
{"extractionMode": "blog_seo","inputType": "urls","urls": ["https://myblog.com/latest-post"],"outputFormat": "full"}
📥 Input Schema
| Field | Type | Default | Description |
|---|---|---|---|
extractionMode | enum | "auto" | resume, ecommerce, blog_seo, chat_log, or auto |
inputType | enum | "raw_text" | raw_text, urls, or key_value_store |
rawInput | string | (sample resume) | Direct text/HTML input |
urls | string[] | [] | URLs to fetch content from |
kvStoreKeys | string[] | [] | Keys to read from KV store |
chatLogFormat | enum | "auto" | auto, whatsapp, slack, discord, irc, generic, simple |
outputFormat | enum | "full" | full, compact, or flat |
includeSourceText | boolean | false | Include original text in output |
maxConcurrency | integer | 5 | Parallel URL fetches (1–20) |
proxyConfiguration | object | Apify Proxy | Proxy settings for URL fetching |
requestTimeoutSecs | integer | 30 | URL fetch timeout (5–120) |
📤 Output Format
Each dataset record looks like:
{"source": "raw_input","extraction_mode": "resume","output_format": "full","success": true,"error": null,"data": { ... }}
Output Modes
- full — All extracted fields, deeply nested
- compact — Key fields only (great for dashboards)
- flat — Single-level dict with underscore-separated keys (great for spreadsheets)
🔍 Extraction Details
Resume Extractor
Detects and parses: name, email, phone, LinkedIn, GitHub, location, summary, work history with bullet points, education with GPA, categorized skills, projects, certifications, and languages.
E-Commerce Extractor
Three-priority pipeline: (1) JSON-LD Schema.org, (2) Open Graph meta tags, (3) HTML class-based parsing. Extracts product name, description, price with currency, brand, SKU, availability, rating, review count, and images.
Blog SEO Extractor
Produces a complete SEO audit with a score (0–100, grade A–F) based on 14 weighted checks. Analyzes title, meta description, Open Graph, Twitter Card, heading hierarchy, image alt text, internal/external links, structured data, content length, and more.
Chat Log Extractor
Auto-detects format from WhatsApp, Slack, Discord, IRC, and generic patterns. Builds participant profiles (message count, word average), extracts shared links with context, identifies topics via keyword frequency, and counts media messages.
🧪 Running Locally
# Install dependenciespip install -r requirements.txt# Run with Apify CLIapify run --input-file=INPUT.json
📋 Example Output (Compact Resume)
{"name": "John Doe","email": "johndoe@email.com","phone": "(555) 123-4567","location": "New York, NY","summary": "Full-stack developer with 5+ years...","skills": ["Python", "JavaScript", "TypeScript", "React", "Django", "AWS"],"experience_count": 2,"education_count": 1,"certifications": ["AWS Solutions Architect Associate", "Certified Kubernetes Administrator"],"languages": ["English (Native)", "French (Conversational)"]}