HTML to JSON Smart Parser
Pricing
Pay per event
HTML to JSON Smart Parser
Convert HTML to structured JSON using AI! Uses OpenAI to extract and structure data from HTML into clean JSON format. Perfect for developers and data analysts who need to transform HTML into structured data without manual parsing.
Pricing
Pay per event
Rating
5.0
(2)
Developer
ParseForge
Actor stats
0
Bookmarked
34
Total users
1
Monthly active users
12 hours ago
Last modified
Categories
Share

🧩 HTML to JSON Smart Parser
🚀 Convert HTML into structured JSON in seconds. Bring your own OpenAI API key. URL fetch, paste HTML, or upload files. No bespoke parsers.
🕒 Last updated: 2026-05-09 · 🧠 BYO OpenAI key · 📥 URL / paste / file upload · 🔑 BYO model selection
Convert HTML into clean structured JSON without writing a parser per page. Provide one or more URLs, paste HTML directly, or upload HTML files, then specify (or auto-detect) which fields to extract. The actor sends the HTML to your OpenAI account using your API key, parses the response, and returns one structured record per input. Built for developers who want layout-agnostic HTML extraction without bespoke selector code.
You bring your own OpenAI API key, so all model usage is billed directly to your OpenAI account. Choose the model (gpt-4o, gpt-4o-mini, gpt-3.5-turbo, etc.) based on your accuracy and cost trade-offs.
| 👥 Built for | 🎯 Primary use cases |
|---|---|
| Developers | Skip writing CSS selectors and XPath queries |
| Data engineers | Build layout-agnostic data pipelines |
| AI ops | Convert HTML into structured prompts for LLM workflows |
| Researchers | Index HTML archives without bespoke parsers |
| Content ops | Migrate HTML content into structured DBs |
| Indie devs | Add HTML parsing to side projects without a parser |
📋 What the HTML to JSON Smart Parser does
- 🌐 Three input modes. URL fetch, paste raw HTML, or upload HTML file URLs.
- 🧠 AI-driven extraction. Sends HTML to OpenAI with your key for layout-agnostic parsing.
- 🎯 Field selection. Specify which fields to extract or let the AI auto-detect.
- 🤖 Model choice. gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, or gpt-5 when available.
- ✏️ Custom prompts. Optional system prompt to bias the extraction.
- 🆔 Per-input metadata. Each record carries the source URL, prompt, and timestamp.
The actor processes inputs in the order you provide them. Records stream into the dataset as parsing completes.
💡 Why it matters: writing a parser per page type costs hours and breaks with every layout change. AI-driven extraction adapts to layout variation without code changes, so dev teams can ship structured-data features faster.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing URL input, custom field extraction, and how to feed the output into a downstream pipeline.
⚙️ Input
| Field | Type | Name | Description |
|---|---|---|---|
url | array | URL (Fetch HTML) | URLs to fetch HTML from. The actor does a plain HTTP GET. |
htmlContent | string | HTML Content (Paste) | Optional. Paste raw HTML directly. |
htmlFileUrl | array | HTML File URL (Upload) | Optional. URLs to uploaded HTML files. |
openAIApiKey | string | OpenAI API Key | Required. Your OpenAI API key. The actor uses this for the model call. |
model | enum | OpenAI Model | gpt-4o-mini (default), gpt-4o, gpt-4-turbo, gpt-3.5-turbo, gpt-5. |
Example 1. URL extraction with default model.
{"url": [{"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"}],"openAIApiKey": "sk-...","model": "gpt-4o-mini"}
Example 2. Paste HTML directly.
{"htmlContent": "<html><body><h1>Title</h1><p>Body</p></body></html>","openAIApiKey": "sk-...","model": "gpt-4o"}
⚠️ Good to Know: you must supply your own OpenAI API key. All model usage is billed to your OpenAI account.
📊 Output
The dataset returns one structured record per input. Each record carries the source identifier, extracted JSON, the model used, and a timestamp. Consume the dataset as JSON, CSV, Excel, XML, or RSS via the Apify console or API.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🌐 sourceUrl | string (url) or null | https://books.toscrape.com/.../1000/index.html |
📦 parsedData | object | {"title":"A Light in the Attic","price":51.77,"availability":"In stock"} |
🤖 model | string | gpt-4o-mini |
🎯 prompt | string | Extract title, price, and availability |
📅 timestamp | ISO datetime | 2026-05-09T12:00:00.000Z |
❗ error | string or null | null |
📦 Sample records
1. URL extraction (book product page)
{"sourceUrl": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html","parsedData": {"title": "A Light in the Attic","price": 51.77,"availability": "In stock","rating": "Three","description": "It's hard to imagine a world without A Light in the Attic..."},"model": "gpt-4o-mini","prompt": "Extract title, price, availability, rating, and description","timestamp": "2026-05-09T12:00:00.000Z","error": null}
2. Pasted HTML (simple page)
{"sourceUrl": null,"parsedData": {"title": "Welcome","body": "Today we launched our new product..."},"model": "gpt-4o","timestamp": "2026-05-09T12:00:00.000Z","error": null}
3. Failed parse (missing API key)
{"sourceUrl": "https://example.com/page.html","parsedData": null,"model": "gpt-4o-mini","timestamp": "2026-05-09T12:00:00.000Z","error": "Missing OpenAI API key"}
✨ Why choose this Actor
| Capability | |
|---|---|
| 🎯 | Built for the job. Single-purpose HTML-to-JSON pipeline with sensible defaults. |
| 🧠 | BYO OpenAI key. All model usage billed directly to your OpenAI account. |
| ⚙️ | Model choice. Pick model based on accuracy and cost trade-offs. |
| 🔁 | Live processing. Every run runs end to end with no caching of input HTML. |
| 🌐 | No infra to manage. Apify handles compute, scaling, scheduling, and storage. |
| 🛡️ | Reliable. Per-input error reporting means one bad URL does not kill the whole run. |
| 🚫 | No code required. Configure in the UI, run from CLI, schedule via cron, or call from any language with the Apify SDK. |
📊 Production-grade HTML-to-JSON conversion without writing or maintaining custom parsers.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Quality | Setup |
|---|---|---|---|---|---|
| ⭐ HTML to JSON Smart Parser (this Actor) | $5 free credit + your OpenAI usage | Any HTML | Live per run | High, layout-agnostic | ⚡ 2 min |
| Hand-written parsers | Engineering hours | Per layout | Whenever you maintain it | High but brittle | 🐢 Days to weeks |
| Paid HTML-extraction SaaS | $$ monthly | Limited | Live | Variable | ⏳ Hours |
| Manual review | Hours per file | One at a time | Stale | Highest | 🕒 Variable |
Pick this Actor when you want flexible, layout-agnostic HTML parsing without owning the model integration.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the HTML to JSON Smart Parser page on the Apify Store.
- 🎯 Set inputs. Provide URLs, paste HTML, or upload files. Add your OpenAI API key.
- 🚀 Run it. Click Start and let the Actor parse each input.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to first parsed JSON: 3-5 minutes for a single URL.
💼 Business use cases
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🔌 Automating HTML to JSON Smart Parser
This Actor exposes a REST endpoint, so you can drive it from any language or workflow tool.
- Node.js - call it via the Apify JS SDK.
- Python - call it via the Apify Python SDK.
- REST - hit it directly through the Apify v2 API.
Schedules. Use Apify Scheduler to batch-parse a folder of HTML inputs. Combine with webhooks to trigger downstream workflows when parsing completes.
❓ Frequently Asked Questions
💳 Do I need a paid Apify plan to run this actor?
No, but you do need an OpenAI API key. You can start the actor on the free Apify plan (which includes $5 in monthly credit), but model calls are billed to your OpenAI account.
🚨 What happens if my run fails or returns no results?
Failed runs are not charged on Apify. If a single input fails, the actor records the error on that record only. If the OpenAI key is invalid or out of credits, the actor logs the error.
🧠 Why do I need to bring my own OpenAI key?
So your model usage is metered against your OpenAI account, with full control over rate limits and billing. We never see or store your key.
🤖 Which model should I pick?
gpt-4o-mini is the recommended default for cost. gpt-4o is more accurate for complex layouts. gpt-3.5-turbo is cheapest but less reliable on dense pages.
📥 Which input mode should I use?
URLs are simplest for public pages. Paste HTML when you have content not on the public web. Upload HTML files for bulk processing.
🧑💻 Can I call this actor from my own code?
Yes. Apify exposes every actor as a REST endpoint and ships first-class SDKs for Node.js and Python.
📤 How do I export the data?
Every Apify dataset can be downloaded in one click as CSV, JSON, JSONL, Excel, HTML, XML, or RSS.
📅 Can I schedule the actor to run automatically?
Yes. Use the Apify scheduler to parse new URLs on a cadence. Wire to webhooks for trigger-driven parsing.
🏪 Can I use the data commercially?
Yes. Parsed data is yours to use, subject to your rights to the source HTML.
💼 Which plan should I pick for production use?
Apify's Starter and Scale plans are designed for production workloads. OpenAI usage is billed separately to your OpenAI account.
🛠️ Can you add other LLM providers?
Open the contact form and tell us about your use case. We add features regularly when there is a clear use case behind the request.
⚖️ Is it legal to use this Actor?
Yes, provided you have rights to the source HTML. You are responsible for compliance with OpenAI's terms, source-site terms, and applicable copyright laws.
🔌 Integrate with any app
HTML to JSON Smart Parser connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe results into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes.
🔗 Recommended Actors
- 📄 PDF to JSON Parser - Convert PDFs into structured JSON
- 📰 Article Extractor - Extract clean article text from any URL
- 🌐 Website Content Crawler - Crawl and extract clean content from any site
- 🔍 RAG Web Browser - Fetch clean text for AI retrieval pipelines
- 🎤 Audio Transcriber - Convert audio recordings to structured text
💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.
🆘 Need Help? Open our contact form to request a new actor, propose a custom project, or report an issue.
⚠️ Disclaimer. This Actor is an independent tool. The actor processes only HTML you supply by URL, paste, or upload, and is intended for legitimate data-extraction workflows. Users are responsible for ensuring they hold the rights to the source content and for compliance with copyright, OpenAI's terms of service, and applicable law in their jurisdiction.


