Siemens Document PDF Parser — Specs JSON
Pricing
from $25.00 / 1,000 parsed pdf documents
Siemens Document PDF Parser — Specs JSON
Parse Siemens PDF documents (manuals, datasheets, certificates) into structured JSON: specification key-values, technical tables, limit values. Chain with Document Downloader or discover by MPN.
Pricing
from $25.00 / 1,000 parsed pdf documents
Rating
0.0
(0)
Developer
Andrej Kiva
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
13 hours ago
Last modified
Categories
Share
Crawloop Siemens Automation Suite — Structured data extraction for Siemens SiePortal (Industry Mall), SIOS, and TED product datasheets. Built for procurement teams, system integrators, and BOM engineering workflows.
Suite hub: github.com/PLCSPS-DEV/siemens-sieportal-automation
| Discovery | Enrichment | SIOS documents | TED datasheets |
|---|---|---|---|
| Catalog Crawler | SiePortal Scraper | Document Downloader | TED Datasheet Downloader |
| Category Scraper | Lifecycle Tracker | Document PDF Parser | TED Datasheet Parser |
Disclaimer: This is an unofficial integration developed independently of Siemens AG. It is not affiliated with, sponsored by, or endorsed by Siemens AG or any of its subsidiaries.
Siemens, SiePortal, SIMATIC, and related names are trademarks of Siemens AG. Product data is read from publicly accessible Siemens web sources only; no proprietary databases are redistributed.
This Actor is provided for informational and research purposes only (e.g. procurement research, BOM audits, internal engineering workflows). You are solely responsible for ensuring your use complies with applicable laws, Siemens website terms of use, and your organization's policies.
No warranty is given as to accuracy, completeness, or continued availability of third-party data. Use at your own risk.
Parse Siemens PDF documents from SIOS and SiePortal — equipment manuals, certificates, brochures, and other attachments — into structured JSON. Extracts specification key-value pairs, technical tables, limit values, and optional wiring diagrams from the PDF file itself, not from the SiePortal web page.
Recommended workflow: Run the Document Downloader first to save PDFs to Key-Value Store, then pass keyValueStoreId and items to this Actor for parse-only extraction (no browser, lower cost).
For compact TED product datasheets from Industry Mall, use the TED Datasheet Parser instead.
When to use this Actor
Use the Document PDF Parser when you have Siemens SIOS PDFs (manuals, certificates, brochures) and need structured specifications, tables, and limit values in JSON.
Use the TED Datasheet Parser for official TED catalog datasheet PDFs. Use the SiePortal Scraper for web PDP specifications without PDF parsing.
Siemens Automation Pipeline
Phase 1 — Discover MPNs Phase 2 — Screen & enrich Phase 3 — Documents & specs───────────────────────── ───────────────────────── ─────────────────────────────Catalog Crawler ──┐├──► MPN list ──► Lifecycle Tracker ──► SiePortal ScraperCategory Scraper ──┘ ││┌────────────────────────────┴────────────────────────────┐│ │▼ ▼Document Downloader (SIOS) TED Datasheet Downloadercertificates, manuals, CAD compact catalog PDFs│ │▼ ▼Document PDF Parser ◄── you are here TED Datasheet Parserspecs from SIOS PDFs specs from TED PDFs
Key Features
- Any SIOS PDF — Manuals, certificates, brochures — whatever the Document Downloader saved.
- Specification extraction — Parameter/value tables and inline key-value lines from PDF content.
- Technical tables — Raw table rows with page numbers for downstream normalization.
- Limit values — Rows mentioning limits, minimum, and maximum ratings.
- Wiring diagrams — Optional image extraction from PDF pages to Key-Value Store.
- Chained workflow — Reads PDFs from the current run or a previous Document Downloader run.
- Parse-only mode — No browser when PDFs are already in Key-Value Store.
Input Parameters
| Parameter | Description | Default |
|---|---|---|
items | Recommended. PDFs to parse with keyValueStoreKey from Document Downloader output. | — |
keyValueStoreKeys | Alternative: list of PDF keys only. | — |
keyValueStoreId | Store ID from a previous Document Downloader run. | — |
searchTerms | Discover mode: find one document PDF per MPN on SiePortal, download, parse. Requires proxy. | — |
locale | SiePortal locale for discover mode. | en-nl |
extractDiagrams | Save large images from PDF pages to Key-Value Store. | false |
maxConcurrency | Parallel PDF processing (1–8). | 3 |
proxyConfiguration | Required for searchTerms discover mode on Apify Cloud. | — |
Input Example (recommended — after Document Downloader)
{"keyValueStoreId": "YOUR_DOCUMENT_DOWNLOADER_STORE_ID","items": [{"partNumber": "6ES7193-6BP00-0DA0","keyValueStoreKey": "6ES7193-6BP00-0DA0_110003392_ET200SP_FM16US0053XSupp56.pdf","fileName": "ET200SP_FM16US0053XSupp56.pdf"}],"extractDiagrams": false,"maxConcurrency": 3}
Output Format
{"partNumber": "6ES7193-6BP00-0DA0","fileName": "ET200SP_FM16US0053XSupp56.pdf","keyValueStoreKey": "6ES7193-6BP00-0DA0_110003392_ET200SP_FM16US0053XSupp56.pdf","status": "PARSED","pageCount": 42,"tableCount": 8,"specificationCount": 35,"specifications": {"Article number": "6ES7193-6BP00-0DA0","Net weight": "0.05 kg","Supply voltage": "24 V DC"},"tables": [{"page": 12,"rowCount": 15,"rows": [["Parameter", "Value", "Unit"], ["Supply voltage", "24", "V DC"]]}],"limits": [],"diagramCount": 0,"parsedAt": "2026-06-18T15:00:00+00:00"}
Status values
status | Meaning |
|---|---|
PARSED | Specifications and/or tables extracted from PDF |
PARTIAL | Tables found but few/no key-value specs |
METADATA_ONLY | PDF read but no structured tables detected |
NO_DATASHEET | Discover mode: product found, no document link on PDP |
NO_DOWNLOAD_LINK | SIOS page found but no PDF attachment |
NOT_FOUND | Part number not found on SiePortal |
FAILED | Parse or load error |
Typical Workflow
Document Downloader → PDF files in Key-Value Store│▼Document PDF Parser → specifications, tables, limits (from PDF)│▼SiePortal Scraper → web specs, lifecycle, related products (optional)
Actor Comparison
| Task | Document PDF Parser | Document Downloader | TED Datasheet Parser | SiePortal Scraper |
|---|---|---|---|---|
| Parse SIOS/manual PDFs | Yes | No | No | No |
| Parse TED catalog PDFs | No | No | Yes | No |
| Download PDFs | Discover mode only | Yes | No | Links only |
| Web PDP specifications | No | No | No | Yes |
Pricing
Pay-per-event billing. Parse-only mode (PDFs already in Key-Value Store) is significantly lower cost than discover mode with browser.