Siemens Document PDF Parser — Specs JSON avatar

Siemens Document PDF Parser — Specs JSON

Pricing

from $25.00 / 1,000 parsed pdf documents

Go to Apify Store
Siemens Document PDF Parser — Specs JSON

Siemens Document PDF Parser — Specs JSON

Parse Siemens PDF documents (manuals, datasheets, certificates) into structured JSON: specification key-values, technical tables, limit values. Chain with Document Downloader or discover by MPN.

Pricing

from $25.00 / 1,000 parsed pdf documents

Rating

0.0

(0)

Developer

Andrej Kiva

Andrej Kiva

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

13 hours ago

Last modified

Share

Crawloop Siemens Automation Suite — Structured data extraction for Siemens SiePortal (Industry Mall), SIOS, and TED product datasheets. Built for procurement teams, system integrators, and BOM engineering workflows.

Suite hub: github.com/PLCSPS-DEV/siemens-sieportal-automation

DiscoveryEnrichmentSIOS documentsTED datasheets
Catalog CrawlerSiePortal ScraperDocument DownloaderTED Datasheet Downloader
Category ScraperLifecycle TrackerDocument PDF ParserTED Datasheet Parser

Disclaimer: This is an unofficial integration developed independently of Siemens AG. It is not affiliated with, sponsored by, or endorsed by Siemens AG or any of its subsidiaries.

Siemens, SiePortal, SIMATIC, and related names are trademarks of Siemens AG. Product data is read from publicly accessible Siemens web sources only; no proprietary databases are redistributed.

This Actor is provided for informational and research purposes only (e.g. procurement research, BOM audits, internal engineering workflows). You are solely responsible for ensuring your use complies with applicable laws, Siemens website terms of use, and your organization's policies.

No warranty is given as to accuracy, completeness, or continued availability of third-party data. Use at your own risk.

Parse Siemens PDF documents from SIOS and SiePortal — equipment manuals, certificates, brochures, and other attachments — into structured JSON. Extracts specification key-value pairs, technical tables, limit values, and optional wiring diagrams from the PDF file itself, not from the SiePortal web page.

Recommended workflow: Run the Document Downloader first to save PDFs to Key-Value Store, then pass keyValueStoreId and items to this Actor for parse-only extraction (no browser, lower cost).

For compact TED product datasheets from Industry Mall, use the TED Datasheet Parser instead.

When to use this Actor

Use the Document PDF Parser when you have Siemens SIOS PDFs (manuals, certificates, brochures) and need structured specifications, tables, and limit values in JSON.

Use the TED Datasheet Parser for official TED catalog datasheet PDFs. Use the SiePortal Scraper for web PDP specifications without PDF parsing.

Siemens Automation Pipeline

Phase 1 — Discover MPNs Phase 2 — Screen & enrich Phase 3 — Documents & specs
───────────────────────── ───────────────────────── ─────────────────────────────
Catalog Crawler ──┐
├──► MPN list ──► Lifecycle Tracker ──► SiePortal Scraper
Category Scraper ──┘ │
┌────────────────────────────┴────────────────────────────┐
│ │
▼ ▼
Document Downloader (SIOS) TED Datasheet Downloader
certificates, manuals, CAD compact catalog PDFs
│ │
▼ ▼
Document PDF Parser ◄── you are here TED Datasheet Parser
specs from SIOS PDFs specs from TED PDFs

Key Features

  • Any SIOS PDF — Manuals, certificates, brochures — whatever the Document Downloader saved.
  • Specification extraction — Parameter/value tables and inline key-value lines from PDF content.
  • Technical tables — Raw table rows with page numbers for downstream normalization.
  • Limit values — Rows mentioning limits, minimum, and maximum ratings.
  • Wiring diagrams — Optional image extraction from PDF pages to Key-Value Store.
  • Chained workflow — Reads PDFs from the current run or a previous Document Downloader run.
  • Parse-only mode — No browser when PDFs are already in Key-Value Store.

Input Parameters

ParameterDescriptionDefault
itemsRecommended. PDFs to parse with keyValueStoreKey from Document Downloader output.
keyValueStoreKeysAlternative: list of PDF keys only.
keyValueStoreIdStore ID from a previous Document Downloader run.
searchTermsDiscover mode: find one document PDF per MPN on SiePortal, download, parse. Requires proxy.
localeSiePortal locale for discover mode.en-nl
extractDiagramsSave large images from PDF pages to Key-Value Store.false
maxConcurrencyParallel PDF processing (1–8).3
proxyConfigurationRequired for searchTerms discover mode on Apify Cloud.
{
"keyValueStoreId": "YOUR_DOCUMENT_DOWNLOADER_STORE_ID",
"items": [
{
"partNumber": "6ES7193-6BP00-0DA0",
"keyValueStoreKey": "6ES7193-6BP00-0DA0_110003392_ET200SP_FM16US0053XSupp56.pdf",
"fileName": "ET200SP_FM16US0053XSupp56.pdf"
}
],
"extractDiagrams": false,
"maxConcurrency": 3
}

Output Format

{
"partNumber": "6ES7193-6BP00-0DA0",
"fileName": "ET200SP_FM16US0053XSupp56.pdf",
"keyValueStoreKey": "6ES7193-6BP00-0DA0_110003392_ET200SP_FM16US0053XSupp56.pdf",
"status": "PARSED",
"pageCount": 42,
"tableCount": 8,
"specificationCount": 35,
"specifications": {
"Article number": "6ES7193-6BP00-0DA0",
"Net weight": "0.05 kg",
"Supply voltage": "24 V DC"
},
"tables": [
{
"page": 12,
"rowCount": 15,
"rows": [["Parameter", "Value", "Unit"], ["Supply voltage", "24", "V DC"]]
}
],
"limits": [],
"diagramCount": 0,
"parsedAt": "2026-06-18T15:00:00+00:00"
}

Status values

statusMeaning
PARSEDSpecifications and/or tables extracted from PDF
PARTIALTables found but few/no key-value specs
METADATA_ONLYPDF read but no structured tables detected
NO_DATASHEETDiscover mode: product found, no document link on PDP
NO_DOWNLOAD_LINKSIOS page found but no PDF attachment
NOT_FOUNDPart number not found on SiePortal
FAILEDParse or load error

Typical Workflow

Document Downloader → PDF files in Key-Value Store
Document PDF Parser → specifications, tables, limits (from PDF)
SiePortal Scraper → web specs, lifecycle, related products (optional)

Actor Comparison

TaskDocument PDF ParserDocument DownloaderTED Datasheet ParserSiePortal Scraper
Parse SIOS/manual PDFsYesNoNoNo
Parse TED catalog PDFsNoNoYesNo
Download PDFsDiscover mode onlyYesNoLinks only
Web PDP specificationsNoNoNoYes

Pricing

Pay-per-event billing. Parse-only mode (PDFs already in Key-Value Store) is significantly lower cost than discover mode with browser.