Siemens Catalog Crawler — MPN Discovery
Pricing
from $0.20 / 1,000 catalog product exporteds
Siemens Catalog Crawler — MPN Discovery
Search Siemens SiePortal live by keywords (6ES, SIMATIC, S7-1200) via OneSearch. Export unique MPNs, titles, category paths & PDP links. Deduplicated output with resume for large catalog builds.
Pricing
from $0.20 / 1,000 catalog product exporteds
Rating
0.0
(0)
Developer
Andrej Kiva
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 hours ago
Last modified
Categories
Share
Crawloop Siemens Automation Suite — Structured data extraction for Siemens SiePortal (Industry Mall), SIOS, and TED product datasheets. Built for procurement teams, system integrators, and BOM engineering workflows.
Suite hub: github.com/PLCSPS-DEV/siemens-sieportal-automation
Product site: crawloop.com/siemens-automation
| Discovery | Enrichment | SIOS documents | TED datasheets |
|---|---|---|---|
| Catalog Crawler | SiePortal Scraper | Document Downloader | TED Datasheet Downloader |
| Lifecycle Tracker | Document PDF Parser | TED Datasheet Parser |
Disclaimer: This is an unofficial integration developed independently of Siemens AG. It is not affiliated with, sponsored by, or endorsed by Siemens AG or any of its subsidiaries.
Siemens, SiePortal, SIMATIC, and related names are trademarks of Siemens AG. Product data is read from publicly accessible Siemens web sources only; no proprietary databases are redistributed.
This Actor is provided for informational and research purposes only (e.g. procurement research, BOM audits, internal engineering workflows). You are solely responsible for ensuring your use complies with applicable laws, Siemens website terms of use, and your organization's policies.
No warranty is given as to accuracy, completeness, or continued availability of third-party data. Use at your own risk.
Search the official Siemens SiePortal catalog by keyword and export live product part numbers (MPNs). Each result is fetched from Siemens OneSearch inside a stealth browser session — no static MPN lists, no stale exports.
Use this Actor to build MPN inventories for product families (6ES, SIMATIC, S7-1200), procurement databases, or as input for downstream enrichment with the SiePortal Scraper and Lifecycle Tracker.
When to use this Actor
Use the Catalog Crawler when you need to discover Siemens part numbers by search terms or crawl the full catalog index (~350k+ MPNs) via the authenticated OneSearch API.
For full PDP specifications on a known MPN list, use the SiePortal Scraper.
Siemens Automation Pipeline
Phase 1 — Discover MPNs Phase 2 — Screen & enrich Phase 3 — Documents & specs───────────────────────── ───────────────────────── ─────────────────────────────Catalog Crawler ◄── you are here│▼MPN list ──► Lifecycle Tracker ──► SiePortal Scraper││┌────────────────────────────┴────────────────────────────┐│ │▼ ▼Document Downloader (SIOS) TED Datasheet Downloadercertificates, manuals, CAD compact catalog PDFs│ │▼ ▼Document PDF Parser TED Datasheet Parserspecs from SIOS PDFs specs from TED PDFs
Key Features
- Keyword search — Provide product families, prefixes, or terms; the Actor queries Siemens OneSearch live.
- Multi-sort crawl — Each keyword is queried with multiple
sortingOptionsto maximize coverage; deduplication removes overlap. - Unique output — Each MPN emitted once (
deduplicateOutput, defaulttrue). - Full catalog mode — Optional crawl of the entire Siemens index with adaptive shard subdivision and resume checkpoints.
- Akamai bypass — CloakBrowser stealth fingerprinting and session bootstrap for authenticated OneSearch API access.
- Resume support — Skip completed search terms on interrupted runs.
How It Works
- Bootstrap SiePortal search page (Akamai WAF).
- Capture Bearer JWT for
POST /api/onesearch/search?api-version=2.0. - For each
searchKeywordsterm, query OneSearch and paginate results. - Auto-subdivide large result sets into child shards.
- Push unique MPN records to the dataset with category breadcrumbs.
Input Parameters
| Parameter | Description | Default |
|---|---|---|
searchKeywords | Required for keywords mode — product families, prefixes, or terms. | [] |
sortingOptions | OneSearch sort orders per keyword (multiple = more coverage). | Relevance + 3 more |
useMultipleSorts | Crawl every keyword with all sorting options. | true |
discoveryMode | keywords, full_catalog, or catalog_tree. | keywords |
locale | SiePortal locale (en-ww, en-nl, …). | en-ww |
maxSearchShards | Cap terms processed (0 = unlimited; use for testing). | 0 |
concurrencyLimit | Parallel OneSearch workers. | 3 |
deduplicateOutput | Emit each MPN only once. | true |
resumeFromCheckpoint | Resume interrupted runs. | true |
proxyConfiguration | Apify proxy settings. Residential proxies recommended. | — |
Input Example — keyword search (recommended)
{"searchKeywords": ["6ES", "SIMATIC", "S7-1200", "3RT", "contactor"],"discoveryMode": "keywords","concurrencyLimit": 3,"deduplicateOutput": true,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Input Example — smoke test
{"searchKeywords": ["6ES"],"maxSearchShards": 3,"concurrencyLimit": 2,"resumeFromCheckpoint": false}
Input Example — full catalog (~350k MPNs)
{"discoveryMode": "full_catalog","concurrencyLimit": 3,"resumeFromCheckpoint": true}
Output Format
Each dataset record:
| Field | Description |
|---|---|
partNumber | Siemens article number (MPN) |
title | Short product title |
description | Product description |
pdpUrl | Product detail page URL |
categoryId | Parent category node ID |
categoryPath | Breadcrumb trail |
source | Always onesearch (live Siemens API) |
searchShard | Keyword/shard that found this product |
sortingOption | Sort order that surfaced this product |
Typical Workflow
Keywords (6ES, SIMATIC, …)│▼Catalog Crawler → deduplicated MPN list with PDP URLs│▼Lifecycle Tracker → screen for obsolete parts│▼SiePortal Scraper → full specs on selected MPNs
Actor Comparison
| Task | Catalog Crawler | SiePortal Scraper |
|---|---|---|
| Discover MPNs by keyword | Yes | No |
| Full PDP specifications | No | Yes |
| Full catalog index (~350k) | Yes | No |
Pricing
Pay-per-event billing. You are charged only for unique MPNs pushed to the dataset when deduplication is enabled.
| Event | Price |
|---|---|
| Actor start | $0.05 per run |
| Catalog product | $0.20 / 1,000 MPNs ($0.0002 per unique product) |
Configure these events in the Apify Store Publication tab: apify-actor-start and catalog-product. Disable apify-default-dataset-item to avoid double billing.
Residential proxies are strongly recommended. full_catalog mode can take many hours; all output is scraped live from Siemens.
Learn more: Product page · Suite hub · GitHub docs
Also from Crawloop Industrial: Rockwell Automation Suite · GitHub docs