Siemens Catalog Crawler — MPN Discovery avatar

Siemens Catalog Crawler — MPN Discovery

Pricing

from $0.20 / 1,000 catalog product exporteds

Go to Apify Store
Siemens Catalog Crawler — MPN Discovery

Siemens Catalog Crawler — MPN Discovery

Search Siemens SiePortal live by keywords (6ES, SIMATIC, S7-1200) via OneSearch. Export unique MPNs, titles, category paths & PDP links. Deduplicated output with resume for large catalog builds.

Pricing

from $0.20 / 1,000 catalog product exporteds

Rating

0.0

(0)

Developer

Andrej Kiva

Andrej Kiva

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 hours ago

Last modified

Share

Crawloop Siemens Automation Suite — Structured data extraction for Siemens SiePortal (Industry Mall), SIOS, and TED product datasheets. Built for procurement teams, system integrators, and BOM engineering workflows.

Suite hub: github.com/PLCSPS-DEV/siemens-sieportal-automation

Product site: crawloop.com/siemens-automation

DiscoveryEnrichmentSIOS documentsTED datasheets
Catalog CrawlerSiePortal ScraperDocument DownloaderTED Datasheet Downloader
Lifecycle TrackerDocument PDF ParserTED Datasheet Parser

Disclaimer: This is an unofficial integration developed independently of Siemens AG. It is not affiliated with, sponsored by, or endorsed by Siemens AG or any of its subsidiaries.

Siemens, SiePortal, SIMATIC, and related names are trademarks of Siemens AG. Product data is read from publicly accessible Siemens web sources only; no proprietary databases are redistributed.

This Actor is provided for informational and research purposes only (e.g. procurement research, BOM audits, internal engineering workflows). You are solely responsible for ensuring your use complies with applicable laws, Siemens website terms of use, and your organization's policies.

No warranty is given as to accuracy, completeness, or continued availability of third-party data. Use at your own risk.

Search the official Siemens SiePortal catalog by keyword and export live product part numbers (MPNs). Each result is fetched from Siemens OneSearch inside a stealth browser session — no static MPN lists, no stale exports.

Use this Actor to build MPN inventories for product families (6ES, SIMATIC, S7-1200), procurement databases, or as input for downstream enrichment with the SiePortal Scraper and Lifecycle Tracker.

When to use this Actor

Use the Catalog Crawler when you need to discover Siemens part numbers by search terms or crawl the full catalog index (~350k+ MPNs) via the authenticated OneSearch API.

For full PDP specifications on a known MPN list, use the SiePortal Scraper.

Siemens Automation Pipeline

Phase 1 — Discover MPNs Phase 2 — Screen & enrich Phase 3 — Documents & specs
───────────────────────── ───────────────────────── ─────────────────────────────
Catalog Crawler ◄── you are here
MPN list ──► Lifecycle Tracker ──► SiePortal Scraper
┌────────────────────────────┴────────────────────────────┐
│ │
▼ ▼
Document Downloader (SIOS) TED Datasheet Downloader
certificates, manuals, CAD compact catalog PDFs
│ │
▼ ▼
Document PDF Parser TED Datasheet Parser
specs from SIOS PDFs specs from TED PDFs

Key Features

  • Keyword search — Provide product families, prefixes, or terms; the Actor queries Siemens OneSearch live.
  • Multi-sort crawl — Each keyword is queried with multiple sortingOptions to maximize coverage; deduplication removes overlap.
  • Unique output — Each MPN emitted once (deduplicateOutput, default true).
  • Full catalog mode — Optional crawl of the entire Siemens index with adaptive shard subdivision and resume checkpoints.
  • Akamai bypass — CloakBrowser stealth fingerprinting and session bootstrap for authenticated OneSearch API access.
  • Resume support — Skip completed search terms on interrupted runs.

How It Works

  1. Bootstrap SiePortal search page (Akamai WAF).
  2. Capture Bearer JWT for POST /api/onesearch/search?api-version=2.0.
  3. For each searchKeywords term, query OneSearch and paginate results.
  4. Auto-subdivide large result sets into child shards.
  5. Push unique MPN records to the dataset with category breadcrumbs.

Input Parameters

ParameterDescriptionDefault
searchKeywordsRequired for keywords mode — product families, prefixes, or terms.[]
sortingOptionsOneSearch sort orders per keyword (multiple = more coverage).Relevance + 3 more
useMultipleSortsCrawl every keyword with all sorting options.true
discoveryModekeywords, full_catalog, or catalog_tree.keywords
localeSiePortal locale (en-ww, en-nl, …).en-ww
maxSearchShardsCap terms processed (0 = unlimited; use for testing).0
concurrencyLimitParallel OneSearch workers.3
deduplicateOutputEmit each MPN only once.true
resumeFromCheckpointResume interrupted runs.true
proxyConfigurationApify proxy settings. Residential proxies recommended.
{
"searchKeywords": ["6ES", "SIMATIC", "S7-1200", "3RT", "contactor"],
"discoveryMode": "keywords",
"concurrencyLimit": 3,
"deduplicateOutput": true,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Input Example — smoke test

{
"searchKeywords": ["6ES"],
"maxSearchShards": 3,
"concurrencyLimit": 2,
"resumeFromCheckpoint": false
}

Input Example — full catalog (~350k MPNs)

{
"discoveryMode": "full_catalog",
"concurrencyLimit": 3,
"resumeFromCheckpoint": true
}

Output Format

Each dataset record:

FieldDescription
partNumberSiemens article number (MPN)
titleShort product title
descriptionProduct description
pdpUrlProduct detail page URL
categoryIdParent category node ID
categoryPathBreadcrumb trail
sourceAlways onesearch (live Siemens API)
searchShardKeyword/shard that found this product
sortingOptionSort order that surfaced this product

Typical Workflow

Keywords (6ES, SIMATIC,)
Catalog Crawler → deduplicated MPN list with PDP URLs
Lifecycle Tracker → screen for obsolete parts
SiePortal Scraper → full specs on selected MPNs

Actor Comparison

TaskCatalog CrawlerSiePortal Scraper
Discover MPNs by keywordYesNo
Full PDP specificationsNoYes
Full catalog index (~350k)YesNo

Pricing

Pay-per-event billing. You are charged only for unique MPNs pushed to the dataset when deduplication is enabled.

EventPrice
Actor start$0.05 per run
Catalog product$0.20 / 1,000 MPNs ($0.0002 per unique product)

Configure these events in the Apify Store Publication tab: apify-actor-start and catalog-product. Disable apify-default-dataset-item to avoid double billing.

Residential proxies are strongly recommended. full_catalog mode can take many hours; all output is scraped live from Siemens.


Learn more: Product page · Suite hub · GitHub docs

Also from Crawloop Industrial: Rockwell Automation Suite · GitHub docs