# Thomasnet Supplier Scraper (`alwaysprimedev/thomasnet-scraper`) Actor

Extract structured B2B supplier profiles from Thomasnet.com — companies, addresses, primary categories, contact phone, websites, brands, headings, and business metadata.

- **URL**: https://apify.com/alwaysprimedev/thomasnet-scraper.md
- **Developed by:** [Always Prime](https://apify.com/alwaysprimedev) (community)
- **Categories:** E-commerce, Social media, Lead generation
- **Stats:** 2 total users, 1 monthly users, 50.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🏭 Thomasnet Supplier Scraper

[![Apify](https://img.shields.io/badge/Apify-Actor-orange.svg)](https://apify.com)
[![Python](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org)
[![JSON / CSV / Excel](https://img.shields.io/badge/output-JSON%20%7C%20CSV%20%7C%20Excel-success.svg)](#)

> 🚀 **Pull structured B2B supplier records from Thomasnet — 25+ fields per company, ready for your CRM, sourcing pipeline, or model.**

Get a clean dataset of US industrial suppliers from any Thomasnet category in minutes. Company name, address (with lat/lng), primary phone, website, brands, families, year founded, employee bracket, sales bracket, and the full 25-deep classification tree. No selectors. No "Please enable JavaScript" pages. No maintenance.

### ✨ Why this scraper

- ⚡️ **Single-call rich records** — every supplier comes back with 25+ typed fields (lat/lng, year founded, sales bracket, employees bracket, full category tree, social links, key personnel). No separate detail-fetch step, no DOM scraping, no XPath drift.
- 🔎 **Keyword search** — give it a free-text term ("steel pipe", "cnc machining") and it auto-resolves to category headings. No need to hunt for heading IDs.
- 🚀 **Fast** — one network call per 100 suppliers, full record in one shot. A 250-record heading run finishes in seconds.
- 🛡 **Reliable** — retry, fingerprint rotation, structured rejection diagnostics, failure-budget guard that aborts a broken run before it burns your CU.
- 🔁 **Dedup-safe** — same supplier across multiple categories appears once.
- 📦 **JSON / CSV / Excel** — pipe straight into Sheets, Postgres, HubSpot, or your warehouse.

### 🚀 Quick start

1. Click **Try for free** above.
2. Enter a **search keyword** (e.g. `gates`, `cnc machining`) — *or* a specific heading ID if you have one.
3. Set `maxItems` and hit **Start** → wait → **Download** as JSON / CSV / Excel.

### 🛠 Input

| Field | What it does |
| --- | --- |
| `searchKeywords` | Free-text terms ("gates", "steel pipe") — auto-resolved to category heading IDs. |
| `headingIds` | Direct numeric heading IDs from any Thomasnet category URL — skip if using keywords. |
| `startUrls` | Specific supplier-profile URLs to scrape directly (skips category discovery). |
| `maxItems` | Hard cap on records (default 50, `0` = unlimited). Each heading currently returns up to ~290 unique suppliers — set `0` to keep them all. |
| `concurrency` | Parallel headings to crawl in flight (default 5, max 25). |

#### 📋 Sample input

```json
{
  "searchKeywords": ["cnc machining"],
  "maxItems": 200
}
````

### 📦 Sample output

```json
{
  "url": "https://www.thomasnet.com/company/long-fence-inc-358322/profile",
  "id": "358322",
  "name": "Long Fence, Inc.",
  "website": "http://www.longfence.com/commercial/",
  "phone": "(866) 966-4337",
  "logoUrl": "https://cdn.thomasnet.com/ccp/00358322/36527.png",
  "description": "Custom manufacturer and installation of fences, gates, decks...",
  "annualSales": "Under $1 Mil",
  "numberEmployees": "200-499",
  "yearFounded": "1945",
  "isClaimed": true,
  "isAdvertiser": true,
  "xometryVerified": false,
  "isMultiLocation": true,
  "tier": "VERIFIED",
  "type": "C",
  "address": {
    "address1": "8545 Edgeworth Dr.",
    "city": "Capitol Heights",
    "state": "MD",
    "stateName": "Maryland",
    "zip": "20743",
    "country": "USA",
    "latitude": 38.876765,
    "longitude": -76.881892
  },
  "primaryHeading": {
    "headingId": "33470279",
    "name": "Gates",
    "familyName": "Gates",
    "familyId": "162627"
  },
  "headings": [{ "headingId": "180307", "name": "Access Control Systems" }, "..."],
  "families": [{ "id": "161624", "name": "Fences" }, "..."],
  "brands": ["Alutech", "Cookson"],
  "otherActivities": ["D", "M", "S"],
  "businessDetails": {
    "Website": [{ "url": "http://www.longfence.com/commercial/", "text": "Homepage" }],
    "Primary Company Type": [{ "text": "Custom Manufacturer" }],
    "Annual Sales:": [{ "text": "Under $1 Mil" }],
    "No of Employees:": [{ "text": "200-499" }],
    "Year Founded:": [{ "text": "1945" }]
  },
  "scrapedAt": "2026-05-11T08:35:46Z"
}
```

### 💼 Use cases

| Who | What for |
| --- | --- |
| 💰 **Sales & RevOps** | Build territory lists by category — feed straight into Salesforce or HubSpot. |
| 🤝 **Procurement** | Discover qualified suppliers for a specific industrial category, complete with phone, website, and certifications context. |
| 📊 **Market research** | Map an industry by employee bracket, sales bracket, year founded, and geographic distribution. |
| 🤖 **ML / data teams** | Training data for industrial taxonomy classifiers, NER on supplier descriptions, B2B knowledge graph projects. |
| 🏗 **PropTech / Logistics** | Cross-reference supplier addresses (with lat/lng) with real-estate or shipping datasets. |

### 💡 Tips & tricks

- **Use keywords first.** `searchKeywords: ["gates"]` is easier than hunting heading IDs — the underlying resolver returns precisely the matching categories.
- **Set `maxItems: 0`** to walk the entire heading. Each heading caps around 290 unique suppliers on the current Thomasnet surface.
- **Speed vs reliability**: stick to `concurrency: 5` for normal runs. Crank to 10–15 only when crawling many headings at once.

### ❓ FAQ

**Are emails included?**
No. Thomasnet does not expose contact emails on supplier profiles publicly. The actor returns whatever Thomasnet itself shows — phone and website only.

**How fresh is the data?**
Live from Thomasnet on every run — no caching layer.

**Can I scrape one specific supplier?**
Yes. Put the profile URL in `startUrls` and leave the search/heading inputs empty.

### 📜 Licence & terms

The actor returns publicly visible profile data. You are responsible for complying with Thomasnet's terms of service, your local data-protection law, and any sector-specific rules in your jurisdiction.

# Actor input Schema

## `searchKeywords` (type: `array`):

Free-text terms to look up on Thomasnet (e.g. 'gates', 'steel pipe', 'cnc machining'). Each keyword is resolved to one or more category headings server-side, then those headings are crawled in full. Use this when you know the product/service rather than the heading number.

## `headingIds` (type: `array`):

Thomasnet category heading IDs to crawl directly. The numeric string from any category URL — e.g. /suppliers/usa/gates-33470279 → '33470279'. Skip if you used Search keywords above.

## `startUrls` (type: `array`):

Optional list of direct supplier-profile URLs (https://www.thomasnet.com/company/<slug>/profile). Bypasses category discovery and scrapes only these profiles.

## `maxItems` (type: `integer`):

Hard cap on the number of supplier records pushed. Use 0 for unlimited. Each heading currently returns up to ~290 unique suppliers.

## `concurrency` (type: `integer`):

Number of headings crawled in parallel. Defaults to 5; raise carefully on bulk runs.

## Actor input object example

```json
{
  "searchKeywords": [
    "gates"
  ],
  "headingIds": [
    "33470279"
  ],
  "startUrls": [],
  "maxItems": 50,
  "concurrency": 5
}
```

# Actor output Schema

## `suppliers` (type: `string`):

No description

## `suppliersCsv` (type: `string`):

No description

## `suppliersXlsx` (type: `string`):

No description

## `consoleView` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchKeywords": [
        "gates"
    ],
    "startUrls": [],
    "maxItems": 50
};

// Run the Actor and wait for it to finish
const run = await client.actor("alwaysprimedev/thomasnet-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchKeywords": ["gates"],
    "startUrls": [],
    "maxItems": 50,
}

# Run the Actor and wait for it to finish
run = client.actor("alwaysprimedev/thomasnet-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchKeywords": [
    "gates"
  ],
  "startUrls": [],
  "maxItems": 50
}' |
apify call alwaysprimedev/thomasnet-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=alwaysprimedev/thomasnet-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Thomasnet Supplier Scraper",
        "description": "Extract structured B2B supplier profiles from Thomasnet.com — companies, addresses, primary categories, contact phone, websites, brands, headings, and business metadata.",
        "version": "0.1",
        "x-build-id": "5yJdSfbreWlwqJbED"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/alwaysprimedev~thomasnet-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-alwaysprimedev-thomasnet-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/alwaysprimedev~thomasnet-scraper/runs": {
            "post": {
                "operationId": "runs-sync-alwaysprimedev-thomasnet-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/alwaysprimedev~thomasnet-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-alwaysprimedev-thomasnet-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchKeywords": {
                        "title": "Search keywords",
                        "type": "array",
                        "description": "Free-text terms to look up on Thomasnet (e.g. 'gates', 'steel pipe', 'cnc machining'). Each keyword is resolved to one or more category headings server-side, then those headings are crawled in full. Use this when you know the product/service rather than the heading number.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "headingIds": {
                        "title": "Heading IDs (advanced)",
                        "type": "array",
                        "description": "Thomasnet category heading IDs to crawl directly. The numeric string from any category URL — e.g. /suppliers/usa/gates-33470279 → '33470279'. Skip if you used Search keywords above.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "Optional list of direct supplier-profile URLs (https://www.thomasnet.com/company/<slug>/profile). Bypasses category discovery and scrapes only these profiles.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Hard cap on the number of supplier records pushed. Use 0 for unlimited. Each heading currently returns up to ~290 unique suppliers.",
                        "default": 50
                    },
                    "concurrency": {
                        "title": "Concurrency",
                        "minimum": 1,
                        "maximum": 25,
                        "type": "integer",
                        "description": "Number of headings crawled in parallel. Defaults to 5; raise carefully on bulk runs.",
                        "default": 5
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
