Hoogvliet Category Scraper
Pricing
from $5.00 / 1,000 category results
Hoogvliet Category Scraper
Scrape Hoogvliet's full category tree (hoogvliet.com): every category and subcategory with name, hierarchical URI, parent and level. Clean JSON/CSV, ideal as input for the Hoogvliet product scraper. Needs a Dutch (NL) proxy. Failed lookups are never billed.
Pricing
from $5.00 / 1,000 category results
Rating
0.0
(0)
Developer
Elena Vance
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Hoogvliet Category Scraper — Map the Full Hoogvliet.com Category Tree
Discover Hoogvliet's entire category structure (hoogvliet.com) as clean, structured JSON or CSV: every category and subcategory with its name, the link you need to address it, its parent category, its depth in the tree, and a flag telling you whether it holds products — one tidy record per category node.
This is the map of the store, not the products on the shelves. The output is the ideal input for the Hoogvliet Product Scraper: pipe the category links straight in and let it pull the products. Use it on its own to analyze site structure, or as the first step of a two-stage pipeline. No login, no HTML wrangling — and you are never billed for failed requests.
Good to know: this Actor runs through a Dutch residential proxy — already pre-selected in the input, just keep it enabled.
Why this Actor
- The whole tree in one run. Every node Hoogvliet exposes — top-level departments down to the deepest subcategories — flattened into one record per category. No clicking through menus.
- The exact link you need to fetch products. Bare category ids simply 404.
Each record carries the working
uri— the single thing the product scraper needs to read a category's products. - Know which categories actually hold products. Every record has a
hasOnlineProductsflag, so you can feed the product scraper only the leaves that carry products instead of crawling the whole tree blindly. - Reconstruct the hierarchy.
parentIdandlevellet you rebuild the tree exactly — breadcrumbs, navigation, a category picker, or a coverage report. - The perfect companion to the product scraper. Run this once to get the map, then drive the Hoogvliet Product Scraper with the links — or schedule it to catch new categories as Hoogvliet adds them.
- You never pay for failures. Timeouts and other transient errors are reported in the run summary — not written to your dataset and not billed.
- Fast and browserless. No headless browser — so a full run finishes in seconds and compute stays minimal.
Problems this Actor solves
| If you are… | Your problem | How this Actor solves it |
|---|---|---|
| Running the Hoogvliet product scraper | You need the category links to scrape, and bare ids don't work | One run hands you every category's working link — paste them straight into the product scraper |
| Building a price/assortment pipeline | You want to scrape only product-bearing categories, not the whole tree | Filter on hasOnlineProducts and feed just the leaves downstream |
| A market researcher / analyst | Understanding how a competitor organizes its assortment is manual and partial | A complete, dated map of the category hierarchy — export to pandas, Sheets, or BI |
| Maintaining a category mapping | Hoogvliet adds, renames, and reshuffles categories over time | Schedule a run and diff the tree to catch structural changes automatically |
| An app / chatbot / agent developer | You need Hoogvliet's navigation structure without building a crawler | Pay per category record on demand; a stable, normalized schema you can rely on |
What data you get
Each category node becomes one dataset record:
| Field | Description |
|---|---|
id / externalId | Hoogvliet's category id (the record's unique id) |
name | Category display name (e.g. Zuivel, eieren, boter) |
uri | The full category link — required to address the category; bare ids 404 |
parentId | The id of the parent category (null for top-level nodes) — rebuild the tree from this |
level | Depth in the tree (0 = top level, increasing downward) |
hasOnlineProducts | true when the category carries products (the leaves to feed the product scraper) |
hasOnlineSubCategories | true when the category has child categories |
categoryUrl | The full URL for the category node |
source | Constant tag identifying the producing Actor |
scrapedAt | ISO 8601 timestamp of the run |
rawData | Optional: the raw category fields, when Include raw category payload is on |
Example output
{"id": "100495","externalId": "100495","name": "Verse zuivel","uri": "org-webshop-Site/-/categories/schappen/100/100495","parentId": "100","level": 1,"hasOnlineProducts": true, // a leaf — feed this link to the product scraper"hasOnlineSubCategories": false,"categoryUrl": "https://www.hoogvliet.com/...&/categories/schappen/100/100495","source": "hoogvliet-category-scraper","scrapedAt": "2026-06-18T08:30:00+00:00"}
Categories or subtrees that could not be fetched are not written to the
dataset (and never billed). They are listed in the run's SUMMARY record in the
key-value store —
{ "failures": [ { "input": "…", "source": "hoogvliet-category-scraper", "error": "…" } ] }How to use it (60 seconds)
- Click Try for free / Start.
- Choose what to map:
- The whole tree (default): leave Category URIs empty to emit every category in the entire tree.
- A subtree: add one full category URI per line under Category URIs
(e.g.
org-webshop-Site/-/categories/schappen/100) to emit only those categories and everything beneath them.
- Keep the Dutch residential proxy enabled (required — see above).
- Click Save & Start. Download results as JSON, CSV, Excel, or via API from the Dataset tab; check Key-value store → SUMMARY for run totals and any failed requests.
- Next step: copy the
urivalues (filter onhasOnlineProductsif you only want product-bearing ones) and paste them into the Hoogvliet Product Scraper.
Input reference
| Field | Type | Default | Description |
|---|---|---|---|
| Category URIs | list | (empty) | One full category URI per line to emit only those categories and everything beneath them. Empty = emit the entire tree. Bare category ids do not work — use the full URI. |
| Include raw category payload | boolean | false | Adds the raw category fields under rawData. Leave off unless you need them. |
| Proxy configuration | object | Apify Residential, NL | Required — keep the Dutch residential default unless you route NL traffic another way. |
| Max concurrency | integer | 5 | Parallel requests (1–20). Kept moderate to be respectful. |
| Delay between requests | integer | 0 | Politeness delay in seconds before each request (0–10). ~0.15s is recommended; 0 is fine at moderate concurrency. |
| Max items | integer | 0 | Stop after N category records (0 = unlimited / the whole tree). |
Pricing — what a run costs
This Actor uses transparent pay-per-event pricing with a built-in volume discount: a small Actor-start fee, a fixed price per category record for the first results of a run, and a cheaper rate for every result beyond a high threshold. No subscription, no minimums, and failures are never charged. The exact per-result rate is shown on the Actor's Pricing tab.
- A category map is small and cheap. Hoogvliet's whole tree is a few hundred categories, so a full run is a tiny, predictable cost — and it is the input that unlocks the much larger product scrape.
- Failed requests are free. Timeouts and other transient errors are reported in the summary, never billed.
- Bigger runs cost less per item. The volume-discount tier resets per run (though most category runs stay well under it).
- Try it free: an Apify free account includes $5 of monthly platform credit — more than enough to map the tree before paying anything.
- Stay in control: set Max items and Apify's maximum charge per run; the Actor stops gracefully at your cap, keeping everything already scraped.
Compared to the alternatives
| This Actor | Build your own crawler | Click through the site by hand | |
|---|---|---|---|
| Full category tree, normalized | Yes | You maintain it | Impractical |
| Working full links (not 404-ing ids) | Yes | You build and maintain it | – |
hasOnlineProducts / parentId / level | Yes | You maintain it | – |
| Dutch residential proxy built in | Yes | You source proxies | – |
| Feeds the product scraper directly | Yes | DIY glue | Copy-paste |
| Never billed for failures | Yes | – | – |
| Export JSON / CSV / Excel / API | Yes, built-in | DIY | Copy-paste |
| Setup time | ~60 seconds | Days; breaks when the site changes | Hours |
Integrate the data
- Feed the Hoogvliet Product Scraper (the main use case). Run this Actor,
take the
urifield from each record (optionally filter onhasOnlineProducts == trueto scrape only product-bearing categories), and pass those URIs as the category input to the Hoogvliet Product Scraper. That two- stage pipeline — map the tree, then scrape the products — is exactly what these two Actors are designed for. - Exports: JSON, CSV, Excel, XML from the Dataset tab — or fetch
programmatically:
GET https://api.apify.com/v2/datasets/{datasetId}/items?format=json
- Run on a schedule: use Apify Schedules to refresh the category map periodically, and webhooks to push finished runs into your pipeline (or to trigger the product scraper automatically).
- From code: call the Actor with the Apify API or SDKs (Python / JavaScript) and read the dataset when the run finishes.
- Run summary: every run writes a
SUMMARYrecord (key-value store) with totals, successes, failures, and billing counts — ideal for monitoring automated pipelines.
FAQ
Is this the product scraper? No — this Actor maps the category tree (one record per category). To get the products themselves, use its companion, the Hoogvliet Product Scraper, and feed it the category URIs this Actor produces.
Do I need a Hoogvliet account or API key? No. There is no login.
Why is a Dutch proxy required?
The Actor ships with a Dutch residential proxy pre-selected; keep it on. Without
it, requests are blocked and the run returns no categories (the block is reported
in SUMMARY).
Why can't I just use a category id?
Bare category ids return 404. That is exactly why this Actor exists — it captures
the working uri for every category so you don't have to.
What does hasOnlineProducts mean?
It is true when the category actually holds products. Filter on it to feed the
product scraper only the leaves that have something to scrape.
Does the output include every category or only the ones with products?
Every node in the tree — including parent/department categories that hold only
subcategories. The hasOnlineProducts and hasOnlineSubCategories flags let you
filter to whatever you need.
How fresh is the data? Each run fetches the live tree, so you get exactly Hoogvliet's current category structure. Schedule the Actor to catch new or renamed categories.
What formats can I export? JSON, CSV, Excel, XML — from the Console or via the Apify API.
Related Actors
- Hoogvliet Product Scraper — its companion (run this one first). This
Category Scraper is stage one of a two-stage pipeline: run it to map the tree,
then feed the
uriof each category (filter onhasOnlineProducts == trueto scrape only product-bearing leaves) into the Hoogvliet Product Scraper to pull the actual products, prices, and promotions. Map the store, then scrape the shelves. - The wider Dutch supermarket family. Product scrapers for every major Dutch chain — Albert Heijn, Lidl, Plus, Dirk van den Broek, and DekaMarkt Product Scrapers — plus the Lidl Category Scraper, the other category-tree mapper (same map-then-scrape pairing as Hoogvliet's two Actors).
- Same clean schema and pay-per-event billing across every Dutch chain — mix and match for full market coverage, and you are never billed for failures on any of them.
Disclaimer
This Actor is intended for personal and research use. You are responsible for ensuring your use complies with Hoogvliet's terms and applicable law. Please scrape responsibly — keep concurrency moderate and delays reasonable. This project is not affiliated with, endorsed by, or sponsored by Hoogvliet.