README savings example now rendered for multi-event pricing:
- Changed: the "recurring monitoring savings" section in README now renders for actors with multi-event pricing. Numbers are computed against the primary event, with a disclosure note naming the secondary events that are billed separately
- No code or pricing change
URL-mode internals consolidated into a shared workspace module:
- Changed: URL-mode orchestration, NOT_FOUND cache, and status-summary helpers now live in a shared module (
urlModeInput) reused across actors. No behavior change — same parse rules, same NOT_FOUND key convention (notfound_url_{cc}_{slug}), same setStatusMessage summary format
- Note: first-run state is unaffected. Runs with a prior
stateKey continue to classify per-slug exactly as before
Incremental robustness:
- Fixed: a transient detail-page fetch failure (SSL handshake, proxy timeout) no longer flips a genuinely unchanged company to
UPDATED. When the detail fetch fails and incremental mode has a prior hash for the company, the cached hash is preserved and the company is emitted as UNCHANGED (filtered from output). Next run retries the fetch
- No schema or behavior change for first-run / companies without cached state
URL mode UX polish:
- Added: 30-day NOT_FOUND cache for URL mode — repeated runs with dead URLs skip the fetch (KV key
notfound_url_{cc}_{slug}), same pattern as company-name mode
- Changed:
queryCompanyName on URL-mode output now preserves the exact input URL instead of the resolved company name, so downstream pipelines can join on the user's original input
- Changed: invalid URLs and empty-result runs set a human-readable status message via
Actor.setStatusMessage instead of failing the run — the run stays SUCCEEDED with a summary like ✓ 2 companies scraped · ✗ 1 unrecognized URL skipped: https://...
Direct URL input mode:
- Added:
companyUrls input — list of kununu company URLs (e.g. https://www.kununu.com/de/bosch-gruppe). Country code and slug are parsed from each URL, so URLs can mix DE/AT/CH in a single run
- Added: URL mode skips search and Jaccard matching — the detail page is fetched directly, which saves one request per company vs
companyNames mode
- Added:
extractBasicProfile — builds a KununuProfile (name, uuid, score, totalReviews, industry, location, isTopCompany) from the detail page Redux state, replacing the SERP response that URL mode lacks
- Changed: URL mode auto-enables
includeDetails (the detail page is required to populate the output)
- Note: incremental mode works per-slug exactly as in
companyNames/datasetId modes — first run = all NEW, subsequent runs filter UNCHANGED
Reviews page 1 now prefetched alongside detail/salary/culture:
- Changed:
fetchReviewsPage(..., page: 1) runs in the same Promise.all as detail/salary/culture when includeReviews: true. Saves ~2s per company on runs where the company is NEW or UPDATED. The prefetched page is discarded when the company is later classified UNCHANGED (rare in practice — aggregate hash already accounts for review activity)
- No behavior change on individual review emission, pagination, or incremental review tracking
Per-company detail fetches now run in parallel:
- Changed:
includeDetails + includeFullSalary + includeCulture fetches issue in parallel per company instead of sequentially. Expected ~2-3x speedup when all three flags are enabled
- No behavior change — same output fields, same error handling, same incremental semantics
Kulturkompass + review responses:
- Added:
includeCulture input — fetch the /kultur page for the Kulturkompass (profile-vs-industry compass score, MODERN/TRADITIONAL binary classification, 4 culture dimensions, strength/weakness/most-voted factors, company culture statements, and culture-tagged review comments)
- Added:
kulturKompass output field on company profiles (null when a company has too few culture submissions)
- Added:
responses field on review items — individual company replies with timestamps, author, and response body (previously only responseCount was exposed)
- Changed: review content hash now includes response count + latest response timestamp, so a company reply emits the review as
UPDATED in incremental mode
- Note:
includeCulture is opt-in; adds one extra request per company. Requires includeDetails: true
- Note: review response data is always included when
includeReviews: true — no new flag, no extra fetch
Full salary enrichment:
- Added:
includeFullSalary input — fetch the /gehalt page for all salary ranges (~20 job titles with min/max/median/average/entry counts) instead of the top 3 shown on the detail summary
- Note: opt-in flag; adds one extra request per company. Requires
includeDetails: true
- Note: same
SalaryRange schema as before — downstream consumers need no changes
Individual review scraping:
- Added:
includeReviews input — scrape individual employee reviews per company (requires includeDetails: true)
- Added:
maxReviewPages input — cap review pagination per company
- Added:
reviewSort input — newest / oldest / relevance / best / worst
- Added: review dataset items with
type: "review" discriminator, 13 factor ratings per review, pros/cons/suggestions, position/department, reactions, and full company reference
- Added: per-review incremental classification — each review tracked by uuid and content hash; unchanged reviews are filtered from emission
- Added:
review-extracted PPE event at $0.001 per review. Primary company-profile event renamed from the generic dataset-item event so each row is charged exactly once
- Note: review scraping is skipped when a company is UNCHANGED in incremental mode — the aggregate hash covers review activity
- Note: in-session deduplication by review uuid, and early-exit when consecutive pages stop yielding fresh reviews, guard against wasted traffic
- Note: review pagination supports deep scraping (verified 90+ unique reviews at
maxReviewPages: 10, capped per-company by totalReviews)
Company profile enrichment — expanded output fields when includeDetails: true:
- Added:
scoreBreakdown — 4 rating categories with 13 factors (salary, career, atmosphere, leadership, etc.)
- Added:
benefits — list of benefit types with employee endorsement percentages
- Added:
salaryRanges — top salary ranges per job title (min/max/median/average)
- Added:
competitors — related companies with scores
- Added:
topCompanyYears — years the company earned a Top Company badge
- Added:
salarySatisfaction — aggregated salary satisfaction metrics
- Added:
followerCount — number of kununu followers
- Added:
recommendationTotalReviews, recommendationRecommended, recommendationNotRecommended — full recommendation breakdown
- Added:
type discriminator field ("company") — enables mixed-item datasets
- Changed: Detail extraction is significantly more complete and accurate than v0.1
- Note: existing incremental users with a
stateKey will see a one-time UPDATED wave on first v0.2 run as enrichment fields are included in the change hash
- Added:
descriptionHtml, descriptionMarkdown output fields (triple-format descriptions for RAG/LLM pipelines)
- Added:
contentHash output field (SHA-256 hash of content-identifying fields)
Initial release.
- Three input modes: keyword search, company name list, dataset enrichment
- Jaccard similarity matching for company name lookup
- Optional detail page enrichment: recommendation rate and company website
- Incremental mode with KV store state tracking (NEW/UPDATED/UNCHANGED)
- NOT_FOUND caching with 30-day TTL
- Industry filter for keyword search mode
- Compact output mode for AI-agent workflows
- Deduplication support for dataset input via configurable field
- Parallel SERP fetching (5 concurrent pages) and detail fetching (3 concurrent)
- Proxy support for reliable access
- PPE pricing: $0.005/start + $0.001/result