GoWork Scraper (FR, DE & ES) avatar

GoWork Scraper (FR, DE & ES)

Pricing

from $2.00 / 1,000 results

Go to Apify Store
GoWork Scraper (FR, DE & ES)

GoWork Scraper (FR, DE & ES)

Structured employer data (FR, DE, ES): one record per company with page and Open Graph metadata, Schema.org JSON-LD, flattened organization attributes, enriched firmographics and rating distribution, threaded reviews with replies, crawl provenance via parseMeta, and Cloudflare challenge detection.

Pricing

from $2.00 / 1,000 results

Rating

5.0

(1)

Developer

Muhamed Didovic

Muhamed Didovic

Maintained by Community

Actor stats

1

Bookmarked

3

Total users

2

Monthly active users

19 days ago

Last modified

Share

Overview

Extract structured employer reviews and company profiles from GoWork.fr (France), GoWork.de (Germany), and GoWork ES (Spain). The actor loads HTML pages with a browser-like HTTP client (Crawlee Impit), parses Nuxt __NUXT_DATA__ when present (preferred for full review threads), and falls back to JSON-LD when needed. You get one dataset row per company with flattened header fields for CSV and a nested reviews array (each thread includes replies).

Use it to monitor employer reputation, export review text and ratings, or feed analytics with company metadata (contact, activity, star histogram, opening hours, trusted partners) plus traceable parseMeta for each crawl source (direct URL, search, homepage, listing hub).


Features

  • Multiple entry URLs (all routed to listing handlers or direct detail):

    • Company profile: https://gowork.fr/{slug}, https://gowork.de/{slug}, or https://es.gowork.com/{slug} (e.g. b-hive-mulhouse, herole-dresden).
    • Search: …/search?… on each host with page=2 pagination (Nuxt total + page size).
    • Homepage: national roots such as https://gowork.fr/, https://gowork.de/, or https://es.gowork.com/recently rated feed with ?page=2 pagination (sequential paginator links only; junk page=500 links are ignored).
    • Other listing hubs (paths containing e.g. /trouver, /recherche, …): discovers profile links from anchors.
  • Per-company detail pass:

    • One HTML request per company (until maxItems caps queued detail URLs globally).
    • Reviews from Nuxt company-reviews when available; otherwise JSON-LD Organization.review (subset, synthetic ids when missing).
    • Optional goworkOnlyRatedReviews: keep only thread roots with a 1–5 star rating (see Input).
  • Flattened export:

    • Org fields from JSON-LD (org_*), Nuxt company block (company*, business*, rating*, partners JSON, etc.) at the root of the row alongside reviews.

How to Use

  1. Set Up: Apify account and this actor (or run locally with apify run / npm run start:dev).
  2. Provide Input: Add one or more GoWork URLs under startUrls (and optional url1, url2, … on the same object for multiple starts).
  3. Configure: Set maxItems (cap on company detail pages queued), concurrency, retries, and proxy (often required if Cloudflare challenges appear).
  4. Run & Export: Download JSON / CSV from the dataset. If you see isCloudflareChallenge: true or empty Nuxt payload, use residential proxy or adjust client settings.

Usage Limitations

Free / non-paying Apify users may be subject to platform limits on dataset items or charges. Paid users typically get higher limits; adjust maxItems to control how many company detail pages are fetched per run. GoWork may rate-limit or challenge datacenter IPs—proxy is recommended.


Input Configuration

Example input:

{
"startUrls": [
{
"url": "https://gowork.fr/b-hive-mulhouse"
},
{
"url": "https://gowork.fr/search?q=fra&city=Paris"
},
{
"url": "https://gowork.fr/"
},
{
"url": "https://gowork.de/herole-dresden"
},
{
"url": "https://es.gowork.com/"
}
],
"maxItems": 100,
"goworkOnlyRatedReviews": false,
"maxConcurrency": 100,
"minConcurrency": 1,
"maxRequestRetries": 100,
"proxy": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Input Fields Explanation

  • startUrls (startUrls): Objects whose url, url1, url2, … fields are collected in order. Use GoWork company URLs on gowork.fr, gowork.de, or es.gowork.com, /search?…, homepage (with optional ?page=N), or supported listing-style paths.
  • maxItems (maxItems): Maximum number of company detail pages to queue across the run (shared counter for search / homepage / listings). Default 100 (or as in actor schema).
  • goworkOnlyRatedReviews (goworkOnlyRatedReviews): When true, each row’s reviews array includes only thread roots with a numeric 1–5 star rating; unrated text threads are dropped. Replies under kept roots stay. Default false.
  • maxConcurrency / minConcurrency / maxRequestRetries: Standard Crawlee / actor concurrency and retry behavior.
  • proxy (proxy): Apify proxy or custom proxyUrls for outbound requests.

Output Structure

The dataset contains one primary row type for GoWork:

  • gowork_detail — one row per company profile scraped: page metadata, JSON-LD snapshot, flattened org + Nuxt company fields, and the reviews array.

Filter with source === 'gowork_detail' when consuming the dataset.


Sample: gowork_detail (first object in data.json)

The JSON below is based on the first record of a real export. jsonLd, reviews, and long strings are shortened for the README; the on-disk file contains the full arrays. _readme_note is documentation-only and does not appear in live output.

{
"source": "gowork_detail",
"listingId": "b-hive-mulhouse",
"slug": "b-hive-mulhouse",
"url": "https://gowork.fr/b-hive-mulhouse",
"statusCode": 200,
"originalSearchUrl": "https://gowork.fr/b-hive-mulhouse",
"parseMeta": {
"mode": "direct_detail_url",
"detailPageUrl": "https://gowork.fr/b-hive-mulhouse",
"searchIndex": 1
},
"scrapedAt": "2026-04-02T08:22:42.226Z",
"pageTitle": "Avis sur B HIVE Mulhouse - 21 avis - GoWork.fr",
"metaDescription": "Opportunités de réseautage : Travailler chez B HIVE permet…",
"ogTitle": "Avis sur B HIVE Mulhouse - 21 avis - GoWork.fr",
"ogDescription": "Vérifiez ce que les gens disent de B HIVE sur https://gowork.fr/ | 21 avis",
"ogImage": "https://gowork.fr/assets/images/sharing/thread/cover-fr.jpg",
"ogUrl": "https://gowork.fr/b-hive-mulhouse",
"canonicalUrl": "https://gowork.fr/b-hive-mulhouse",
"h1": "Avis B HIVE",
"htmlLang": "fr-FR",
"reviewCountFromTitle": 21,
"jsonLd": [
{
"@context": "http://schema.org/",
"@type": "Organization",
"name": "B HIVE",
"aggregateRating": { "@type": "EmployerAggregateRating", "ratingValue": 4.8, "ratingCount": 4, "reviewCount": 4 },
"review": [ { "@type": "Review", "author": { "@type": "Person", "name": "BS" }, "reviewBody": "…", "reviewRating": { "ratingValue": 4 } } ]
}
],
"isCloudflareChallenge": false,
"org_name": "B HIVE",
"org_telephone": "+33 3 67 35 04 36",
"org_tax_id": "831826649",
"org_description": "B-HIVE est une société d'ingénierie…",
"org_founding_date": "20170906",
"org_street_address": "74 rue Jean Monnet, 68200 MULHOUSE",
"org_address_locality": "Mulhouse",
"org_address_region": "Grand Est",
"org_address_country": "FR",
"org_rating_value": 4.8,
"org_rating_count": 4,
"org_review_count": 4,
"org_review_blocks": 4,
"pageGlobalRating": 4.8,
"pageGlobalReviewCount": 21,
"statisticsRuCount": 10,
"statisticsRuRootCount": 4,
"reviewsIncludeAllRuThreads": true,
"goworkOnlyRatedReviewsApplied": false,
"pageAggregateRatingCount": 4,
"siteLocale": "fr",
"companyEmail": "admin@bhiveunderfloor.co.uk",
"companyWebsite": "https://www.b-hive.fr/",
"companyPhone": "+33 3 67 35 04 36",
"companyLinkedInUrl": "https://fr.linkedin.com/company/b-hive-engineering",
"companyEmployeeCountLabel": "501-1000",
"companyBusinessArea": "Industrie manufacturière",
"companyActivityDescription": "ingénierie conseil opérationnel…",
"businessTradeName": "Ingénierie et architecture",
"businessTradeSlug": "ingenierie-et-architecture",
"ratingHistogramScoredTotal": 4,
"ratingStar1Count": 0,
"ratingStar2Count": 0,
"ratingStar3Count": 0,
"ratingStar4Count": 1,
"ratingStar5Count": 3,
"ratingStar1Percent": 0,
"ratingStar2Percent": 0,
"ratingStar3Percent": 0,
"ratingStar4Percent": 25,
"ratingStar5Percent": 75,
"companyCapital": "50000 EUR",
"companyFoundedDate": "2017-09-06",
"companyActivityShortLabel": "Ingénierie, études techniques",
"companyOpeningHoursJson": "{\"lundi\":\"08:00–19:00\",…}",
"companyTrustedPartnersJson": "[{\"name\":\"OPTIM 67\",\"profileUrl\":\"https://gowork.fr/…\",…}]",
"reviews": [
{
"reviewId": "e270ac96-fed0-4eed-bf23-6681d41ec643",
"reviewerName": "Charlotte",
"reviewDate": "17-02-2026 13:11",
"content": "Les avis sur B HIVE semblent très positifs…",
"ratingValue": null,
"languageCode": "fr",
"authorKind": "SU",
"replies": []
},
{
"reviewId": "38c2e5bc-75ed-4a35-a526-899a98c12311",
"reviewerName": "BS",
"reviewDate": "07-08-2023 16:22",
"content": "Entreprise jeune en croissance",
"ratingValue": 4,
"languageCode": "fr",
"authorKind": "ANONYMOUS",
"replies": [
{
"replyId": "511b3dfe-314f-4a82-a248-bc8713fd2257",
"authorName": "Audrey",
"content": "Pourriez vous me dire ce qui fait la particularité…",
"date": "08-08-2023 11:41",
"authorKind": "SU"
}
]
}
],
"_readme_note": "Omitted here: remaining review threads, full jsonLd reviews, optional mainTextPreview when present."
}

Output fields (gowork_detail) — field-by-field

Row identity and request metadata

  • source — Always gowork_detail for GoWork company rows.
  • listingId — Company slug (same as URL path segment); stable key for joins.
  • slug — Duplicate of listingId for clarity in exports.
  • url — Final HTML URL fetched for this company.
  • statusCode — HTTP status of that response (200 when OK).
  • originalSearchUrl — Original URL that led to this detail (direct detail URL, search page, homepage, or listing page) from crawl userData.
  • parseMeta — How this company was discovered and extra context:
    • mode — e.g. direct_detail_url, from_search, from_homepage, from_listing_page.
    • detailPageUrl — Set for direct starts (mode: direct_detail_url).
    • listingPageUrl — Listing / search / homepage URL when enqueued from a hub.
    • searchPageUrl — Search results URL when mode is from_search.
    • homepageListUrl / homepagePage — Homepage URL and 1-based page index when mode is from_homepage.
    • searchIndex — 1-based index among startUrls when provided by the crawler (direct detail flow).
  • scrapedAt — ISO timestamp when the row was written.

Page-level HTML metadata

  • pageTitle — Contents of <title>.
  • metaDescriptionmeta[name=description] content when present.
  • ogTitle — Open Graph og:title.
  • ogDescription — Open Graph og:description.
  • ogImage — Open Graph og:image.
  • ogUrl — Open Graph og:url.
  • canonicalUrllink[rel=canonical] href when present.
  • h1 — First <h1> text (best-effort selectors).
  • htmlLanghtml[lang] attribute (e.g. fr-FR).
  • reviewCountFromTitle — Integer parsed from title / OG hints like “21 avis” when regex matches; else null / omitted.
  • mainTextPreview — When present, a long plain-text preview of the main content region (capped in the parser); omitted or empty if selectors find nothing useful.

Raw JSON-LD

  • jsonLd — Array of parsed JSON-LD objects from script[type=application/ld+json] blocks. Typically includes @type: Organization with aggregateRating, review, address, taxID, etc. Used for fallback reviews when Nuxt is missing.

Challenge flag

  • isCloudflareChallengetrue when the HTML looks like a Cloudflare interstitial (“Just a moment”); parsing quality may be poor—use proxy / browser if this stays true.

Organization (flattened from JSON-LD)

  • org_name — Organization name.
  • org_telephonetelephone.
  • org_tax_idtaxID (e.g. SIREN-style id when provided).
  • org_descriptiondescription.
  • org_founding_datefoundingDate as in schema (often YYYYMMDD).
  • org_street_addressaddress.streetAddress.
  • org_address_localityaddress.addressLocality.
  • org_address_regionaddress.addressRegion.
  • org_address_countryaddress.addressCountry.
  • org_rating_valueaggregateRating.ratingValue (employer aggregate).
  • org_rating_countaggregateRating.ratingCount when present.
  • org_review_countaggregateRating.reviewCount when present.
  • org_review_blocks — Count of review entries embedded in JSON-LD (subset of site threads).

Nuxt / global stats (employer rating and review counters)

  • pageGlobalRating — Star rating from Nuxt company payload (matches header UI when present).
  • pageGlobalReviewCount — Broad review counter (statistics.reviewsCount), same idea as “N avis” in the title (e.g. 21); not always equal to length of reviews in HTML.
  • statisticsRuCount — Number of thread roots shipped in company-reviews for this SSR payload.
  • statisticsRuRootCountRated root count used for aggregates (often closer to JSON-LD review count).
  • reviewsIncludeAllRuThreadstrue when reviews.length === statisticsRuCount (all SSR threads captured). null when goworkOnlyRatedReviews is true (counts not comparable).
  • goworkOnlyRatedReviewsAppliedtrue if the input filter only rated roots was active for this row.
  • pageAggregateRatingCount — Sample size from company.rating aggregate (often aligns with JSON-LD ~4).
  • siteLocale — Pinia / Nuxt locale (e.g. fr).

Company profile (flattened for CSV)

  • companyEmail — Contact email from Nuxt infoGraph when present.
  • companyWebsite — Website URL (web_page_found or web_page).
  • companyPhone — Phone.
  • companyLinkedInUrl — LinkedIn profile URL.
  • companyEmployeeCountLabel — Employee range label (e.g. 501-1000).
  • companyBusinessArea — Business area / sector label.
  • companyActivityDescription — Long activity text (truncated in parser for very long strings).
  • businessTradeName — Trade / category display name.
  • businessTradeSlug — Trade slug for URLs.
  • ratingHistogramScoredTotal — Sum of star bucket counts (matches “based on N ratings” when complete).
  • ratingStar1CountratingStar5Count — Counts per star level 1–5.
  • ratingStar1PercentratingStar5Percent — Percentages (0–100) derived from those counts.
  • companyCapital — Capital string (e.g. 50000 EUR).
  • companyFoundedDateYYYY-MM-DD from infoGraph.date when YYYYMMDD.
  • companyActivityShortLabel — Short activity line (distinct from long description).
  • companyOpeningHoursJson — JSON string: map day → hours (e.g. 08:00–19:00, Fermé).
  • companyTrustedPartnersJson — JSON string: array of { name, profileUrl, city, logoUrl, companyId } for “trusted companies” when present.

Reviews (Nuxt threads, JSON-LD fallback)

  • reviews — Array of thread root objects. Each object:
    • reviewId — GoWork review UUID (or synthetic id in JSON-LD fallback).
    • reviewerName — Display name (may be "-" or empty for anonymous).
    • reviewDate — Date/time string as on site.
    • content — Main review / post body.
    • ratingValue1–5 stars when rated; null for text-only / question-style threads.
    • languageCode — Primary language code (from page / locale).
    • authorKind — Author type flag from Nuxt (e.g. ANONYMOUS, SU).
    • role — Optional role (e.g. candidate) when present.
    • replies — Array of first-level replies on this thread:
      • replyId — Reply UUID.
      • authorName — Reply author display name.
      • content — Reply body.
      • date — Reply date/time string.
      • authorKind — Reply author kind when present.

Benefits of the GoWork scraper

  • One row per company with reviews nested but company fields flat for spreadsheets.
  • Honest counters: distinguish title / global review count from SSR thread count and JSON-LD subset via pageGlobalReviewCount, statisticsRuCount, org_review_blocks.
  • Traceability: parseMeta records whether the row came from search, homepage, listing, or direct URL.
  • Optional rated-only export for clients who want star reviews only (goworkOnlyRatedReviews).

Why Choose This Actor?

Built for French, German, and Spanish employer review research on GoWork: company discovery from search, homepage, or hubs, then full profile + threads where Nuxt allows. Outputs are suitable for warehouses, BI, or CRM enrichment.

Use cases:

  • Track reviews and aggregates for a watchlist of employers.
  • Export Q&A-style threads and star reviews with replies for NLP or moderation.
  • Combine flat firmographics (contact, capital, hours, partners) with review content.

Technical Implementation

  1. URL routing (gowork-mapper.ts): Detects gowork.fr, gowork.de, and es.gowork.com hosts, detail slug paths, search, homepage, and listing hints; builds CheerioCrawler requests with userData (slug, goworkOnlyRatedReviews, maxItems, etc.).
  2. Listing handlers (routes.tsGOWORK_LISTINGS): Collects company URLs from anchors, Nuxt search SERP, homepage index (recently rated + static company strips on page 1), paginates search (total + page size) and homepage (exact page+1 paginator links, cap 200).
  3. Detail handler (routes.tsGOWORK_DETAIL): Parses HTML with parseGoworkDetailHtml (gowork-detail-parser.ts): Nuxt extractGoworkFromNuxtPayload, extractGoworkCompanyFlat, JSON-LD org flattening, optional rated-only filter; pushes one dataset row.

Explore More Scrapers

If you found this actor useful, check out other scrapers at memo23's Apify profile.


Support


Additional Services