GoWork Scraper (FR, DE & ES)
Pricing
from $2.00 / 1,000 results
GoWork Scraper (FR, DE & ES)
Structured employer data (FR, DE, ES): one record per company with page and Open Graph metadata, Schema.org JSON-LD, flattened organization attributes, enriched firmographics and rating distribution, threaded reviews with replies, crawl provenance via parseMeta, and Cloudflare challenge detection.
Pricing
from $2.00 / 1,000 results
Rating
5.0
(1)
Developer
Muhamed Didovic
Actor stats
1
Bookmarked
3
Total users
2
Monthly active users
19 days ago
Last modified
Categories
Share
Overview
Extract structured employer reviews and company profiles from GoWork.fr (France), GoWork.de (Germany), and GoWork ES (Spain). The actor loads HTML pages with a browser-like HTTP client (Crawlee Impit), parses Nuxt __NUXT_DATA__ when present (preferred for full review threads), and falls back to JSON-LD when needed. You get one dataset row per company with flattened header fields for CSV and a nested reviews array (each thread includes replies).
Use it to monitor employer reputation, export review text and ratings, or feed analytics with company metadata (contact, activity, star histogram, opening hours, trusted partners) plus traceable parseMeta for each crawl source (direct URL, search, homepage, listing hub).
Features
-
Multiple entry URLs (all routed to listing handlers or direct detail):
- Company profile:
https://gowork.fr/{slug},https://gowork.de/{slug}, orhttps://es.gowork.com/{slug}(e.g.b-hive-mulhouse,herole-dresden). - Search:
…/search?…on each host withpage=2pagination (Nuxt total + page size). - Homepage: national roots such as
https://gowork.fr/,https://gowork.de/, orhttps://es.gowork.com/— recently rated feed with?page=2pagination (sequential paginator links only; junkpage=500links are ignored). - Other listing hubs (paths containing e.g.
/trouver,/recherche, …): discovers profile links from anchors.
- Company profile:
-
Per-company detail pass:
- One HTML request per company (until
maxItemscaps queued detail URLs globally). - Reviews from Nuxt
company-reviewswhen available; otherwise JSON-LDOrganization.review(subset, synthetic ids when missing). - Optional
goworkOnlyRatedReviews: keep only thread roots with a 1–5 star rating (see Input).
- One HTML request per company (until
-
Flattened export:
- Org fields from JSON-LD (
org_*), Nuxt company block (company*,business*,rating*, partners JSON, etc.) at the root of the row alongsidereviews.
- Org fields from JSON-LD (
How to Use
- Set Up: Apify account and this actor (or run locally with
apify run/npm run start:dev). - Provide Input: Add one or more GoWork URLs under
startUrls(and optionalurl1,url2, … on the same object for multiple starts). - Configure: Set
maxItems(cap on company detail pages queued), concurrency, retries, and proxy (often required if Cloudflare challenges appear). - Run & Export: Download JSON / CSV from the dataset. If you see
isCloudflareChallenge: true or empty Nuxt payload, use residential proxy or adjust client settings.
Usage Limitations
Free / non-paying Apify users may be subject to platform limits on dataset items or charges. Paid users typically get higher limits; adjust maxItems to control how many company detail pages are fetched per run. GoWork may rate-limit or challenge datacenter IPs—proxy is recommended.
Input Configuration
Example input:
{"startUrls": [{"url": "https://gowork.fr/b-hive-mulhouse"},{"url": "https://gowork.fr/search?q=fra&city=Paris"},{"url": "https://gowork.fr/"},{"url": "https://gowork.de/herole-dresden"},{"url": "https://es.gowork.com/"}],"maxItems": 100,"goworkOnlyRatedReviews": false,"maxConcurrency": 100,"minConcurrency": 1,"maxRequestRetries": 100,"proxy": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Input Fields Explanation
- startUrls (
startUrls): Objects whoseurl,url1,url2, … fields are collected in order. Use GoWork company URLs on gowork.fr, gowork.de, or es.gowork.com,/search?…, homepage (with optional?page=N), or supported listing-style paths. - maxItems (
maxItems): Maximum number of company detail pages to queue across the run (shared counter for search / homepage / listings). Default 100 (or as in actor schema). - goworkOnlyRatedReviews (
goworkOnlyRatedReviews): When true, each row’sreviewsarray includes only thread roots with a numeric 1–5 star rating; unrated text threads are dropped. Replies under kept roots stay. Default false. - maxConcurrency / minConcurrency / maxRequestRetries: Standard Crawlee / actor concurrency and retry behavior.
- proxy (
proxy): Apify proxy or customproxyUrlsfor outbound requests.
Output Structure
The dataset contains one primary row type for GoWork:
gowork_detail— one row per company profile scraped: page metadata, JSON-LD snapshot, flattened org + Nuxt company fields, and thereviewsarray.
Filter with source === 'gowork_detail' when consuming the dataset.
Sample: gowork_detail (first object in data.json)
The JSON below is based on the first record of a real export. jsonLd, reviews, and long strings are shortened for the README; the on-disk file contains the full arrays. _readme_note is documentation-only and does not appear in live output.
{"source": "gowork_detail","listingId": "b-hive-mulhouse","slug": "b-hive-mulhouse","url": "https://gowork.fr/b-hive-mulhouse","statusCode": 200,"originalSearchUrl": "https://gowork.fr/b-hive-mulhouse","parseMeta": {"mode": "direct_detail_url","detailPageUrl": "https://gowork.fr/b-hive-mulhouse","searchIndex": 1},"scrapedAt": "2026-04-02T08:22:42.226Z","pageTitle": "Avis sur B HIVE Mulhouse - 21 avis - GoWork.fr","metaDescription": "Opportunités de réseautage : Travailler chez B HIVE permet…","ogTitle": "Avis sur B HIVE Mulhouse - 21 avis - GoWork.fr","ogDescription": "Vérifiez ce que les gens disent de B HIVE sur https://gowork.fr/ | 21 avis","ogImage": "https://gowork.fr/assets/images/sharing/thread/cover-fr.jpg","ogUrl": "https://gowork.fr/b-hive-mulhouse","canonicalUrl": "https://gowork.fr/b-hive-mulhouse","h1": "Avis B HIVE","htmlLang": "fr-FR","reviewCountFromTitle": 21,"jsonLd": [{"@context": "http://schema.org/","@type": "Organization","name": "B HIVE","aggregateRating": { "@type": "EmployerAggregateRating", "ratingValue": 4.8, "ratingCount": 4, "reviewCount": 4 },"review": [ { "@type": "Review", "author": { "@type": "Person", "name": "BS" }, "reviewBody": "…", "reviewRating": { "ratingValue": 4 } } ]}],"isCloudflareChallenge": false,"org_name": "B HIVE","org_telephone": "+33 3 67 35 04 36","org_tax_id": "831826649","org_description": "B-HIVE est une société d'ingénierie…","org_founding_date": "20170906","org_street_address": "74 rue Jean Monnet, 68200 MULHOUSE","org_address_locality": "Mulhouse","org_address_region": "Grand Est","org_address_country": "FR","org_rating_value": 4.8,"org_rating_count": 4,"org_review_count": 4,"org_review_blocks": 4,"pageGlobalRating": 4.8,"pageGlobalReviewCount": 21,"statisticsRuCount": 10,"statisticsRuRootCount": 4,"reviewsIncludeAllRuThreads": true,"goworkOnlyRatedReviewsApplied": false,"pageAggregateRatingCount": 4,"siteLocale": "fr","companyEmail": "admin@bhiveunderfloor.co.uk","companyWebsite": "https://www.b-hive.fr/","companyPhone": "+33 3 67 35 04 36","companyLinkedInUrl": "https://fr.linkedin.com/company/b-hive-engineering","companyEmployeeCountLabel": "501-1000","companyBusinessArea": "Industrie manufacturière","companyActivityDescription": "ingénierie conseil opérationnel…","businessTradeName": "Ingénierie et architecture","businessTradeSlug": "ingenierie-et-architecture","ratingHistogramScoredTotal": 4,"ratingStar1Count": 0,"ratingStar2Count": 0,"ratingStar3Count": 0,"ratingStar4Count": 1,"ratingStar5Count": 3,"ratingStar1Percent": 0,"ratingStar2Percent": 0,"ratingStar3Percent": 0,"ratingStar4Percent": 25,"ratingStar5Percent": 75,"companyCapital": "50000 EUR","companyFoundedDate": "2017-09-06","companyActivityShortLabel": "Ingénierie, études techniques","companyOpeningHoursJson": "{\"lundi\":\"08:00–19:00\",…}","companyTrustedPartnersJson": "[{\"name\":\"OPTIM 67\",\"profileUrl\":\"https://gowork.fr/…\",…}]","reviews": [{"reviewId": "e270ac96-fed0-4eed-bf23-6681d41ec643","reviewerName": "Charlotte","reviewDate": "17-02-2026 13:11","content": "Les avis sur B HIVE semblent très positifs…","ratingValue": null,"languageCode": "fr","authorKind": "SU","replies": []},{"reviewId": "38c2e5bc-75ed-4a35-a526-899a98c12311","reviewerName": "BS","reviewDate": "07-08-2023 16:22","content": "Entreprise jeune en croissance","ratingValue": 4,"languageCode": "fr","authorKind": "ANONYMOUS","replies": [{"replyId": "511b3dfe-314f-4a82-a248-bc8713fd2257","authorName": "Audrey","content": "Pourriez vous me dire ce qui fait la particularité…","date": "08-08-2023 11:41","authorKind": "SU"}]}],"_readme_note": "Omitted here: remaining review threads, full jsonLd reviews, optional mainTextPreview when present."}
Output fields (gowork_detail) — field-by-field
Row identity and request metadata
source— Alwaysgowork_detailfor GoWork company rows.listingId— Company slug (same as URL path segment); stable key for joins.slug— Duplicate oflistingIdfor clarity in exports.url— Final HTML URL fetched for this company.statusCode— HTTP status of that response (200 when OK).originalSearchUrl— Original URL that led to this detail (direct detail URL, search page, homepage, or listing page) from crawluserData.parseMeta— How this company was discovered and extra context:mode— e.g.direct_detail_url,from_search,from_homepage,from_listing_page.detailPageUrl— Set for direct starts (mode: direct_detail_url).listingPageUrl— Listing / search / homepage URL when enqueued from a hub.searchPageUrl— Search results URL whenmodeisfrom_search.homepageListUrl/homepagePage— Homepage URL and 1-based page index whenmodeisfrom_homepage.searchIndex— 1-based index amongstartUrlswhen provided by the crawler (direct detail flow).
scrapedAt— ISO timestamp when the row was written.
Page-level HTML metadata
pageTitle— Contents of<title>.metaDescription—meta[name=description]content when present.ogTitle— Open Graphog:title.ogDescription— Open Graphog:description.ogImage— Open Graphog:image.ogUrl— Open Graphog:url.canonicalUrl—link[rel=canonical]hrefwhen present.h1— First<h1>text (best-effort selectors).htmlLang—html[lang]attribute (e.g.fr-FR).reviewCountFromTitle— Integer parsed from title / OG hints like “21 avis” when regex matches; elsenull/ omitted.mainTextPreview— When present, a long plain-text preview of the main content region (capped in the parser); omitted or empty if selectors find nothing useful.
Raw JSON-LD
jsonLd— Array of parsed JSON-LD objects fromscript[type=application/ld+json]blocks. Typically includes@type: OrganizationwithaggregateRating,review, address,taxID, etc. Used for fallback reviews when Nuxt is missing.
Challenge flag
isCloudflareChallenge—truewhen the HTML looks like a Cloudflare interstitial (“Just a moment”); parsing quality may be poor—use proxy / browser if this stays true.
Organization (flattened from JSON-LD)
org_name— Organizationname.org_telephone—telephone.org_tax_id—taxID(e.g. SIREN-style id when provided).org_description—description.org_founding_date—foundingDateas in schema (oftenYYYYMMDD).org_street_address—address.streetAddress.org_address_locality—address.addressLocality.org_address_region—address.addressRegion.org_address_country—address.addressCountry.org_rating_value—aggregateRating.ratingValue(employer aggregate).org_rating_count—aggregateRating.ratingCountwhen present.org_review_count—aggregateRating.reviewCountwhen present.org_review_blocks— Count ofreviewentries embedded in JSON-LD (subset of site threads).
Nuxt / global stats (employer rating and review counters)
pageGlobalRating— Star rating from Nuxt company payload (matches header UI when present).pageGlobalReviewCount— Broad review counter (statistics.reviewsCount), same idea as “N avis” in the title (e.g. 21); not always equal to length ofreviewsin HTML.statisticsRuCount— Number of thread roots shipped incompany-reviewsfor this SSR payload.statisticsRuRootCount— Rated root count used for aggregates (often closer to JSON-LD review count).reviewsIncludeAllRuThreads—truewhenreviews.length === statisticsRuCount(all SSR threads captured).nullwhengoworkOnlyRatedReviewsis true (counts not comparable).goworkOnlyRatedReviewsApplied—trueif the input filter only rated roots was active for this row.pageAggregateRatingCount— Sample size from company.rating aggregate (often aligns with JSON-LD ~4).siteLocale— Pinia / Nuxt locale (e.g.fr).
Company profile (flattened for CSV)
companyEmail— Contact email from Nuxt infoGraph when present.companyWebsite— Website URL (web_page_foundorweb_page).companyPhone— Phone.companyLinkedInUrl— LinkedIn profile URL.companyEmployeeCountLabel— Employee range label (e.g. 501-1000).companyBusinessArea— Business area / sector label.companyActivityDescription— Long activity text (truncated in parser for very long strings).businessTradeName— Trade / category display name.businessTradeSlug— Trade slug for URLs.ratingHistogramScoredTotal— Sum of star bucket counts (matches “based on N ratings” when complete).ratingStar1Count…ratingStar5Count— Counts per star level 1–5.ratingStar1Percent…ratingStar5Percent— Percentages (0–100) derived from those counts.companyCapital— Capital string (e.g.50000 EUR).companyFoundedDate—YYYY-MM-DDfrominfoGraph.datewhenYYYYMMDD.companyActivityShortLabel— Short activity line (distinct from long description).companyOpeningHoursJson— JSON string: map day → hours (e.g.08:00–19:00,Fermé).companyTrustedPartnersJson— JSON string: array of{ name, profileUrl, city, logoUrl, companyId }for “trusted companies” when present.
Reviews (Nuxt threads, JSON-LD fallback)
reviews— Array of thread root objects. Each object:reviewId— GoWork review UUID (or synthetic id in JSON-LD fallback).reviewerName— Display name (may be "-" or empty for anonymous).reviewDate— Date/time string as on site.content— Main review / post body.ratingValue— 1–5 stars when rated;nullfor text-only / question-style threads.languageCode— Primary language code (from page / locale).authorKind— Author type flag from Nuxt (e.g.ANONYMOUS,SU).role— Optional role (e.g.candidate) when present.replies— Array of first-level replies on this thread:replyId— Reply UUID.authorName— Reply author display name.content— Reply body.date— Reply date/time string.authorKind— Reply author kind when present.
Benefits of the GoWork scraper
- One row per company with reviews nested but company fields flat for spreadsheets.
- Honest counters: distinguish title / global review count from SSR thread count and JSON-LD subset via
pageGlobalReviewCount,statisticsRuCount,org_review_blocks. - Traceability:
parseMetarecords whether the row came from search, homepage, listing, or direct URL. - Optional rated-only export for clients who want star reviews only (
goworkOnlyRatedReviews).
Why Choose This Actor?
Built for French, German, and Spanish employer review research on GoWork: company discovery from search, homepage, or hubs, then full profile + threads where Nuxt allows. Outputs are suitable for warehouses, BI, or CRM enrichment.
Use cases:
- Track reviews and aggregates for a watchlist of employers.
- Export Q&A-style threads and star reviews with replies for NLP or moderation.
- Combine flat firmographics (contact, capital, hours, partners) with review content.
Technical Implementation
- URL routing (
gowork-mapper.ts): Detects gowork.fr, gowork.de, and es.gowork.com hosts, detail slug paths, search, homepage, and listing hints; builds CheerioCrawler requests withuserData(slug,goworkOnlyRatedReviews,maxItems, etc.). - Listing handlers (
routes.ts—GOWORK_LISTINGS): Collects company URLs from anchors, Nuxt search SERP, homepageindex(recently rated + static company strips on page 1), paginates search (total + page size) and homepage (exactpage+1paginator links, cap 200). - Detail handler (
routes.ts—GOWORK_DETAIL): Parses HTML withparseGoworkDetailHtml(gowork-detail-parser.ts): NuxtextractGoworkFromNuxtPayload,extractGoworkCompanyFlat, JSON-LD org flattening, optional rated-only filter; pushes one dataset row.
Explore More Scrapers
If you found this actor useful, check out other scrapers at memo23's Apify profile.
Support
- For issues or feature requests, use the Issues section of this actor on Apify.
- For further assistance, contact the author:
- Author's website: https://muhamed-didovic.github.io/
- Email: muhamed.didovic@gmail.com
Additional Services
- Request customization or a full dataset: muhamed.didovic@gmail.com
- Need other platforms scraped? Contact muhamed.didovic@gmail.com
- For API services of this actor, reach out to muhamed.didovic@gmail.com
- Custom integrations and automation solutions available