Community First Yorkshire Jobs Scraper avatar

Community First Yorkshire Jobs Scraper

Pricing

from $1.99 / 1,000 results

Go to Apify Store
Community First Yorkshire Jobs Scraper

Community First Yorkshire Jobs Scraper

Scrape jobs and other portfolio content from communityfirstyorkshire.org.uk via WP-JSON portfolio CPT. Filter by taxonomy (default jobs ≈ 6 vacancies). Title, full HTML, location, apply email/URL, best-effort closing date + salary regex. JSON or CSV out.

Pricing

from $1.99 / 1,000 results

Rating

0.0

(0)

Developer

Muhamed Didovic

Muhamed Didovic

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Categories

Share

Scrape jobs (and other portfolio content) from communityfirstyorkshire.org.uk. Uses the public WP-JSON portfolio custom post type filtered by the portfolio_entries taxonomy (default jobs term = ~6 live vacancies). Each row carries title, full description HTML, location term, apply email/URL (extracted from body), and best-effort closing date + salary. JSON or CSV out, no compute charge per run, just per result.

How it works

How Community First Yorkshire Scraper works

✨ Why use this scraper?

Community First Yorkshire (CFY) is the rural voluntary-sector hub for North Yorkshire, York, and the Yorkshire Dales. Tracking who's hiring at rural Yorkshire charities? Cross-region CVS comparisons? Sourcing for paid roles outside the metro areas?

  • 🎯 Three starting points. The default Jobs taxonomy filter (set entityTerms: ["jobs"]), a direct /portfolio-item/<slug>/ URL, or any /wp-json/wp/v2/portfolio URL.
  • WP-JSON portfolio CPT as the data source. Each item is a full WordPress portfolio entry with content, taxonomy, and _embed-able media.
  • 🏷️ portfolio_entries taxonomy split. Term names are auto-split into categories (Jobs, Leadership, Networks, etc.) vs locations (North Yorkshire, York, Pateley Bridge, Homeworking).
  • 📧 Apply email/URL from body. Regex-extracted from the content HTML (first mailto:applyEmail; first outbound http href → externalApplyUrl).
  • 📅 Closing date + salary (best-effort). Heuristic regex against body plain-text ("Closing date: …", "£X – £Y per annum"). Always falls back gracefully to null.
  • 🌐 Beyond jobs. Filter by volunteering, get-support, leadership, networks, advertise, podcast, membership to pull other portfolio content.
  • 📤 Clean exports. One row per item with full HTML description inline. JSON + CSV exported automatically.

🎯 Use cases

TeamWhat they build
Rural CVS recruitersDaily new-vacancy feeds for North Yorkshire / York charities
Sector publicationsAuto-populate Yorkshire voluntary-sector jobs sections
Workforce strategyRural vs urban pay benchmarks across Yorkshire
AggregatorsApply emails / URLs for redirect-and-track use cases
Podcast / content discoveryPull podcast term for the CFY podcast catalogue

📥 Supported inputs

URL patternBehaviour
(empty + entityTerms: ["jobs"])Default — Jobs only (~6 vacancies)
https://www.communityfirstyorkshire.org.uk/portfolio-item/<slug>/Single portfolio item
https://www.communityfirstyorkshire.org.uk/wp-json/wp/v2/portfolioAll portfolio entries (43 items)
https://www.communityfirstyorkshire.org.uk/wp-json/wp/v2/portfolio?portfolio_entries=87Filter by term ID (pass-through)

Not supported: browser listing pages (CFY has no public /jobs/ page — content is rendered into a masonry on the homepage); hosts outside communityfirstyorkshire.org.uk.

🔄 How it works

  1. Resolve start URLs — either from explicit startUrls, or built from entityTerms (slug → numeric term ID via a known map).
  2. Classify + translate each URL into the canonical /wp-json/wp/v2/portfolio shape, optionally with ?portfolio_entries=<id>&_embed=1.
  3. Walk pagination via X-WP-TotalPages from the response header.
  4. Parse each portfolio item:
    • title, content HTML
    • portfolio_entries term names → split into categories vs locations
    • body regex → apply email, external URL, closing date, salary (best-effort)
  5. Push one normalised row per item to the dataset.

⚙️ Input parameters

ParameterTypeDefaultDescription
startUrlsarray[]Direct portfolio-item / WP-JSON URLs. Empty = use entityTerms.
entityTermsarray["jobs"]portfolio_entries taxonomy slugs to scrape. Allowed: jobs, volunteering, get-support, get-involved, leadership, networks, advertise, podcast, membership.
enrichTaxonomiesbooleantrueWhen true, embeds taxonomy term names + featured image via WP-JSON _embed.
postedWithinHoursinteger(none)Only return rows posted in the last N hours (24 = last day, 72 = last 3 days). Empty/0 = all. Ideal for daily monitoring runs that only want fresh postings.
maxItemsinteger1000Hard cap on rows pushed.
maxConcurrency / minConcurrencyinteger5 / 1Parallel WP-JSON page-fetch limits.
maxRequestRetriesinteger5Retries before a failed request is given up.
proxyobjectNo proxySite does not anti-bot.

📊 Output overview

Each scraped item is one single dataset row. The type field is "job" when the item is in the "Jobs" category, else "post". The cpt field is always "portfolio".

📦 Output sample

{
"type": "job",
"cpt": "portfolio",
"source": "communityfirstyorkshire.org.uk",
"jobId": "24490",
"slug": "north-yorkshire-adviser-to-unpaid-carers-veterans-carers-plus-yorkshire",
"jobUrl": "https://www.communityfirstyorkshire.org.uk/portfolio-item/north-yorkshire-adviser-to-unpaid-carers-veterans-carers-plus-yorkshire/",
"wpJsonUrl": "https://www.communityfirstyorkshire.org.uk/wp-json/wp/v2/portfolio/24490",
"title": "North Yorkshire: Adviser to Unpaid Carers (Veterans), Carers Plus Yorkshire",
"description": "<div>About the role…</div>",
"descriptionText": "About the role…",
"companyName": null,
"companyWebsite": "https://www.carersplus.net/",
"companyDomain": "carersplus.net",
"location": "North Yorkshire",
"locations": ["North Yorkshire"],
"remote": false,
"salary": {
"currency": "GBP",
"min": 24000,
"max": 27000,
"raw": "£24,000 - £27,000 per annum"
},
"salaryRaw": "£24,000 - £27,000 per annum",
"categories": ["Jobs"],
"employmentTypes": [],
"contractType": null,
"portfolioTerms": ["Jobs", "North Yorkshire"],
"status": "publish",
"postedDate": "2026-05-15T09:26:35Z",
"closingDate": "Friday 30 May 2026",
"modifiedDate": "2026-05-15T09:26:35Z",
"applyType": "email",
"applyUrl": "https://www.communityfirstyorkshire.org.uk/portfolio-item/north-yorkshire-adviser-to-unpaid-carers-veterans-carers-plus-yorkshire/",
"applyEmail": "recruitment@carersplus.net",
"externalApplyUrl": "https://www.carersplus.net/",
"featuredImageUrl": null,
"authorId": 1,
"authorName": null,
"scrapedAt": "2026-05-20T00:13:00.000Z"
}

🗂 Key output fields

GroupFields
Identifierstype (job or post), cpt (always portfolio), source, jobId, slug, jobUrl, wpJsonUrl, scrapedAt
Contenttitle, description (HTML), descriptionText (plain)
DatespostedDate (ISO), closingDate (raw text), modifiedDate (ISO)
EmployercompanyName (null), companyWebsite (= externalApplyUrl), companyDomain
Locationlocation (primary, from portfolio_entries), locations[] (all), remote (true if 'Homeworking' tag present)
Compensationsalary.{currency, min, max, raw} (best-effort regex), salaryRaw
Taxonomiescategories[] (Jobs/Leadership/etc.), portfolioTerms[] (all term names)
Apply flowapplyType, applyUrl, applyEmail, externalApplyUrl

❓ FAQ

Why is closing date sometimes null even when the body mentions a deadline? The regex looks for "Closing date:", "Deadline:", or "Apply by:" prefixes. If the body uses other phrasing (e.g. "Applications must arrive by…"), the field stays null. The full body HTML is always in description.

Why is salary parse fragile? CFY items don't have a structured salary field — the regex hunts for "£" patterns in body text. Look at salaryRaw to see what was matched; if structured min/max look wrong, fall back to the raw string.

Can I scrape volunteering or events too? Yes. Set entityTerms: ["volunteering"] (or other term slugs). The same row shape applies — type becomes "post" for non-job categories.

Can I scrape private pages or applicant data? No. Only the public WP-JSON REST API.

How do I limit results? Set maxItems. With only ~6 jobs live, maxItems: 100 covers everything.

💬 Support

🛠 Additional services

  • Custom output shape, additional fields, or one-off datasets: muhamed.didovic@gmail.com
  • Similar scrapers for other CVS / volunteer hubs (Doing Good Leeds, VA Rotherham, VAS Sheffield, Barnsley CVS, BCVS, York CVS): drop an email.
  • For API access (no Apify fee, just usage): muhamed.didovic@gmail.com

🔎 Explore more scrapers

See other scrapers at memo23's Apify profile — covering job boards, real estate, social media, and more.


⚠️ Disclaimer

This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Community First Yorkshire (CFY), communityfirstyorkshire.org.uk, or any of their subsidiaries or affiliates. All trademarks mentioned are the property of their respective owners.

The scraper accesses only the publicly available WP-JSON REST endpoint and public detail pages on communityfirstyorkshire.org.uk — no authenticated endpoints, recruiter-only features, or content behind a login. Users are responsible for ensuring their use complies with communityfirstyorkshire.org.uk's Terms of Service, applicable data-protection law (GDPR, CCPA, etc.), and any contractual obligations of their own organisation.


SEO Keywords

community first yorkshire scraper, scrape communityfirstyorkshire.org.uk, cfy jobs api, yorkshire rural charity jobs scraper, north yorkshire voluntary sector jobs api, york charity jobs scraper, yorkshire dales charity recruitment data, Apify cfy, rural yorkshire jobs scraper, pateley bridge jobs api, yorkshire homeworking jobs scraper, wp-json portfolio cpt scraper, wordpress portfolio scraper, charityjob alternative scraper, doing good leeds alternative scraper, vassheffield alternative scraper, barnsleycvs alternative scraper, va rotherham alternative scraper, uk rural cvs jobs scraper, yorkshire third sector recruitment data