EU CORDIS API — Horizon Europe & H2020 Scraper avatar

EU CORDIS API — Horizon Europe & H2020 Scraper

Pricing

Pay per event

Go to Apify Store
EU CORDIS API — Horizon Europe & H2020 Scraper

EU CORDIS API — Horizon Europe & H2020 Scraper

EU CORDIS API scraper for Horizon Europe and Horizon 2020 grant projects. Exports funding amounts, coordinator, participants, programme, dates, and objectives as structured JSON. Four input modes, 20 fields per row.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

an hour ago

Last modified

Share

EU CORDIS API — Horizon Europe & H2020 Scraper

EU CORDIS API — Horizon Europe & H2020 Scraper

We do the dirty work so your dataset stays clean. 😈

$3.05 / 1,000 rows — pay only for results that land. No credit card required to try.

Export structured EU grant project records from the CORDIS database covering Horizon Europe (HORIZON) and Horizon 2020 (H2020) frameworks. Four input modes: direct project-ID lookup, free-text search, programme-code filter, and coordinator-country filter. Every row carries funding amounts, coordinator, participants, programme hierarchy, dates, status, and the project objective.

CORDIS (Community Research and Development Information Service) is the EU's primary source for results of EU-funded research projects since FP1. Horizon Europe (2021–2027, ~€95 billion budget) and Horizon 2020 (2014–2020, ~€80 billion, closed) together expose 57,000+ project records. We turn the awkward CORDIS web search into a clean JSON dataset for grant writers, research institutions, science-policy analysts, and EU innovation consultants. The only existing Apify Actor for CORDIS is marked DEPRECATED — there is zero active competition on the Store.

🎯 What this scrapes

One ResultRow per project. Every row carries the same 20 columns regardless of which input mode you used. Data is published by the European Commission under CC-BY 4.0 (EU Open Data policy).

FieldTypeDescription
project_idstringCORDIS numeric project ID
project_acronymstring | nullShort acronym (e.g. Photo2Fuel)
project_titlestringFull project title
project_urlstringPublic CORDIS project page URL
programmestring | nullPrimary programme code (e.g. HORIZON.2.5)
programme_display_namestring | nullProgramme title (e.g. Climate, Energy and Mobility)
call_idstring | nullMaster call identifier (e.g. HORIZON-CL5-2021-D2-01)
funding_schemestring | nullFunding scheme code (e.g. HORIZON-RIA, ERC-COG)
start_datestring | nullProject start date (YYYY-MM-DD)
end_datestring | nullProject end date (YYYY-MM-DD)
total_cost_eurnumber | nullTotal project cost in EUR
eu_contribution_eurnumber | nullEU contribution (grant amount) in EUR
statusstring | nullSIGNED, CLOSED, or TERMINATED
coordinator_organizationstring | nullLegal name of the coordinator organisation
coordinator_countrystring | nullCoordinator country (ISO 3166-1 alpha-2)
participating_organizationsstring[]Legal names of all participating organisations
participating_countriesstring[]Deduped ISO 3166-1 alpha-2 codes of participants
objective_summarystring | nullProject objective text (truncated unless includeFullAbstract=true)
keywordsstring[]Project keywords parsed from the comma-separated CORDIS string
scraped_atstringISO 8601 UTC datetime this row was written

🔥 Features

  • Four input modes in one Actor — projectIds, searchQuery, programmeFilter, countryFilter. Pydantic XOR validator enforces exactly one mode before any network call.
  • Horizon Europe and Horizon 2020 both supported via the framework switch (HORIZON, H2020, or ANY).
  • Coordinator-country filter implemented as a server-side scan with in-process post-filtering — CORDIS's search API does not expose coordinator country as a query field, so the Actor scans pages and emits only matching rows.
  • Single-hit anomaly handled — CORDIS returns hits.hit as a dict (not a list) when only one project matches; the parser's _ensure_list() normalises every nested array so the same code path handles both shapes.
  • Programme primary selection respects @attributes.uniqueProgrammePart=true — multi-programme projects expose the correct top-level programme code, not the first nested one.
  • Optional full-abstract mode — includeFullAbstract=true deep-fetches the project detail HTML page via parsel and replaces the truncated search-API objective with the full text; charged as result-row-detailed ($0.005) instead of result-row ($0.003).
  • Pydantic v2 input + output models — ResultRow.status is validated against the live enum (SIGNED, CLOSED, TERMINATED); country_filter normalised to upper-case at validation.

What we handle for you

CORDIS's public endpoints can rate-limit and shape traffic in ways that break naive scrapers. We absorb every failure mode before it touches your dataset:

  • Browser fingerprint rotationcurl-cffi replays real-browser TLS handshakes (Chrome 131 impersonation) so the target sees a genuine browser client, not Python. Profiles rotate across requests on any target that fingerprints clients.
  • Residential proxy rotation — Apify Proxy routes requests through fresh exit IPs on every block. A 4xx or 5xx response invalidates the current session; we request a new session_id and retry automatically.
  • Exponential backoff with Retry-After honoured429 and 503 responses trigger retries with 2 s base, doubling up to 30 s, max 5 attempts. We never hammer the endpoint; partial successes surface with a clear status message rather than silently returning an empty dataset.
  • Rate-limit pacing — page scanning for country-filter mode is paced so bursts don't trigger throttling mid-run.
  • Clean typed rows — Pydantic-validated output, ISO-8601 timestamps, stable IDs. No raw dicts, no silent nulls.
  • Pay only for results that land — PPE pricing means no charge for rows that never emit. Only the small actor-start warm-up fee applies if a run returns zero rows.

💡 Use cases

  • Grant writer competitive intelligence — pull all HORIZON.2.5 (Climate, Energy and Mobility) projects from the last 12 months to map who is winning which calls in your topic area.
  • Research-institution portfolio reporting — filter by countryFilter=DE (or your country) to enumerate every Horizon Europe project coordinated nationally, with funding totals.
  • Science-policy analysis — bulk-export H2020 vs HORIZON funding distributions across funding schemes (RIA / IA / CSA / ERC) for a programme retrospective.
  • EU innovation consulting — feed a list of projectIds (e.g. from a client's reference list) and return clean structured records for proposal due diligence.
  • University tech-transfer offices — track CLOSED projects in their field where commercial follow-on opportunities (IP licensing, spinout candidates) may have emerged.
  • Journalists & think-tanks — measure the country, programme, and consortium composition across an entire framework programme using the horizon europe project data returned per row.
  • B2B SaaS prospecting — use coordinator_organization + coordinator_country to build a list of EU research-project leaders as a prospecting signal for tools that sell into that audience.

⚙️ How to use it

  1. Open the Actor input form on the Apify Console.
  2. Pick exactly one input mode:
    • Project IDs — supply a list of projectIds (e.g. ["101069357"]) for direct lookup.
    • Search query — set searchQuery to a free-text query (e.g. "quantum computing").
    • Programme filter — set programmeFilter to a programme code (e.g. "HORIZON.2.5" or "H2020-EU.1.1.").
    • Country filter — set countryFilter to an ISO 3166-1 alpha-2 code (e.g. "DE", "ES", "NL").
  3. Pick a frameworkHORIZON (default), H2020, or ANY. Ignored in projectIds mode.
  4. Set maxProjects (1–5000) to cap the dataset size in list modes. Ignored in projectIds mode.
  5. Toggle includeFullAbstract on if you need the full untruncated objective text — costs one extra request per row and switches PPE to result-row-detailed.
  6. Toggle useProxy on to route through Apify residential proxies. Recommended for large runs or if you observe 429 responses.
  7. Click Start. Results stream into the default dataset and are available as JSON, CSV, Excel, or XML.

Single project lookup

{
"projectIds": ["101069357"]
}

Free-text search, capped at 50 rows

{
"searchQuery": "quantum computing",
"framework": "HORIZON",
"maxProjects": 50
}

All German-coordinated HORIZON projects (post-filter)

{
"countryFilter": "DE",
"framework": "HORIZON",
"maxProjects": 500
}

📥 Input

FieldTypeRequiredDefaultDescription
projectIdsstring[]XORExplicit CORDIS project IDs to fetch.
searchQuerystringXORFree-text search query (1–500 chars).
programmeFilterstringXORProgramme code (e.g. HORIZON.2.5).
countryFilterstringXORISO 3166-1 alpha-2 of coordinator country (e.g. DE).
frameworkenumnoHORIZONHORIZON, H2020, or ANY.
maxProjectsintegerno100Max ResultRows emitted (1–5000).
includeFullAbstractbooleannofalseDeep-fetch full objective from detail page.
useProxybooleannofalseRoute via Apify Proxy residential IPs.

Exactly one of projectIds, searchQuery, programmeFilter, or countryFilter must be provided. Passing zero or two+ raises a validation error before any network call.

📤 Output

One row per project, pushed to the default dataset and available as JSON, CSV, Excel, or XML.

{
"project_id": "101069357",
"project_acronym": "Photo2Fuel",
"project_title": "Artificial PHOTOsynthesis to produce FUELs and chemicals",
"project_url": "https://cordis.europa.eu/project/id/101069357/en",
"programme": "HORIZON.2.5",
"programme_display_name": "Climate, Energy and Mobility",
"call_id": "HORIZON-CL5-2021-D2-01",
"funding_scheme": "HORIZON-RIA",
"start_date": "2022-09-01",
"end_date": "2025-08-31",
"total_cost_eur": 2493171.25,
"eu_contribution_eur": 2493171.0,
"status": "SIGNED",
"coordinator_organization": "IDENER RESEARCH & DEVELOPMENT AIE",
"coordinator_country": "ES",
"participating_organizations": ["FUNDACION TECNALIA RESEARCH & INNOVATION", "UPPSALA UNIVERSITET"],
"participating_countries": ["DE", "ES", "SE"],
"objective_summary": "The Photo2Fuel project will develop a breakthrough technology that converts CO2 into useful fuels and chemicals...",
"keywords": ["solar energy", "bacteria", "archaea", "solar fuels", "CO2 reduction"],
"scraped_at": "2026-05-16T12:00:00.000000+00:00"
}

Export formats

  • JSON — full fidelity, all 20 fields, newline-delimited
  • CSV — flat, one row per project (array fields joined)
  • Excel.xlsx via the Apify dataset converter
  • XML — structured per-item

All formats are available via the Apify dataset API: GET /datasets/{id}/items?format=csv&clean=true.

💰 Pricing

Pay-Per-Event (PPE) — you pay only for what you use:

EventPrice (USD)When
actor-start$0.05Once per run, at boot
result-row$0.003Per project row written when includeFullAbstract=false
result-row-detailed$0.005Per project row written when includeFullAbstract=true

Example costs

RunRowsModeCost
1 project lookup1standard$0.053
50 search results50standard$0.20
500 programme rows500standard$1.55
1,000 country-filtered rows1,000standard$3.05
1,000 rows, full abstracts1,000detailed$5.05

At scale the per-row charge dominates: ~$3.05 per 1,000 rows in standard mode, ~$5.05 per 1,000 rows in detailed mode. Comparable NIH / NSF grant Actors on Apify run $1–3 per 1,000 records. EU grant records carry more value per row — larger deal sizes (median Horizon Europe grant ~€2–3M vs ~$500k for a typical NIH R01) and richer multi-organisation metadata with consortium members across multiple countries.

🚧 Limitations

  • HORIZON and H2020 only. FP7 and earlier framework programmes are not indexed in the current CORDIS search index; this Actor returns zero rows for FP6-and-earlier queries.
  • No deliverables, publications, or result documents. Available on the project detail page but not in the search API; out of scope for this Actor.
  • No organisation-level enrichment. Legal address, VAT ID, org type category — out of scope; use a dedicated organisation enrichment Actor.
  • Country filter is post-filter, not server-side. CORDIS's search API does not expose coordinator country as a query field. Country-filter mode scans pages and keeps only rows where coordinator_country matches. To collect maxProjects=500 German projects, the Actor may scan 1,500+ projects total. The emitted row count is always ≤ maxProjects.
  • Status filter is not a user input. The search API's status field path does not reliably filter server-side — status is emitted on every output row so you can filter the dataset client-side after the run.
  • Page size is fixed at 50. CORDIS's num=100 parameter returns only 10 results (verified 2026-05-16) — the Actor uses num=50 everywhere as the effective practical maximum.
  • 7-day default storage retention on the Apify FREE tier. Export your dataset immediately after the run or upgrade for longer retention.
  • CORDIS data is CC-BY 4.0. Attribution to the European Commission's CORDIS service is required when republishing.

❓ FAQ

What is the EU CORDIS API and what does this Actor do with it?

CORDIS exposes a public JSON search endpoint at https://cordis.europa.eu/search/en?format=json. This Actor wraps that endpoint with four input modes (project ID lookup, free-text search, programme filter, coordinator-country filter), handles pagination, normalises the raw API response into clean typed rows, and streams results into an Apify dataset in JSON, CSV, Excel, or XML. No registration or API key is required.

Does this cover Horizon Europe and Horizon 2020?

Yes — both frameworks are supported via the framework input field (HORIZON, H2020, or ANY). The CORDIS search index covers Horizon Europe (2021–2027) and H2020 (2014–2020). FP7 and earlier frameworks are not indexed and will return zero results.

Why is the objective_summary cut off?

The CORDIS search API returns the objective text truncated to ~2,000 characters in search results. Set includeFullAbstract=true to make the Actor follow up with a per-project GET to the detail page (/project/id/{id}/en) and extract the untruncated objective via a CSS selector. This switches the PPE event from result-row ($0.003) to result-row-detailed ($0.005) per row.

Why is country-filter mode slow for large result sets?

CORDIS's search API does not expose coordinator country as a query parameter — coordinator/country=DE returns zero results (verified 2026-05-16). We implement country filtering by scanning all-framework results and retaining only matching rows. To collect maxProjects=500 German-coordinated projects, the Actor may scan 1,500+ projects. The emitted dataset row count is always ≤ maxProjects.

Can I filter by project status (SIGNED / CLOSED / TERMINATED)?

Not as a direct input — the CORDIS search API status field path does not produce reliable server-side filtered results. Status is emitted on every output row, so you can filter the dataset client-side after the run.

What about horizon europe grants scraper alternatives?

As of 2026, the only other Apify Actor for CORDIS data is marked DEPRECATED. This Actor is the only actively maintained option on the Store — four input modes, 20 structured fields per row, and PPE pricing so you pay only for rows that land.

Are CORDIS results free to redistribute?

Yes, under CC-BY 4.0 (EU Open Data policy). Attribution to the European Commission's CORDIS service is required when republishing.

How much horizon europe project data can I pull per run?

The maxProjects input caps runs at 1–5,000 rows. For programme-filter and search-query modes, results are paginated at 50 per page — a 5,000-row run makes ~100 API pages. Country-filter mode may scan significantly more pages to collect matching rows (see the Limitations section above).

💬 Your feedback

Found a bug, hit a rate limit, or need a new field on the output row? Open an issue on the Actor's Apify Store page or contact the Devil Scrapes team at apify.com/DevilScrapes. We ship updates within days of validated reports.

Part of the Devil Scrapes Research Intelligence SuitearXiv Papers Scraper, PubMed Papers Scraper, Hugging Face Hub Scraper. All suite Actors share consistent PPE pricing and ISO 8601 UTC scraped_at timestamps for clean cross-source joins.