EU CORDIS Grants Scraper avatar

EU CORDIS Grants Scraper

Pricing

Pay per event

Go to Apify Store
EU CORDIS Grants Scraper

EU CORDIS Grants Scraper

Scrape EU CORDIS — the European Commission's R&D project database. Get grants, participants, funding amounts, topics, and project IDs as typed dataset rows. We handle pagination, retries, and rate-limit pacing across CORDIS's federated endpoints.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 days ago

Last modified

Categories

Share

EU CORDIS Grants Scraper

EU CORDIS Grants Scraper

We do the dirty work so your dataset stays clean. 😈

$3.05 / 1,000 rows — Export structured EU grant project records from the CORDIS database covering Horizon Europe (HORIZON) and Horizon 2020 (H2020) frameworks. Four input modes: direct project-ID lookup, free-text search, programme-code filter, and coordinator-country filter. Every row carries funding amounts, coordinator, participants, programme hierarchy, dates, status, and the project objective. Public CORDIS search API, no API key, no login, no browser automation.

CORDIS (Community Research and Development Information Service) is the EU's primary source for results of EU-funded research projects since FP1. The Horizon Europe (2021-2027, ~95 billion EUR budget) and Horizon 2020 (2014-2020, ~80 billion EUR closed) frameworks together expose 57,000+ project records. This Actor turns the awkward CORDIS web search into a clean JSON dataset for grant writers, research institutions, science-policy analysts, and EU innovation consultants. The only existing Apify Actor for CORDIS is marked DEPRECATED — there is zero active competition on the Store.

🎯 What this scrapes

One ResultRow per project. Every row carries the same 20 columns regardless of which input mode you used. Data is published by the European Commission under CC-BY 4.0 (EU Open Data policy).

FieldTypeDescription
project_idstringCORDIS numeric project ID
project_acronymstring | nullShort acronym (e.g. Photo2Fuel)
project_titlestringFull project title
project_urlstringPublic CORDIS project page URL
programmestring | nullPrimary programme code (e.g. HORIZON.2.5)
programme_display_namestring | nullProgramme title (e.g. Climate, Energy and Mobility)
call_idstring | nullMaster call identifier (e.g. HORIZON-CL5-2021-D2-01)
funding_schemestring | nullFunding scheme code (e.g. HORIZON-RIA, ERC-COG)
start_datestring | nullProject start date (YYYY-MM-DD)
end_datestring | nullProject end date (YYYY-MM-DD)
total_cost_eurnumber | nullTotal project cost in EUR
eu_contribution_eurnumber | nullEU contribution (grant amount) in EUR
statusstring | nullSIGNED, CLOSED, or TERMINATED
coordinator_organizationstring | nullLegal name of the coordinator organisation
coordinator_countrystring | nullCoordinator country (ISO 3166-1 alpha-2)
participating_organizationsstring[]Legal names of all participating organisations
participating_countriesstring[]Deduped ISO 3166-1 alpha-2 codes of participants
objective_summarystring | nullProject objective text (truncated unless includeFullAbstract=true)
keywordsstring[]Project keywords parsed from the comma-separated CORDIS string
scraped_atstringISO 8601 UTC datetime this row was written

🔥 Features

  • Four input modes in one Actor — projectIds, searchQuery, programmeFilter, countryFilter. Pydantic XOR validator enforces exactly one mode before any network call.
  • Horizon Europe and Horizon 2020 both supported via the framework switch (HORIZON, H2020, or ANY).
  • Coordinator-country filter implemented as a server-side scan with in-process post-filtering — CORDIS's search API does not expose coordinator country as a query field, so the Actor scans pages and emits only matching rows.
  • Single-hit anomaly handled — CORDIS returns hits.hit as a dict (not a list) when only one project matches; the parser's _ensure_list() normalises every nested array so the same code path handles both shapes.
  • Programme primary selection respects @attributes.uniqueProgrammePart=true — multi-programme projects expose the correct top-level programme code, not the first nested one.
  • Optional full-abstract mode — includeFullAbstract=true deep-fetches the project detail HTML page via parsel and replaces the truncated search-API objective with the full text; charged as result-row-detailed ($0.005) instead of result-row ($0.003).
  • Exponential backoff with Retry-After honoured for 429 and 503 responses; max 5 attempts.
  • curl-cffi with Chrome 131 TLS impersonation (ADR-0002 house default) — robust against any future CORDIS rate-limit tightening even though the endpoint is unauthenticated today.
  • Apify Proxy support via the BUYPROXIES94952 group (opt-in via useProxy).
  • Pydantic v2 input + output models — ResultRow.status is validated against the live enum (SIGNED, CLOSED, TERMINATED); country_filter normalised to upper-case at validation.

💡 Use cases

  • Grant writer competitive intelligence — pull all HORIZON.2.5 (Climate, Energy and Mobility) projects from the last 12 months to map who is winning which calls in your topic area.
  • Research-institution portfolio reporting — filter by countryFilter=DE (or your country) to enumerate every Horizon Europe project coordinated nationally, with funding totals.
  • Science-policy analysis — bulk-export H2020 vs HORIZON funding distributions across funding schemes (RIA / IA / CSA / ERC) for a programme retrospective.
  • EU innovation consulting — feed a list of projectIds (e.g. from a client's reference list) and return clean structured records for proposal due diligence.
  • University tech-transfer offices — track CLOSED projects in their field where commercial follow-on opportunities (IP licensing, spinout candidates) may have emerged.
  • Journalists & think-tanks — measure the gender, country, and SME representation among coordinators across an entire framework programme.

⚙️ How to use it

  1. Open the Actor input form on the Apify Console.
  2. Pick exactly one input mode:
    • Project IDs — supply a list of projectIds (e.g. ["101069357"]) for direct lookup.
    • Search query — set searchQuery to a free-text query (e.g. "quantum computing").
    • Programme filter — set programmeFilter to a programme code (e.g. "HORIZON.2.5" or "H2020-EU.1.1.").
    • Country filter — set countryFilter to an ISO 3166-1 alpha-2 code (e.g. "DE", "ES", "NL").
  3. Pick a frameworkHORIZON (default), H2020, or ANY. Ignored in projectIds mode.
  4. Set maxProjects (1-5000) to cap the dataset size in list modes. Ignored in projectIds mode.
  5. Toggle includeFullAbstract on if you need the full untruncated objective text — costs one extra request per row and switches PPE to result-row-detailed.
  6. Toggle useProxy on if CORDIS starts returning 429. Default is off — CORDIS does not currently rate-limit datacenter IPs.
  7. Click Start. Results stream into the default dataset as JSON / CSV / Excel / XML.

Single project lookup

{
"projectIds": ["101069357"]
}

Free-text search, capped at 50 rows

{
"searchQuery": "quantum computing",
"framework": "HORIZON",
"maxProjects": 50
}

All German-coordinated HORIZON projects (post-filter)

{
"countryFilter": "DE",
"framework": "HORIZON",
"maxProjects": 500
}

📥 Input

FieldTypeRequiredDefaultDescription
projectIdsstring[]XORExplicit CORDIS project IDs to fetch.
searchQuerystringXORFree-text search query (1-500 chars).
programmeFilterstringXORProgramme code (e.g. HORIZON.2.5).
countryFilterstringXORISO 3166-1 alpha-2 of coordinator country (e.g. DE).
frameworkenumnoHORIZONHORIZON, H2020, or ANY.
maxProjectsintegerno100Max ResultRows emitted (1-5000).
includeFullAbstractbooleannofalseDeep-fetch full objective from detail page.
useProxybooleannofalseRoute via Apify Proxy (BUYPROXIES94952).

Exactly one of projectIds, searchQuery, programmeFilter, or countryFilter must be provided. Passing zero or two+ raises a validation error before any network call.

📤 Output

One row per project, pushed to the default dataset and available as JSON, CSV, Excel, or XML.

{
"project_id": "101069357",
"project_acronym": "Photo2Fuel",
"project_title": "Artificial PHOTOsynthesis to produce FUELs and chemicals",
"project_url": "https://cordis.europa.eu/project/id/101069357/en",
"programme": "HORIZON.2.5",
"programme_display_name": "Climate, Energy and Mobility",
"call_id": "HORIZON-CL5-2021-D2-01",
"funding_scheme": "HORIZON-RIA",
"start_date": "2022-09-01",
"end_date": "2025-08-31",
"total_cost_eur": 2493171.25,
"eu_contribution_eur": 2493171.0,
"status": "SIGNED",
"coordinator_organization": "IDENER RESEARCH & DEVELOPMENT AIE",
"coordinator_country": "ES",
"participating_organizations": ["FUNDACION TECNALIA RESEARCH & INNOVATION", "UPPSALA UNIVERSITET"],
"participating_countries": ["DE", "ES", "SE"],
"objective_summary": "The Photo2Fuel project will develop a breakthrough technology that converts CO2 into useful fuels and chemicals...",
"keywords": ["solar energy", "bacteria", "archaea", "solar fuels", "CO2 reduction"],
"scraped_at": "2026-05-16T12:00:00.000000+00:00"
}

Export formats

  • JSON — full fidelity, all 20 fields, newline-delimited
  • CSV — flat, one row per project (array fields joined)
  • Excel.xlsx via the Apify dataset converter
  • XML — structured per-item

All formats are available via the Apify API: GET /datasets/{id}/items?format=csv&clean=true.

💰 Pricing

Pay-Per-Event (PPE) — you pay only for what you use:

EventPrice (USD)When
actor-start$0.05Once per run, at boot
result-row$0.003Per project row written when includeFullAbstract=false
result-row-detailed$0.005Per project row written when includeFullAbstract=true

Example costs

RunRowsModeCost
1 project lookup1standard$0.053
50 search results50standard$0.20
500 programme rows500standard$1.55
1,000 country-filtered rows1,000standard$3.05
1,000 rows, full abstracts1,000detailed$5.05

At scale the per-row charge dominates: ~$3.05 per 1,000 rows in standard mode, ~$5.05 per 1,000 rows in detailed mode. Comparable NIH / NSF grant Actors run $1-3 per 1,000 records — EU grants are higher value per record due to larger deal sizes (median Horizon Europe grant ~2-3M EUR vs ~500k USD for typical NIH R01) and richer multi-organisation metadata.

🚧 Limitations

  • Public CORDIS search API only. No authenticated CORDIS portal features, no My CORDIS saved searches, no Steamworks-style backstage endpoints.
  • HORIZON and H2020 only. FP7 and earlier framework programmes are not indexed in the current CORDIS search; this Actor returns zero rows for FP6-and-earlier queries.
  • No deliverables, publications, or result documents. Available on the project detail page but not in the search API; out of scope.
  • No organisation-level enrichment. Legal address, VAT ID, org type category — out of scope; use a dedicated organisation enrichment Actor.
  • Country filter is post-filter, not server-side. CORDIS's search API does not expose coordinator country as a query field, so country-filter mode may scan many more API pages than maxProjects to collect that many matching rows. API call count is uncapped; emitted row count is always ≤ maxProjects.
  • Status filter is not a user input. Live CORDIS data only emits SIGNED, CLOSED, or TERMINATED; the search API's status field path does not reliably filter — surfaced only on the output row.
  • Page size is fixed at 50. CORDIS's num=100 parameter returns only 10 results (verified 2026-05-16) — the Actor uses num=50 everywhere as the effective practical maximum.
  • 7-day default storage retention on the Apify FREE tier. Export your dataset immediately after the run or upgrade for longer retention.
  • CORDIS data is CC-BY 4.0. Attribution to the European Commission's CORDIS service is required when republishing.

❓ FAQ

Do I need an API key?

No. The CORDIS search endpoint at https://cordis.europa.eu/search/en?format=json is fully public and unauthenticated. The Actor reads only data that the EU Commission already publishes openly.

Why is the objective_summary cut off?

The CORDIS search API returns the objective text truncated to ~2000 characters in search results. Set includeFullAbstract=true to make the Actor follow up with a per-project HTTP GET to the detail page (/project/id/{id}/en) and extract the untruncated objective via a parsel CSS selector. This switches the PPE event from result-row ($0.003) to result-row-detailed ($0.005) per row.

Why is the country filter so slow?

CORDIS's search API does not expose coordinator country as a query field — coordinator/country=DE returns zero results (verified 2026-05-16). The Actor implements country filtering by post-filtering full search results: it scans pages of all-framework projects and keeps only rows where coordinator_country == countryFilter. To collect maxProjects=500 German projects, the Actor may scan 1500+ projects total. The emitted dataset row count is always ≤ maxProjects.

Can I filter by project status (SIGNED / CLOSED / TERMINATED)?

Not as a direct input — CORDIS's search API status field path does not produce reliable filtered results. Status is emitted on every output row, so you can filter the dataset client-side after the run.

What about FP7 or earlier frameworks?

Out of scope. The current CORDIS search index covers HORIZON (Horizon Europe, 2021-2027) and H2020 (Horizon 2020, 2014-2020) only. The Actor will return zero rows for any FP6-and-earlier query.

Are CORDIS results free to redistribute?

Yes, under CC-BY 4.0 (EU Open Data policy). Attribution to the European Commission's CORDIS service is required.

Part of the Devil Scrapes Research Intelligence Suite:

  • arXiv Papers Scraper — preprint metadata across all arXiv categories.
  • PubMed Papers Scraper — biomedical literature from the NIH PubMed index.
  • SEC EDGAR Filings Scraper — US public-company filings for company funding context.
  • USPTO Patents Scraper — US patent metadata for IP landscape work.
  • Hugging Face Hub Scraper — model and dataset metadata for AI research overlap.

All suite Actors share consistent PPE pricing ($0.001-$0.005 per row, $0.01-$0.05 per run) and scraped_at ISO 8601 UTC timestamps so cross-source joins work cleanly.

💬 Your feedback

Found a bug, hit a rate limit, or need a new field on the output row? Open an issue on the Actor's Apify Store page or contact the Devil Scrapes team at apify.com/DevilScrapes. We ship updates within days of validated reports.