GBIF Species & Occurrence API Scraper avatar

GBIF Species & Occurrence API Scraper

Pricing

Pay per event

Go to Apify Store
GBIF Species & Occurrence API Scraper

GBIF Species & Occurrence API Scraper

Extract biodiversity data from the Global Biodiversity Information Facility (GBIF). Dual-mode: species mode retrieves taxonomy, vernacular names, synonyms, and distributions; occurrence mode streams georeferenced observation records for species distribution modelling and conservation analysis.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Extract biodiversity data from the Global Biodiversity Information Facility (GBIF) — the world's largest aggregator of species taxonomy and georeferenced occurrence records.

This actor exposes a dual-mode interface:

  • Species mode — paginate species/search, then fan-out to 4 enrichment endpoints per taxon: vernacular names, synonyms, geographic distributions, and descriptions. Produces fully-denormalised taxonomy rows.
  • Occurrence mode — stream occurrence/search results with geo coordinates, collection metadata, and observer info. Year-range chunking automatically bypasses GBIF's 100,000-offset cap for deep pulls.

No API key required. No proxy required. Pure public REST JSON.


Why GBIF?

GBIF indexes 1.4 million Plantae species and billions of georeferenced occurrence records contributed by natural-history museums, herbaria, and citizen-science platforms worldwide. The data is public-domain and updated continuously.

Who uses this actor:

  • Ecological niche / species distribution modellers (SDM/MaxEnt workflows)
  • Conservation NGOs building species-range databases
  • AI training-data builders (botanical vision datasets, biodiversity LLMs)
  • University labs needing bulk taxonomy or occurrence point extracts
  • Agtech and GIS / land-use analytics teams

Modes

Species Mode (mode: "species")

Paginates https://api.gbif.org/v1/species/search and emits one row per taxon. With enrich: true (default), each taxon gets 4 additional API calls:

Sub-resourceWhat it adds
/species/{key}/vernacularNamesCommon names in all languages
/species/{key}/synonymsSynonym names
/species/{key}/distributionsGeographic range with establishment status
/species/{key}/descriptionsHabitat, ecology, and conservation notes

Set enrich: false to skip enrichment for faster bulk taxonomy extraction.

Occurrence Mode (mode: "occurrence")

Paginates https://api.gbif.org/v1/occurrence/search and emits one row per observation. Each row includes coordinates, collection metadata, observer, date, and basis of record.

100k offset cap: GBIF limits occurrence search offsets to 100,000 records per query. For deeper pulls, set yearFrom and yearTo — the actor automatically chunks by year, running one paginated query per year and concatenating results. A 10-year range with 100k records per year gives 1M+ records.


Input

FieldTypeDefaultDescription
modestring"species""species" or "occurrence"
querystringFree-text search (e.g. "Acer saccharum", "Quercus")
higherTaxonKeyintegerGBIF backbone key for a higher taxon (6 = Plantae, 1 = Animalia)
rankstringSpecies-mode filter: SPECIES, GENUS, FAMILY, etc.
taxonomicStatusstringSpecies-mode filter: ACCEPTED, SYNONYM, DOUBTFUL
datasetKeystringRestrict to a specific GBIF dataset UUID
countryCodestringOccurrence-mode filter: ISO 3166-1 alpha-2 country code (e.g. "US")
yearFromintegerOccurrence-mode: earliest year; enables year-chunking
yearTointegerOccurrence-mode: latest year
hasCoordinatebooleanfalseOccurrence-mode: only return records with coordinates
enrichbooleantrueSpecies-mode: fetch vernacular names, synonyms, distributions, descriptions
maxItemsinteger15Maximum records to return (0 = unlimited, requires a filter)

Output

Each record is emitted as a flat JSON object. Fields unused by the current mode are null.

Species record example

{
"record_type": "species",
"gbif_key": 3189834,
"nub_key": 3189834,
"scientific_name": "Acer saccharum Marshall",
"canonical_name": "Acer saccharum",
"authorship": "Marshall",
"rank": "SPECIES",
"taxonomic_status": "ACCEPTED",
"kingdom": "Plantae",
"phylum": "Tracheophyta",
"class": "Magnoliopsida",
"order": "Sapindales",
"family": "Sapindaceae",
"genus": "Acer",
"species": "Acer saccharum",
"dataset_key": "d7dddbf4-2cf0-4f39-9b2a-bb099caae36c",
"vernacular_names": "Sugar Maple [en] | Érable à sucre [fr] | Zuckerahorn [de]",
"synonyms": "Acer saccharophorum K.Koch | Saccharodendron saccharum (Marshall) Nieuwl.",
"distributions": "United States (NATIVE) | Canada (NATIVE)",
"descriptions": "[habitat] Mesic deciduous forests, often with beech and yellow birch",
"gbif_url": "https://www.gbif.org/species/3189834",
"scraped_at": "2026-05-18T10:00:00.000Z"
}

Occurrence record example

{
"record_type": "occurrence",
"gbif_key": 3189834,
"scientific_name": "Acer saccharum Marshall",
"canonical_name": "Acer saccharum",
"rank": "SPECIES",
"kingdom": "Plantae",
"family": "Sapindaceae",
"occurrence_key": 4530178441,
"decimal_latitude": 44.3601,
"decimal_longitude": -72.6519,
"country_code": "US",
"state_province": "Vermont",
"event_date": "2023-09-24",
"basis_of_record": "HUMAN_OBSERVATION",
"recorded_by": "John Doe",
"institution_code": "iNaturalist",
"coordinate_uncertainty_m": 12,
"gbif_url": "https://www.gbif.org/occurrence/4530178441",
"scraped_at": "2026-05-18T10:00:00.000Z"
}

Usage Examples

All Plantae species (ACCEPTED, no enrichment, fast)

{
"mode": "species",
"higherTaxonKey": 6,
"rank": "SPECIES",
"taxonomicStatus": "ACCEPTED",
"enrich": false,
"maxItems": 10000
}

Sugar Maple occurrences in the US with coordinates (2020-2024)

{
"mode": "occurrence",
"query": "Acer saccharum",
"countryCode": "US",
"hasCoordinate": true,
"yearFrom": 2020,
"yearTo": 2024,
"maxItems": 0
}

Quercus genus with full enrichment

{
"mode": "species",
"query": "Quercus",
"rank": "SPECIES",
"enrich": true,
"maxItems": 500
}

Rate Limits & Polite Use

GBIF does not publish a hard rate limit but asks heavy users to be considerate. This actor uses:

  • 300 records per page (GBIF maximum for occurrence search)
  • 200 ms delay between pages
  • Up to 3 concurrent enrichment calls per species batch
  • Identifies itself via User-Agent: OrbLabs/gbif-species-occurrence-api-scraper

For very large runs (millions of records), consider using the GBIF download API for a pre-packaged export instead.


Pricing

Pay-per-result (PPE). You pay only for records actually extracted. No charge for idle time spent waiting on GBIF's API.


Notes

  • GBIF data is licensed under CC BY 4.0 or CC0 depending on the contributing dataset. Always check individual dataset licenses before commercial use.
  • The GBIF backbone (datasetKey: d7dddbf4-2cf0-4f39-9b2a-bb099caae36c) is the authoritative taxonomy for 49M+ taxa.
  • Occurrence records are georeferenced observations contributed by iNaturalist, eBird, natural history collections, and citizen science platforms.