GBIF Species Occurrences Scraper avatar

GBIF Species Occurrences Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
GBIF Species Occurrences Scraper

GBIF Species Occurrences Scraper

Extract species occurrence records from GBIF, the Global Biodiversity Information Facility (2B+ records). Filter by scientific name, country, taxon, year, and basis of record. Returns taxonomy, coordinates, dates, dataset provenance, and collector metadata for biodiversity and ESG research.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Compute Edge

Compute Edge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

Extract structured biodiversity occurrence records from the Global Biodiversity Information Facility (GBIF) — the world's largest open-access repository of species occurrence data. This Actor queries the free GBIF API and returns clean JSON records with taxonomic classification, geographic coordinates, collection dates, and dataset provenance for conservation research, ecological analysis, and species distribution modeling.

GBIF aggregates occurrence records from thousands of institutions worldwide — museums, herbaria, citizen science initiatives, and research organizations. A single search can return millions of historical and contemporary observations spanning centuries of biodiversity data.

Key Features

  • Global occurrence database access — Query 2+ billion species observations with no authentication required
  • Taxonomic filtering — Filter by scientific name, taxon key, or multiple ranks (kingdom, phylum, class, order, family, genus, species)
  • Geographic filtering — Restrict results by country, coordinates, or location polygon
  • Temporal filtering — Filter by year or year range (e.g., "2020" or "2010,2020")
  • Record type filtering — Isolate human observations, museum specimens, fossil records, or experimental observations
  • Coordinate validation — Option to return only records with verified GPS coordinates
  • Pagination handling — Automatic batching with GBIF's 300-record-per-page limit; respects 100k deep-paging offset cap
  • Dataset provenance — Retain dataset name, publication key, institution code, and license for proper attribution
  • Clean JSON output — All fields mapped to safe names (e.g., taxonClass instead of class) ready for analysis pipelines

Output Data Fields

FieldTypeDescription
keyintegerGBIF unique occurrence identifier
scientificNamestringSpecies scientific name (e.g., Panthera leo)
kingdomstringKingdom classification (e.g., Animalia, Plantae, Fungi)
phylumstringPhylum classification
taxonClassstringClass classification (mapped from class)
taxonOrderstringOrder classification (mapped from order)
familystringFamily name
genusstringGenus name
speciesstringSpecies epithet
taxonRankstringTaxonomic rank of the record
countrystringCountry name
countryCodestring2-letter ISO country code
stateProvincestringState or province
localitystringSpecific locality description
decimalLatitudenumberLatitude (WGS84)
decimalLongitudenumberLongitude (WGS84)
elevationnumberElevation in meters (above sea level)
depthnumberDepth in meters (below water surface)
eventDatestringDate of observation (ISO format)
yearintegerYear of observation
monthintegerMonth of observation (1-12)
dayintegerDay of observation
basisOfRecordstringRecord type (HUMAN_OBSERVATION, SPECIMEN, OBSERVATION, FOSSIL_SPECIMEN, etc.)
occurrenceStatusstringStatus (PRESENT, ABSENT)
individualCountintegerNumber of individuals observed
datasetNamestringName of the dataset contributing the record
datasetKeystringGBIF dataset UUID
publishingOrgKeystringGBIF organization UUID of the data publisher
institutionCodestringMuseum/institution acronym
collectionCodestringCollection identifier within the institution
catalogNumberstringCatalog ID within the collection
recordedBystringObserver/collector name
identifiedBystringPerson who identified the specimen
licensestringData license (e.g., CC_BY_4_0, CC0_1_0)
lastInterpretedstringDate GBIF last interpreted/validated the record

How to Scrape GBIF Species Occurrence Data

  1. Navigate to the GBIF Species Occurrences Scraper Actor page on Apify Store.
  2. Click Start to open the input configuration form.
  3. (Optional) Enter a Scientific Name to filter by species (e.g., "Panthera leo", "Quercus", "Drosophila").
  4. (Optional) Enter a Country ISO code to limit results to one country (e.g., "US", "BR", "ZA").
  5. (Optional) Enter a Taxon Key to filter by GBIF's unique taxon identifier.
  6. (Optional) Enter a Year Range to filter by observation date (e.g., "2020" for a single year or "2010,2020" for a range).
  7. (Optional) Toggle Has Coordinates to only return records with GPS coordinates.
  8. (Optional) Enter a Basis of Record filter (e.g., "HUMAN_OBSERVATION", "SPECIMEN").
  9. Set Max Results to control output size (default: 1000, set to 0 for unlimited, max 100000).
  10. Click Start to run the Actor.
  11. Download results as JSON, CSV, or Excel from the Dataset tab.

Input Example

{
"scientificName": "Panthera leo",
"country": "ZA",
"year": "2015,2024",
"hasCoordinate": true,
"basisOfRecord": "HUMAN_OBSERVATION",
"maxResults": 500
}

Output Example

{
"key": 4156523045,
"scientificName": "Panthera leo",
"kingdom": "Animalia",
"phylum": "Chordata",
"taxonClass": "Mammalia",
"taxonOrder": "Carnivora",
"family": "Felidae",
"genus": "Panthera",
"species": "leo",
"taxonRank": "SPECIES",
"country": "South Africa",
"countryCode": "ZA",
"stateProvince": "Kruger National Park",
"locality": "Central section",
"decimalLatitude": -24.56,
"decimalLongitude": 31.82,
"elevation": 380,
"depth": null,
"eventDate": "2020-06-15",
"year": 2020,
"month": 6,
"day": 15,
"basisOfRecord": "HUMAN_OBSERVATION",
"occurrenceStatus": "PRESENT",
"individualCount": 1,
"datasetName": "African Mammal Sightings",
"datasetKey": "98e50e68-6ecb-4dca-ab8b-b0e3eabc1234",
"publishingOrgKey": "1234a5bc-d67e-8f90-1234-567a890123ab",
"institutionCode": "SANBI",
"collectionCode": "MAMMAL",
"catalogNumber": "ZA2020-06-15-001",
"recordedBy": "Jane Smith",
"identifiedBy": "Dr. John Doe",
"license": "CC_BY_4_0",
"lastInterpreted": "2021-03-10T12:34:56"
}

Pricing

This Actor queries the free GBIF API with automatic pagination. Compute costs depend on result volume.

  • Cost per run: ~$0.001 for small queries (< 1000 records), ~$0.005-0.01 for large queries (10k+ records)
  • Actor start event: Default platform rate
  • Per-result pricing: $0.003/result

Typical run time is 30 seconds for 1000 records, up to several minutes for 50k+ records due to pagination.

Use Cases

  • Conservation planning — Map species distributions by country/region to inform protected area designation
  • Climate change research — Analyze occurrence trends over time to assess range shifts in response to warming
  • Invasive species monitoring — Track the geographic spread of non-native species using historical records
  • Biodiversity assessment — Quantify species richness and endemism in target regions
  • Museum collection analytics — Export specimen metadata from natural history collections for cataloging
  • Citizen science validation — Retrieve expert-verified occurrence records to train models on observation quality
  • Ecological niche modeling — Feed occurrence records into MaxEnt or other ENM algorithms
  • Taxonomic research — Gather all known specimens of a genus to study morphological variation

FAQ

Yes. GBIF is a public, open-access repository of biodiversity data. The API is free and requires no authentication. Most records are published under open licenses (CC_BY_4_0 or CC0_1_0). Always check the license field and cite the dataset and publishing organization per their requirements.

How many records can I retrieve?

GBIF hosts 2+ billion occurrence records. The API has a hard limit of 100,000 records per search (due to deep paging limitations). For larger datasets, filter by taxon, country, or date range to narrow results. Schedule multiple runs targeting different subsets.

What does "Basis of Record" mean?

It indicates the source type of the occurrence:

  • HUMAN_OBSERVATION — Direct observation by a person (e.g., field sighting, photo)
  • SPECIMEN — Museum or herbarium specimen
  • OBSERVATION — Observation of unknown source
  • FOSSIL_SPECIMEN — Fossilized remains
  • LIVING_SPECIMEN — Alive specimen (e.g., in zoo or botanic garden)

Can I export GBIF data to Excel or CSV?

Yes. Apify supports exporting results in JSON, CSV, Excel, XML, and other formats directly from the Dataset tab after a run completes.

What does a null value in eventDate mean?

It means GBIF did not have a specific date for that occurrence record. Dates may be missing for historical specimens or citizen observations without timestamp metadata. The year, month, and day fields may also be null independently.

How often is GBIF updated?

GBIF receives new data continuously from connected institutions. Updates can happen daily. Schedule this Actor to run periodically to capture newly published occurrences.

Other Scrapers by SeatSignal

This Actor extracts publicly available biodiversity data from the GBIF API. GBIF data is published by institutions under various open licenses (primarily CC_BY_4_0 or CC0_1_0). Users are responsible for respecting individual dataset licenses and providing proper attribution to data publishers and GBIF. Ensure your use complies with applicable laws and the terms of any derived datasets. For support, contact the Actor developer through the Apify Store.