GBIF Species Occurrences Scraper
Pricing
from $3.00 / 1,000 results
GBIF Species Occurrences Scraper
Extract species occurrence records from GBIF, the Global Biodiversity Information Facility (2B+ records). Filter by scientific name, country, taxon, year, and basis of record. Returns taxonomy, coordinates, dates, dataset provenance, and collector metadata for biodiversity and ESG research.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Compute Edge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Extract structured biodiversity occurrence records from the Global Biodiversity Information Facility (GBIF) — the world's largest open-access repository of species occurrence data. This Actor queries the free GBIF API and returns clean JSON records with taxonomic classification, geographic coordinates, collection dates, and dataset provenance for conservation research, ecological analysis, and species distribution modeling.
GBIF aggregates occurrence records from thousands of institutions worldwide — museums, herbaria, citizen science initiatives, and research organizations. A single search can return millions of historical and contemporary observations spanning centuries of biodiversity data.
Key Features
- Global occurrence database access — Query 2+ billion species observations with no authentication required
- Taxonomic filtering — Filter by scientific name, taxon key, or multiple ranks (kingdom, phylum, class, order, family, genus, species)
- Geographic filtering — Restrict results by country, coordinates, or location polygon
- Temporal filtering — Filter by year or year range (e.g., "2020" or "2010,2020")
- Record type filtering — Isolate human observations, museum specimens, fossil records, or experimental observations
- Coordinate validation — Option to return only records with verified GPS coordinates
- Pagination handling — Automatic batching with GBIF's 300-record-per-page limit; respects 100k deep-paging offset cap
- Dataset provenance — Retain dataset name, publication key, institution code, and license for proper attribution
- Clean JSON output — All fields mapped to safe names (e.g.,
taxonClassinstead ofclass) ready for analysis pipelines
Output Data Fields
| Field | Type | Description |
|---|---|---|
key | integer | GBIF unique occurrence identifier |
scientificName | string | Species scientific name (e.g., Panthera leo) |
kingdom | string | Kingdom classification (e.g., Animalia, Plantae, Fungi) |
phylum | string | Phylum classification |
taxonClass | string | Class classification (mapped from class) |
taxonOrder | string | Order classification (mapped from order) |
family | string | Family name |
genus | string | Genus name |
species | string | Species epithet |
taxonRank | string | Taxonomic rank of the record |
country | string | Country name |
countryCode | string | 2-letter ISO country code |
stateProvince | string | State or province |
locality | string | Specific locality description |
decimalLatitude | number | Latitude (WGS84) |
decimalLongitude | number | Longitude (WGS84) |
elevation | number | Elevation in meters (above sea level) |
depth | number | Depth in meters (below water surface) |
eventDate | string | Date of observation (ISO format) |
year | integer | Year of observation |
month | integer | Month of observation (1-12) |
day | integer | Day of observation |
basisOfRecord | string | Record type (HUMAN_OBSERVATION, SPECIMEN, OBSERVATION, FOSSIL_SPECIMEN, etc.) |
occurrenceStatus | string | Status (PRESENT, ABSENT) |
individualCount | integer | Number of individuals observed |
datasetName | string | Name of the dataset contributing the record |
datasetKey | string | GBIF dataset UUID |
publishingOrgKey | string | GBIF organization UUID of the data publisher |
institutionCode | string | Museum/institution acronym |
collectionCode | string | Collection identifier within the institution |
catalogNumber | string | Catalog ID within the collection |
recordedBy | string | Observer/collector name |
identifiedBy | string | Person who identified the specimen |
license | string | Data license (e.g., CC_BY_4_0, CC0_1_0) |
lastInterpreted | string | Date GBIF last interpreted/validated the record |
How to Scrape GBIF Species Occurrence Data
- Navigate to the GBIF Species Occurrences Scraper Actor page on Apify Store.
- Click Start to open the input configuration form.
- (Optional) Enter a Scientific Name to filter by species (e.g., "Panthera leo", "Quercus", "Drosophila").
- (Optional) Enter a Country ISO code to limit results to one country (e.g., "US", "BR", "ZA").
- (Optional) Enter a Taxon Key to filter by GBIF's unique taxon identifier.
- (Optional) Enter a Year Range to filter by observation date (e.g., "2020" for a single year or "2010,2020" for a range).
- (Optional) Toggle Has Coordinates to only return records with GPS coordinates.
- (Optional) Enter a Basis of Record filter (e.g., "HUMAN_OBSERVATION", "SPECIMEN").
- Set Max Results to control output size (default: 1000, set to 0 for unlimited, max 100000).
- Click Start to run the Actor.
- Download results as JSON, CSV, or Excel from the Dataset tab.
Input Example
{"scientificName": "Panthera leo","country": "ZA","year": "2015,2024","hasCoordinate": true,"basisOfRecord": "HUMAN_OBSERVATION","maxResults": 500}
Output Example
{"key": 4156523045,"scientificName": "Panthera leo","kingdom": "Animalia","phylum": "Chordata","taxonClass": "Mammalia","taxonOrder": "Carnivora","family": "Felidae","genus": "Panthera","species": "leo","taxonRank": "SPECIES","country": "South Africa","countryCode": "ZA","stateProvince": "Kruger National Park","locality": "Central section","decimalLatitude": -24.56,"decimalLongitude": 31.82,"elevation": 380,"depth": null,"eventDate": "2020-06-15","year": 2020,"month": 6,"day": 15,"basisOfRecord": "HUMAN_OBSERVATION","occurrenceStatus": "PRESENT","individualCount": 1,"datasetName": "African Mammal Sightings","datasetKey": "98e50e68-6ecb-4dca-ab8b-b0e3eabc1234","publishingOrgKey": "1234a5bc-d67e-8f90-1234-567a890123ab","institutionCode": "SANBI","collectionCode": "MAMMAL","catalogNumber": "ZA2020-06-15-001","recordedBy": "Jane Smith","identifiedBy": "Dr. John Doe","license": "CC_BY_4_0","lastInterpreted": "2021-03-10T12:34:56"}
Pricing
This Actor queries the free GBIF API with automatic pagination. Compute costs depend on result volume.
- Cost per run: ~$0.001 for small queries (< 1000 records), ~$0.005-0.01 for large queries (10k+ records)
- Actor start event: Default platform rate
- Per-result pricing: $0.003/result
Typical run time is 30 seconds for 1000 records, up to several minutes for 50k+ records due to pagination.
Use Cases
- Conservation planning — Map species distributions by country/region to inform protected area designation
- Climate change research — Analyze occurrence trends over time to assess range shifts in response to warming
- Invasive species monitoring — Track the geographic spread of non-native species using historical records
- Biodiversity assessment — Quantify species richness and endemism in target regions
- Museum collection analytics — Export specimen metadata from natural history collections for cataloging
- Citizen science validation — Retrieve expert-verified occurrence records to train models on observation quality
- Ecological niche modeling — Feed occurrence records into MaxEnt or other ENM algorithms
- Taxonomic research — Gather all known specimens of a genus to study morphological variation
FAQ
Is it legal to scrape GBIF data?
Yes. GBIF is a public, open-access repository of biodiversity data. The API is free and requires no authentication. Most records are published under open licenses (CC_BY_4_0 or CC0_1_0). Always check the license field and cite the dataset and publishing organization per their requirements.
How many records can I retrieve?
GBIF hosts 2+ billion occurrence records. The API has a hard limit of 100,000 records per search (due to deep paging limitations). For larger datasets, filter by taxon, country, or date range to narrow results. Schedule multiple runs targeting different subsets.
What does "Basis of Record" mean?
It indicates the source type of the occurrence:
- HUMAN_OBSERVATION — Direct observation by a person (e.g., field sighting, photo)
- SPECIMEN — Museum or herbarium specimen
- OBSERVATION — Observation of unknown source
- FOSSIL_SPECIMEN — Fossilized remains
- LIVING_SPECIMEN — Alive specimen (e.g., in zoo or botanic garden)
Can I export GBIF data to Excel or CSV?
Yes. Apify supports exporting results in JSON, CSV, Excel, XML, and other formats directly from the Dataset tab after a run completes.
What does a null value in eventDate mean?
It means GBIF did not have a specific date for that occurrence record. Dates may be missing for historical specimens or citizen observations without timestamp metadata. The year, month, and day fields may also be null independently.
How often is GBIF updated?
GBIF receives new data continuously from connected institutions. Updates can happen daily. Schedule this Actor to run periodically to capture newly published occurrences.
Other Scrapers by SeatSignal
- CISA Known Exploited Vulnerabilities Scraper — Extract CVE threat intelligence
- NIST NVD Scraper — Extract National Vulnerability Database records
- Hotfrog Business Directory Scraper — Extract business listings and contact info
Legal Disclaimer
This Actor extracts publicly available biodiversity data from the GBIF API. GBIF data is published by institutions under various open licenses (primarily CC_BY_4_0 or CC0_1_0). Users are responsible for respecting individual dataset licenses and providing proper attribution to data publishers and GBIF. Ensure your use complies with applicable laws and the terms of any derived datasets. For support, contact the Actor developer through the Apify Store.