# GBIF Species Occurrence Scraper — Biodiversity Records API (`compute-edge/gbif-occurrences-scraper`) Actor

Extract species occurrence records from GBIF (Global Biodiversity Information Facility). Filter by scientific name, country, date range, and basis of record. Outputs taxonomy, geolocation, and dataset attribution.

- **URL**: https://apify.com/compute-edge/gbif-occurrences-scraper.md
- **Developed by:** [Compute Edge](https://apify.com/compute-edge) (community)
- **Categories:** Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## GBIF Species Occurrence Scraper — Biodiversity Records API

Extract species occurrence records from the **GBIF (Global Biodiversity Information Facility)** open API — the world's largest aggregator of biodiversity observation data, with over 3.8 billion records from museums, herbaria, citizen science projects, and government surveys.

### What this Actor does

This Actor wraps the public GBIF Occurrence Search API and returns clean, structured JSON records ready for ecological research, conservation reporting, biodiversity assessments, environmental impact studies, and species distribution modeling.

**Use cases:**
- **Conservation biology**: Track sightings of threatened species across geographies and time periods.
- **Environmental consulting**: Pull baseline biodiversity for a project area before a development EIA/EIS.
- **Academic research**: Bulk-pull occurrence data for niche modeling (MaxEnt, SDM) without writing API glue code.
- **Education**: Build classroom datasets of charismatic megafauna, native flora, or invasive species.
- **GIS pipelines**: Feed standardized lat/lng + taxonomy records into ArcGIS, QGIS, or PostGIS.

### How to scrape GBIF biodiversity data

1. Enter a **Scientific Name** (e.g., `Panthera leo`, `Quercus alba`). Leave blank for any species.
2. Optionally restrict by **Country** (ISO 2-letter code), **Basis of Record**, **Year From / Year To**.
3. Toggle **Only Records With Coordinates** to filter out occurrences without GPS data.
4. Set **Max Results** (1–100,000).
5. Run. Output appears in the dataset as JSON, CSV, or Excel.

### Output fields

| Field | Description |
|-------|-------------|
| gbifID | Globally unique GBIF occurrence key |
| scientificName | Original scientific name as recorded |
| acceptedScientificName | Currently accepted name (post-taxonomic backbone match) |
| taxonRank | Rank (species, subspecies, etc.) |
| kingdom / phylum / class / order / family / genus / species | Linnaean classification |
| country / stateProvince / locality | Geographic context |
| decimalLatitude / decimalLongitude | WGS84 coordinates |
| coordinateUncertaintyInMeters | Reported positional uncertainty |
| eventDate / year / month / day | When the observation occurred |
| basisOfRecord | HUMAN_OBSERVATION, PRESERVED_SPECIMEN, etc. |
| individualCount | Number of individuals recorded |
| recordedBy | Collector/observer name |
| institutionCode / collectionCode / catalogNumber | Specimen provenance |
| datasetKey / datasetName | Source dataset attribution |
| publishingOrgKey / publishingCountry | Dataset publisher info |
| license | CC license terms |
| issues | GBIF data-quality flags |
| gbifUrl | Direct link to the occurrence detail page |

### Pricing

This Actor uses **pay-per-result** pricing. The first results are typically returned in under a minute. GBIF imposes no scraping costs or rate limits for the bulk search endpoint within polite use.

### Example input

```json
{
  "scientificName": "Panthera leo",
  "country": "KE",
  "yearFrom": 2018,
  "hasCoordinate": true,
  "maxResults": 1000
}
````

### Example output (single record)

```json
{
  "gbifID": 4011982334,
  "scientificName": "Panthera leo (Linnaeus, 1758)",
  "kingdom": "Animalia",
  "family": "Felidae",
  "country": "Kenya",
  "decimalLatitude": -1.4063,
  "decimalLongitude": 35.0094,
  "eventDate": "2023-08-14",
  "basisOfRecord": "HUMAN_OBSERVATION",
  "datasetName": "iNaturalist Research-grade Observations"
}
```

### FAQ

**Q: Is this legal?** Yes. GBIF aggregates data under Creative Commons licenses (CC0, CC BY, CC BY-NC). Each record carries its license field — respect it in downstream use.

**Q: What if I get fewer results than I asked for?** GBIF's `endOfRecords` flag signals the end of the result set for your filters. Broaden your query (remove country, expand year range) to retrieve more.

**Q: Can I request specific occurrence IDs?** This Actor uses the search endpoint. For single-record lookups, GBIF's `/occurrence/{key}` endpoint is best called directly.

**Q: Coordinate uncertainty seems high — is that a bug?** No. Many citizen-science records have ±1,000–10,000 m uncertainty. Use the `coordinateUncertaintyInMeters` field to filter to research-grade points.

### Legal disclaimer

This Actor accesses the public GBIF API in accordance with GBIF's published terms. Output records remain subject to their original Creative Commons licenses — attribute the source dataset and publisher when reusing data. This Actor is not affiliated with or endorsed by GBIF. For commercial reuse of large data volumes, consult GBIF's [citation guidelines](https://www.gbif.org/citation-guidelines).

# Actor input Schema

## `scientificName` (type: `string`):

Filter by full or partial scientific name (e.g. 'Panthera leo', 'Quercus alba').

## `country` (type: `string`):

Two-letter country code to filter occurrences (e.g. 'US', 'BR', 'DE'). Leave blank for all countries.

## `basisOfRecord` (type: `string`):

Filter by record type. Examples: HUMAN\_OBSERVATION, PRESERVED\_SPECIMEN, MACHINE\_OBSERVATION, MATERIAL\_SAMPLE. Leave blank for all.

## `yearFrom` (type: `integer`):

Earliest observation year (e.g. 2020).

## `yearTo` (type: `integer`):

Latest observation year (inclusive).

## `hasCoordinate` (type: `boolean`):

If enabled, only return occurrences with valid latitude/longitude.

## `maxResults` (type: `integer`):

Maximum number of occurrence records to return. The GBIF API caps at 100,000; use a sample size that matches your need.

## Actor input object example

```json
{
  "scientificName": "",
  "country": "",
  "basisOfRecord": "",
  "hasCoordinate": true,
  "maxResults": 500
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("compute-edge/gbif-occurrences-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("compute-edge/gbif-occurrences-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call compute-edge/gbif-occurrences-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=compute-edge/gbif-occurrences-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "GBIF Species Occurrence Scraper — Biodiversity Records API",
        "description": "Extract species occurrence records from GBIF (Global Biodiversity Information Facility). Filter by scientific name, country, date range, and basis of record. Outputs taxonomy, geolocation, and dataset attribution.",
        "version": "0.1",
        "x-build-id": "epQHscZYF7E8piEYb"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/compute-edge~gbif-occurrences-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-compute-edge-gbif-occurrences-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/compute-edge~gbif-occurrences-scraper/runs": {
            "post": {
                "operationId": "runs-sync-compute-edge-gbif-occurrences-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/compute-edge~gbif-occurrences-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-compute-edge-gbif-occurrences-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "scientificName": {
                        "title": "Scientific Name",
                        "type": "string",
                        "description": "Filter by full or partial scientific name (e.g. 'Panthera leo', 'Quercus alba').",
                        "default": ""
                    },
                    "country": {
                        "title": "Country (ISO 3166-1 alpha-2)",
                        "type": "string",
                        "description": "Two-letter country code to filter occurrences (e.g. 'US', 'BR', 'DE'). Leave blank for all countries.",
                        "default": ""
                    },
                    "basisOfRecord": {
                        "title": "Basis of Record",
                        "type": "string",
                        "description": "Filter by record type. Examples: HUMAN_OBSERVATION, PRESERVED_SPECIMEN, MACHINE_OBSERVATION, MATERIAL_SAMPLE. Leave blank for all.",
                        "default": ""
                    },
                    "yearFrom": {
                        "title": "Year From",
                        "minimum": 1700,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Earliest observation year (e.g. 2020)."
                    },
                    "yearTo": {
                        "title": "Year To",
                        "minimum": 1700,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Latest observation year (inclusive)."
                    },
                    "hasCoordinate": {
                        "title": "Only Records With Coordinates",
                        "type": "boolean",
                        "description": "If enabled, only return occurrences with valid latitude/longitude.",
                        "default": true
                    },
                    "maxResults": {
                        "title": "Max Results",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Maximum number of occurrence records to return. The GBIF API caps at 100,000; use a sample size that matches your need.",
                        "default": 500
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
