GBIF Occurrence Search Scraper avatar

GBIF Occurrence Search Scraper

Pricing

from $18.00 / 1,000 result items

Go to Apify Store
GBIF Occurrence Search Scraper

GBIF Occurrence Search Scraper

Search 1.5B+ species occurrence records from the Global Biodiversity Information Facility. Filter by taxon, country, year, dataset, basis of record. Pulls coordinates, taxonomy, dates, recorder, IUCN status, license per record.

Pricing

from $18.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

🦋 GBIF Occurrence Search Scraper

🚀 Export species occurrence records in seconds. Filter 1.5 billion+ biodiversity observations and museum specimens by taxon, country, year, dataset, and basis of record. No API key, no registration, no SQL pipelines.

🕒 Last updated: 2026-05-22 · 📊 31 fields per record · 🦋 1.5B+ occurrences · 🌍 250+ countries · 🔬 10 record types

The GBIF Occurrence Search Scraper pulls structured species observation and specimen records from the Global Biodiversity Information Facility, returning 31 fields per record, including full taxonomy, coordinates, event date, recorder, license, and links back to the source occurrence page. GBIF aggregates field observations, museum specimens, and citizen-science contributions from thousands of natural history institutions and biodiversity networks worldwide.

The dataset covers every kingdom, more than 250 ISO country codes, and ten distinct basis-of-record categories ranging from preserved specimens to fossil records and human observations. This Actor turns that catalog into downloadable CSV, Excel, JSON, or XML files in under five minutes. Server-side filters do the heavy lifting, so you skip building your own ingestion pipeline.

🎯 Target Audience💡 Primary Use Cases
Ecologists, conservation NGOs, biodiversity researchers, GIS analysts, museum curators, environmental consultancies, agritech teamsRange maps, species distribution modeling, conservation status reports, red-list inputs, biosecurity feeds, ecology dashboards

📋 What the GBIF Occurrence Search Scraper does

Five filtering workflows in a single run:

  • 🌿 Taxonomy search. Pull every occurrence for a scientific name, taxon key, or rank.
  • 🌍 Country filter. Restrict to one or many ISO codes such as US, BR, KE, AU.
  • 📅 Date window. Limit to a year range, useful for trend analysis or recent surveys.
  • 📐 Coordinate filter. Keep only records with decimal latitude and longitude for mapping.
  • 🔬 Record type filter. Choose from human observations, machine observations, preserved specimens, fossils, living specimens, material samples, and more.

Each record includes the GBIF ID, full Linnaean hierarchy, scientific name, taxon rank, basis of record, country, locality, decimal coordinates, elevation, event date, recorder, dataset key, license, IUCN status when available, and the canonical occurrence URL.

💡 Why it matters: biodiversity data drives conservation funding, environmental impact reports, climate research, and legal compliance. Building your own GBIF pipeline means handling pagination, country-code lookups, and taxonomic backbone joins. This Actor skips all of that and refreshes the dataset on every run.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
taxonKeyintegernullGBIF taxon key. 212 Aves, 359 Mammalia, 6 Plantae.
scientificNamestring""Filter by scientific name such as Anas platyrhynchos.
datasetKeystring""Single publisher dataset UUID.
countryarray[]ISO 3166-1 alpha-2 codes. Empty list = worldwide.
yearFrom, yearTointegernullInclusive event year window.
hasCoordinatebooleanfalseKeep only records with decimal lat/lon.
basisOfRecordarray[]One or more of 10 record types.

Example: 200 recent bird observations across the United States.

{
"maxItems": 200,
"taxonKey": 212,
"country": ["US"],
"yearFrom": 2020,
"yearTo": 2024,
"hasCoordinate": true,
"basisOfRecord": ["HUMAN_OBSERVATION"]
}

Example: every preserved specimen of mallard duck.

{
"maxItems": 500,
"scientificName": "Anas platyrhynchos",
"basisOfRecord": ["PRESERVED_SPECIMEN"]
}

⚠️ Good to Know: GBIF aggregates from thousands of publishers. Coordinate precision varies by source. For ecological modeling, set hasCoordinate to true and inspect the issues array for known data-quality flags.


📊 Output

Each occurrence record contains up to 31 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 gbifIDstring"4011034398"
🔬 scientificNamestring"Anas platyrhynchos Linnaeus, 1758"
🦜 vernacularNamestring | null"Mallard"
👑 kingdomstring"Animalia"
🧬 phylumstring"Chordata"
🐦 classstring"Aves"
📚 orderstring"Anseriformes"
🪶 familystring"Anatidae"
🏷️ genusstring"Anas"
🧪 speciesstring"Anas platyrhynchos"
🔢 taxonRankstring"SPECIES"
taxonomicStatusstring"ACCEPTED"
📦 basisOfRecordstring"HUMAN_OBSERVATION"
🏳️ countrystring | null"United States"
🆔 countryCodestring | null"US"
🗺️ stateProvincestring | null"California"
📍 localitystring | null"Bolsa Chica Ecological Reserve"
📍 decimalLatitudenumber | null33.6849
📍 decimalLongitudenumber | null-118.0353
⛰️ elevationnumber | null5
🌊 depthnumber | null0
📅 eventDatestring | null"2024-05-12T08:30"
📆 year / month / daynumber | null2024 / 5 / 12
👤 recordedBystring | null"Jane Doe"
🆔 identifiedBystring | null"M. Smith"
🏛️ collectionCodestring | null"EBIRD"
🔖 catalogNumberstring | null"OBS123456"
📚 datasetKeystring"4fa7b334-ce0d-4e88-aaae-2e0c138d049e"
🏢 publishingOrgKeystring"e2e717bf-551a-4917-bdc9-4fa0f342c530"
⚖️ licensestring"CC_BY_4_0"
🛡️ iucnRedListCategorystring | null"LC"
⚠️ issuesstring[]["GEODETIC_DATUM_ASSUMED_WGS84"]
🔗 occurrenceUrlstring"https://www.gbif.org/occurrence/4011034398"
🕒 scrapedAtISO 8601"2026-05-22T10:00:00.000Z"

📦 Sample record


✨ Why choose this Actor

Capability
🌍Global coverage. 1.5 billion+ occurrence records across every continent and ocean basin.
🎯Multi-dimensional filtering. Combine taxon, country, year, dataset, coordinate, and basis-of-record filters in one run.
🧬Full Linnaean hierarchy. Kingdom through species in every record, ready for join with any external biodiversity store.
Fast. 10 records in under 5 seconds, 1,000 records in under a minute.
⚖️License-aware. Every record carries its publisher license so commercial use is unambiguous.
🔁Always fresh. Each run hits the live GBIF index, so the dataset reflects current contributions.
🚫No authentication. Works against the public GBIF index. No login or API key needed.

📊 Accurate biodiversity records are the foundation of conservation policy, species distribution models, and environmental impact assessments.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ GBIF Occurrence Search Scraper (this Actor)$5 free credit, then pay-per-use1.5B+ worldwideLive per runtaxon, country, year, dataset, basis⚡ 2 min
Manual GBIF portal exportsFreeFullDays for large requestsLimited UI filters🐢 Hours
In-house ETL on GBIF dumpsFree + infraFullQuarterly snapshotBuild it yourself🛠️ Weeks
Commercial biodiversity APIs$99+/monthSubsetDailyCurated only⏳ Hours

Pick this Actor when you want broad coverage, fresh data, and zero pipeline maintenance.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the GBIF Occurrence Search Scraper page on the Apify Store.
  3. 🎯 Set input. Pick a taxon, country, year range, and basis of record. Set maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🌳 Conservation NGOs

  • Species range refresh for protected-area planning
  • Red-list inputs and population trend baselines
  • Country-level occurrence reports for funders
  • Threat overlap maps with land-use change

🏛️ Museums & Research Institutes

  • Specimen network analyses across collections
  • Pull occurrence partners by datasetKey
  • Reconcile catalog numbers across publishers
  • Reference data for taxonomic revisions

🌾 Agritech & Biosecurity

  • Pest and pathogen distribution feeds
  • Pollinator presence mapping by region
  • Invasive species early-warning datasets
  • Crop-relative wild ancestor surveys

🏢 Environmental Consultancies

  • ESG and impact assessment inputs
  • Pre-construction biodiversity baselines
  • Mitigation hierarchy supporting datasets
  • Compliance reports for regulators and lenders

🔌 Automating GBIF Occurrence Search Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly or monthly pulls keep downstream species distribution models in sync automatically.


🌟 Beyond business use cases

Biodiversity records support far more than enterprise workflows. The same structured occurrences power research, education, civic projects, and citizen science.

🎓 Research and academia

  • Species distribution modeling for ecology PhDs
  • Reproducible datasets cited in peer-reviewed papers
  • GIS coursework with real coordinate data
  • Comparative phylogeography studies

🎨 Personal and creative

  • Nature blogs, birding maps, field guide apps
  • Wildlife photography location research
  • Educational visualizations of species ranges
  • Hobbyist range databases for collectors

🤝 Non-profit and civic

  • Local conservation group baseline reports
  • Citizen-science dashboards tied to occurrence data
  • Investigative journalism on biodiversity loss
  • Open-data contributions to OpenStreetMap nature layers

🧪 Experimentation

  • Train species classification ML models on labeled coords
  • Prototype agents that resolve scientific to common names
  • Build climate-impact dashboards with occurrence trends
  • Test ecology product hypotheses with real survey data

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Configure your taxon, country, year, and basis-of-record filters in the input form, click Start, and the Actor applies the filters server-side and emits a clean structured record per occurrence. No browser automation, no captchas, no setup.

📏 How accurate is the data?

GBIF aggregates from thousands of publishers, so accuracy varies by source. Each record exposes an issues array with known data-quality flags. For modeling work, set hasCoordinate to true and review issues before downstream use.

🔁 How often is the dataset refreshed?

The GBIF index ingests publisher updates continuously. Every run of this Actor pulls the latest records, so your dataset reflects the live catalog at run time.

🐦 Can I filter by specific species or taxa?

Yes. Use taxonKey to scope to a clade like Aves (212) or Mammalia (359), or use scientificName for an exact binomial such as Anas platyrhynchos.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (daily, weekly, monthly) and keep a downstream database in sync.

GBIF records carry per-publisher licenses (CC0, CC BY, CC BY-NC). The Actor exposes the license field on every record so you can filter for commercial use. Always credit GBIF and the dataset publisher per the license terms.

💼 Can I use this data commercially?

Yes, for records under CC0 or CC BY. Filter the dataset by the license field to exclude non-commercial publishers. You are responsible for attribution and license compliance.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

🔁 What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, you can inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved so you never lose progress.

🛰️ What if I need climate or environmental layers next to the occurrences?

This Actor returns occurrence records only. For climate, soil, or land-use overlays, reach out via the contact form below to request a companion environmental data scraper.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

GBIF Occurrence Search Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe occurrence data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh occurrence data into your model retraining loop, or alert your team in Slack.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Global Biodiversity Information Facility (GBIF) or any of its publishing institutions. All trademarks mentioned are the property of their respective owners. Only publicly available open biodiversity data is collected.