Paleobiology Database Fossils Scraper avatar

Paleobiology Database Fossils Scraper

Pricing

from $6.00 / 1,000 results

Go to Apify Store
Paleobiology Database Fossils Scraper

Paleobiology Database Fossils Scraper

Search the Paleobiology Database by taxon name and pull every fossil occurrence beneath it. Returns taxon, rank, occurrence and collection IDs, geologic interval, early and late age in millions of years, country, coordinates, and formation. Filter by interval or country.

Pricing

from $6.00 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

ParseForge Banner

🦴 Paleobiology Database Fossils Scraper

🚀 Pull fossil occurrence records in seconds. Search any taxon and get back where it was found, how old it is, and which collection it came from, straight from the Paleobiology Database (PBDB).

🕒 Last updated: 2026-06-05 · 📊 24 fields per record · Global coverage · Keyless public data service

The Paleobiology Database is a public, scientist-curated archive of the fossil record, holding millions of occurrence records contributed by paleontologists worldwide. This Actor queries the PBDB data service by taxonomic name and returns clean, structured fossil occurrence records ready for analysis.

Coverage: Every taxonomic level is searchable, from a single genus like Tyrannosaurus up to an entire class like Mammalia. Each occurrence carries its taxon identification, geologic age in millions of years, country and coordinates, and the rock formation it was recovered from. Filter by geologic interval (period, epoch, or age) and by country to focus a query.

🎯 Target Audience💡 Primary Use Cases
Paleontologists and geologistsBuild occurrence datasets for a taxon or clade
Researchers and studentsMap fossil distribution across time and place
Data scientists and educatorsFeed biodiversity and macroevolution models
Museum and collection staffCross-reference specimens with literature

📋 What the Paleobiology Database Fossils Scraper does

It calls the PBDB occurrences endpoint for a taxon name you provide and walks the returned records, returning one row per fossil occurrence. For every occurrence it captures the taxon name and rank, the occurrence and collection identifiers, the geologic interval and early/late age in millions of years (Ma), the reference identifier, the country, latitude and longitude, and the rock formation. Optional interval and country filters narrow the search.

🎬 Full Demo (🚧 Coming soon)

⚙️ Input

FieldTypeRequiredDescription
baseNamestringNo (defaults to Tyrannosaurus)Taxonomic name to search. Hierarchical, so a genus, family, or class returns every occurrence beneath it.
intervalstringNoNamed geologic interval such as Cretaceous, Maastrichtian, or Miocene.
countryselectNoRestrict to one country by ISO code (for example US, CA, MN).
maxItemsintegerNoFree users limited to 10. Paid users up to 1,000,000.

Example 1 — all Tyrannosaurus occurrences:

{
"baseName": "Tyrannosaurus",
"maxItems": 50
}

Example 2 — Cretaceous dinosaurs from the United States:

{
"baseName": "Dinosauria",
"interval": "Cretaceous",
"country": "US",
"maxItems": 200
}

⚠️ Good to Know: PBDB queries are hierarchical. A broad name like Mammalia or Trilobita can match tens of thousands of occurrences, so set maxItems to keep runs focused. Geographic fields such as formation, group, member, and county are filled only when the original collection recorded them, so they may be empty on some occurrences.

📊 Output

🏷 FieldDescription
🦴 taxonNameAccepted taxonomic name of the occurrence
🏷 taxonRankRank of the name (species, genus, family, and so on)
🔎 identifiedNameName as originally identified, when it differs
🆔 occurrenceIdPBDB occurrence identifier
🗂 collectionIdPBDB collection identifier
🧬 taxonIdPBDB taxon identifier
🧫 phylumPhylum classification
🐾 classClass classification
📚 orderOrder classification
👪 familyFamily classification
🦕 genusGenus classification
earlyIntervalEarliest geologic interval for the occurrence
lateIntervalLatest geologic interval, when bounded
earlyAgeMaEarly age boundary in millions of years
lateAgeMaLate age boundary in millions of years
🌍 countryCountry code where the fossil was found
🗺 stateState or province
🏘 countyCounty, when recorded
📍 latLatitude in decimal degrees
📍 lngLongitude in decimal degrees
🪨 formationGeologic formation
🗻 geologicGroupGeologic group, when recorded
🧱 memberGeologic member, when recorded
📖 referenceIdPBDB bibliographic reference identifier
🕒 scrapedAtTimestamp the record was collected

Real sample records:

[
{
"taxonName": "Tyrannosaurus rex",
"taxonRank": "species",
"occurrenceId": "occ:139292",
"collectionId": "col:11917",
"phylum": "Chordata",
"class": "Reptilia",
"family": "Tyrannosauridae",
"earlyInterval": "Late Maastrichtian",
"earlyAgeMa": 72.2,
"lateAgeMa": 66,
"country": "CA",
"state": "Alberta",
"formation": "Scollard",
"lat": 51.906399,
"lng": -113.0289,
"referenceId": "ref:4218"
},
{
"taxonName": "Tyrannosaurus rex",
"taxonRank": "species",
"occurrenceId": "occ:139293",
"collectionId": "col:11918",
"phylum": "Chordata",
"class": "Reptilia",
"family": "Tyrannosauridae",
"earlyInterval": "Late Maastrichtian",
"earlyAgeMa": 72.2,
"lateAgeMa": 66,
"country": "CA",
"state": "Alberta",
"formation": "Scollard",
"lat": 51.933334,
"lng": -113.23333,
"referenceId": "ref:4205"
},
{
"taxonName": "Tyrannosaurus rex",
"taxonRank": "species",
"occurrenceId": "occ:220009",
"collectionId": "col:22657",
"phylum": "Chordata",
"class": "Reptilia",
"family": "Tyrannosauridae",
"earlyInterval": "Late Campanian",
"earlyAgeMa": 83.6,
"lateAgeMa": 72.2,
"country": "CA",
"state": "Alberta",
"formation": "Dinosaur Park",
"lat": 50.727234,
"lng": -111.524582,
"referenceId": "ref:5721"
}
]

✨ Why choose this Actor

  • Direct line to the PBDB data service, the reference archive for the fossil record.
  • One clean row per occurrence, with classification, age, location, and formation already separated into fields.
  • Hierarchical taxon search, so one query can pull a genus or an entire class.
  • Interval and country filters to scope a study without post-processing.
  • Ages returned numerically in millions of years, ready for plotting and modeling.

📈 How it compares to alternatives

ApproachEffortStructured outputFilters
This ActorEnter a taxon nameYes, 24 fieldsInterval and country
Manual PBDB web downloadBuild query strings by handPartialManual
Copying from publicationsVery highNoNone

🚀 How to use

  1. Create a free Apify account using this sign-up link.
  2. Open the Paleobiology Database Fossils Scraper.
  3. Enter a taxon name in baseName, for example Triceratops, Canis, or Trilobita.
  4. Optionally set an interval and a country, then choose maxItems.
  5. Run the Actor and collect your fossil occurrence dataset.

💼 Business use cases

Research and academia

NeedHow it helps
Build a taxon occurrence datasetPull every record for a clade in one run
Study diversity over timeAge fields support range and turnover analysis

Education

NeedHow it helps
Teach the fossil recordReal occurrences with ages and locations
Student projectsReady data for maps and timelines

Collections and museums

NeedHow it helps
Cross-reference holdingsMatch specimens to PBDB collections
Trace literatureReference identifiers link back to sources

Data and analytics

NeedHow it helps
Feed biodiversity modelsClean, typed fields per occurrence
Build dashboardsCoordinates and ages drive maps and charts

🔌 Automating Paleobiology Database Fossils Scraper

Connect runs to Make, Zapier, Slack, Airbyte, GitHub Actions, or Google Drive through the Apify API and integrations to schedule queries and route fresh occurrence data wherever your team works.

🌟 Beyond business use cases

  • Research: assemble occurrence sets for macroevolution and biogeography studies.
  • Personal: explore where your favorite prehistoric animals once lived.
  • Non-profit: support science outreach and museum education programs.
  • Experimentation: prototype paleo data visualizations and maps.

🤖 Ask an AI assistant

Paste your dataset into ChatGPT, Claude, Perplexity, or Microsoft Copilot and ask it to summarize age ranges, cluster occurrences by formation, or draft a methods paragraph.

❓ Frequently Asked Questions

Is the data official? Yes, it comes from the Paleobiology Database public data service, curated by paleontologists.

Do I need an API key? No. The PBDB data service is public and keyless.

What does baseName accept? Any taxonomic name. The search is hierarchical and includes everything below the name.

How do the age fields work? earlyAgeMa and lateAgeMa give the older and younger boundaries of the occurrence in millions of years.

Why are some location fields empty? Formation, group, member, and county appear only when the original collection recorded them.

Can I filter by time? Yes, set interval to a period, epoch, or age such as Jurassic or Maastrichtian.

Can I filter by place? Yes, set country to an ISO code such as US, CN, or MN.

How many records can I get? Free runs return up to 10 records. Paid plans return up to 1,000,000.

What if a taxon has no occurrences? The run finishes with no records and a note to broaden the search.

Can I run it on a schedule? Yes, use Apify scheduling or any connected automation platform.

🔌 Integrate with any app

Use the Apify API, webhooks, and native integrations to push occurrence data into your own pipelines, databases, and notebooks.

💡 Pro Tip: browse the complete ParseForge collection.

🆘 Need Help? Open our contact form

⚠️ Disclaimer: independent tool, not affiliated with the Paleobiology Database. Only publicly available data is collected.