Pricing

from $19.00 / 1,000 results

Maven Central Scraper | Java Package Metadata

Extract Java and Kotlin artifacts from Maven Central including group ID, artifact ID, version history, dependencies, publisher, packaging, and license info. Audit JVM dependencies, track ecosystem trends, or feed developer security, SBOM, and intelligence tools at scale.

Pricing

from $19.00 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

☕ Maven Central Repository Scraper

🚀 Export Java and JVM packages from Maven Central in seconds. Search by keyword, filter by groupId, and download artifact metadata - groupId, artifactId, version, packaging, version history, and timestamps - without touching a browser or writing a single parser.

🕒 Last updated: 2026-05-21 · 📊 9 fields per record · ☕ 500,000+ artifacts · 🌐 Maven Central · 🔓 No auth required

The Maven Central Scraper queries the official Maven Central Solr search API and returns structured metadata for every matching Java and JVM package. Each record includes the full Maven coordinates (groupId, artifactId, latestVersion), packaging type, version count, repository origin, and last-updated timestamp.

Maven Central is the primary public repository for JVM ecosystem packages - the definitive source for Spring, Jackson, Hibernate, Guava, Apache Commons, and hundreds of thousands of other open-source libraries. This Actor makes the entire catalog searchable and downloadable as JSON, CSV, or Excel without any setup.

🎯 Target Audience	💡 Primary Use Cases
Java developers, security teams, DevOps engineers, data analysts, OSS researchers, enterprise architects	Dependency auditing, supply chain analysis, ecosystem research, package discovery, build tool integration, license compliance

📋 What the Maven Central Scraper does

Five search workflows in a single run:

🔍 Keyword search. Find all packages matching a library name, technology, or concept (e.g. spring, jackson, logging, kafka).
🏷 GroupId filter. Restrict to a specific Maven group like org.springframework, com.fasterxml.jackson, or org.apache.commons.
🔀 Combined search. Search by keyword within a specific groupId for targeted results.
📦 Full catalog browse. Leave both fields empty to iterate through Maven Central's entire public artifact index.
📊 Version history. Every record includes versionCount - the total number of published versions for that artifact.

Each record includes Maven coordinates, latest version, packaging type (jar, pom, bundle, aar), version count, repository ID, and the ISO timestamp of the last published version.

💡 Why it matters: auditing your dependency tree means knowing what is actually published, when it was last updated, and how actively it is maintained. Building this yourself means maintaining a Solr client, handling pagination edge cases, and refreshing by hand. This Actor skips all of that.

🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.

⚙️ Input

Input	Type	Default	Behavior
searchQuery	string	""	Keyword search (e.g. "spring", "jackson", "logging"). Empty = browse all packages.
groupId	string	""	Filter by Maven groupId (e.g. "org.springframework"). Empty = all groups.
maxItems	integer	10	Records to return. Free plan caps at 10, paid plan at 1,000,000.

Example: all Spring Framework packages.

{
    "groupId": "org.springframework",
    "maxItems": 100
}

Example: top Jackson packages.

{
    "searchQuery": "jackson",
    "maxItems": 50
}

⚠️ Good to Know: Maven Central's Solr index is updated continuously as new versions are published. versionCount reflects all historically published versions including snapshots promoted to release. The latestVersion field shows the most recent stable release at time of scraping.

📊 Output

Each record contains 9 fields:

Field	Type	Description
🏷 `groupId`	string	Maven groupId (e.g. `org.springframework`)
📦 `artifactId`	string	Maven artifactId (e.g. `spring-core`)
🔗 `url`	string	Direct link to the artifact on search.maven.org
🏷 `latestVersion`	string	Most recently published version
📁 `packaging`	string	Artifact type: jar, pom, bundle, aar, etc.
🔢 `versionCount`	integer	Total number of published versions
🗄 `repositoryId`	string	Repository origin (almost always `central`)
📅 `lastUpdated`	string	ISO 8601 timestamp of the last published version
🕒 `scrapedAt`	string	ISO 8601 timestamp when the record was collected

Sample records (real output):

[
  {
    "groupId": "org.apache.karaf.features",
    "artifactId": "spring",
    "url": "https://search.maven.org/artifact/org.apache.karaf.features/spring",
    "latestVersion": "4.4.11",
    "packaging": "pom",
    "versionCount": 77,
    "repositoryId": "central",
    "lastUpdated": "2026-04-27T13:48:38.000Z",
    "scrapedAt": "2026-05-22T01:45:56.208Z",
    "error": null
  },
  {
    "groupId": "au.com.dius.pact.provider",
    "artifactId": "spring",
    "url": "https://search.maven.org/artifact/au.com.dius.pact.provider/spring",
    "latestVersion": "4.7.0-beta.1",
    "packaging": "jar",
    "versionCount": 152,
    "repositoryId": "central",
    "lastUpdated": "2025-05-22T23:43:20.428Z",
    "scrapedAt": "2026-05-22T01:45:56.208Z",
    "error": null
  },
  {
    "groupId": "community.flock.wirespec.integration",
    "artifactId": "spring",
    "url": "https://search.maven.org/artifact/community.flock.wirespec.integration/spring",
    "latestVersion": "0.14.11",
    "packaging": "jar",
    "versionCount": 22,
    "repositoryId": "central",
    "lastUpdated": "2025-05-09T07:17:19.000Z",
    "scrapedAt": "2026-05-22T01:45:56.208Z",
    "error": null
  }
]

✨ Why choose this Actor

Feature	Benefit
🔓 No auth required	Works out of the box - no API keys, no Maven account needed
⚡ Direct Solr API	Queries Maven Central's native search engine for fast, accurate results
🔍 Flexible search	Keyword search, groupId filter, or both combined
📄 Clean JSON output	Ready to load into BigQuery, Postgres, or any BI tool
🔢 Version history	`versionCount` reveals how actively a library is maintained
🕒 Timestamps	`lastUpdated` shows exactly when each artifact was last published
📦 All packaging types	Covers jar, pom, bundle, aar, and all other Maven packaging formats
🌐 Full catalog	Access to 500,000+ artifacts across the entire JVM ecosystem

📈 How it compares to alternatives

Approach	Setup	Maintenance	Data freshness
☕ This Actor	Zero - click Run	Automatic	Real-time from Solr
Manual curl scripts	Write pagination logic	Break on API changes	Manual refresh
Maven REST API	Custom client code	Maintain yourself	Manual refresh
Third-party package DBs	Varies	Varies	Often delayed

🚀 How to use

Create a free account w/ $5 credit on Apify.
Open the Maven Central Scraper Actor page.
Enter a searchQuery (e.g. spring) or groupId (e.g. org.springframework).
Set maxItems - free plan gives 10, paid plan up to 1,000,000.
Click Run and wait a few seconds.
Download your dataset as JSON, CSV, Excel, or XML.

💼 Business use cases

Dependency auditing and security

Security and DevOps teams use Maven Central data to audit which versions of a library are published, identify abandoned packages (low versionCount or old lastUpdated), and flag dependencies that have not received updates in over 12 months.

Supply chain analysis

Engineering leads can map the entire dependency graph of a technology stack by searching groupIds like org.springframework, com.google.guava, or io.netty. Knowing how many versions exist and when they were last updated surfaces maintenance risk before it becomes a production incident.

Open-source ecosystem research

Academic researchers and analyst firms use Maven Central data to study Java ecosystem trends - which frameworks are gaining adoption, which are declining, and how publishing velocity correlates with community health metrics.

Build tool and IDE integration

Platform teams can feed Maven Central metadata into internal developer portals, Renovate/Dependabot dashboards, or package recommendation engines to help developers discover well-maintained alternatives to deprecated libraries.

🔌 Automating Maven Central Scraper

Connect this Actor to thousands of tools without writing code:

Make (Integromat) - Trigger a run on schedule, send results to Google Sheets or Slack.
Zapier - Connect to your ticketing system or internal wiki whenever new packages are found.
Slack - Pipe weekly dependency audit summaries directly to your engineering channel.
Apify Scheduler - Run daily or weekly to track ecosystem changes over time.
Webhooks - POST results to any internal endpoint when the run completes.

🌟 Beyond business use cases

Research and academia

Study how the JVM ecosystem has evolved by tracking version counts and publication timestamps across thousands of libraries. Identify periods of high activity, framework migrations, and the lifecycle of open-source Java projects.

Non-profit and open source

Open-source maintainers can audit competing implementations of a library concept, identify gaps in the ecosystem, and ensure their artifact metadata is consistent with similar packages.

Education and training

Java instructors and bootcamp curricula can use real Maven Central data to teach students about dependency management, semantic versioning, and the structure of the JVM package ecosystem without setting up a local Maven repository.

Experimentation and prototyping

Data engineers can prototype dependency graph visualizations, build custom package search UIs, or create internal developer tooling using live Maven Central data without building a Solr integration from scratch.

🤖 Ask an AI assistant about this scraper

Not sure which inputs to use? Paste this into any AI assistant:

"I want to scrape Maven Central packages using the ParseForge Maven Central Scraper on Apify. The inputs are: searchQuery (keyword), groupId (Maven group filter), and maxItems. Help me build an input for [your use case]."

❓ Frequently Asked Questions

❓ Do I need a Maven account or API key? No. Maven Central's search API is fully public. This Actor works without any credentials.

❓ How many packages does Maven Central have? Maven Central hosts over 500,000 unique artifacts from more than 50,000 groupIds. The index is updated in near-real-time as new versions are published.

❓ What does versionCount mean? It is the total number of distinct versions published for that artifact in Maven Central, including all historical releases.

❓ Can I filter by packaging type (jar vs pom)? The current version returns all packaging types. The packaging field in the output lets you filter results client-side after downloading.

❓ How fresh is the data? Results come directly from Maven Central's live Solr index. The lastUpdated field reflects the exact timestamp of the most recently published version.

❓ Can I scrape an entire groupId like org.springframework? Yes. Set groupId to org.springframework (or any other group) and set maxItems to however many you need. The scraper will paginate through all matching artifacts.

❓ What is the difference between groupId and searchQuery? groupId is an exact Maven namespace match (e.g. only artifacts in com.fasterxml.jackson). searchQuery is a full-text keyword search across artifact names and descriptions. You can use both together for narrower results.

❓ Does this scrape private repositories? No. This Actor only queries Maven Central's public index. Private Nexus, Artifactory, or GitHub Packages repositories are not accessible.

❓ How do I get all packages from a specific organization? Use the groupId field with the organization's Maven namespace. For example, com.google.guava returns all Guava artifacts, while com.google would return all Google-published artifacts.

❓ Is there a rate limit? Maven Central's public API is rate-limited per IP. This Actor includes a 300ms delay between pages to stay within polite limits. For large extractions, paid Apify plans include proxy rotation that further reduces throttling risk.

❓ Can I schedule regular runs to track ecosystem changes? Yes. Use Apify Scheduler to run this Actor daily or weekly and compare datasets over time to track new releases, abandoned packages, and version trends.

🔌 Integrate with any app

Download your dataset in any format and connect to:

JSON - CSV - Excel - XML - Google Sheets - BigQuery - Snowflake - PostgreSQL - MySQL - MongoDB - Airtable - Notion - Zapier - Make - Slack - Microsoft Teams - Power BI - Tableau - Looker - dbt - Airflow - Prefect

🔗 Recommended Actors

Actor	Description
PyPI Scraper	Scrape Python packages from the PyPI registry
NPM Registry Scraper	Extract JavaScript packages from the npm registry
NuGet Scraper	Download .NET package metadata from NuGet.org
Crates.io Scraper	Scrape Rust packages from crates.io
GitHub Trending Scraper	Track trending repositories across programming languages

💡 Pro Tip: browse the complete ParseForge collection for scrapers covering package registries, developer tools, job boards, and public datasets.

This Actor queries Maven Central's public search API. All data is publicly available at search.maven.org. This tool is intended for lawful research, analysis, and development purposes only.

Maven Central Scraper - Java Package Metadata

benthepythondev/maven-central-scraper

Scrape Maven Central Java artifacts with group ID, artifact ID, latest version, packaging, timestamp and version count.

ben

Maven Central Scraper

fortuitous_pirate/maven-central-scraper

Scrape Maven Central: 500K+ Java/JVM packages with version history, release dates, and artifact metadata. Free, no auth required.

Fortuitous Pirate

Maven Central Scraper — Java Package & Artifact Extractor

klondikeking/maven-central-scraper

Search and extract Java package metadata from Maven Central Repository. Get artifact details, versions, timestamps, and dependency info via the public Solr API.

Pierrick McD0nald

Maven Central Vendor Leads Scraper

gocreative.ai/maven-central-vendor-leads

Discover B2B companies publishing Java/JVM libraries on Maven Central. Extracts company domains, artifact IDs, version history, and activity signals for dev-tool sales prospecting.

GoCreative AI

NPM Scraper

muscular_quadruplet/npm-scraper

Scrape NPM package data. Get downloads, versions, dependencies, maintainers. Analyze JavaScript ecosystem trends, track package popularity, monitor dependencies. Build developer tools.

Do It

Bioconductor Package Metadata Scraper

klondikeking/bioconductor-package-scraper

Extract structured metadata from Bioconductor R package pages including version, license, dependencies, biocViews, and reverse dependencies.

Pierrick McD0nald

NPM Package Stats Scraper. Downloads, Versions, Dependencies

seemuapps/npm-package-stats-scraper

Get download counts, version history, dependencies, license, repo, and maintainer info for any npm package. Bulk-process a list of packages in one run.

Andrew

Maven Courses & Instructors Scraper

crawlerbros/maven-courses-scraper

Scrape Maven (maven.com) cohort-based courses and instructor profiles. Search by keyword, browse by category, get full course details or instructor profiles with all courses.

Crawler Bros

Pypi Package Scraper

openclawmara/pypi-package-scraper

Scrape PyPI the Python Package Index. Extract package metadata, download statistics, version history, dependencies, and maintainer info. Track new releases and popularity trends. Perfect for Python ecosystem analysis and package research.

OpenClaw Mara

Hex.pm Scraper | Elixir and Erlang Packages

parseforge/hex-pm-scraper

Scrape Hex.pm package data including name, version history, downloads, dependencies, owners, repository links, licenses, and release dates. Track the Elixir and Erlang ecosystem, audit dependencies, or build BEAM developer intelligence and security tools for production projects.