Maven Central Scraper | Java Package Metadata
Pricing
from $19.00 / 1,000 results
Maven Central Scraper | Java Package Metadata
Extract Java and Kotlin artifacts from Maven Central including group ID, artifact ID, version history, dependencies, publisher, packaging, and license info. Audit JVM dependencies, track ecosystem trends, or feed developer security, SBOM, and intelligence tools at scale.
Pricing
from $19.00 / 1,000 results
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share

โ Maven Central Repository Scraper
๐ Export Java and JVM packages from Maven Central in seconds. Search by keyword, filter by groupId, and download artifact metadata - groupId, artifactId, version, packaging, version history, and timestamps - without touching a browser or writing a single parser.
๐ Last updated: 2026-05-21 ยท ๐ 9 fields per record ยท โ 500,000+ artifacts ยท ๐ Maven Central ยท ๐ No auth required
The Maven Central Scraper queries the official Maven Central Solr search API and returns structured metadata for every matching Java and JVM package. Each record includes the full Maven coordinates (groupId, artifactId, latestVersion), packaging type, version count, repository origin, and last-updated timestamp.
Maven Central is the primary public repository for JVM ecosystem packages - the definitive source for Spring, Jackson, Hibernate, Guava, Apache Commons, and hundreds of thousands of other open-source libraries. This Actor makes the entire catalog searchable and downloadable as JSON, CSV, or Excel without any setup.
| ๐ฏ Target Audience | ๐ก Primary Use Cases |
|---|---|
| Java developers, security teams, DevOps engineers, data analysts, OSS researchers, enterprise architects | Dependency auditing, supply chain analysis, ecosystem research, package discovery, build tool integration, license compliance |
๐ What the Maven Central Scraper does
Five search workflows in a single run:
- ๐ Keyword search. Find all packages matching a library name, technology, or concept (e.g.
spring,jackson,logging,kafka). - ๐ท GroupId filter. Restrict to a specific Maven group like
org.springframework,com.fasterxml.jackson, ororg.apache.commons. - ๐ Combined search. Search by keyword within a specific groupId for targeted results.
- ๐ฆ Full catalog browse. Leave both fields empty to iterate through Maven Central's entire public artifact index.
- ๐ Version history. Every record includes
versionCount- the total number of published versions for that artifact.
Each record includes Maven coordinates, latest version, packaging type (jar, pom, bundle, aar), version count, repository ID, and the ISO timestamp of the last published version.
๐ก Why it matters: auditing your dependency tree means knowing what is actually published, when it was last updated, and how actively it is maintained. Building this yourself means maintaining a Solr client, handling pagination edge cases, and refreshing by hand. This Actor skips all of that.
๐ฌ Full Demo
๐ง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
โ๏ธ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
| searchQuery | string | "" | Keyword search (e.g. "spring", "jackson", "logging"). Empty = browse all packages. |
| groupId | string | "" | Filter by Maven groupId (e.g. "org.springframework"). Empty = all groups. |
| maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
Example: all Spring Framework packages.
{"groupId": "org.springframework","maxItems": 100}
Example: top Jackson packages.
{"searchQuery": "jackson","maxItems": 50}
โ ๏ธ Good to Know: Maven Central's Solr index is updated continuously as new versions are published.
versionCountreflects all historically published versions including snapshots promoted to release. ThelatestVersionfield shows the most recent stable release at time of scraping.
๐ Output
Each record contains 9 fields:
| Field | Type | Description |
|---|---|---|
๐ท groupId | string | Maven groupId (e.g. org.springframework) |
๐ฆ artifactId | string | Maven artifactId (e.g. spring-core) |
๐ url | string | Direct link to the artifact on search.maven.org |
๐ท latestVersion | string | Most recently published version |
๐ packaging | string | Artifact type: jar, pom, bundle, aar, etc. |
๐ข versionCount | integer | Total number of published versions |
๐ repositoryId | string | Repository origin (almost always central) |
๐
lastUpdated | string | ISO 8601 timestamp of the last published version |
๐ scrapedAt | string | ISO 8601 timestamp when the record was collected |
Sample records (real output):
[{"groupId": "org.apache.karaf.features","artifactId": "spring","url": "https://search.maven.org/artifact/org.apache.karaf.features/spring","latestVersion": "4.4.11","packaging": "pom","versionCount": 77,"repositoryId": "central","lastUpdated": "2026-04-27T13:48:38.000Z","scrapedAt": "2026-05-22T01:45:56.208Z","error": null},{"groupId": "au.com.dius.pact.provider","artifactId": "spring","url": "https://search.maven.org/artifact/au.com.dius.pact.provider/spring","latestVersion": "4.7.0-beta.1","packaging": "jar","versionCount": 152,"repositoryId": "central","lastUpdated": "2025-05-22T23:43:20.428Z","scrapedAt": "2026-05-22T01:45:56.208Z","error": null},{"groupId": "community.flock.wirespec.integration","artifactId": "spring","url": "https://search.maven.org/artifact/community.flock.wirespec.integration/spring","latestVersion": "0.14.11","packaging": "jar","versionCount": 22,"repositoryId": "central","lastUpdated": "2025-05-09T07:17:19.000Z","scrapedAt": "2026-05-22T01:45:56.208Z","error": null}]
โจ Why choose this Actor
| Feature | Benefit |
|---|---|
| ๐ No auth required | Works out of the box - no API keys, no Maven account needed |
| โก Direct Solr API | Queries Maven Central's native search engine for fast, accurate results |
| ๐ Flexible search | Keyword search, groupId filter, or both combined |
| ๐ Clean JSON output | Ready to load into BigQuery, Postgres, or any BI tool |
| ๐ข Version history | versionCount reveals how actively a library is maintained |
| ๐ Timestamps | lastUpdated shows exactly when each artifact was last published |
| ๐ฆ All packaging types | Covers jar, pom, bundle, aar, and all other Maven packaging formats |
| ๐ Full catalog | Access to 500,000+ artifacts across the entire JVM ecosystem |
๐ How it compares to alternatives
| Approach | Setup | Maintenance | Data freshness |
|---|---|---|---|
| โ This Actor | Zero - click Run | Automatic | Real-time from Solr |
| Manual curl scripts | Write pagination logic | Break on API changes | Manual refresh |
| Maven REST API | Custom client code | Maintain yourself | Manual refresh |
| Third-party package DBs | Varies | Varies | Often delayed |
๐ How to use
- Create a free account w/ $5 credit on Apify.
- Open the Maven Central Scraper Actor page.
- Enter a
searchQuery(e.g.spring) orgroupId(e.g.org.springframework). - Set
maxItems- free plan gives 10, paid plan up to 1,000,000. - Click Run and wait a few seconds.
- Download your dataset as JSON, CSV, Excel, or XML.
๐ผ Business use cases
Dependency auditing and security
Security and DevOps teams use Maven Central data to audit which versions of a library are published, identify abandoned packages (low versionCount or old lastUpdated), and flag dependencies that have not received updates in over 12 months.
Supply chain analysis
Engineering leads can map the entire dependency graph of a technology stack by searching groupIds like org.springframework, com.google.guava, or io.netty. Knowing how many versions exist and when they were last updated surfaces maintenance risk before it becomes a production incident.
Open-source ecosystem research
Academic researchers and analyst firms use Maven Central data to study Java ecosystem trends - which frameworks are gaining adoption, which are declining, and how publishing velocity correlates with community health metrics.
Build tool and IDE integration
Platform teams can feed Maven Central metadata into internal developer portals, Renovate/Dependabot dashboards, or package recommendation engines to help developers discover well-maintained alternatives to deprecated libraries.
๐ Automating Maven Central Scraper
Connect this Actor to thousands of tools without writing code:
- Make (Integromat) - Trigger a run on schedule, send results to Google Sheets or Slack.
- Zapier - Connect to your ticketing system or internal wiki whenever new packages are found.
- Slack - Pipe weekly dependency audit summaries directly to your engineering channel.
- Apify Scheduler - Run daily or weekly to track ecosystem changes over time.
- Webhooks - POST results to any internal endpoint when the run completes.
๐ Beyond business use cases
Research and academia
Study how the JVM ecosystem has evolved by tracking version counts and publication timestamps across thousands of libraries. Identify periods of high activity, framework migrations, and the lifecycle of open-source Java projects.
Non-profit and open source
Open-source maintainers can audit competing implementations of a library concept, identify gaps in the ecosystem, and ensure their artifact metadata is consistent with similar packages.
Education and training
Java instructors and bootcamp curricula can use real Maven Central data to teach students about dependency management, semantic versioning, and the structure of the JVM package ecosystem without setting up a local Maven repository.
Experimentation and prototyping
Data engineers can prototype dependency graph visualizations, build custom package search UIs, or create internal developer tooling using live Maven Central data without building a Solr integration from scratch.
๐ค Ask an AI assistant about this scraper
Not sure which inputs to use? Paste this into any AI assistant:
"I want to scrape Maven Central packages using the ParseForge Maven Central Scraper on Apify. The inputs are:
searchQuery(keyword),groupId(Maven group filter), andmaxItems. Help me build an input for [your use case]."
โ Frequently Asked Questions
โ Do I need a Maven account or API key? No. Maven Central's search API is fully public. This Actor works without any credentials.
โ How many packages does Maven Central have? Maven Central hosts over 500,000 unique artifacts from more than 50,000 groupIds. The index is updated in near-real-time as new versions are published.
โ What does versionCount mean?
It is the total number of distinct versions published for that artifact in Maven Central, including all historical releases.
โ Can I filter by packaging type (jar vs pom)?
The current version returns all packaging types. The packaging field in the output lets you filter results client-side after downloading.
โ How fresh is the data?
Results come directly from Maven Central's live Solr index. The lastUpdated field reflects the exact timestamp of the most recently published version.
โ Can I scrape an entire groupId like org.springframework?
Yes. Set groupId to org.springframework (or any other group) and set maxItems to however many you need. The scraper will paginate through all matching artifacts.
โ What is the difference between groupId and searchQuery?
groupId is an exact Maven namespace match (e.g. only artifacts in com.fasterxml.jackson). searchQuery is a full-text keyword search across artifact names and descriptions. You can use both together for narrower results.
โ Does this scrape private repositories? No. This Actor only queries Maven Central's public index. Private Nexus, Artifactory, or GitHub Packages repositories are not accessible.
โ How do I get all packages from a specific organization?
Use the groupId field with the organization's Maven namespace. For example, com.google.guava returns all Guava artifacts, while com.google would return all Google-published artifacts.
โ Is there a rate limit? Maven Central's public API is rate-limited per IP. This Actor includes a 300ms delay between pages to stay within polite limits. For large extractions, paid Apify plans include proxy rotation that further reduces throttling risk.
โ Can I schedule regular runs to track ecosystem changes? Yes. Use Apify Scheduler to run this Actor daily or weekly and compare datasets over time to track new releases, abandoned packages, and version trends.
๐ Integrate with any app
Download your dataset in any format and connect to:
JSON - CSV - Excel - XML - Google Sheets - BigQuery - Snowflake - PostgreSQL - MySQL - MongoDB - Airtable - Notion - Zapier - Make - Slack - Microsoft Teams - Power BI - Tableau - Looker - dbt - Airflow - Prefect
๐ Recommended Actors
| Actor | Description |
|---|---|
| PyPI Scraper | Scrape Python packages from the PyPI registry |
| NPM Registry Scraper | Extract JavaScript packages from the npm registry |
| NuGet Scraper | Download .NET package metadata from NuGet.org |
| Crates.io Scraper | Scrape Rust packages from crates.io |
| GitHub Trending Scraper | Track trending repositories across programming languages |
๐ก Pro Tip: browse the complete ParseForge collection for scrapers covering package registries, developer tools, job boards, and public datasets.
This Actor queries Maven Central's public search API. All data is publicly available at search.maven.org. This tool is intended for lawful research, analysis, and development purposes only.