Maven Central Scraper | Java Package Metadata avatar

Maven Central Scraper | Java Package Metadata

Pricing

from $19.00 / 1,000 results

Go to Apify Store
Maven Central Scraper | Java Package Metadata

Maven Central Scraper | Java Package Metadata

Extract Java and Kotlin artifacts from Maven Central including group ID, artifact ID, version history, dependencies, publisher, packaging, and license info. Audit JVM dependencies, track ecosystem trends, or feed developer security, SBOM, and intelligence tools at scale.

Pricing

from $19.00 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

ParseForge Banner

โ˜• Maven Central Repository Scraper

๐Ÿš€ Export Java and JVM packages from Maven Central in seconds. Search by keyword, filter by groupId, and download artifact metadata - groupId, artifactId, version, packaging, version history, and timestamps - without touching a browser or writing a single parser.

๐Ÿ•’ Last updated: 2026-05-21 ยท ๐Ÿ“Š 9 fields per record ยท โ˜• 500,000+ artifacts ยท ๐ŸŒ Maven Central ยท ๐Ÿ”“ No auth required

The Maven Central Scraper queries the official Maven Central Solr search API and returns structured metadata for every matching Java and JVM package. Each record includes the full Maven coordinates (groupId, artifactId, latestVersion), packaging type, version count, repository origin, and last-updated timestamp.

Maven Central is the primary public repository for JVM ecosystem packages - the definitive source for Spring, Jackson, Hibernate, Guava, Apache Commons, and hundreds of thousands of other open-source libraries. This Actor makes the entire catalog searchable and downloadable as JSON, CSV, or Excel without any setup.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Java developers, security teams, DevOps engineers, data analysts, OSS researchers, enterprise architectsDependency auditing, supply chain analysis, ecosystem research, package discovery, build tool integration, license compliance

๐Ÿ“‹ What the Maven Central Scraper does

Five search workflows in a single run:

  • ๐Ÿ” Keyword search. Find all packages matching a library name, technology, or concept (e.g. spring, jackson, logging, kafka).
  • ๐Ÿท GroupId filter. Restrict to a specific Maven group like org.springframework, com.fasterxml.jackson, or org.apache.commons.
  • ๐Ÿ”€ Combined search. Search by keyword within a specific groupId for targeted results.
  • ๐Ÿ“ฆ Full catalog browse. Leave both fields empty to iterate through Maven Central's entire public artifact index.
  • ๐Ÿ“Š Version history. Every record includes versionCount - the total number of published versions for that artifact.

Each record includes Maven coordinates, latest version, packaging type (jar, pom, bundle, aar), version count, repository ID, and the ISO timestamp of the last published version.

๐Ÿ’ก Why it matters: auditing your dependency tree means knowing what is actually published, when it was last updated, and how actively it is maintained. Building this yourself means maintaining a Solr client, handling pagination edge cases, and refreshing by hand. This Actor skips all of that.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
searchQuerystring""Keyword search (e.g. "spring", "jackson", "logging"). Empty = browse all packages.
groupIdstring""Filter by Maven groupId (e.g. "org.springframework"). Empty = all groups.
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.

Example: all Spring Framework packages.

{
"groupId": "org.springframework",
"maxItems": 100
}

Example: top Jackson packages.

{
"searchQuery": "jackson",
"maxItems": 50
}

โš ๏ธ Good to Know: Maven Central's Solr index is updated continuously as new versions are published. versionCount reflects all historically published versions including snapshots promoted to release. The latestVersion field shows the most recent stable release at time of scraping.


๐Ÿ“Š Output

Each record contains 9 fields:

FieldTypeDescription
๐Ÿท groupIdstringMaven groupId (e.g. org.springframework)
๐Ÿ“ฆ artifactIdstringMaven artifactId (e.g. spring-core)
๐Ÿ”— urlstringDirect link to the artifact on search.maven.org
๐Ÿท latestVersionstringMost recently published version
๐Ÿ“ packagingstringArtifact type: jar, pom, bundle, aar, etc.
๐Ÿ”ข versionCountintegerTotal number of published versions
๐Ÿ—„ repositoryIdstringRepository origin (almost always central)
๐Ÿ“… lastUpdatedstringISO 8601 timestamp of the last published version
๐Ÿ•’ scrapedAtstringISO 8601 timestamp when the record was collected

Sample records (real output):

[
{
"groupId": "org.apache.karaf.features",
"artifactId": "spring",
"url": "https://search.maven.org/artifact/org.apache.karaf.features/spring",
"latestVersion": "4.4.11",
"packaging": "pom",
"versionCount": 77,
"repositoryId": "central",
"lastUpdated": "2026-04-27T13:48:38.000Z",
"scrapedAt": "2026-05-22T01:45:56.208Z",
"error": null
},
{
"groupId": "au.com.dius.pact.provider",
"artifactId": "spring",
"url": "https://search.maven.org/artifact/au.com.dius.pact.provider/spring",
"latestVersion": "4.7.0-beta.1",
"packaging": "jar",
"versionCount": 152,
"repositoryId": "central",
"lastUpdated": "2025-05-22T23:43:20.428Z",
"scrapedAt": "2026-05-22T01:45:56.208Z",
"error": null
},
{
"groupId": "community.flock.wirespec.integration",
"artifactId": "spring",
"url": "https://search.maven.org/artifact/community.flock.wirespec.integration/spring",
"latestVersion": "0.14.11",
"packaging": "jar",
"versionCount": 22,
"repositoryId": "central",
"lastUpdated": "2025-05-09T07:17:19.000Z",
"scrapedAt": "2026-05-22T01:45:56.208Z",
"error": null
}
]

โœจ Why choose this Actor

FeatureBenefit
๐Ÿ”“ No auth requiredWorks out of the box - no API keys, no Maven account needed
โšก Direct Solr APIQueries Maven Central's native search engine for fast, accurate results
๐Ÿ” Flexible searchKeyword search, groupId filter, or both combined
๐Ÿ“„ Clean JSON outputReady to load into BigQuery, Postgres, or any BI tool
๐Ÿ”ข Version historyversionCount reveals how actively a library is maintained
๐Ÿ•’ TimestampslastUpdated shows exactly when each artifact was last published
๐Ÿ“ฆ All packaging typesCovers jar, pom, bundle, aar, and all other Maven packaging formats
๐ŸŒ Full catalogAccess to 500,000+ artifacts across the entire JVM ecosystem

๐Ÿ“ˆ How it compares to alternatives

ApproachSetupMaintenanceData freshness
โ˜• This ActorZero - click RunAutomaticReal-time from Solr
Manual curl scriptsWrite pagination logicBreak on API changesManual refresh
Maven REST APICustom client codeMaintain yourselfManual refresh
Third-party package DBsVariesVariesOften delayed

๐Ÿš€ How to use

  1. Create a free account w/ $5 credit on Apify.
  2. Open the Maven Central Scraper Actor page.
  3. Enter a searchQuery (e.g. spring) or groupId (e.g. org.springframework).
  4. Set maxItems - free plan gives 10, paid plan up to 1,000,000.
  5. Click Run and wait a few seconds.
  6. Download your dataset as JSON, CSV, Excel, or XML.

๐Ÿ’ผ Business use cases

Dependency auditing and security

Security and DevOps teams use Maven Central data to audit which versions of a library are published, identify abandoned packages (low versionCount or old lastUpdated), and flag dependencies that have not received updates in over 12 months.

Supply chain analysis

Engineering leads can map the entire dependency graph of a technology stack by searching groupIds like org.springframework, com.google.guava, or io.netty. Knowing how many versions exist and when they were last updated surfaces maintenance risk before it becomes a production incident.

Open-source ecosystem research

Academic researchers and analyst firms use Maven Central data to study Java ecosystem trends - which frameworks are gaining adoption, which are declining, and how publishing velocity correlates with community health metrics.

Build tool and IDE integration

Platform teams can feed Maven Central metadata into internal developer portals, Renovate/Dependabot dashboards, or package recommendation engines to help developers discover well-maintained alternatives to deprecated libraries.


๐Ÿ”Œ Automating Maven Central Scraper

Connect this Actor to thousands of tools without writing code:

  • Make (Integromat) - Trigger a run on schedule, send results to Google Sheets or Slack.
  • Zapier - Connect to your ticketing system or internal wiki whenever new packages are found.
  • Slack - Pipe weekly dependency audit summaries directly to your engineering channel.
  • Apify Scheduler - Run daily or weekly to track ecosystem changes over time.
  • Webhooks - POST results to any internal endpoint when the run completes.

๐ŸŒŸ Beyond business use cases

Research and academia

Study how the JVM ecosystem has evolved by tracking version counts and publication timestamps across thousands of libraries. Identify periods of high activity, framework migrations, and the lifecycle of open-source Java projects.

Non-profit and open source

Open-source maintainers can audit competing implementations of a library concept, identify gaps in the ecosystem, and ensure their artifact metadata is consistent with similar packages.

Education and training

Java instructors and bootcamp curricula can use real Maven Central data to teach students about dependency management, semantic versioning, and the structure of the JVM package ecosystem without setting up a local Maven repository.

Experimentation and prototyping

Data engineers can prototype dependency graph visualizations, build custom package search UIs, or create internal developer tooling using live Maven Central data without building a Solr integration from scratch.


๐Ÿค– Ask an AI assistant about this scraper

Not sure which inputs to use? Paste this into any AI assistant:

"I want to scrape Maven Central packages using the ParseForge Maven Central Scraper on Apify. The inputs are: searchQuery (keyword), groupId (Maven group filter), and maxItems. Help me build an input for [your use case]."


โ“ Frequently Asked Questions

โ“ Do I need a Maven account or API key? No. Maven Central's search API is fully public. This Actor works without any credentials.

โ“ How many packages does Maven Central have? Maven Central hosts over 500,000 unique artifacts from more than 50,000 groupIds. The index is updated in near-real-time as new versions are published.

โ“ What does versionCount mean? It is the total number of distinct versions published for that artifact in Maven Central, including all historical releases.

โ“ Can I filter by packaging type (jar vs pom)? The current version returns all packaging types. The packaging field in the output lets you filter results client-side after downloading.

โ“ How fresh is the data? Results come directly from Maven Central's live Solr index. The lastUpdated field reflects the exact timestamp of the most recently published version.

โ“ Can I scrape an entire groupId like org.springframework? Yes. Set groupId to org.springframework (or any other group) and set maxItems to however many you need. The scraper will paginate through all matching artifacts.

โ“ What is the difference between groupId and searchQuery? groupId is an exact Maven namespace match (e.g. only artifacts in com.fasterxml.jackson). searchQuery is a full-text keyword search across artifact names and descriptions. You can use both together for narrower results.

โ“ Does this scrape private repositories? No. This Actor only queries Maven Central's public index. Private Nexus, Artifactory, or GitHub Packages repositories are not accessible.

โ“ How do I get all packages from a specific organization? Use the groupId field with the organization's Maven namespace. For example, com.google.guava returns all Guava artifacts, while com.google would return all Google-published artifacts.

โ“ Is there a rate limit? Maven Central's public API is rate-limited per IP. This Actor includes a 300ms delay between pages to stay within polite limits. For large extractions, paid Apify plans include proxy rotation that further reduces throttling risk.

โ“ Can I schedule regular runs to track ecosystem changes? Yes. Use Apify Scheduler to run this Actor daily or weekly and compare datasets over time to track new releases, abandoned packages, and version trends.


๐Ÿ”Œ Integrate with any app

Download your dataset in any format and connect to:

JSON - CSV - Excel - XML - Google Sheets - BigQuery - Snowflake - PostgreSQL - MySQL - MongoDB - Airtable - Notion - Zapier - Make - Slack - Microsoft Teams - Power BI - Tableau - Looker - dbt - Airflow - Prefect


ActorDescription
PyPI ScraperScrape Python packages from the PyPI registry
NPM Registry ScraperExtract JavaScript packages from the npm registry
NuGet ScraperDownload .NET package metadata from NuGet.org
Crates.io ScraperScrape Rust packages from crates.io
GitHub Trending ScraperTrack trending repositories across programming languages

๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for scrapers covering package registries, developer tools, job boards, and public datasets.


This Actor queries Maven Central's public search API. All data is publicly available at search.maven.org. This tool is intended for lawful research, analysis, and development purposes only.