🛡️ npm Dependency Tree Scraper avatar

🛡️ npm Dependency Tree Scraper

Pricing

from $8.00 / 1,000 results

Go to Apify Store
🛡️ npm Dependency Tree Scraper

🛡️ npm Dependency Tree Scraper

Scrape npm to map transitive dependency trees up to three levels deep. Extract exact license types, deprecation warnings, and active maintainer counts.

Pricing

from $8.00 / 1,000 results

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Categories

Share

📜 Open-Source License & Dependency Audit API

Modern software security demands strict oversight of supply chain vulnerabilities, making it essential to audit transitive dependency trees before deployment. This specialized npm dependency scraper evaluates packages directly from the web, allowing DevSecOps teams to search and map modules up to three levels deep without building complex custom web scraping tools. Instead of manually checking the registry website for deprecated modules, run this data extraction tool to schedule weekly compliance audits and catch hazardous copyleft licenses before they merge into your production branch.

Engineering leads rely on this scraper to continuously monitor repository health and extract specific maintainer signals. By scraping deep package details, the tool automatically outputs clean results including the exact license type, deprecation status, and active maintainer counts. Using these scraped outputs, you can enforce custom security policies that flag risky dependencies. The fast browser automation extracts a structured summary row per package, giving your team instant visibility into the exact risk profile of every module your application requires. Whether you input specific package names or batch search URLs, the extracted data helps teams confidently clear compliance reviews. Run the tool to parse complex nested trees and lock down your open-source pipeline with structured, reliable dependency data.

Store Quickstart

Run this actor with your target input. Results appear in the Apify Dataset and can be piped to webhooks for real-time delivery. Use dryRun to validate before committing to a schedule.

Key Features

  • License classification — Categorize as permissive, weak-copyleft, strong-copyleft, or unknown
  • Configurable policypermissive (strict), copyleft-ok (tolerant), or custom allow/deny lists
  • Transitive dependency crawl — Audit nested deps up to 3 levels deep
  • Risk scoring — 0-100 score with A-F grades per package
  • Maintainer signals — Maintainer count, last publish date, deprecation status
  • Summary-first — One row per package with aggregate transitive risk

Use Cases

WhoWhy
DevelopersAutomate recurring data fetches without building custom scrapers
Data teamsPipe structured output into analytics warehouses
Ops teamsMonitor changes via webhook alerts
Product managersTrack competitor/market signals without engineering time

Input

FieldTypeDefaultDescription
packagesarrayprefillednpm package names to audit (max 200).
licensePolicystring"permissive"Which license policy to apply: 'permissive' flags copyleft licenses as high risk, 'copyleft-ok' treats copyleft as mediu
allowListarray[]SPDX identifiers to treat as approved (only used when licensePolicy=custom).
denyListarray[]SPDX identifiers to treat as denied (only used when licensePolicy=custom).
maxDepthinteger1Maximum transitive dependency depth to crawl (0 = direct only, 1 = one level deep, etc.).
includeDevDepsbooleanfalseAlso audit devDependencies of each package.
concurrencyinteger5Number of parallel requests
timeoutMsinteger15000Request timeout in milliseconds

Input Example

{
"packages": [
"express",
"react",
"lodash",
"axios"
],
"licensePolicy": "permissive",
"allowList": [],
"denyList": [],
"maxDepth": 1,
"includeDevDeps": false,
"concurrency": 5,
"timeoutMs": 15000,
"delivery": "dataset",
"dryRun": false
}

Output

FieldTypeDescription
metaobject
resultsarray
results[].packagestring
results[].versionstring
results[].licensestring
results[].licenseFamilystring
results[].riskLevelstring
results[].descriptionstring
results[].authornull
results[].homepagestring (url)
results[].repositorystring (url)
results[].maintainerCountnumber
results[].lastPublishtimestamp
results[].daysSincePublishnumber
results[].deprecatednull
results[].directDepsnumber
results[].devDepsnumber
results[].transitiveDepsnumber
results[].transitiveRiskSummaryobject
results[].scoreobject
results[].auditedAttimestamp
results[].errornull

Output Example

{
"meta": {
"generatedAt": "2026-06-15T12:00:00.000Z",
"policy": "permissive",
"maxDepth": 1,
"totals": {
"audited": 3,
"errors": 0,
"highRisk": 0,
"mediumRisk": 0,
"lowRisk": 3,
"gradeA": 2,
"gradeB": 1,
"gradeC": 0,
"gradeD": 0,
"gradeF": 0,
"deprecated": 0
}
},
"results": [
{
"package": "express",
"version": "4.21.0",
"license": "MIT",
"licenseFamily": "permissive",
"riskLevel": "low",
"description": "Fast, unopinionated, minimalist web framework",
"author": null,
"homepage": "http://expressjs.com/",
"repository": "https://github.com/expressjs/express",
"maintainerCount": 4,
"lastPublish": "2024-09-11T00:00:00.000Z",
"daysSincePublish": 45,
"deprecated": null,
"directDeps": 30,
"devDeps": 0,
"transitiveDeps": 48,
"transitiveRiskSummary": {
"total": 48,
"high": 0,

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~open-source-license-dependency-audit/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "packages": [ "express", "react", "lodash", "axios" ], "licensePolicy": "permissive", "allowList": [], "denyList": [], "maxDepth": 1, "includeDevDeps": false, "concurrency": 5, "timeoutMs": 15000, "delivery": "dataset", "dryRun": false }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/open-source-license-dependency-audit").call(run_input={
"packages": [
"express",
"react",
"lodash",
"axios"
],
"licensePolicy": "permissive",
"allowList": [],
"denyList": [],
"maxDepth": 1,
"includeDevDeps": false,
"concurrency": 5,
"timeoutMs": 15000,
"delivery": "dataset",
"dryRun": false
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/open-source-license-dependency-audit').call({
"packages": [
"express",
"react",
"lodash",
"axios"
],
"licensePolicy": "permissive",
"allowList": [],
"denyList": [],
"maxDepth": 1,
"includeDevDeps": false,
"concurrency": 5,
"timeoutMs": 15000,
"delivery": "dataset",
"dryRun": false
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

  • Schedule weekly runs against your production domains to catch config drift.
  • Use webhook delivery to pipe findings into your SIEM (Splunk, Datadog, Elastic).
  • For CI integration, block releases on critical severity findings using exit codes.
  • Combine with ssl-certificate-monitor for layered cert + headers coverage.
  • Findings include links to official remediation docs — share with dev teams via the webhook payload.

FAQ

Is running this against a third-party site legal?

Passive public-header scanning is generally permitted, but follow your own compliance policies. Only scan sites you have authorization for.

How often should I scan?

Weekly for production domains; daily if you have high config-change velocity.

Can I export to a compliance tool?

Use webhook delivery or Dataset API — formats map well to Drata, Vanta, OneTrust import templates.

Is this a penetration test?

No — this actor performs passive compliance scanning only. No exploitation, fuzzing, or auth bypass.

Does this qualify as a SOC2 control?

This actor produces evidence artifacts suitable for SOC2 CC7.1 (continuous monitoring). It is not itself a SOC2 certification.

Security & Compliance cluster — explore related Apify tools:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.