# NPM Package Scraper (`glassventures/npm-package-scraper`) Actor

Scrape NPM package data. Extract name, version, downloads, dependencies, maintainers, and more. Export to JSON, CSV, Excel.

- **URL**: https://apify.com/glassventures/npm-package-scraper.md
- **Developed by:** [Glass Ventures](https://apify.com/glassventures) (community)
- **Categories:** Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## NPM Package Scraper

Scrape detailed package data from the NPM registry. Extract name, version, downloads, dependencies, maintainers, license, and more from any NPM package.

### What does NPM Package Scraper do?

NPM Package Scraper extracts comprehensive metadata from the NPM registry for any package. It uses the public NPM registry API to gather package details including version history, download statistics, dependency counts, maintainer information, and more.

This actor is ideal for developers, researchers, and analysts who need bulk NPM package data without writing custom API integration code. It handles pagination, rate limiting, and data normalization automatically.

Whether you need to audit dependencies across your organization, research competing packages, or build a dataset of packages in a specific domain, this actor delivers clean, structured data ready for analysis.

### Use Cases

- **DevOps engineers** -- audit dependencies across projects, track package versions and licenses
- **Market researchers** -- analyze NPM ecosystem trends, compare package popularity and adoption
- **Security analysts** -- monitor packages for maintainer changes, dependency counts, and update frequency
- **Developers** -- discover and compare packages by search terms, evaluate alternatives

### Features

- Scrape packages by direct URL, package name, or search query
- Extract download statistics, dependency counts, and maintainer info
- Optional README content extraction
- Proxy support with automatic rotation
- Handles pagination for search results automatically
- Exports to JSON, CSV, Excel, or connect via API

### How much will it cost?

The NPM registry API is fully public and lightweight. This actor uses minimal compute resources.

| Results | Estimated Cost |
|---------|---------------|
| 100     | ~$0.01        |
| 1,000   | ~$0.05        |
| 10,000  | ~$0.40        |

| Cost Component | Per 1,000 Results |
|----------------|-------------------|
| Platform compute | ~$0.05 |
| Proxy (optional) | ~$0.00 |
| **Total** | **~$0.05** |

### How to use

1. Go to the NPM Package Scraper page on Apify Store
2. Click "Start" or "Try for free"
3. Enter package names, NPM URLs, or search terms
4. Set the maximum number of items
5. Click "Start" and wait for the results

### Input parameters

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| startUrls | array | NPM package page URLs to scrape | - |
| packageNames | array | Package names to look up directly | - |
| searchTerms | array | Search queries to find packages | - |
| includeReadme | boolean | Include full README in output | false |
| maxItems | number | Max results to return | 100 |
| proxyConfig | object | Proxy settings | Apify Proxy |

### Output

The actor produces a dataset with the following fields:

```json
{
    "url": "https://www.npmjs.com/package/express",
    "name": "express",
    "version": "4.18.2",
    "description": "Fast, unopinionated, minimalist web framework for node.",
    "author": "TJ Holowaychuk",
    "license": "MIT",
    "homepage": "http://expressjs.com/",
    "repository": "https://github.com/expressjs/express",
    "keywords": ["express", "framework", "sinatra", "web", "http"],
    "dependenciesCount": 31,
    "weeklyDownloads": 30000000,
    "lastPublished": "2023-10-11T17:00:00.000Z",
    "maintainers": ["dougwilson", "linusu", "wesleytodd"],
    "readme": null,
    "scrapedAt": "2024-01-15T10:30:00.000Z"
}
````

| Field | Type | Description |
|-------|------|-------------|
| url | string | NPM package page URL |
| name | string | Package name |
| version | string | Latest version |
| description | string | Package description |
| author | string | Package author |
| license | string | License type |
| homepage | string | Project homepage URL |
| repository | string | Source repository URL |
| keywords | array | Package keywords |
| dependenciesCount | integer | Number of runtime dependencies |
| weeklyDownloads | integer | Downloads in the last month (weekly average) |
| lastPublished | string | ISO 8601 date of last publish |
| maintainers | array | List of maintainer usernames |
| readme | string | Full README content (if enabled) |
| scrapedAt | string | ISO 8601 scrape timestamp |

### Integrations

Connect NPM Package Scraper with other tools:

- **Apify API** -- REST API for programmatic access
- **Webhooks** -- get notified when a run finishes
- **Zapier / Make** -- connect to 5,000+ apps
- **Google Sheets** -- export directly to spreadsheets

#### API Example (Node.js)

```javascript
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('YOUR_USERNAME/npm-package-scraper').call({
    packageNames: ['express', 'react', 'lodash'],
    maxItems: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
```

#### API Example (Python)

```python
from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('YOUR_USERNAME/npm-package-scraper').call(run_input={
    'packageNames': ['express', 'react', 'lodash'],
    'maxItems': 100,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
```

#### API Example (cURL)

```bash
curl "https://api.apify.com/v2/acts/YOUR_USERNAME~npm-package-scraper/runs" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"packageNames": ["express", "react", "lodash"], "maxItems": 100}'
```

### Tips and tricks

- Start with a small `maxItems` (10-20) to test before running large scrapes
- Use `packageNames` for known packages -- it is faster than search
- Search terms return up to 250 results per query from the NPM API
- The NPM registry is public and rate-limits are generous, so proxies are optional

### FAQ

**Q: Does this actor require login credentials?**
A: No. The NPM registry API is fully public and requires no authentication.

**Q: How fast is the scraping?**
A: Approximately 100-300 packages per minute depending on concurrency settings.

**Q: What should I do if I get rate limited?**
A: Enable proxy configuration in the settings. NPM rarely rate-limits, but proxies help for very large scrapes.

**Q: Can I get download counts?**
A: Yes. The actor fetches monthly download counts from the NPM downloads API and includes them as `weeklyDownloads`.

### Is it legal to scrape NPM?

The NPM registry API is a public API designed for programmatic access. This actor only accesses publicly available package metadata through the official API endpoints. Always review and respect NPM's Terms of Service. For more information, see [Apify's blog on web scraping legality](https://blog.apify.com/is-web-scraping-legal/).

### Limitations

- Download counts are monthly totals from the NPM downloads API (not real-time)
- Search results are limited to 250 per search term (NPM API limit)
- Private/scoped packages that require authentication are not supported
- README content can be large and significantly increases output size

### Changelog

- **v0.1** (2026-04-23) -- Initial release

# Actor input Schema

## `startUrls` (type: `array`):

List of NPM package URLs to scrape (e.g. https://www.npmjs.com/package/express).

## `packageNames` (type: `array`):

List of NPM package names to scrape directly (e.g. express, react, lodash).

## `searchTerms` (type: `array`):

Search queries to find packages on NPM. The actor will search and scrape the results.

## `includeReadme` (type: `boolean`):

Whether to include the full README content in the output. Increases data size significantly.

## `maxItems` (type: `integer`):

Maximum number of packages to scrape. Use 0 or leave empty for unlimited.

## `maxConcurrency` (type: `integer`):

Maximum number of requests processed in parallel.

## `debugMode` (type: `boolean`):

Enables verbose logging for debugging.

## `extendOutputFunction` (type: `string`):

A JavaScript function to customize each output item. Receives { data }.

## `proxyConfig` (type: `object`):

Select proxies to be used. NPM registry is public so proxies are optional.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.npmjs.com/package/crawlee"
    }
  ],
  "packageNames": [
    "crawlee",
    "apify",
    "express"
  ],
  "includeReadme": false,
  "maxItems": 100,
  "maxConcurrency": 10,
  "debugMode": false,
  "extendOutputFunction": "async ({ data }) => {\n    return data;\n}",
  "proxyConfig": {
    "useApifyProxy": true
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.npmjs.com/package/crawlee"
        }
    ],
    "packageNames": [
        "crawlee",
        "apify",
        "express"
    ],
    "extendOutputFunction": async ({ data }) => {
        return data;
    },
    "proxyConfig": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("glassventures/npm-package-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://www.npmjs.com/package/crawlee" }],
    "packageNames": [
        "crawlee",
        "apify",
        "express",
    ],
    "extendOutputFunction": """async ({ data }) => {
    return data;
}""",
    "proxyConfig": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("glassventures/npm-package-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.npmjs.com/package/crawlee"
    }
  ],
  "packageNames": [
    "crawlee",
    "apify",
    "express"
  ],
  "extendOutputFunction": "async ({ data }) => {\\n    return data;\\n}",
  "proxyConfig": {
    "useApifyProxy": true
  }
}' |
apify call glassventures/npm-package-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=glassventures/npm-package-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "NPM Package Scraper",
        "description": "Scrape NPM package data. Extract name, version, downloads, dependencies, maintainers, and more. Export to JSON, CSV, Excel.",
        "version": "0.1",
        "x-build-id": "zHfH1V7aKHTZ0i7a9"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/glassventures~npm-package-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-glassventures-npm-package-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/glassventures~npm-package-scraper/runs": {
            "post": {
                "operationId": "runs-sync-glassventures-npm-package-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/glassventures~npm-package-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-glassventures-npm-package-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "List of NPM package URLs to scrape (e.g. https://www.npmjs.com/package/express).",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "packageNames": {
                        "title": "Package Names",
                        "type": "array",
                        "description": "List of NPM package names to scrape directly (e.g. express, react, lodash).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "searchTerms": {
                        "title": "Search Terms",
                        "type": "array",
                        "description": "Search queries to find packages on NPM. The actor will search and scrape the results.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "includeReadme": {
                        "title": "Include README",
                        "type": "boolean",
                        "description": "Whether to include the full README content in the output. Increases data size significantly.",
                        "default": false
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of packages to scrape. Use 0 or leave empty for unlimited.",
                        "default": 100
                    },
                    "maxConcurrency": {
                        "title": "Max Concurrency",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of requests processed in parallel.",
                        "default": 10
                    },
                    "debugMode": {
                        "title": "Debug Mode",
                        "type": "boolean",
                        "description": "Enables verbose logging for debugging.",
                        "default": false
                    },
                    "extendOutputFunction": {
                        "title": "Extend Output Function",
                        "type": "string",
                        "description": "A JavaScript function to customize each output item. Receives { data }."
                    },
                    "proxyConfig": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Select proxies to be used. NPM registry is public so proxies are optional."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
