# CRAN R Packages Scraper - Metadata, Dependencies & Analytics (`klondikeking/cran-r-packages-scraper`) Actor

Extract comprehensive metadata from CRAN (Comprehensive R Archive Network) packages including descriptions, versions, dependencies, reverse dependencies, publication dates, authors, DOIs, vignettes, and download links. Perfect for R ecosystem research, dependency analysis, and package discovery.

- **URL**: https://apify.com/klondikeking/cran-r-packages-scraper.md
- **Developed by:** [Pierrick McD0nald](https://apify.com/klondikeking) (community)
- **Categories:** Developer tools, Education
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## CRAN R Packages Scraper — Metadata, Dependencies & Analytics

Extract comprehensive metadata from CRAN (Comprehensive R Archive Network) packages. This Actor scrapes package detail pages to collect descriptions, versions, dependencies, reverse dependencies, publication dates, authors, DOIs, vignettes, download links, and more. Perfect for R ecosystem research, dependency analysis, package discovery, and academic data collection.

### Use Cases

- **R Ecosystem Research** — Analyze the CRAN package landscape, identify trending packages, and study dependency networks across the R statistical computing environment.
- **Dependency Analysis** — Map reverse dependencies to understand which packages rely on a given library, useful for security auditing and impact assessment.
- **Package Discovery** — Build curated lists of R packages by category, author, or publication date for research or teaching purposes.
- **Academic Data Collection** — Collect structured metadata from CRAN for bibliometric analysis, reproducibility studies, and software citation research.

### Input

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `packageNames` | Array | Yes | List of CRAN package names to scrape (e.g., ggplot2, dplyr). Leave empty to scrape all packages. |
| `maxItems` | Number | No | Maximum packages to scrape (default: 100, 0 for unlimited). |
| `includeDetails` | Boolean | No | Scrape full detail pages (true) or just list names (false). |
| `proxyConfiguration` | Object | No | Proxy configuration. Apify proxy included by default. |

### Output

The Actor outputs a dataset with the following fields:

```json
{
  "packageName": "ggplot2",
  "title": "Create Elegant Data Visualisations Using the Grammar of Graphics",
  "description": "A system for declaratively creating graphics...",
  "version": "4.0.3",
  "depends": "R (>= 4.1)",
  "imports": "cli, grDevices, grid, gtable, isoband, lifecycle, rlang, S7, scales, stats, vctrs, withr",
  "suggests": "broom, covr, dplyr, hexbin, Hmisc, hms, knitr, MASS, mgcv, multcomp, munsell, nlme, profvis, quantreg, quarto, ragg, RColorBrewer, roxygen2, rpart, sf, svglite, testthat, tibble, vdiffr, xml2",
  "enhances": "sp",
  "publishedDate": "2026-04-22",
  "author": "Hadley Wickham, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, Dewey Dunnington, Teun van den Brand",
  "maintainer": "Thomas Lin Pedersen <thomasp85@gmail.com>",
  "doi": "10.32614/CRAN.package.ggplot2",
  "url": "https://cran.r-project.org/web/packages/ggplot2/index.html",
  "license": "MIT + file LICENSE",
  "needsCompilation": "no",
  "inViews": "ChemPhys, NetworkAnalysis, Phylogenetics, Spatial, TeachingStatistics",
  "cranChecks": "ggplot2 results",
  "referenceManual": "refman/ggplot2.html, ggplot2.pdf",
  "vignettes": "Extending ggplot2, Using ggplot2 in packages, Aesthetic specifications, Introduction to ggplot2, Profiling Performance",
  "materials": "README, NEWS",
  "citation": "ggplot2 citation info",
  "packageSource": "ggplot2_4.0.3.tar.gz",
  "windowsBinaries": "r-devel: ggplot2_4.0.3.zip, r-release: ggplot2_4.0.3.zip, r-oldrel: ggplot2_4.0.3.zip",
  "macosBinaries": "r-release (arm64): ggplot2_4.0.3.tgz, r-oldrel (arm64): ggplot2_4.0.3.tgz, r-release (x86_64): ggplot2_4.0.3.tgz, r-oldrel (x86_64): ggplot2_4.0.3.tgz",
  "oldSources": "https://CRAN.R-project.org/src/contrib/Archive/ggplot2",
  "reverseDepends": "accessrmd, afmToolkit, alakazam, alookr, AmpliconDuo, Anaconda, Anaquin, apisensr, applicable, ausplotsR, bacon, BasketballAnalyzeR, bayesDP, bayesnec, bbnet, bde, bhm, bootnet, bpcp, braidReports, bunching, CalibrationCurves, caret, CellNOptR, ceterisParibus, cfda, changepoint.geo, changeS, CHETAH, ChIPQC, circhelp, cjoint, ClassificationEnsembles, classifierplots, clustEff, ClusteredMutations, clustrd, CNVrd2, CNVScope, coefplot, cogena, cohorttools, colleyRstats, ConconiAnaerobicThresholdTest, ContourFunctions, corkscrew, CoSMoS, CRABS, CrispRVariants, crmPack, Crossover, CRTgeeDR, crumblr, CTxCC, curatedBreastData, cystiSim, cytofan, dae, DaMiRseq, dampack, dartR, dartR.base, dartR.sim, ddecompose, decompTumor2Sig, Deducer, deltaGseg, DendroSync, DepthProc, DEqMS, DHBins, diathor, diffEnrich, diffeR, DiSCos, dittoSeq, dittoViz, dnn, donutsk, dotwhisker, dowser, dpGMM, dreamlet, dslice, dynr, Eagle, echoice2, eeptools, egg, embryogrowth, EnhancedVolcano, EnsCat, EpiCurve, episensr, EQUALCompareImages, EQUALPrognosis, EQUALrepeat, erccdashboard, escheR, eVCGsampler, extraChIPs, FactoClass, factoextra, factorplot, Factoshiny, fbroc, findGSEP, FisherEM, flippant, ForecastingEnsembles, forestmodel, FormulR, freqparcoord, frequency, func2vis, funMoDisco, gam.hp, gapmap, garma, GARS, gcerisk, gde, GenericML, genlogis, GenomicOZone, geomtextpath, geotoolsR, GerminaR, gg4way, ggalign, ggallin, ggalluvial, GGally, gganimate, ggarrow, ggbeeswarm, ggbio, ggbiplot, ggbuildr, ggcharts, ggcorrplot, ggcube, ggcyto, ggdemetra, ggdensity, ggetho, ggExametrika, ggFishPlots, ggfixest, ggfocus, ggforce, ggformula, ggfortify, ggfoundry, gggda, gggenomes, ggghost, gggibbous, gggrid, ggh4x, gghighlight, ggHoriPlot, ggimage, ggincerta, gginnards, ggInterval, ggip, ggkegg, gglm, gglorenz, ggmanh, ggmap, ggmapcn, ggmatplot, ggmcmc, ggmulti, ggnetwork, ggOceanMaps, ggordiplots, ggpackets, ggparty, ggplot2.utils, ggpointless, ggpolar, ggpolypath, ggpp, ggpubr, ggragged, ggrain, ggraph, ggraptR, ggResidpanel, ggROC, ggsdc, ggsignif, ggsom, ggspatial, ggsurvfit, ggtext, ggthemes, ggthreed, ggtibble, ggtidy, ggtree, ggtrendline, ggunify, ggupset, ggvenn, ggVennDiagram, ggvis, ggwordcloud, ggx, ggxtend, ghiblipalettes, Gifi, ggsoccer, ggsolvencyii, ggstance, ggstats, ggstatsplot, ggsteam, ggstream, ggsubplot, ggswissmaps, ggtern, ggtexttable, ggTimeSeries, ggtrend, ggupset, ggvis, ggwordcloud, ggx, ggxtend",
  "reverseImports": "",
  "reverseSuggests": "",
  "reverseEnhances": ""
}
````

### Pricing

Pay per event: $0.001 per package extracted.

### Limitations

- Scraping all ~20,000 CRAN packages may take significant time and compute. Use `maxItems` to limit scope.
- Package pages are static HTML; no JavaScript rendering required.
- CRAN rate limits are generous but respect them by using the built-in proxy configuration.
- Some packages may have missing fields (e.g., no DOI, no vignettes) which will be returned as empty strings.

### FAQ

**Q: Can I scrape all CRAN packages at once?**
A: Yes. Leave `packageNames` empty and set `maxItems` to 0. This will fetch all ~20,000 packages. Consider using a higher compute tier for large runs.

**Q: How do I find package names?**
A: Package names are the exact CRAN identifiers (e.g., `ggplot2`, not `ggplot 2`). You can find them on CRAN or by running the Actor with an empty `packageNames` list to discover them.

**Q: What is the difference between `includeDetails` true and false?**
A: When `includeDetails` is true, the Actor visits each package's detail page and extracts full metadata. When false, it only extracts name and title from the package list page (much faster but less data).

### Changelog

- **v1.0.0** — Initial release. Scrape CRAN package metadata including dependencies, reverse dependencies, authors, DOIs, vignettes, and download links.

# Actor input Schema

## `packageNames` (type: `array`):

List of CRAN package names to scrape (e.g., ggplot2, dplyr, shiny). Leave empty to scrape all packages.

## `maxItems` (type: `integer`):

Maximum number of packages to scrape. Use 0 for unlimited.

## `includeDetails` (type: `boolean`):

If enabled, scrapes full package detail pages. If disabled, only extracts name, title, and description from the package list.

## `proxyConfiguration` (type: `object`):

Configure proxy for scraping. Apify proxy is included by default.

## Actor input object example

```json
{
  "packageNames": [
    "ggplot2",
    "dplyr",
    "shiny"
  ],
  "maxItems": 10,
  "includeDetails": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `results` (type: `string`):

No description

## `stats` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "packageNames": [
        "ggplot2",
        "dplyr",
        "shiny"
    ],
    "maxItems": 10,
    "includeDetails": true,
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("klondikeking/cran-r-packages-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "packageNames": [
        "ggplot2",
        "dplyr",
        "shiny",
    ],
    "maxItems": 10,
    "includeDetails": True,
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("klondikeking/cran-r-packages-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "packageNames": [
    "ggplot2",
    "dplyr",
    "shiny"
  ],
  "maxItems": 10,
  "includeDetails": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call klondikeking/cran-r-packages-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=klondikeking/cran-r-packages-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "CRAN R Packages Scraper - Metadata, Dependencies & Analytics",
        "description": "Extract comprehensive metadata from CRAN (Comprehensive R Archive Network) packages including descriptions, versions, dependencies, reverse dependencies, publication dates, authors, DOIs, vignettes, and download links. Perfect for R ecosystem research, dependency analysis, and package discovery.",
        "version": "1.0",
        "x-build-id": "RCnIMIZkxTvj2nywX"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/klondikeking~cran-r-packages-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-klondikeking-cran-r-packages-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/klondikeking~cran-r-packages-scraper/runs": {
            "post": {
                "operationId": "runs-sync-klondikeking-cran-r-packages-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/klondikeking~cran-r-packages-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-klondikeking-cran-r-packages-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "packageNames",
                    "maxItems",
                    "includeDetails"
                ],
                "properties": {
                    "packageNames": {
                        "title": "Package Names",
                        "type": "array",
                        "description": "List of CRAN package names to scrape (e.g., ggplot2, dplyr, shiny). Leave empty to scrape all packages.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "type": "integer",
                        "description": "Maximum number of packages to scrape. Use 0 for unlimited.",
                        "default": 100
                    },
                    "includeDetails": {
                        "title": "Include Detailed Metadata",
                        "type": "boolean",
                        "description": "If enabled, scrapes full package detail pages. If disabled, only extracts name, title, and description from the package list.",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Configure proxy for scraping. Apify proxy is included by default."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
