# OSS Insight Scraper (`crawlerbros/oss-insight-scraper`) Actor

Scrape OSS Insight - open-source intelligence on 5M+ GitHub repos. Get trending repos by language, top projects by collection (databases, AI, web frameworks, CI/CD, and more), stars, forks, contributors, and growth metrics.

- **URL**: https://apify.com/crawlerbros/oss-insight-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** Developer tools, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## OSS Insight Scraper

Scrape **OSS Insight** — open-source intelligence on 5+ million GitHub repositories. Discover trending repos, top-ranked open-source projects by category (databases, AI, web frameworks, CI/CD, programming languages, and more), with stars, forks, contributor lists, and growth metrics — directly from the public [OSS Insight](https://ossinsight.io) data API.

Built for product managers, founders, investors, developer-tooling teams, and OSS researchers who need fresh, structured, queryable intelligence on the open-source landscape.

### What you can do

- **Find what is trending right now** on GitHub by language and time period
- **Discover top projects in a category** — open-source databases, AI tools, headless CMSs, web frameworks, etc.
- **Track week-over-week growth** of repos within any collection
- **Enrich your dataset** of dev tools with descriptions, contributor lists, and language tags

### Modes

| Mode | Description |
| --- | --- |
| `trending` | Top trending repos across all languages for the chosen time period |
| `byLanguage` | Top trending repos for a specific programming language |
| `byCollection` | Top-ranked repos inside a curated collection (databases, AI, web frameworks, ...) |
| `listCollections` | List every available collection with its ID and name |

### Example inputs

#### Trending repos (past 24 hours)
```json
{
  "mode": "trending",
  "period": "past_24_hours",
  "maxItems": 20
}
````

#### Python trending repos (past week)

```json
{
  "mode": "byLanguage",
  "language": "Python",
  "period": "past_week",
  "maxItems": 30
}
```

#### Top AI repos by stars gained (past month)

```json
{
  "mode": "byCollection",
  "collectionId": 10010,
  "rankBy": "stars",
  "period": "past_month",
  "maxItems": 50
}
```

#### Discover every collection

```json
{
  "mode": "listCollections",
  "maxItems": 200
}
```

### Output fields

#### Repo records (`trending`, `byLanguage`)

| Field | Description |
| --- | --- |
| `repoId` | GitHub numeric repo ID |
| `repoName` | `owner/name` |
| `owner` | Repo owner |
| `repoShortName` | Repo short name |
| `repoUrl` | Direct GitHub URL |
| `ossInsightUrl` | OSS Insight analyze URL |
| `primaryLanguage` | Main programming language |
| `description` | Repository description |
| `stars` | Stars gained in the period |
| `forks` | Forks gained in the period |
| `pullRequests` | Pull requests in the period |
| `pushes` | Push events in the period |
| `trendingScore` | Composite trending score |
| `topContributors` | Top contributor logins |
| `collections` | Collection names this repo belongs to |
| `period` | Time period queried |
| `languageFilter` | Language filter applied (if any) |
| `scrapedAt` | UTC ISO timestamp |

#### Collection-ranking records (`byCollection`)

| Field | Description |
| --- | --- |
| `repoId`, `repoName`, `owner`, `repoUrl`, `ossInsightUrl` | Repo identifiers |
| `totalStars` | Total all-time stars |
| `currentPeriodGrowth` | Stars/forks/PRs gained in the current period |
| `currentPeriodRank` | Rank within the collection this period |
| `pastPeriodGrowth`, `pastPeriodRank` | Same metrics for the previous period |
| `growthPercentChange` | Growth change vs. previous period (%) |
| `rankChange` | Rank change vs. previous period |
| `collectionId`, `collectionName` | Collection identifiers |
| `rankBy` | Ranking metric used (`stars`, `forks`, `pull_requests`) |

#### Collection records (`listCollections`)

| Field | Description |
| --- | --- |
| `collectionId` | Numeric collection ID |
| `collectionName` | Human-readable name |
| `ossInsightUrl` | Direct OSS Insight collection URL |

Empty fields are omitted from output records.

### FAQ

**Does this scraper use cookies or API keys?**
No. OSS Insight serves an unauthenticated public REST API.

**Will the actor work on the Apify free plan?**
Yes. No proxy is required — the API is reachable from any datacenter IP.

**How fresh is the data?**
OSS Insight refreshes its underlying GitHub event data continuously; trending endpoints typically reflect activity within the last hour.

**Can I filter by category?**
Yes. Run `mode=listCollections` once to enumerate every collection ID, then pass that ID with `mode=byCollection`. Common IDs include `2` (Open Source Database), `10010` (Artificial Intelligence), `10004` (Web Framework), `10024` (Programming Language).

**What about deleted or private repos?**
OSS Insight only indexes public GitHub repositories; deleted repos drop out automatically.

**Can I use this for competitive intelligence on dev-tools companies?**
Yes — pairing `byCollection` with a `period` window lets you see week-over-week or month-over-month growth, which is a strong leading indicator of category momentum.

### Source

- OSS Insight: https://ossinsight.io
- Public data API: https://api.ossinsight.io/v1/

# Actor input Schema

## `mode` (type: `string`):

What data to fetch.

## `period` (type: `string`):

Time window for trending/ranking data.

## `language` (type: `string`):

Filter trending repos to a specific language.

## `collectionId` (type: `integer`):

OSS Insight collection ID. Use mode=listCollections to discover IDs. Examples: 1=Static Site Generator, 2=Open Source Database, 10010=Artificial Intelligence, 10004=Web Framework, 10024=Programming Language.

## `rankBy` (type: `string`):

How to rank repos within a collection.

## `maxItems` (type: `integer`):

Hard cap on emitted records.

## Actor input object example

```json
{
  "mode": "trending",
  "period": "past_24_hours",
  "language": "Python",
  "rankBy": "stars",
  "maxItems": 20
}
```

# Actor output Schema

## `repos` (type: `string`):

Dataset containing all scraped OSS Insight records.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "trending",
    "period": "past_24_hours",
    "maxItems": 20
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/oss-insight-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "trending",
    "period": "past_24_hours",
    "maxItems": 20,
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/oss-insight-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "trending",
  "period": "past_24_hours",
  "maxItems": 20
}' |
apify call crawlerbros/oss-insight-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/oss-insight-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "OSS Insight Scraper",
        "description": "Scrape OSS Insight - open-source intelligence on 5M+ GitHub repos. Get trending repos by language, top projects by collection (databases, AI, web frameworks, CI/CD, and more), stars, forks, contributors, and growth metrics.",
        "version": "1.0",
        "x-build-id": "QXVGwGxST5BDmkBCc"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~oss-insight-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-oss-insight-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~oss-insight-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-oss-insight-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~oss-insight-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-oss-insight-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "trending",
                            "byLanguage",
                            "byCollection",
                            "listCollections"
                        ],
                        "type": "string",
                        "description": "What data to fetch.",
                        "default": "trending"
                    },
                    "period": {
                        "title": "Time period",
                        "enum": [
                            "past_24_hours",
                            "past_week",
                            "past_month",
                            "past_3_months"
                        ],
                        "type": "string",
                        "description": "Time window for trending/ranking data.",
                        "default": "past_24_hours"
                    },
                    "language": {
                        "title": "Programming language (mode=byLanguage)",
                        "enum": [
                            "JavaScript",
                            "TypeScript",
                            "Python",
                            "Java",
                            "Go",
                            "Rust",
                            "C",
                            "C++",
                            "C#",
                            "Ruby",
                            "PHP",
                            "Swift",
                            "Kotlin",
                            "Scala",
                            "Dart",
                            "Shell",
                            "HTML",
                            "CSS"
                        ],
                        "type": "string",
                        "description": "Filter trending repos to a specific language.",
                        "default": "Python"
                    },
                    "collectionId": {
                        "title": "Collection ID (mode=byCollection)",
                        "minimum": 1,
                        "maximum": 99999,
                        "type": "integer",
                        "description": "OSS Insight collection ID. Use mode=listCollections to discover IDs. Examples: 1=Static Site Generator, 2=Open Source Database, 10010=Artificial Intelligence, 10004=Web Framework, 10024=Programming Language."
                    },
                    "rankBy": {
                        "title": "Ranking metric (mode=byCollection)",
                        "enum": [
                            "stars",
                            "forks",
                            "pull_requests"
                        ],
                        "type": "string",
                        "description": "How to rank repos within a collection.",
                        "default": "stars"
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Hard cap on emitted records.",
                        "default": 50
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
