# OSINT Scraper (`crawlerbros/osint-scraper`) Actor

Search paste sites and code sharing platforms (Pastebin, GitHub Gist, Ideone, Paste.org, Textbin) for leaked keywords, credentials, and sensitive data using Google SERP-based discovery.

- **URL**: https://apify.com/crawlerbros/osint-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** Developer tools, Other, SEO tools
- **Stats:** 1 total users, 1 monthly users, 100.0% runs succeeded, 15 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $1.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## OSINT Scraper

Search paste sites and code-sharing platforms for keywords using Google Search's `site:` operator. Ideal for security researchers, threat intelligence analysts, and compliance teams looking for leaked credentials, sensitive data, or public mentions of specific terms.

### Supported Sources

| Source | Domain | Description |
|---|---|---|
| **Pastebin** | pastebin.com | The most popular paste site |
| **GitHub Gist** | gist.github.com | GitHub's snippet sharing platform |
| **Ideone** | ideone.com | Online code compiler with shareable pastes |
| **Paste.org** | paste.org | General-purpose paste site |
| **Textbin** | textbin.net | Simple text sharing site |

### How It Works

Many paste sites actively block direct scraping or require paid API access. This scraper sidesteps that by searching Google for indexed paste URLs matching your keywords — Google has already crawled the public content, making it freely discoverable. No authentication, no per-site rate limits, and no residential proxies required.

### Input

| Field | Type | Description |
|---|---|---|
| `searchKeywords` | array of strings | Keywords to search across OSINT sources |
| `sources` | array of strings | Sources to search (default: all). Valid values: `pastebin`, `gist`, `ideone`, `paste_org`, `textbin` |
| `maxItemsPerSource` | integer | Maximum results per source per keyword (default: 10) |

#### Example Input

```json
{
    "searchKeywords": ["api_key", "password"],
    "sources": ["pastebin", "gist"],
    "maxItemsPerSource": 5
}
````

### Output

Each dataset item represents one discovered paste or gist:

| Field | Type | Description |
|---|---|---|
| `source` | string | Source platform identifier |
| `url` | string | Direct URL to the paste |
| `title` | string | Paste title or identifier |
| `snippet` | string | Excerpt from the paste content (as indexed by Google) |
| `matchedKeyword` | string | The keyword that matched this result |
| `scrapedAt` | string | ISO 8601 scrape timestamp |

#### Example Output

```json
{
    "source": "pastebin",
    "url": "https://pastebin.com/ABC12345",
    "title": "Example Configuration Dump",
    "snippet": "api_key = example_key_value_here, connecting to production environment...",
    "matchedKeyword": "api_key",
    "scrapedAt": "2026-04-10T12:00:00+00:00"
}
```

### FAQ

**Q: Does this scrape paste sites directly?**
No — it searches Google for indexed paste URLs. This avoids anti-bot protection and API rate limits on the individual sites.

**Q: Can I scrape the full paste content?**
The scraper returns the snippet Google shows in search results (typically 100–500 characters). Fetching the full paste is out of scope because many paste sites block it.

**Q: Is this legal?**
Searching publicly indexed content via Google is legal. However, use this scraper responsibly — only search for information you are authorized to look up (e.g., your own leaked credentials, authorized security research).

**Q: Why are some sources missing (Dumpz, Codepad)?**
Those sites are either defunct or no longer publicly accessible as of 2026. This scraper only includes live sources.

### Use Cases

- **Threat intelligence** — monitor for leaked company credentials
- **Breach detection** — search for employee emails across paste sites
- **Security research** — discover public proof-of-concept code
- **Compliance monitoring** — detect data exfiltration via public pastes

# Actor input Schema

## `searchKeywords` (type: `array`):

Keywords to search across OSINT paste sources. Each keyword is searched individually.

## `sources` (type: `array`):

Paste sources to search. Defaults to all supported.

## `maxItemsPerSource` (type: `integer`):

Maximum number of results to collect per source (per keyword).

## Actor input object example

```json
{
  "searchKeywords": [
    "example"
  ],
  "sources": [
    "pastebin",
    "gist",
    "ideone",
    "paste_org",
    "textbin"
  ],
  "maxItemsPerSource": 3
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchKeywords": [
        "example"
    ],
    "sources": [
        "pastebin",
        "gist",
        "ideone",
        "paste_org",
        "textbin"
    ],
    "maxItemsPerSource": 3
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/osint-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchKeywords": ["example"],
    "sources": [
        "pastebin",
        "gist",
        "ideone",
        "paste_org",
        "textbin",
    ],
    "maxItemsPerSource": 3,
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/osint-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchKeywords": [
    "example"
  ],
  "sources": [
    "pastebin",
    "gist",
    "ideone",
    "paste_org",
    "textbin"
  ],
  "maxItemsPerSource": 3
}' |
apify call crawlerbros/osint-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/osint-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "OSINT Scraper",
        "description": "Search paste sites and code sharing platforms (Pastebin, GitHub Gist, Ideone, Paste.org, Textbin) for leaked keywords, credentials, and sensitive data using Google SERP-based discovery.",
        "version": "1.0",
        "x-build-id": "MW5R2HPjm3SuQr4x8"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~osint-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-osint-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~osint-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-osint-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~osint-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-osint-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "searchKeywords"
                ],
                "properties": {
                    "searchKeywords": {
                        "title": "Search Keywords",
                        "type": "array",
                        "description": "Keywords to search across OSINT paste sources. Each keyword is searched individually.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "sources": {
                        "title": "Sources",
                        "type": "array",
                        "description": "Paste sources to search. Defaults to all supported.",
                        "default": [
                            "pastebin",
                            "gist",
                            "ideone",
                            "paste_org",
                            "textbin"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItemsPerSource": {
                        "title": "Max Items per Source",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of results to collect per source (per keyword).",
                        "default": 10
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
