# Threads Crawler (`zrkluke/threads-crawler`) Actor

One Actor for Threads profiles, tags, searches, posts with replies, and custom feeds. Supports bulk accounts, relative dates, engagement metrics, ISO timestamps, and media URLs. No login or API token required.

- **URL**: https://apify.com/zrkluke/threads-crawler.md
- **Developed by:** [Luke](https://apify.com/zrkluke) (community)
- **Categories:** Social media, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Threads Crawler

An Apify Actor for scraping publicly visible Threads pages with Python, Crawlee, Playwright, and Camoufox.

The Actor is designed for local development and Apify deployment through GitHub. It loads dynamic Threads pages in a Camoufox browser, extracts visible profile and post data, and stores structured results in an Apify dataset.

### Features

- Crawl publicly visible Threads profile pages.
- Batch crawl up to 100 accounts in one run.
- Support five crawl modes:
  - Profile pages by username.
  - Tag / topic pages.
  - Keyword search pages.
  - Single thread / post URLs, including publicly visible replies.
  - Custom Threads feed URLs.
- Support username input with or without `@`.
- Support tags with or without `#`.
- Support bulk paste fields for accounts and keywords.
- Support relative date windows such as `7 days`, `1 month`, `24 hours`, `7 天`, or `1 個月`.
- Limit returned posts per account or target with `maxPostsPerAccount`.
- Support absolute `startDate` and `endDate` filters.
- Extract profile metadata when visible:
  - username
  - display name
  - bio
  - external URL
  - follower count
- Extract visible post data:
  - author
  - relative timestamp
  - best-effort ISO timestamp
  - post text
  - visible metrics
  - media URLs found on the page
- Optional raw visible text output for parser debugging.

### Important Limitations

This Actor does not log in to Threads and does not use a private API token. It only extracts data that Threads renders publicly in the browser.

Because of that:

- Full historical posts may not be available. Threads can show `Log in to see more`.
- Replies are only extracted when they are publicly visible on the loaded page.
- Engagement fields depend on what the public page exposes.
- `likes`, `replies`, `reposts`, `shares`, `views`, and `quotes` are best-effort mappings from visible metric numbers.
- Some metrics may be `null` if Threads does not expose them publicly.
- ISO timestamps are estimated from relative timestamps such as `12h`, `1d`, or `3w` using the scrape time.
- Exact post URLs, permanent IDs, and deeper media metadata may require additional parser work.
- Threads page structure can change, which may require selector/parser updates.
- Camoufox reduces common automation fingerprints, but it does not guarantee access or bypass platform limits.

Use this Actor only for public data and make sure your usage complies with applicable laws, Threads terms, and Apify platform rules.

### Input

The Actor input is configured in `.actor/input_schema.json`.

#### `mode`

Choose what to crawl:

```json
"profile"
````

Supported values:

- `profile`
- `tag`
- `search`
- `thread`
- `feed`

#### Profile Mode

Use `accounts` for structured input:

```json
{
  "mode": "profile",
  "accounts": ["largitdata"],
  "maxPostsPerAccount": 10
}
```

Or use `bulkAccounts`:

```json
{
  "mode": "profile",
  "bulkAccounts": "largitdata\nopenai\nmeta",
  "maxPostsPerAccount": 10
}
```

#### Tag / Topic Mode

```json
{
  "mode": "tag",
  "keywordsOrTags": ["AI", "MachineLearning"],
  "maxPostsPerAccount": 10
}
```

#### Keyword Search Mode

```json
{
  "mode": "search",
  "keywordsOrTags": ["AI agent", "Crawlee"],
  "searchSort": "top",
  "maxPostsPerAccount": 10
}
```

#### Single Thread Mode

```json
{
  "mode": "thread",
  "threadUrls": [
    {
      "url": "https://www.threads.com/@largitdata/post/POST_ID"
    }
  ],
  "maxPostsPerAccount": 10
}
```

#### Custom Feed Mode

```json
{
  "mode": "feed",
  "feedUrls": [
    {
      "url": "https://www.threads.com/"
    }
  ],
  "maxPostsPerAccount": 10
}
```

#### Date Filters

```json
{
  "mode": "profile",
  "accounts": ["largitdata"],
  "relativeDate": "7 days"
}
```

You can also use absolute dates:

```json
{
  "mode": "profile",
  "accounts": ["largitdata"],
  "startDate": "2026-05-01",
  "endDate": "2026-05-12"
}
```

Date filtering is best-effort because public Threads pages often expose only relative timestamps.

### Output

Each dataset item represents one crawled target.

Example shape:

```json
{
  "url": "https://www.threads.com/@largitdata",
  "mode": "profile",
  "target": "largitdata",
  "scraped_at": "2026-05-12T06:44:40.405672+00:00",
  "title": "@largitdata • Threads, Say more",
  "profile": {
    "username": "largitdata",
    "display_name": "largitdata",
    "bio": "...",
    "external_url": "largitdata.com",
    "followers": "4,847 followers"
  },
  "posts": [
    {
      "author": "largitdata",
      "posted_at": "15h",
      "posted_at_iso": "2026-05-11T15:44:40.405672+00:00",
      "text": "...",
      "metrics": {
        "likes": "10",
        "replies": "1",
        "reposts": "1",
        "shares": "2",
        "views": null,
        "quotes": null,
        "raw": ["10", "1", "1", "2"]
      }
    }
  ],
  "media_urls": [
    "https://..."
  ]
}
```

### Local Development

Install the Apify CLI:

```bash
npm install -g apify-cli
```

Create and activate a Python virtual environment:

```bash
python -m venv .venv
.venv\Scripts\activate
```

Install dependencies:

```bash
python -m pip install -r requirements.txt
```

Run locally:

```bash
apify run --purge
```

Local test input is stored in:

```text
storage/key_value_stores/default/INPUT.json
```

Local dataset output is stored in:

```text
storage/datasets/default/
```

The `storage/` and `.venv/` directories are ignored by Git.

### Deploy to Apify

This project is intended to be deployed from GitHub.

1. Push changes to GitHub.
2. Open Apify Console.
3. Create or open the Actor.
4. Link the GitHub repository.
5. Build the Actor from the `main` branch.

You can also deploy directly with:

```bash
apify login
apify push
```

GitHub deployment is recommended because it keeps version history and makes future parser updates easier to review.

### Tech Stack

- Apify Python SDK
- Crawlee for Python
- Playwright
- Camoufox
- Python 3.12+

### Notes for Future Improvements

- Add deeper scrolling for profiles and feed pages.
- Improve single-thread reply parsing.
- Extract stable post URLs and IDs.
- Map metric numbers to labels more reliably by inspecting DOM structure.
- Add optional authenticated session support for private internal use cases.
- Add tests around parser behavior using saved page text fixtures.

# Actor input Schema

## `mode` (type: `string`):

Choose what kind of Threads page to crawl.

## `accounts` (type: `array`):

Threads usernames to crawl. You can enter usernames with or without @.

## `bulkAccounts` (type: `string`):

Paste up to 100 accounts, separated by new lines, commas, or spaces.

## `keywordsOrTags` (type: `array`):

Keywords or tags for tag/topic and search modes. Enter tags with or without #.

## `bulkKeywordsOrTags` (type: `string`):

Paste keywords or tags separated by new lines or commas.

## `searchSort` (type: `string`):

Sorting preference for keyword search when Threads exposes it in the page.

## `startDate` (type: `string`):

Optional absolute start date. Applies best-effort filtering to visible relative timestamps.

## `endDate` (type: `string`):

Optional absolute end date. Applies best-effort filtering to visible relative timestamps.

## `relativeDate` (type: `string`):

Optional relative date window, for example: 7 days, 1 month, 24 hours.

## `threadUrls` (type: `array`):

Single post / thread URLs. Replies are extracted when publicly visible.

## `feedUrls` (type: `array`):

Custom Threads feed URLs to crawl.

## `maxPostsPerAccount` (type: `integer`):

Maximum number of posts to return per crawled account or target.

## `includeRawText` (type: `boolean`):

Store the full visible page text for debugging parser changes.

## Actor input object example

```json
{
  "mode": "profile",
  "accounts": [
    "largitdata"
  ],
  "searchSort": "top",
  "relativeDate": "7 days",
  "maxPostsPerAccount": 10,
  "includeRawText": false
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "accounts": [
        "largitdata"
    ],
    "relativeDate": "7 days"
};

// Run the Actor and wait for it to finish
const run = await client.actor("zrkluke/threads-crawler").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "accounts": ["largitdata"],
    "relativeDate": "7 days",
}

# Run the Actor and wait for it to finish
run = client.actor("zrkluke/threads-crawler").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "accounts": [
    "largitdata"
  ],
  "relativeDate": "7 days"
}' |
apify call zrkluke/threads-crawler --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=zrkluke/threads-crawler",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Threads Crawler",
        "description": "One Actor for Threads profiles, tags, searches, posts with replies, and custom feeds. Supports bulk accounts, relative dates, engagement metrics, ISO timestamps, and media URLs. No login or API token required.",
        "version": "0.0",
        "x-build-id": "yi1YpbHr51LhwJrfV"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/zrkluke~threads-crawler/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-zrkluke-threads-crawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/zrkluke~threads-crawler/runs": {
            "post": {
                "operationId": "runs-sync-zrkluke-threads-crawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/zrkluke~threads-crawler/run-sync": {
            "post": {
                "operationId": "run-sync-zrkluke-threads-crawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "profile",
                            "tag",
                            "search",
                            "thread",
                            "feed"
                        ],
                        "type": "string",
                        "description": "Choose what kind of Threads page to crawl.",
                        "default": "profile"
                    },
                    "accounts": {
                        "title": "Account list",
                        "type": "array",
                        "description": "Threads usernames to crawl. You can enter usernames with or without @.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "bulkAccounts": {
                        "title": "Bulk paste accounts",
                        "type": "string",
                        "description": "Paste up to 100 accounts, separated by new lines, commas, or spaces."
                    },
                    "keywordsOrTags": {
                        "title": "Keywords / tags",
                        "type": "array",
                        "description": "Keywords or tags for tag/topic and search modes. Enter tags with or without #.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "bulkKeywordsOrTags": {
                        "title": "Bulk paste keywords",
                        "type": "string",
                        "description": "Paste keywords or tags separated by new lines or commas."
                    },
                    "searchSort": {
                        "title": "Search sort",
                        "enum": [
                            "top",
                            "latest"
                        ],
                        "type": "string",
                        "description": "Sorting preference for keyword search when Threads exposes it in the page.",
                        "default": "top"
                    },
                    "startDate": {
                        "title": "Start date",
                        "type": "string",
                        "description": "Optional absolute start date. Applies best-effort filtering to visible relative timestamps."
                    },
                    "endDate": {
                        "title": "End date",
                        "type": "string",
                        "description": "Optional absolute end date. Applies best-effort filtering to visible relative timestamps."
                    },
                    "relativeDate": {
                        "title": "Relative date",
                        "type": "string",
                        "description": "Optional relative date window, for example: 7 days, 1 month, 24 hours."
                    },
                    "threadUrls": {
                        "title": "Thread URLs",
                        "type": "array",
                        "description": "Single post / thread URLs. Replies are extracted when publicly visible.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL",
                                    "description": "Threads post URL"
                                }
                            }
                        }
                    },
                    "feedUrls": {
                        "title": "Custom feed URLs",
                        "type": "array",
                        "description": "Custom Threads feed URLs to crawl.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL",
                                    "description": "Threads feed URL"
                                }
                            }
                        }
                    },
                    "maxPostsPerAccount": {
                        "title": "Max posts per account",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of posts to return per crawled account or target.",
                        "default": 10
                    },
                    "includeRawText": {
                        "title": "Include raw visible text",
                        "type": "boolean",
                        "description": "Store the full visible page text for debugging parser changes.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
