# TikTok Keyword Scraper (`devine_device/tiktok-keyword-scraper`) Actor

Scrapes TikTok video search results by keyword using Playwright, with persistent browser profiles, CAPTCHA solving, and an optional Apify residential proxy that can be fully disabled to run direct (no proxy).

- **URL**: https://apify.com/devine\_device/tiktok-keyword-scraper.md
- **Developed by:** [Yara Mohamed](https://apify.com/devine_device) (community)
- **Categories:** Developer tools, Automation, Social media
- **Stats:** 7 total users, 5 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## TikTok Keyword Scraper — Apify Actor

Scrapes TikTok video search results by keyword using Playwright, with persistent
browser profiles, CAPTCHA solving, proxy rotation via a local relay, and an
**optional Apify proxy that can be fully disabled** to run with no proxy at all.

This package runs two ways from the same codebase:

- **As an Apify Actor** (`.actor/` + `src/`) — the intended way to deploy on
  Apify's cloud.
- **As a local FastAPI server** (`server.py`) — for local development,
  debugging, or running outside Apify.

---

### Project Structure

````

.
├── .actor/
│   ├── actor.json          # Actor metadata
│   ├── INPUT\_SCHEMA.json   # Input fields shown in Apify Console (incl. "Use proxy" toggle)
│   └── Dockerfile          # apify/actor-python-playwright:3.11 base image
├── src/
│   ├── **init**.py
│   ├── **main**.py         # Actor entrypoint (`python -m src`)
│   └── main.py             # Reads input, runs keywords concurrently, pushes to dataset
├── browser.py               # Browser/context factory, cookies, scroll/nav helpers
├── captcha.py                # CAPTCHA detection + solving (SadCaptcha → SolveCaptcha)
├── scrapers.py               # scrape\_tiktok\_search / hashtag / profile / download-url
├── data\_helpers.py           # Video cleaning, API parsing, shared scroll loop
├── config.py                  # Constants, proxy pool (now toggleable), selectors
├── profiles.py / profile\_pool.py   # Persistent browser-profile pool + proxy assignment
├── proxy\_relay.py             # Local TCP relay (works around Chromium proxy-auth bugs)
├── downloader.py              # Streaming video downloader
├── server.py                   # FastAPI server for local/non-Apify use
├── tiktok\_cookies.json         # Bundled fallback cookie set (see "Cookie fallback" below)
├── requirements.txt
└── .dockerignore

````

---

### Running No Proxy at All (the toggle you asked for)

The Actor input has a **"Use proxy"** checkbox (`useProxy`, default `true`).

- **ON** (default): every browser session routes through the Apify residential
  proxy pool (`BUYPROXIES94952` group), same as before.
- **OFF**: every session connects **directly, with no proxy** — useful for
  local testing, debugging, or if your Apify plan has no proxy quota left.

How it works under the hood: `src/main.py` sets the environment variable
`USE_PROXY=true|false` from that checkbox **before any scrape starts**.
`config.get_proxy_pool()` checks `USE_PROXY` on every call (it's not cached at
import time), and returns an empty list when proxying is off. With an empty
pool, `profiles.get_proxy_for_profile()` returns `None`, and
`browser.make_browser_and_context()` already had a "no proxy configured"
direct-connection branch — so turning the toggle off requires no other code
changes anywhere in the scraper.

You can also flip this manually outside the Actor input by setting the
`USE_PROXY` env var directly (e.g. for local CLI/server runs):

```bash
USE_PROXY=false python server.py
````

There's also an optional `proxyPassword` input field (marked secret) if you
want to override the Apify proxy password baked into `config.py` with your
own, without editing code.

***

### Cookie Fallback (Apify `/tmp` wipe fix)

Apify's containers are ephemeral — `/tmp` (and therefore every per-profile
`cookies.json` under `/tmp/tiktok_profiles/`) is wiped between separate Actor
runs. A brand-new container's first navigation then has zero session cookies,
which is what causes TikTok to silently route searches to the Users tab
instead of the Videos tab.

`browser.load_cookies()` now falls back to the bundled `tiktok_cookies.json`
at the project root whenever a profile has no (or expired) per-profile cookie
file yet, giving every fresh container at least one valid baseline session
instead of a completely cold one. This file is copied into the Docker image
by `COPY . ./` in the Dockerfile, so it ships with every build.

***

### Deploying to Apify

```bash
npm install -g apify-cli
apify login
cd tiktok-apify-actor/
apify push
```

`apify push` builds the Docker image from `.actor/Dockerfile` and uploads
everything else (`COPY . ./` in the Dockerfile copies the whole repo root
into the image, including `src/`, the scraper modules, and the bundled
cookie file).

#### Actor Input Example

```json
{
  "keywords": ["funny cats", "cooking recipe"],
  "maxResults": 50,
  "maxConcurrency": 3,
  "dateFilter": 0,
  "scrollPause": 3.0,
  "headless": true,
  "useProxy": true
}
```

Run with no proxy at all:

```json
{
  "keywords": ["funny cats"],
  "useProxy": false
}
```

#### Running via REST API

```python
import requests

APIFY_TOKEN = "your_token_here"
ACTOR_ID    = "your_username/tiktok-keyword-scraper"

response = requests.post(
    f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs",
    headers={"Content-Type": "application/json"},
    params={"token": APIFY_TOKEN},
    json={"keywords": ["python tutorial"], "maxResults": 30, "useProxy": True},
)
run_id = response.json()["data"]["id"]
```

Results land in the run's default **dataset** (one row per video, plus a
`search_keyword` field), and a run-level summary is written to the
**key-value store** under `OUTPUT`.

***

### Running Locally (FastAPI server, unchanged)

```bash
pip install -r requirements.txt
playwright install chromium
python server.py
## → http://localhost:8000/docs
```

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/search` | Search by keyword |
| `POST` | `/batch-search` | Up to 100 keywords at once |
| `GET` | `/job/{job_id}` | Poll job status / results |
| `GET` | `/proxy-test` | Test every proxy in the pool |
| `POST` | `/download` | Download a video on demand |
| `GET` | `/health` | Health check |

***

### Notes

- `maxConcurrency` is capped at `PROFILE_POOL_SIZE` (10) — each parallel job
  needs its own browser-profile slot.
- `dateFilter` matches TikTok's "Posted" search filter: `0`=all,
  `1`=24h, `7`=week, `30`=month, `90`=3mo, `180`=6mo.
- This tool is for educational/research purposes only.

# Actor input Schema

## `keywords` (type: `array`):

One or more keywords/phrases to search on TikTok. Each one runs as its own job.

## `maxResults` (type: `integer`):

Stop scrolling once this many videos have been collected for a keyword.

## `maxConcurrency` (type: `integer`):

How many keywords to scrape in parallel in this run. Each one claims its own persistent browser-profile slot (profile pool size is 10, so keep this at or below 10).

## `dateFilter` (type: `string`):

TikTok's "Posted" search filter.

## `scrollPause` (type: `number`):

Delay between scroll actions. Lower is faster but more likely to trigger anti-bot detection.

## `headless` (type: `boolean`):

Keep this ON for Apify cloud runs — there is no display in the container. Turn it OFF only when running locally for visual debugging.

## `useProxy` (type: `boolean`):

When ON, every browser session routes through the Apify residential proxy pool. Turn this OFF to run with NO proxy at all — every session connects directly. Useful for local testing, debugging, or if your Apify plan has no proxy quota left.

## `proxyCountry` (type: `string`):

2-letter country code to pin the residential proxy's exit IP (e.g. "EG" for Egypt). Matching the proxy's country to the search keyword's language/region significantly improves TikTok search result relevance. Leave blank for auto/any country. Ignored when "Use proxy" is OFF.

## `proxyPassword` (type: `string`):

Override the Actor's built-in Apify residential-proxy password with your own. Leave blank to use the default baked into the Actor. Ignored entirely when "Use proxy" is OFF.

## Actor input object example

```json
{
  "keywords": [
    "funny cats"
  ],
  "maxResults": 50,
  "maxConcurrency": 3,
  "dateFilter": "0",
  "scrollPause": 3,
  "headless": true,
  "useProxy": true
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "keywords": [
        "funny cats"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("devine_device/tiktok-keyword-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "keywords": ["funny cats"] }

# Run the Actor and wait for it to finish
run = client.actor("devine_device/tiktok-keyword-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "keywords": [
    "funny cats"
  ]
}' |
apify call devine_device/tiktok-keyword-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=devine_device/tiktok-keyword-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "TikTok Keyword Scraper",
        "description": "Scrapes TikTok video search results by keyword using Playwright, with persistent browser profiles, CAPTCHA solving, and an optional Apify residential proxy that can be fully disabled to run direct (no proxy).",
        "version": "1.0",
        "x-build-id": "piyZT4sWoIbyfLGjv"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/devine_device~tiktok-keyword-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-devine_device-tiktok-keyword-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/devine_device~tiktok-keyword-scraper/runs": {
            "post": {
                "operationId": "runs-sync-devine_device-tiktok-keyword-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/devine_device~tiktok-keyword-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-devine_device-tiktok-keyword-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "keywords"
                ],
                "properties": {
                    "keywords": {
                        "title": "Search keywords",
                        "type": "array",
                        "description": "One or more keywords/phrases to search on TikTok. Each one runs as its own job.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxResults": {
                        "title": "Max videos per keyword",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Stop scrolling once this many videos have been collected for a keyword.",
                        "default": 50
                    },
                    "maxConcurrency": {
                        "title": "Max concurrent browsers",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "How many keywords to scrape in parallel in this run. Each one claims its own persistent browser-profile slot (profile pool size is 10, so keep this at or below 10).",
                        "default": 3
                    },
                    "dateFilter": {
                        "title": "Posted within",
                        "enum": [
                            "0",
                            "1",
                            "7",
                            "30",
                            "90",
                            "180"
                        ],
                        "type": "string",
                        "description": "TikTok's \"Posted\" search filter.",
                        "default": "0"
                    },
                    "scrollPause": {
                        "title": "Scroll pause (seconds)",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "number",
                        "description": "Delay between scroll actions. Lower is faster but more likely to trigger anti-bot detection.",
                        "default": 3
                    },
                    "headless": {
                        "title": "Run headless",
                        "type": "boolean",
                        "description": "Keep this ON for Apify cloud runs — there is no display in the container. Turn it OFF only when running locally for visual debugging.",
                        "default": true
                    },
                    "useProxy": {
                        "title": "Use proxy",
                        "type": "boolean",
                        "description": "When ON, every browser session routes through the Apify residential proxy pool. Turn this OFF to run with NO proxy at all — every session connects directly. Useful for local testing, debugging, or if your Apify plan has no proxy quota left.",
                        "default": true
                    },
                    "proxyCountry": {
                        "title": "Proxy exit country (optional)",
                        "type": "string",
                        "description": "2-letter country code to pin the residential proxy's exit IP (e.g. \"EG\" for Egypt). Matching the proxy's country to the search keyword's language/region significantly improves TikTok search result relevance. Leave blank for auto/any country. Ignored when \"Use proxy\" is OFF."
                    },
                    "proxyPassword": {
                        "title": "Apify proxy password (optional override)",
                        "type": "string",
                        "description": "Override the Actor's built-in Apify residential-proxy password with your own. Leave blank to use the default baked into the Actor. Ignored entirely when \"Use proxy\" is OFF."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
