TikTok Keyword Scraper avatar

TikTok Keyword Scraper

Under maintenance

Pricing

Pay per usage

Go to Apify Store
TikTok Keyword Scraper

TikTok Keyword Scraper

Under maintenance

Scrapes TikTok video search results by keyword using Playwright, with persistent browser profiles, CAPTCHA solving, and an optional Apify residential proxy that can be fully disabled to run direct (no proxy).

Pricing

Pay per usage

Rating

5.0

(2)

Developer

Yara Mohamed

Yara Mohamed

Maintained by Community

Actor stats

1

Bookmarked

7

Total users

5

Monthly active users

a day ago

Last modified

Share

TikTok Keyword Scraper — Apify Actor

Scrapes TikTok video search results by keyword using Playwright, with persistent browser profiles, CAPTCHA solving, proxy rotation via a local relay, and an optional Apify proxy that can be fully disabled to run with no proxy at all.

This package runs two ways from the same codebase:

  • As an Apify Actor (.actor/ + src/) — the intended way to deploy on Apify's cloud.
  • As a local FastAPI server (server.py) — for local development, debugging, or running outside Apify.

Project Structure

.
├── .actor/
│ ├── actor.json # Actor metadata
│ ├── INPUT_SCHEMA.json # Input fields shown in Apify Console (incl. "Use proxy" toggle)
│ └── Dockerfile # apify/actor-python-playwright:3.11 base image
├── src/
│ ├── __init__.py
│ ├── __main__.py # Actor entrypoint (`python -m src`)
│ └── main.py # Reads input, runs keywords concurrently, pushes to dataset
├── browser.py # Browser/context factory, cookies, scroll/nav helpers
├── captcha.py # CAPTCHA detection + solving (SadCaptcha → SolveCaptcha)
├── scrapers.py # scrape_tiktok_search / hashtag / profile / download-url
├── data_helpers.py # Video cleaning, API parsing, shared scroll loop
├── config.py # Constants, proxy pool (now toggleable), selectors
├── profiles.py / profile_pool.py # Persistent browser-profile pool + proxy assignment
├── proxy_relay.py # Local TCP relay (works around Chromium proxy-auth bugs)
├── downloader.py # Streaming video downloader
├── server.py # FastAPI server for local/non-Apify use
├── tiktok_cookies.json # Bundled fallback cookie set (see "Cookie fallback" below)
├── requirements.txt
└── .dockerignore

Running No Proxy at All (the toggle you asked for)

The Actor input has a "Use proxy" checkbox (useProxy, default true).

  • ON (default): every browser session routes through the Apify residential proxy pool (BUYPROXIES94952 group), same as before.
  • OFF: every session connects directly, with no proxy — useful for local testing, debugging, or if your Apify plan has no proxy quota left.

How it works under the hood: src/main.py sets the environment variable USE_PROXY=true|false from that checkbox before any scrape starts. config.get_proxy_pool() checks USE_PROXY on every call (it's not cached at import time), and returns an empty list when proxying is off. With an empty pool, profiles.get_proxy_for_profile() returns None, and browser.make_browser_and_context() already had a "no proxy configured" direct-connection branch — so turning the toggle off requires no other code changes anywhere in the scraper.

You can also flip this manually outside the Actor input by setting the USE_PROXY env var directly (e.g. for local CLI/server runs):

$USE_PROXY=false python server.py

There's also an optional proxyPassword input field (marked secret) if you want to override the Apify proxy password baked into config.py with your own, without editing code.


Apify's containers are ephemeral — /tmp (and therefore every per-profile cookies.json under /tmp/tiktok_profiles/) is wiped between separate Actor runs. A brand-new container's first navigation then has zero session cookies, which is what causes TikTok to silently route searches to the Users tab instead of the Videos tab.

browser.load_cookies() now falls back to the bundled tiktok_cookies.json at the project root whenever a profile has no (or expired) per-profile cookie file yet, giving every fresh container at least one valid baseline session instead of a completely cold one. This file is copied into the Docker image by COPY . ./ in the Dockerfile, so it ships with every build.


Deploying to Apify

npm install -g apify-cli
apify login
cd tiktok-apify-actor/
apify push

apify push builds the Docker image from .actor/Dockerfile and uploads everything else (COPY . ./ in the Dockerfile copies the whole repo root into the image, including src/, the scraper modules, and the bundled cookie file).

Actor Input Example

{
"keywords": ["funny cats", "cooking recipe"],
"maxResults": 50,
"maxConcurrency": 3,
"dateFilter": 0,
"scrollPause": 3.0,
"headless": true,
"useProxy": true
}

Run with no proxy at all:

{
"keywords": ["funny cats"],
"useProxy": false
}

Running via REST API

import requests
APIFY_TOKEN = "your_token_here"
ACTOR_ID = "your_username/tiktok-keyword-scraper"
response = requests.post(
f"https://api.apify.com/v2/acts/{ACTOR_ID}/runs",
headers={"Content-Type": "application/json"},
params={"token": APIFY_TOKEN},
json={"keywords": ["python tutorial"], "maxResults": 30, "useProxy": True},
)
run_id = response.json()["data"]["id"]

Results land in the run's default dataset (one row per video, plus a search_keyword field), and a run-level summary is written to the key-value store under OUTPUT.


Running Locally (FastAPI server, unchanged)

pip install -r requirements.txt
playwright install chromium
python server.py
# → http://localhost:8000/docs
MethodPathDescription
POST/searchSearch by keyword
POST/batch-searchUp to 100 keywords at once
GET/job/{job_id}Poll job status / results
GET/proxy-testTest every proxy in the pool
POST/downloadDownload a video on demand
GET/healthHealth check

Notes

  • maxConcurrency is capped at PROFILE_POOL_SIZE (10) — each parallel job needs its own browser-profile slot.
  • dateFilter matches TikTok's "Posted" search filter: 0=all, 1=24h, 7=week, 30=month, 90=3mo, 180=6mo.
  • This tool is for educational/research purposes only.