Pricing

from $20.00 / 1,000 results

SEO Keyword Extractor

Finds keyword phrases from a list of websites 🌐, groups similar ones into clear themes 🧩, and ranks them. Also suggests good main keywords ⭐ and possible negative keywords 🚫 so you can plan SEO and ad campaigns in a smarter, more focused way 📈.

Pricing

from $20.00 / 1,000 results

Rating

0.0

(0)

Developer

Chris Xavier

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

🔍 SEO Keyword Theme & Negative Keyword Analyzer 🚀

📘 Overview

This actor takes one or more URLs, extracts high-value multi-word SEO keyphrases, and then:

Clusters common cross-site keyword families (semantic variants across multiple domains).
Computes n-gram stats (e.g. “real estate lawyer”, “fort lauderdale real estate lawyer”) only for phrases that show up on multiple sites.
Builds keyword themes (ranked topics with all their variants and sites).
Suggests candidate negative keywords (likely competitor names / one-off phrases that only appear on a single site).

It’s built for serious competitive research, PPC planning, and semantic SEO clustering across your niche 🌐✨

🌟 Use Cases

💼 Scenario	📈 Benefit
🔎 Competitor keyword intelligence	See which phrases multiple competitors converge on (strong themes) vs. one-off phrases (weak or brand-specific).
🧩 Local + practice-area SEO	Quickly surface geo + service combos like “fort lauderdale real estate lawyer” or “west palm beach probate attorney.”
🧠 Semantic clustering & topic planning	Get “keyword themes” with a primary phrase, all variants, and which sites use them.
🎯 PPC campaign & ad group design	Use themes as ad groups and variants as match types; use single-site phrases as negative keyword candidates.
🧹 Keyword cleanup & noise reduction	Filters out junky code-like phrases, numeric strings, and odd technical terms by default.

🧪 Output Structure

Results are written as flat dataset rows so they’re easy to export to CSV, Sheets, or BI tools. Each row has a record_type that tells you what kind of entity it is.

1️⃣ Per-page keywords

One row per URL:

{
  "record_type": "page_keywords",
  "page_url": "https://example.com",
  "top_keywords": [
    "west palm beach real estate attorney",
    "florida real estate lawyers",
    "business litigation fort lauderdale"
  ]
}

2️⃣ Common cross-site keyword families

Clusters of similar phrases that show up on more than one site, with similarity metrics:

{
  "record_type": "common_cross_site_keywords",
  "group_representative": "florida real estate attorney",
  "group_keywords": [
    "florida real estate attorney",
    "florida real estate lawyers",
    "florida real estate law",
    "law florida real estate",
    "real estate litigation attorneys"
  ],
  "keyword_count": 5,
  "site_count": 3,
  "sites": [
    "https://a.com",
    "https://b.com",
    "https://c.com"
  ],
  "levenshtein_avg_distance": 0.31,
    "levenshtein_max_distance": 0.53
}


Use these rows to see:

- Which concepts recur across domains (`site_count`).
- How tight the wording cluster is (lower Levenshtein distances = more similar).



### 3️⃣ N-gram stats (cross-site phrases)

For each n (2, 3, …), the actor aggregates n-grams that appear on **at least 3 different sites** (strong cross-site themes):

```json
{
  "record_type": "ngram_3",
  "ngram": "fort lauderdale real",
  "n": 3,
  "count": 8,
  "site_count": 4,
  "sites": [
    "https://a.com",
    "https://b.com",
    "https://c.com",
    "https://d.com"
  ],
  "sample_keywords": [
    "fort lauderdale real estate",
    "lauderdale real estate lawyer",
    "lauderdale real estate attorneys"
  ]
}

This is great for spotting standard phrases in the market (“real estate lawyer”, “west palm beach”, etc.).

4️⃣ Group-to-group similarity (Jaccard)

When two cross-site keyword families heavily overlap in their token sets, they’re connected with a Jaccard score:

{
  "record_type": "group_similarity",
  "group_a": "florida real estate attorney",
  "group_b": "real estate lawyer",
  "similarity": 0.63
}

These tell you which keyword families are basically talking about the same thing and should probably be treated as one theme in your planning.

5️⃣ Keyword themes (the “use this in campaigns” layer)

Themes merge similar groups into higher-level topics and rank them:

{
  "record_type": "keyword_theme",
  "primary_keyword": "florida real estate attorney",
  "score": 0.95,
  "site_count": 3,
  "groups_in_theme": 2,
  "all_variants": [
    "florida real estate attorney",
    "florida real estate law",
    "florida real estate lawyers",
    "law florida real estate",
    "real estate litigation attorneys"
  ],
  "all_sites": [
    "https://a.com",
    "https://b.com",
    "https://c.com"
  ]
}

How to use these:

Treat each keyword_theme as:
- A core SEO topic / pillar page, or
- A PPC ad group (primary = ad group name, variants = match types / ad copy phrases).

Higher score = stronger candidate.

6️⃣ Candidate negative keywords

The actor also flags n-grams that only appear on one site as negative keyword candidates (often brand names or very specific, non-generic terms):

{
  "record_type": "negative_keyword_candidate",
  "phrase": "ryan shipp",
  "n": 2,
  "count": 3,
  "site_count": 1,
  "sites": [
    "https://competitor.com"
  ],
  "reason": "single_site_ngram"
}

These are not auto-applied negatives. They’re suggestions that you should manually review before adding to a PPC negative list (especially competitor names or hyper-specific phrases you don’t want to pay for).

⚙️ Input

Required fields

{
  "urls": [
    { "url": "https://example.com" },
    { "url": "https://another-site.com" }
  ],
  "min_ngram_n": 2
}

urls (array)
- Uses the requestListSources editor in Apify.
- Accepts either { "url": "..." } objects or plain strings "https://...".
min_ngram_n (integer, optional, default 2)
- The minimum n-gram length to analyze.
- 2 = start at bigrams (“real estate”), 3 = only 3+ word phrases (“real estate lawyer”, “fort lauderdale real estate”).
- Unigrams (single words) are never computed to keep noise down.

Internally, the actor analyzes n-grams from min_ngram_n up to a safe cap (currently 6) to avoid combinatorial blow-ups on very long phrases.

🔄 How it works (under the hood)

Fetch & clean
- Fetches each URL via HTTP.
- Strips scripts, styles, and other noise and extracts visible text.
Keyword extraction
- Uses a transformer-based model (all-MiniLM-L6-v2 via KeyBERT) to extract multi-word keyphrases from the page content.
- Filters out:
  - Numeric strings
  - Code-y / technical junk
  - Blacklisted tokens (e.g., obvious non-SEO boilerplate)
- Keeps the most relevant 2–4 word keyphrases per page.
Cross-site aggregation
- Clusters similar phrases across sites using RapidFuzz (token-set similarity).
- Keeps only clusters seen on multiple domains.
- Computes Levenshtein distances inside each cluster to quantify how tight/loose the variants are.
N-gram analysis
- Builds n-gram stats across pages:
  - Only n in [min_ngram_n, 6].
  - Only n-grams seen on ≥ 3 sites are kept as strong cross-site themes.
Theme building
- Builds a graph of keyword groups connected by high Jaccard similarity.
- Collapses connected components into themes.
- Scores each theme by:
  - Cross-site importance (how many sites use it).
  - Cohesion (Levenshtein-based).
  - Phrase length (favoring 2–4 word phrases).
Negative keyword suggestions
- Separately scans all phrases for n-grams that appear on exactly one site.
- Emits them as negative_keyword_candidate rows for manual review.

💰 Monetization & Scaling

This actor is designed to work cleanly with Apify Pay-Per-Event (PPE):

One event per run – apify-actor-start
Charge per actor start (each run).
One event per result row – apify-default-dataset-item
Every Actor.push_data(...) call creates a dataset item, which can be billed as a per-item event.

That means:

Small runs with a few URLs → a handful of items → lower cost.
Large competitive sweeps (many domains) → more items (pages, cross-site keywords, themes, negatives) → higher cost but also richer insight.

You can control cost by:

Limiting the number of input URLs.
Truncating or filtering which record types you care about (e.g., only page_keywords + keyword_theme).

🔄 Workflow Examples

This actor is workflow-ready and plays nicely with other Apify tools:

🔗 Integration	🔍 Description
`serp-scraper`	Scrape top-ranking Google results for a query, then feed the URLs here to see the shared themes across the SERP.
`map-scraper`	Collect local business websites from Google Maps, then compare cross-site phrasing for local SEO campaigns.
Other actors	Build end-to-end automations: harvest → extract → cluster → export to Sheets/Data Studio.

🚀 Ready to Launch?

Use this actor when you want more than just a list of keywords:

See which phrases truly define your niche (themes & n-grams).
Separate generic market language from brand-specific noise.
Build better SEO topics, tighter PPC ad groups, and smarter negative lists.

Perfect for:

SEO agencies
Performance marketers
Local law firms & service businesses
Content strategists and SERP analysts

Happy crawling & clustering! 🚀🌐

Google Keyword Scraper ( With SEO Metrics )

dxbear/google-keyword-scraper

Discover keyword ideas instantly, scrapes Google autocomplete, expands with similar keywords, and adds SEO metrics.

Dxbear

2.9

Google Keyword Data Scraper ( BULK )

dxbear/google-keyword-data-scraper

Scrap Keyword Metrics From Public SEO Tools Get search volumes Keyword Ranking Difficulty and more, Research Keywords in bulk , for more than 1000 keywords in seconds, Find best opportunity to rank higher.

Dxbear

679

5.0

Airbnb SEO Keyword Scraper

moving_beacon-owner1/airbnb-seo-keyword-scraper

The Airbnb SEO Keyword Scraper is a tool designed to extract essential SEO data from Airbnb listings, including the property title, meta description, location, SEO search text, image URL, and more.

Jamshaid Arif

Ai SEO Content Curator

quaking_pail/ai-seo-content-markdown-scraper

The SEO Actor performs a full SEO audit for each URL, extracting key SEO metrics like titles, meta descriptions, and keywords. It also retrieves network information and integrates SEO audit data providing a comprehensive analysis stored in an organized database for further use.

AI_Builder

5.0

Keywords Extractor

lukaskrivka/keywords-extractor

Use our free website keyword extractor to crawl any website and extract keyword counts on each page.

Lukáš Křivka

807

4.8

Ranked Keywords Scraper with SEO Metrics

datapilot/ranked-keywords-scraper-with-seo-metrics

This Apify Actor performs SEO keyword extraction from general (non-GitHub) websites. It scrapes anchor text from page links, generates keyword rankings, simulates monthly search volume and CPC values, and stores structured keyword data in the Apify Dataset for further analysis.

Data Pilot

Ranked Keywords Scraper

easyapi/ranked-keywords-scraper

Discover new keyword ideas and uncover valuable search insights! This actor fetches related keywords, search volumes, and CPC data from Umbrellum's Keyword Discovery tool, helping you optimize your content strategy, SEO efforts, and PPC campaigns. 🔍💡🚀

EasyApi

147

5.0

Keyword Density Analyzer

zerobreak/keyword-density-analyzer

Keyword density analyzer that fetches any URL and ranks 1-, 2-, and 3-word phrases by frequency and density — giving SEO teams instant on-page keyword insights to optimize content and avoid stuffing penalties.

ZeroBreak

Australia Business Scraper

proscraper/australia-business-scraper

Scrapes Australian businesses from any city, zip code, keywords and categories. Input locations and keywords, and the scraper will create all possible combinations and start the scrape. Monthly payment, so you can use this scraper for bulk scraping at low cost.

Owais Nazir

Elite Seo Analytics Lite

thepattyroller/elite-seo-analytics-lite

Comprehensive SEO analysis tool. Extract meta tags, analyze keywords, check page structure, and get actionable SEO recommendations. Perfect for quick SEO audits and on-page optimization.