Pricing

from $0.01 / 1,000 results

Analyze Website Content: Extract Keywords and Terminology

The tool analyzes the textual content of a website, scrapes pages, cleans the html, analyze text and extract the terminology (keywords, words, n-grams and seed related keywords). It can be used to identify the main topics covered, analyze competitor content, find new ideas or trends and help for SEO

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

LilaK

Actor stats

Bookmarked

Total users

Monthly active users

19 days ago

Last modified

Analyze Website Content

Description

This tool allows you to analyze the textual content of a given website or domain name. The tool scrapes the pages at a given depth, cleans the html pages (removes unimportant text such as navigation links and menus), analyze the text and extract the content terminology (keywords, words, ngrams and terms related to given seed keywords). Terminology or keywords extraction allows to summarize the content of a website and identify the main topics covered. The tool can be used for many applications: SEO keyword research, analyzing competitor content, find new ideas and trends, etc.

Main features

Scrape a given website at a specific depth (extra domain links are ignored)
Clean and process HTML and plain text pages
Extract the most frequent words (single word terms) and most frequent ngrams (multiple word terms made of 2 to 4 words / bigrams, trigrams and quadrigrams)
Extract keywords from the HTML metadata
Identify terms similar to given seed keywords
Merge all the extracted data, by language, for a global website analysis.
Extract social media links and emails
Output results in CSV/JSON formats and SVG wordcloud images

Supported formats

HTML
Plain text

Language identification

The language is identified for each scraped page.
The identified language is affected to the terms extracted from the page.
Language stopwords (the most common words, short function words, such as the, is, at, which, etc for english) are used to filter the final term list.
Stopwords are discarded in words, and forbidden as first or last word af an ngram.

Supported languages

French, English, German, Spanish, Italian, Portuguese

Input

The main input of the tool is a starting url for the website to process.
A set of seed keywords. If provided, all terms (metadata keywords, ngrams or words) similar to one of the seed keywords will be identified and grouped together in a separate category (Seed Related).

Input configuration

Output

The result of analysis is:

A dataset with the most frequent extracted terms. The data includes keywords, seed related terms, ngrams and words. For each term: value, frequency, language, type and seed keyword are given. The dataset can be found in Output and storage (terms.json)

Terms table view with seed keywords

The scraped pages list is provided in JSON format. Each page is described by: url, title, description, author, date, keywords and language. The file can be found in storage (pages.json)
The emails and social media links are provided in JSON format. The file can be found in storage (contact.json)
A global file combining all the output (terms, contact and pages) can be found in storage (all.json)

Storage key Store contents

The extracted terms can be represented as wordcloud SVG images. The images can be found in storage (wordcloud..svg)

Seed keywords: web and scraper

Wordcloud representation of the seed related terms

WordCloud representation of the general terms (ngrams and words)

Wordcloud representation

Your feedback

If you’ve got any technical feedback, a bug to report or any suggestion to improve the actor usage, please create an issue on the Actor’s Issues tab.

Advanced Website Email, Phone and Social Media Scraper

perfectscrape/actor

This advanced contact scraper is an ALL-IN-ONE scraper that navigates pages likely to contain contact data, extracting emails, phone numbers, and social media links, with precision and speed. This scraper can bypass cloudflare and captchas. Very good scraper for lead generation.

Sadnan

887

2.9

Extract Contact Details from Any Website – Email, Phone, Social

creative_tablecloth/extract-email-phone-social-media-from-any-website

Discover our powerful scraper that effortlessly extracts emails, phone numbers, and social media links from any website. Ideal for marketers and businesses seeking to enhance their contact database quickly and efficiently.

Jinny Kim

2.9K

3.5

Website Email Scraper - All Contacts

thenetaji/website-email-scraper

Extract videos, images, audio, APKs & emails from websites. This Apify actor crawls pages to discover media links with configurable depth, proxy support & domain filtering. Boost content research & lead gen.

The Netaji

842

5.0

📧✨ Extract Emails, Socials and Contacts from Any Website

logical_scrapers/extract-email-from-any-website

(fastest) An advanced Actor for extracting email addresses, social links and contact details from websites. This tool is perfect for web scraping, contact collection, and lead generation.

Goldmine

1.7K

5.0

Website Emails Scraper

maximedupre/website-emails-scraper

It goes to a website and extracts every email addresses. Super simple.

Maxime

319

3.8

Website Contact & Socials Extractor

embion/website-contact-socials-extractor

Crawl company websites and extract emails, phone numbers and links to Discord, Facebook, Instagram, LinkedIn, Pinterest, Reddit, Snapchat, Telegram, TikTok, Twitch, Twitter/X and YouTube. 2 hour trial available.

Embion

790

3.8

Website Email Scraper

contacts-api/website-email-scraper

Collect verified email addresses with our Website Email Scraper. Extract emails from websites quickly for outreach, marketing campaigns, and lead generation.

Lead Heaven

Website Contact Scraper | Enterprise-Grade | $12 / mo

fatihtahta/email-scraper-deep

The most reliable, enterprise-grade email scraper for any website. Uses a unique hybrid search (fast check + deep scan) to find public emails with an industry-leading success rate. Perfect for building high-quality lead lists for sales, marketing, and outreach.

Fatih Tahta

109

5.0

Email ✉️ & Phone ☎️ Extractor

anchor/email-phone-extractor

Extract emails, phone numbers, and other contact information like Twitter, LinkedIn, Instagram... from websites you provide. Best for lead generation and data enrichment. Export data in structured formats and dominate your outreach game. Capture your leads almost for free, fast, and without limits.

Anchor

7.9K

4.2

Extract Emails from any website

scraplib/extract-emails-from-any-website

Extract email addresses from any website. Whether you're scraping a single company website or automating bulk email collection across thousands of URLs, this actor ensures high accuracy and scalability.