Bizsleuth
Pricing
from $5.00 / 1,000 processing urls
Bizsleuth
An AI powered lead generation tool that can extract useful information from business websites.
Pricing
from $5.00 / 1,000 processing urls
Rating
4.6
(2)
Developer
Ashar Malik
Maintained by CommunityActor stats
1
Bookmarked
14
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
๐ต๏ธโโ๏ธ BizSleuth โ AI-Powered Business Intelligence Tool
BizSleuth is an AI-powered web scraper that analyzes company websites to extract high-value business intelligence. It goes beyond simple regex matching by using Large Language Models (LLMs) to understand the content and identify specific details like business owners, operational size, and contact information.
You define what you want to extract. BizSleuth handles the rest.
๐ก What does BizSleuth extract?
By default, BizSleuth extracts:
- Business name
- Business owner / founder name
- Business address
- Business size (small / medium / large)
- Contact email
- Phone number
- Business summary
But you're not limited to these. The output schema is fully customizable โ you can add, remove, or replace fields with anything you want.
๐ฏ Use cases
- Lead generation โ Build prospect lists enriched with owner names, emails, and business summaries.
- Market research โ Survey a category of businesses and pull structured data at scale.
- Sales prospecting โ Find out which booking platforms or tools your prospects use before reaching out.
- Directory building โ Populate or refresh a business directory straight from each company's website.
โ๏ธ How it works
For each URL you provide, BizSleuth runs a two-stage crawling pipeline:
Stage 1 โ Fast HTTP crawl The actor sends standard HTTP requests to the homepage and crawls internal pages. This is fast and handles the majority of websites.
Stage 2 โ Browser fallback If a URL fails in Stage 1 โ because the site requires JavaScript to render โ BizSleuth retries it using a full Playwright browser. This covers single-page apps and JS-heavy sites that a plain HTTP crawler would miss.
Stage 3 - AI parsing Once the text is collected, it's sent to Gemini AI along with your field definitions. The AI extracts what it can find and returns a structured result for each URL.
๐ ๏ธ How to use BizSleuth
- Click Try for free on the actor page.
- Paste your URL(s) directly, or upload a
.txtfile with one URL per line.Use the root homepage of each website โ e.g.
https://example.comโ not a deep link. BizSleuth crawls outward from there. - (Optional) Open Advanced Options โ Output Fields to customize what you want extracted.
- Click Run and wait for it to finish.
- Download your results from the Dataset tab as JSON, CSV, XLSX, or JSONL.
๐ฅ Input
| Field | Type | Required | Description |
|---|---|---|---|
startUrls | Array | Yes | URLs to scrape. Accepts direct entries or a .txt file (one URL per line). |
outputSchema | Object | No | Fields to extract. Key = field name, value = plain-English description for the AI. Uses the default schema if omitted. |
{"startUrls": [{ "url": "https://www.example-business.com" },{ "url": "https://another-company.com" }]}
๐ค Output
Each item in the dataset corresponds to one successfully processed URL. The url field is always included.
{"business_name": "Bloom Wellness Studio","business_owner_name": "Sarah Chen","business_address": "418 West 3rd Ave, Vancouver, BC V5Y 1E5","business_size": "small","contact_email": "hello@bloomwellness.ca","phone_number": "+1-604-555-0172","business_summary": "A boutique yoga and pilates studio offering small-group classes, private sessions, and corporate wellness programs.","url": "https://www.bloomwellness.ca"}
- Fields the AI couldn't find are returned as
"none". - URLs that fail to load after both crawl stages are skipped and excluded from the output.
โ๏ธ Customizing the output schema
The output schema is the most powerful part of BizSleuth. Each key becomes a field in your output, and the value is a plain-English description that tells the AI what to look for. You can extract practically anything that appears on a website.
Here's an example for a fitness studio lead list:
{"startUrls": [{ "url": "https://www.example-studio.com" }],"outputSchema": {"studio_name": "The name of the studio","owner_name": "The owner or founder's name","class_types": "Types of fitness classes offered, e.g. yoga, pilates, HIIT, barre","booking_platform": "Online booking software used, e.g. Mindbody, Vagaro, ClassPass, Jane","instagram_url": "Instagram page URL of the business","membership_offered": "Whether the studio offers memberships or class packs โ yes or no"}}
The more specific your descriptions, the better the results.
โ ๏ธ Limitations
- Homepage URL required: Provide the root URL, not a deep link. The actor crawls from whatever URL you give it.
- JavaScript-heavy sites: Most are covered by the browser fallback, but heavily bot-protected or CAPTCHA-gated sites may still fail.
- Text-only extraction: The AI works from page text. Information that only exists in images won't be extracted.
- AI accuracy: The AI won't invent information โ if something isn't on the site, it returns
"none". That said, like any LLM, it can occasionally misread complex or cluttered pages.
โ Frequently asked questions
What kind of URLs should I provide?
Always use the root homepage โ e.g. https://example.com. The actor crawls from there and discovers internal pages on its own.
Can I upload a large list of URLs?
Yes. Use the file upload option and provide a .txt file with one URL per line.
What happens to sites that fail to load? They're silently skipped. You'll only see results for URLs that were successfully processed.
How accurate is the AI extraction?
It depends on whether the information is actually on the website. The AI won't make things up โ if a field isn't there, it returns "none". Writing specific descriptions in your output schema helps significantly on ambiguous pages.
Can I use this on social media profiles? BizSleuth is built for business websites. Social platforms typically block scrapers or require authentication, so results there would be unreliable.