# SourceForge & TrustRadius — Software Vendor Leads (`ryanclinton/g2-company-scraper`) Actor

Scrapes SourceForge and TrustRadius for software company leads by category. Returns vendor name, website, rating, review count, pricing tier, and category tags. Filter by rating or review count. $0.05 per company.

- **URL**: https://apify.com/ryanclinton/g2-company-scraper.md
- **Developed by:** [ryan clinton](https://apify.com/ryanclinton) (community)
- **Categories:** Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $100.00 / 1,000 company scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Software Directory Scraper — SourceForge & TrustRadius

Software directory scraper that extracts software company leads from SourceForge and TrustRadius by category. Point it at any category slug — `crm`, `project-management`, `email-marketing` — and it returns structured records with vendor name, website, star rating, review count, pricing tier, and category tags. Built for sales teams, marketing agencies, and SaaS founders who need targeted lists of software companies without manual browsing.

The actor runs on CheerioCrawler, which means it is fast, lightweight, and requires no proxies — SourceForge and TrustRadius serve their listing pages to datacenter IPs without blocking. Both sources are scraped simultaneously, results are deduplicated by domain, and quality filters let you exclude low-review or low-rated products before they reach your dataset.

### What data can you extract?

| Data Point | Source | Example |
|---|---|---|
| 🏢 **Company Name** | SourceForge / TrustRadius | Pinnacle CRM Technologies |
| 📦 **Product Name** | SourceForge / TrustRadius | PinnacleCRM Pro |
| 🌐 **Website URL** | SourceForge / TrustRadius | https://pinnaclecrm.io |
| 🔗 **Domain** | Extracted from website | pinnaclecrm.io |
| 📋 **Directory Profile URL** | SourceForge / TrustRadius | https://sourceforge.net/software/product/pinnaclecrm/ |
| ⭐ **Rating** | SourceForge / TrustRadius | 4.3 (unified 1–5 scale) |
| 💬 **Review Count** | SourceForge / TrustRadius | 1,842 |
| 💰 **Pricing Tier** | SourceForge / TrustRadius | $29/month |
| 🏷️ **Categories** | SourceForge / TrustRadius | ["crm", "Sales Force Automation"] |
| 🏆 **Badges** | SourceForge only | ["Leader", "Top Performer"] |
| 📝 **Description** | SourceForge / TrustRadius | Cloud-based CRM for SMB sales teams... |
| 🗂️ **Source** | Actor metadata | sourceforge |

### Why use Software Directory Scraper?

Building a list of software companies in a niche by hand means clicking through dozens of directory pages, copying names into a spreadsheet, and Googling websites one by one. For a single category on SourceForge you might spend two hours to collect 50 companies — with no rating data, no pricing context, and no structured output.

This actor automates the entire process. Provide a list of category slugs and it crawls SourceForge pagination (page-by-page using `?page=N`) and TrustRadius product sitemaps (fetching from `product-reviews-sitemap-1.xml` through sitemap 5, giving 2,500+ product URLs per crawl). Every record is cleaned, normalized, and written to the Apify dataset in minutes.

- **Scheduling** — run weekly to refresh your list of active software vendors as new products are added to directories
- **API access** — trigger runs from Python, JavaScript, or any HTTP client and pipe results directly into your CRM or enrichment pipeline
- **Proxy rotation** — proxies are not required for these sources, but the actor accepts an optional proxy configuration for custom setups
- **Monitoring** — get Slack or email alerts when runs fail or produce fewer results than expected via Apify's built-in monitoring
- **Integrations** — connect to Zapier, Make, Google Sheets, HubSpot, or webhooks without writing a line of code

### Features

- **Dual-source scraping** — crawls both SourceForge (100k+ products, paginated category listings) and TrustRadius (B2B-focused, Next.js SSR pages) in a single run, configurable per source
- **Automatic pagination on SourceForge** — follows `?page=N` links until the per-category limit is reached or no more listing cards are found
- **TrustRadius sitemap discovery** — parses `product-reviews-sitemap-{1..5}.xml` files to collect product URLs, then fetches each product page individually for structured data
- **Dual-extraction strategy for TrustRadius** — first attempts to parse the `__NEXT_DATA__` JSON embedded in the server-rendered page across three property paths (`pageProps.product`, `pageProps.data.product`, `pageProps.productReviews.product`), then falls back to eight named `data-testid` HTML selectors
- **Unified rating scale** — TrustRadius uses a 10-point scoring system; the actor converts all scores to a 1–5 scale using `Math.round((val / 2) * 10) / 10` so ratings from both sources are directly comparable
- **Domain deduplication** — strips `www.` prefixes, normalizes to registrable domain, and tracks seen domains in a shared `Set<string>` across all categories and sources to prevent duplicate vendor rows
- **Per-category per-source limits** — the `maxCompaniesPerCategory` limit applies independently to each `{source}:{category}` pair, so a limit of 50 means up to 50 from SourceForge CRM and 50 from TrustRadius CRM
- **Quality filters** — `minReviews` and `minRating` filters are applied after extraction and before charging; products that fail are logged and skipped at no cost
- **Pricing normalization** — raw pricing strings are mapped to standard tiers: `Free`, `Freemium`, `Open Source`, `Contact Vendor`, or a cleaned price string like `$29/month`
- **Badge extraction from SourceForge** — scrapes award badges from `.badge-container .badge` and `[class*="award"]` elements, useful for identifying "Leader" and "Top Performer" products
- **Resilient SourceForge selectors** — uses four CSS selector strategies (`[class*="project-cell"]`, `.sf-project-listing-item`, `ul.projects-listing > li`, `.inner-cell`) with a filter for elements containing an `h3 a` title link, ensuring coverage across markup changes
- **Pay-per-event billing** — charged $0.05 per company that passes quality filters; the actor stops automatically when your spending limit is reached and data is always pushed before the charge fires
- **Run summary record** — every run ends with a `type: "summary"` record showing totals by category and source, useful for monitoring and pipeline auditing

### Use cases for software directory scraping

#### Sales prospecting for SaaS tools

Sales development reps building outbound lists can use this actor to find every CRM, help-desk, or marketing-automation vendor in a category. With website and domain data in the output, results feed directly into [Website Contact Scraper](https://apify.com/ryanclinton/website-contact-scraper) to find decision-maker emails, or into [Waterfall Contact Enrichment](https://apify.com/ryanclinton/waterfall-contact-enrichment) for a full contact cascade. A list of 200 CRM vendors takes under 10 minutes and costs $10.

#### Marketing agency lead generation

Agencies that serve software companies — design studios, content agencies, SEO firms — can scrape target categories to find prospect companies with their websites pre-extracted. Filter by `minReviews: 10` to exclude unestablished products and focus on vendors that are already investing in their market presence. Rating data helps prioritize outreach toward well-reviewed products that likely have marketing budgets.

#### Competitive intelligence and market mapping

Founders and product managers can scrape their own category to map the competitive landscape. The output includes category tags, pricing tiers, and badge data, giving a structured view of which products lead the category. Combine with [Website Tech Stack Detector](https://apify.com/ryanclinton/website-tech-stack-detector) to identify which technology platforms your competitors are built on.

#### Data enrichment for existing company lists

If you already have a list of software company domains, run this actor to add rating, review count, pricing tier, and category context from SourceForge and TrustRadius. The `domain` field enables joining with your existing data. Set `deduplicateByDomain: false` when you want complete coverage across multiple categories for the same company.

#### Recruiting and talent sourcing

Recruiters targeting software companies in specific verticals can use category data to find employers. A search for `project-management` or `hr-software` returns companies with their websites, which feed into contact extraction to find hiring manager contacts. The badge data (Leader, Top Performer) helps identify fast-growing companies likely to be actively hiring.

#### B2B lead qualification and scoring

The combination of rating, review count, and pricing tier gives enough signal to score leads before enrichment. High-rating, high-review-count companies with paid pricing tiers are indicators of an established, revenue-generating business. Pipe the output into [B2B Lead Qualifier](https://apify.com/ryanclinton/b2b-lead-qualifier) to apply a formal 0–100 score before committing to enrichment cost.

### How to scrape software company leads from SourceForge and TrustRadius

1. **Enter your target categories** — Type the category slugs you want to scrape. Use lowercase, hyphenated slugs that match the directory URL: `crm`, `project-management`, `email-marketing`, `accounting`, `help-desk`. You can enter multiple categories in one run.
2. **Configure quality filters** — Set `minReviews` to 5 or 10 to exclude newly listed products with no track record. Set `minRating` to 3.5 to focus on well-reviewed vendors. Leave both at 0 to collect everything.
3. **Run the actor** — Click "Start" and wait. A single category with the default limit of 50 companies per source typically completes in 3–5 minutes.
4. **Download results** — Open the Dataset tab, then export to JSON, CSV, or Excel. The dataset includes one row per company plus a summary record at the end showing totals by category and source.

### Input parameters

| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| `categories` | array | Yes | `["crm"]` | Category slugs to scrape. Use lowercase hyphenated slugs matching directory URLs (e.g. `crm`, `project-management`, `email-marketing`). |
| `sources` | array | No | `["sourceforge", "trustradius"]` | Which directories to scrape. Options: `sourceforge`, `trustradius`. Omit to scrape both. |
| `maxCompaniesPerCategory` | integer | No | `50` | Max companies per category per source. `0` = no limit. Range: 0–1000. |
| `minReviews` | integer | No | `0` | Minimum number of reviews a product must have to be included. |
| `minRating` | number | No | `0` | Minimum average rating (1.0–5.0 scale) a product must have. |
| `deduplicateByDomain` | boolean | No | `true` | Remove duplicate companies when the same domain appears across categories or sources. |
| `proxyConfiguration` | object | No | none | Optional Apify proxy configuration. SourceForge and TrustRadius work without proxies. |

#### Input examples

**Standard scrape — two categories, both sources:**
```json
{
  "categories": ["crm", "project-management"],
  "sources": ["sourceforge", "trustradius"],
  "maxCompaniesPerCategory": 50,
  "minReviews": 5,
  "minRating": 3.5,
  "deduplicateByDomain": true
}
````

**Large batch — five categories, SourceForge only, higher limit:**

```json
{
  "categories": ["crm", "project-management", "email-marketing", "accounting", "help-desk"],
  "sources": ["sourceforge"],
  "maxCompaniesPerCategory": 200,
  "minReviews": 0,
  "minRating": 0,
  "deduplicateByDomain": true
}
```

**Quick test — one category, minimal filters:**

```json
{
  "categories": ["crm"],
  "sources": ["sourceforge"],
  "maxCompaniesPerCategory": 10,
  "deduplicateByDomain": false
}
```

#### Input tips

- **Start with a small limit** — set `maxCompaniesPerCategory: 10` for a first run to verify results match your expectations before scaling up.
- **Use both sources together** — SourceForge skews toward SMB and open-source tools; TrustRadius skews toward enterprise B2B. Combined, you get broader coverage of a category.
- **Category slugs must match the SourceForge URL** — verify by visiting `https://sourceforge.net/software/{your-slug}/` in a browser before running.
- **Batch multiple categories in one run** — processing 5 categories in a single run is more efficient than 5 separate runs, because the deduplication set is shared across the entire run.
- **Set a spending limit** — use Apify's per-run budget control to cap costs before running against a large category list.

### Output example

```json
{
  "companyName": "Pinnacle CRM Technologies",
  "productName": "PinnacleCRM Pro",
  "website": "https://pinnaclecrm.io",
  "domain": "pinnaclecrm.io",
  "profileUrl": "https://sourceforge.net/software/product/pinnaclecrm/",
  "rating": 4.3,
  "reviewCount": 1842,
  "pricingTier": "$29/month",
  "categories": ["crm", "Sales Force Automation", "Contact Management"],
  "badges": ["Leader", "Top Performer Q1 2025"],
  "description": "Cloud-based CRM for SMB sales teams. Includes pipeline management, email sequences, and native Slack integration. Free 14-day trial.",
  "source": "sourceforge",
  "sourceCategory": "crm",
  "scrapedAt": "2026-03-22T09:14:32.451Z"
}
```

The final record in every dataset is a summary record:

```json
{
  "type": "summary",
  "categoriesScraped": ["crm", "project-management"],
  "sourcesUsed": ["sourceforge", "trustradius"],
  "totalCompaniesFound": 187,
  "totalDeduplicated": 14,
  "companiesByCategory": {
    "crm": 98,
    "project-management": 89
  },
  "companiesBySource": {
    "sourceforge": 94,
    "trustradius": 93
  },
  "scrapedAt": "2026-03-22T09:21:08.772Z"
}
```

### Output fields

| Field | Type | Description |
|---|---|---|
| `companyName` | string | null | Vendor or company name. Falls back to product name when the directory does not list the vendor separately. |
| `productName` | string | null | Software product name as listed in the directory. |
| `website` | string | null | Vendor website URL as listed in the directory profile. |
| `domain` | string | null | Registrable domain extracted from the website URL (e.g. `pinnaclecrm.io`). Used for deduplication and CRM join keys. |
| `profileUrl` | string | null | Direct link to the product's SourceForge or TrustRadius profile page. |
| `rating` | number | null | Average rating on a unified 1.0–5.0 scale. TrustRadius 10-point scores are divided by 2 and rounded to one decimal. |
| `reviewCount` | number | null | Total number of user reviews or ratings in the directory. |
| `pricingTier` | string | null | Normalized pricing: `Free`, `Freemium`, `Open Source`, `Contact Vendor`, or a price string like `$29/month`. |
| `categories` | string\[] | Category tags from the listing. Always includes the source category slug used to discover the product. |
| `badges` | string\[] | SourceForge award badges (e.g. `Leader`, `Top Performer`). Empty array for TrustRadius results. |
| `description` | string | null | Short product description from the listing page. |
| `source` | string | Which directory this record came from: `sourceforge` or `trustradius`. |
| `sourceCategory` | string | The category slug used to discover this company (e.g. `crm`). |
| `scrapedAt` | string | ISO 8601 timestamp when this record was extracted. |

### How much does it cost to scrape software company leads?

Software Directory Scraper uses **pay-per-event pricing** — you pay **$0.05 per company** extracted. Platform compute costs are included. Companies filtered out by `minReviews` or `minRating` are not charged.

| Scenario | Companies | Cost per company | Total cost |
|---|---|---|---|
| Quick test | 10 | $0.05 | $0.50 |
| Single category | 50 | $0.05 | $2.50 |
| Two categories, both sources | 200 | $0.05 | $10.00 |
| Five categories | 500 | $0.05 | $25.00 |
| Full market map | 1,000 | $0.05 | $50.00 |

You can set a **maximum spending limit** per run to control costs. The actor stops when your budget is reached, so a $5 limit will collect up to 100 companies.

Compare this to manually browsing directories at roughly 30 seconds per company — 200 companies would take 100 minutes of manual work. At $10 for the same output, you get clean structured data with no subscription commitment. Tools like ZoomInfo or Apollo charge $100–500/month and still require manual filtering to narrow to a specific software category.

### Scrape software company leads using the API

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/g2-company-scraper").call(run_input={
    "categories": ["crm", "project-management"],
    "sources": ["sourceforge", "trustradius"],
    "maxCompaniesPerCategory": 50,
    "minReviews": 5,
    "minRating": 3.5,
    "deduplicateByDomain": True,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    if item.get("type") == "summary":
        print(f"Total companies found: {item['totalCompaniesFound']}")
    else:
        print(f"{item['productName']} ({item['companyName']}) — {item['domain']} — {item['rating']} stars, {item['reviewCount']} reviews")
```

#### JavaScript

```javascript
import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("ryanclinton/g2-company-scraper").call({
    categories: ["crm", "project-management"],
    sources: ["sourceforge", "trustradius"],
    maxCompaniesPerCategory: 50,
    minReviews: 5,
    minRating: 3.5,
    deduplicateByDomain: true,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
    if (item.type === "summary") {
        console.log(`Total: ${item.totalCompaniesFound} companies`);
    } else {
        console.log(`${item.productName} — ${item.domain} — ${item.rating} stars, ${item.pricingTier}`);
    }
}
```

#### cURL

```bash
## Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~g2-company-scraper/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "categories": ["crm", "project-management"],
    "sources": ["sourceforge", "trustradius"],
    "maxCompaniesPerCategory": 50,
    "minReviews": 5,
    "minRating": 3.5,
    "deduplicateByDomain": true
  }'

## Fetch results (replace DATASET_ID from the run response above)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
```

### How Software Directory Scraper works

#### Phase 1 — Request generation

On startup the actor reads your `categories` and `sources` inputs, normalizes category slugs (lowercase, hyphens, trim), then builds start requests. For SourceForge, it constructs one URL per category at `https://sourceforge.net/software/{category}/` (page 1). For TrustRadius, it generates five sitemap URLs per category — `https://www.trustradius.com/sitemaps/product-reviews-sitemap-{1..5}.xml` — to maximize product URL discovery. All requests are handed to a CheerioCrawler instance running at `maxConcurrency: 5` with session pooling, persistent cookies per session, a 30-second navigation timeout, and 3 retries per request.

#### Phase 2 — SourceForge category crawling

The SourceForge route handler (`SF_CATEGORY`) parses category listing pages using four CSS selector strategies for resilience against markup changes. It extracts product name from `h3 a` title links, vendor name from `.project-company` or `[class*="company"]` elements, star rating from `[class*="rating-avg"]` or `[itemprop="ratingValue"]` attributes, review count from `[class*="rating-count"]`, pricing from `[class*="price"]`, category tags from `[class*="tag"] a`, award badges via the `extractSFBadges` helper, and the vendor website link identified by `data-ga-label="website"` or `rel="nofollow"` links pointing outside `sourceforge.net`. After processing each page it automatically enqueues the next page (`?page=N+1`) until the per-category limit is reached or no listing cards are found.

#### Phase 3 — TrustRadius sitemap and product crawling

The TrustRadius route has two stages. The sitemap handler (`TR_SITEMAP`) parses XML sitemap files using Cheerio's XML support (enabled via `additionalMimeTypes: ['application/xml', 'text/xml']`), filters for `/products/` URLs that are not comparison, pricing, video, or competitor pages, then enqueues up to `targetCount * 2` product URLs to account for items that will be filtered. The product handler (`TR_PRODUCT`) first attempts to parse structured data from the `__NEXT_DATA__` JSON block embedded in the server-rendered page, checking three property paths. If the JSON path yields no product name, it falls back to eight named `data-testid` HTML selectors — `product-name`, `vendor-name`, `overall-score`, `reviews-count`, `product-description`, `pricing-summary`, `category`, and `vendor-website` — plus a `meta[name="description"]` fallback for descriptions.

#### Phase 4 — Normalization, filtering, and pay-per-event charging

Every extracted record passes through `transformRawToClean()`, which applies domain extraction (stripping `www.` prefixes via `URL` parsing), rating scale conversion, pricing tier normalization, and whitespace collapsing via regex. The record is then checked against `minReviews` and `minRating` in `passesFilters()`. If it passes, the actor calls `Actor.pushData()` first, then `Actor.charge({ eventName: 'company-found', count: 1 })` — following Apify's data-before-charge rule. The `eventChargeLimitReached` flag is checked after each charge; if set, all active route handlers stop and the run completes cleanly with a summary record.

### Tips for best results

1. **Check category slugs against the SourceForge URL before running.** Visit `https://sourceforge.net/software/your-slug/` in a browser. If the page returns results, the slug is valid. Invalid slugs return empty pages and produce zero results for the SourceForge source.

2. **Use `minReviews: 5` as a baseline filter.** Products with fewer than 5 reviews are often newly listed or inactive. Filtering them reduces noise without significantly reducing volume in established categories.

3. **Combine categories strategically to avoid redundancy.** Categories on SourceForge overlap significantly — `crm` and `sales-force-automation` share many products. Running them together with `deduplicateByDomain: true` catches products in both without doubling your cost.

4. **Run TrustRadius-only for enterprise B2B focus.** TrustRadius skews heavily toward enterprise software with large review counts and detailed scoring. If your target market is enterprise buyers, set `sources: ["trustradius"]` and `minReviews: 20` for a focused list.

5. **Pipe directly into contact enrichment.** The `domain` field is a ready-made key for [Website Contact Scraper](https://apify.com/ryanclinton/website-contact-scraper). Extract the domains from your dataset and run them in a batch to get email addresses and phone numbers for each vendor.

6. **Schedule weekly refreshes for fast-moving categories.** Categories like `ai-tools` or `marketing-automation` add new products frequently. A weekly scheduled run keeps your lead list current as new vendors appear in the directories.

7. **Use the summary record for run monitoring.** Every run ends with a `type: "summary"` record. If `totalCompaniesFound` drops significantly week-over-week, that signals a markup change or category rename worth investigating before the next run.

### Combine with other Apify actors

| Actor | How to combine |
|---|---|
| [Website Contact Scraper](https://apify.com/ryanclinton/website-contact-scraper) | Feed the `domain` or `website` field from each company record to extract emails, phone numbers, and contact pages from vendor websites |
| [Waterfall Contact Enrichment](https://apify.com/ryanclinton/waterfall-contact-enrichment) | Run a 10-step enrichment cascade on the vendor domain to find decision-maker emails and LinkedIn profiles |
| [Email Pattern Finder](https://apify.com/ryanclinton/email-pattern-finder) | Detect the email naming convention used by each vendor (e.g. `first.last@domain.com`) before building outbound sequences |
| [B2B Lead Qualifier](https://apify.com/ryanclinton/b2b-lead-qualifier) | Score each company 0–100 using rating, review count, pricing tier, and other signals to prioritize enrichment spend |
| [Website Tech Stack Detector](https://apify.com/ryanclinton/website-tech-stack-detector) | Detect 100+ web technologies on each vendor's website to qualify leads by tech profile or integration fit |
| [HubSpot Lead Pusher](https://apify.com/ryanclinton/hubspot-lead-pusher) | Push the structured company records directly into HubSpot as contacts or companies after enrichment |
| [Bulk Email Verifier](https://apify.com/ryanclinton/bulk-email-verifier) | Verify email addresses found via contact scraping before importing into your sending tool |

### Limitations

- **TrustRadius category filtering is approximate.** The actor discovers TrustRadius products via sitemaps that list all products, not filtered by category. The `sourceCategory` field reflects the category you searched for, not a TrustRadius taxonomy match. Products from adjacent segments may appear in results.
- **SourceForge badge extraction depends on CSS class naming patterns.** Badges are extracted using selectors like `[class*="badge"]` and `[class*="award"]`. If SourceForge changes its CSS class naming, badge data may be incomplete while other fields remain accurate.
- **Vendor websites are not always present.** Some directory listings do not include a vendor website link. In those cases `website` and `domain` will be `null`, and the record cannot be used for downstream website-based enrichment.
- **No JavaScript rendering.** The actor uses CheerioCrawler (HTTP-based), not a browser. Pages that require client-side JavaScript to render their content will return incomplete data. Both SourceForge and TrustRadius use server-rendered HTML so this does not currently affect results, but any future sources added to this actor that require browser execution would need a separate implementation.
- **TrustRadius sitemap coverage is 5 shards out of approximately 25.** The actor fetches sitemaps 1–5, covering thousands of product URLs. Products in higher-numbered shards are not discovered in a standard run. For complete TrustRadius coverage across all shards, contact us about a custom configuration.
- **Deduplication is run-scoped.** The `seenDomains` set is created fresh each run. If you run the actor twice against the same categories, the same companies can appear in both datasets. Use the `domain` field as a unique key in your downstream storage to handle cross-run deduplication.
- **No employee count, funding, or HQ data.** Neither SourceForge nor TrustRadius consistently exposes firmographic data in their listing HTML. Use [Company Deep Research](https://apify.com/ryanclinton/company-deep-research) or a downstream enrichment actor to add firmographic context.
- **Rating scale conversion is a linear approximation.** TrustRadius 10-point scores are divided by 2. This does not account for distribution differences between the two rating systems; a TrustRadius 8.6 becomes 4.3, but the populations rated by each platform differ.

### Integrations

- [Zapier](https://apify.com/integrations/zapier) — trigger a Zap when a run completes to push new software companies into a Google Sheet or CRM automatically
- [Make](https://apify.com/integrations/make) — build a multi-step scenario that scrapes companies, enriches contacts, and adds leads to your outbound sequence tool
- [Google Sheets](https://apify.com/integrations/google-sheets) — export the dataset directly to a sheet for manual review and prioritization before enrichment
- [Apify API](https://docs.apify.com/api/v2) — trigger runs programmatically from your sales or marketing automation platform and receive results via webhook
- [Webhooks](https://docs.apify.com/platform/integrations/webhooks) — post the completed dataset URL to a Slack channel or internal dashboard when a run finishes
- [LangChain / LlamaIndex](https://docs.apify.com/platform/integrations) — use scraped software company descriptions and category data as a knowledge base for AI-powered market research agents

### Troubleshooting

**Zero results despite providing a valid category.** The most common cause is a category slug that does not match the SourceForge URL structure. Verify by visiting `https://sourceforge.net/software/your-slug/` directly. If the page shows no products, try a more general slug (e.g. `crm` instead of `crm-software`). For TrustRadius, results depend on sitemap coverage — if the category has few matching products in the first five sitemaps, output will be low.

**All results have null website and domain fields.** Some SourceForge categories list products without a vendor website link in the listing card. This is more common in open-source or niche categories. The `profileUrl` still links to the directory listing and can be used as a secondary identifier for manual lookup.

**TrustRadius results are empty or very few.** TrustRadius products are discovered via sitemap, not via category-filtered listings. Lowering `minReviews` to 0 and `minRating` to 0 confirms whether any records can be found. The actor enqueues `targetCount * 2` product URLs to account for filtering, but the absolute maximum is bounded by what appears in the first five sitemap shards.

**Run completes faster than expected with fewer results than the limit.** This means the actor exhausted all available listing pages before reaching your `maxCompaniesPerCategory` limit. SourceForge categories vary in size — smaller niches may have fewer than 50 products total. Check the summary record's `companiesByCategory` field to see how many were found per category.

**Duplicate companies appearing across multiple runs.** Deduplication only operates within a single run. Across multiple runs, the same company can appear again. Use the `domain` field as a unique key in your downstream storage — a Google Sheets `VLOOKUP`, CRM deduplication rule, or a database unique constraint on `domain` will handle this cleanly.

### Responsible use

- This actor only accesses publicly available software directory listings on SourceForge and TrustRadius.
- Respect each platform's terms of service and `robots.txt` directives.
- Comply with GDPR, CAN-SPAM, and other applicable data protection laws when using scraped company data for outreach.
- Do not use extracted data to send unsolicited bulk email or for spam campaigns.
- For guidance on web scraping legality, see [Apify's guide](https://blog.apify.com/is-web-scraping-legal/).

### FAQ

**How many software companies can I scrape in one run?**
There is no hard cap from the actor. The `maxCompaniesPerCategory` parameter (default 50, max 1000) controls per-category volume, and you can run as many categories as you like in a single run. Your practical limit is your Apify spending budget — at $0.05 per company, a $50 budget yields up to 1,000 companies.

**Does Software Directory Scraper work for any software category?**
It works for any category that has a valid slug on SourceForge (`https://sourceforge.net/software/{slug}/`). Common slugs include `crm`, `project-management`, `email-marketing`, `accounting`, `help-desk`, `marketing-automation`, `hr-software`, `erp`, `business-intelligence`, and `video-conferencing`. TrustRadius coverage depends on sitemap inclusion and is not category-filtered.

**How accurate is the rating data from this scraper?**
Ratings are taken directly from the directory listings and reflect each platform's own aggregated scores. TrustRadius 10-point scores are converted to a 5-point scale by dividing by 2. The accuracy of the underlying ratings is determined by each directory's own review processes — the actor extracts them without modification beyond scale normalization.

**What is the difference between SourceForge and TrustRadius results?**
SourceForge has 100k+ products including many open-source and SMB-focused tools, with paginated category listings, explicit pricing, and badge data. TrustRadius focuses on enterprise B2B software with in-depth review scoring. Using both sources together gives broader category coverage across company sizes and market segments.

**How is this different from scraping G2, Capterra, or GetApp?**
G2, Capterra, and GetApp aggressively block HTTP scrapers with Cloudflare's JS challenge — extracting data from them requires a full browser with anti-detection measures, which is slower and more expensive. SourceForge and TrustRadius serve their listing pages to datacenter IPs without blocking, making this actor fast, reliable, and proxy-free.

**Can I scrape software company leads from multiple categories at once?**
Yes. Pass multiple slugs in the `categories` array: `["crm", "project-management", "email-marketing"]`. The actor processes all categories in parallel using a shared crawler queue. Deduplication operates across the entire run, so a company appearing in two categories is only returned once when `deduplicateByDomain: true`.

**How long does a typical software directory scraping run take?**
A single category at `maxCompaniesPerCategory: 50` from both sources typically completes in 3–6 minutes. Five categories at the same limit take 10–20 minutes. TrustRadius runs slightly longer because each product requires an individual page fetch after sitemap parsing.

**Can I filter out free and open-source software from the results?**
There is no dedicated filter for this, but you can filter the output dataset by the `pricingTier` field. Records where `pricingTier` is `"Free"` or `"Open Source"` can be excluded in post-processing in Excel, Google Sheets, or your pipeline code.

**Is it legal to scrape SourceForge and TrustRadius?**
Scraping publicly available data from software directories is generally considered lawful in most jurisdictions. Both SourceForge and TrustRadius publish their listings publicly without authentication requirements. Always respect the platforms' terms of service and use the data responsibly. See [Apify's web scraping legality guide](https://blog.apify.com/is-web-scraping-legal/) for a detailed overview.

**Can I schedule this actor to run automatically every week?**
Yes. Use Apify's built-in scheduler to run on any cron schedule — daily, weekly, or monthly. Weekly runs against fast-moving categories like `ai-tools` or `marketing-automation` keep your lead list current as new products are added to the directories.

**What happens if the same company appears on both SourceForge and TrustRadius?**
With `deduplicateByDomain: true` (the default), the first occurrence is kept and the duplicate is skipped. The `source` field on the kept record shows which directory found it first. With `deduplicateByDomain: false`, both records are returned so you can compare ratings and review counts across sources.

**Can I connect the output directly to HubSpot or Salesforce?**
Yes. Use [HubSpot Lead Pusher](https://apify.com/ryanclinton/hubspot-lead-pusher) to push company records into HubSpot, or use Apify's Zapier or Make integrations to route data to Salesforce, Pipedrive, or any other CRM. The `domain` field is a reliable unique key for CRM deduplication.

### Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

1. Go to [Account Settings > Privacy](https://console.apify.com/account/privacy)
2. Enable **Share runs with public Actor creators**

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

### Support

Found a bug or have a feature request? Open an issue in the [Issues tab](https://console.apify.com/actors/g2-company-scraper/issues) on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.

# Actor input Schema

## `categories` (type: `array`):

Category slugs to scrape. Use lowercase, hyphenated slugs matching the directory URL. Examples: crm, email-marketing, project-management, accounting, help-desk, marketing-automation.

## `sources` (type: `array`):

Which software directories to scrape. Options: sourceforge, trustradius. SourceForge has 100k+ products with ratings and pricing. TrustRadius focuses on B2B software with in-depth review scores.

## `maxCompaniesPerCategory` (type: `integer`):

Maximum number of companies to collect per category per source. Set to 0 for no limit. Default: 50.

## `minReviews` (type: `integer`):

Only include products with at least this many reviews. Set to 0 to include all. Useful for filtering out newly listed software with no reviews.

## `minRating` (type: `number`):

Only include products with an average rating of at least this value (1.0–5.0 scale). Set to 0 to include all ratings.

## `deduplicateByDomain` (type: `boolean`):

When the same software vendor appears across multiple categories or sources, keep only the first occurrence. Prevents duplicate company rows for enrichment pipelines.

## `proxyConfiguration` (type: `object`):

Optional proxy configuration. SourceForge and TrustRadius work without proxies — datacenter IPs are not blocked. Leave empty to use Apify's default connection.

## Actor input object example

```json
{
  "categories": [
    "crm",
    "project-management"
  ],
  "sources": [
    "sourceforge",
    "trustradius"
  ],
  "maxCompaniesPerCategory": 50,
  "minReviews": 5,
  "minRating": 3.5,
  "deduplicateByDomain": true
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "categories": [
        "crm",
        "project-management"
    ],
    "sources": [
        "sourceforge",
        "trustradius"
    ],
    "maxCompaniesPerCategory": 50,
    "minReviews": 5,
    "minRating": 3.5
};

// Run the Actor and wait for it to finish
const run = await client.actor("ryanclinton/g2-company-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "categories": [
        "crm",
        "project-management",
    ],
    "sources": [
        "sourceforge",
        "trustradius",
    ],
    "maxCompaniesPerCategory": 50,
    "minReviews": 5,
    "minRating": 3.5,
}

# Run the Actor and wait for it to finish
run = client.actor("ryanclinton/g2-company-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "categories": [
    "crm",
    "project-management"
  ],
  "sources": [
    "sourceforge",
    "trustradius"
  ],
  "maxCompaniesPerCategory": 50,
  "minReviews": 5,
  "minRating": 3.5
}' |
apify call ryanclinton/g2-company-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=ryanclinton/g2-company-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "SourceForge & TrustRadius — Software Vendor Leads",
        "description": "Scrapes SourceForge and TrustRadius for software company leads by category. Returns vendor name, website, rating, review count, pricing tier, and category tags. Filter by rating or review count. $0.05 per company.",
        "version": "1.0",
        "x-build-id": "eeuE1TfJOLPnIxLoI"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/ryanclinton~g2-company-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-ryanclinton-g2-company-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/ryanclinton~g2-company-scraper/runs": {
            "post": {
                "operationId": "runs-sync-ryanclinton-g2-company-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/ryanclinton~g2-company-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-ryanclinton-g2-company-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "categories"
                ],
                "properties": {
                    "categories": {
                        "title": "Software Categories",
                        "type": "array",
                        "description": "Category slugs to scrape. Use lowercase, hyphenated slugs matching the directory URL. Examples: crm, email-marketing, project-management, accounting, help-desk, marketing-automation.",
                        "default": [
                            "crm"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "sources": {
                        "title": "Data Sources",
                        "type": "array",
                        "description": "Which software directories to scrape. Options: sourceforge, trustradius. SourceForge has 100k+ products with ratings and pricing. TrustRadius focuses on B2B software with in-depth review scores.",
                        "default": [
                            "sourceforge",
                            "trustradius"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxCompaniesPerCategory": {
                        "title": "Max Companies Per Category",
                        "minimum": 0,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of companies to collect per category per source. Set to 0 for no limit. Default: 50.",
                        "default": 50
                    },
                    "minReviews": {
                        "title": "Minimum Review Count",
                        "minimum": 0,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Only include products with at least this many reviews. Set to 0 to include all. Useful for filtering out newly listed software with no reviews.",
                        "default": 0
                    },
                    "minRating": {
                        "title": "Minimum Rating",
                        "minimum": 0,
                        "maximum": 5,
                        "type": "number",
                        "description": "Only include products with an average rating of at least this value (1.0–5.0 scale). Set to 0 to include all ratings.",
                        "default": 0
                    },
                    "deduplicateByDomain": {
                        "title": "Deduplicate by Domain",
                        "type": "boolean",
                        "description": "When the same software vendor appears across multiple categories or sources, keep only the first occurrence. Prevents duplicate company rows for enrichment pipelines.",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Optional proxy configuration. SourceForge and TrustRadius work without proxies — datacenter IPs are not blocked. Leave empty to use Apify's default connection."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
