Product Hunt Scraper — Comments, Products & User Profiles avatar

Product Hunt Scraper — Comments, Products & User Profiles

Pricing

from $5.00 / 1,000 products

Go to Apify Store
Product Hunt Scraper — Comments, Products & User Profiles

Product Hunt Scraper — Comments, Products & User Profiles

Scrape Product Hunt comments, product details, and user profiles at scale. Extract reviews, upvotes, maker badges, product descriptions, media, launch rankings, and commenter social links (Twitter, LinkedIn, GitHub). Supports daily leaderboard URLs and individual product pages.

Pricing

from $5.00 / 1,000 products

Rating

0.0

(0)

Developer

VulnV

VulnV

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Scrape Product Hunt comments, product data, and user profiles at scale. Extract reviews, upvotes, maker badges, product descriptions, screenshots, launch rankings, categories, and commenter social media links — all in structured JSON.

Works with individual product URLs, post URLs, and daily leaderboard URLs (automatically expanded into ranked products).

Key features

  • Comments & reviews — full text (plain + HTML), vote counts, maker/hunter badges, timestamps, sticky/pinned flags
  • Product profiles — name, tagline, description, logo, screenshots, videos, website, social links, categories, awards, launch rank and score, hunter info
  • User profiles — name, headline, bio, avatar, Twitter, LinkedIn, GitHub, personal website, follower counts, verified status
  • Leaderboard support — pass a /leaderboard/daily/YYYY/M/D URL to scrape the top N products from any day
  • Three structured datasets with row-level cross-references between comments, products, and users
  • Toggleable outputs — enable/disable comments, products, or user profile scraping independently
  • Concurrent scraping — processes multiple products, comment pages, and user profiles in parallel

Use cases

  • Market research — analyze Product Hunt launch performance, comment sentiment, and community reception
  • Competitor analysis — compare products by upvotes, reviews, ratings, categories, and launch rankings
  • Lead generation — extract commenter profiles with Twitter, LinkedIn, GitHub, and website links
  • Trend tracking — monitor daily leaderboards to identify trending products and categories
  • Content research — study what products and launches generate the most engagement

What data you get

DatasetContents
Comments (default)Comment text, HTML body, vote count, badges, timestamps, author reference, product reference
ProductsName, slug, tagline, description, logo, media, website, social URLs, categories, awards, launch details, review stats
UsersName, username, headline, avatar, profile URL, Twitter, LinkedIn, GitHub, website, follower/following counts, verified status

Input parameters

FieldTypeDefaultDescription
start_urlsarrayrequiredProduct Hunt URLs to scrape — accepts /products/…, /posts/…, or /leaderboard/daily/YYYY/M/D
max_productsinteger1Maximum products to scrape (top N for leaderboards, cap for direct URLs)
max_commentsinteger100Maximum comments to extract per product
scrape_commentsbooleantrueSave comments to the default dataset
scrape_productsbooleantrueSave product info to the products dataset
scrape_usersbooleantrueFetch and save commenter profiles to the users dataset

Example: scrape top 5 products from the daily leaderboard

{
"start_urls": [
{ "url": "https://www.producthunt.com/leaderboard/daily/2026/3/23" }
],
"max_products": 5,
"max_comments": 50,
"scrape_comments": true,
"scrape_products": true,
"scrape_users": true
}

Example: scrape a single product page

{
"start_urls": [
{ "url": "https://www.producthunt.com/products/tobira-ai" }
],
"max_products": 1,
"max_comments": 200
}

Example: scrape only product metadata (no comments or users)

{
"start_urls": [
{ "url": "https://www.producthunt.com/leaderboard/daily/2026/3/23" }
],
"max_products": 10,
"scrape_comments": false,
"scrape_products": true,
"scrape_users": false
}

Output format

Results are split into three datasets. The Apify Console Output tab links directly to each dataset with pre-configured table views.

Comments dataset (default)

{
"type": "comment",
"id": "5217461",
"body": "Hey PH! ...",
"body_html": "<p>Hey PH!</p>...",
"vote_count": 21,
"created_at": "2026-03-17T06:24:57-07:00",
"is_sticky": true,
"is_pinned": false,
"badges": ["maker"],
"award": null,
"product_name": "Tobira.ai",
"product_tagline": "A network where AI agents find deals for their humans",
"product_url": "https://www.producthunt.com/posts/tobira-ai",
"user": { "id": "4231147", "name": "Vlad Shipilov", "username": "vlad_shipilov" },
"user_id": "4231147",
"username": "vlad_shipilov",
"product_id": "1183905",
"product_slug": "tobira-ai"
}

Products dataset

{
"id": "1183905",
"name": "Tobira.ai",
"slug": "tobira-ai",
"tagline": "A network where AI agents find deals for their humans",
"description": "...",
"url": "https://www.producthunt.com/products/tobira-ai",
"logo_url": "https://ph-files.imgix.net/...",
"website_url": "https://tobira.ai",
"twitter_url": "https://twitter.com/shipilov_vlad",
"followers_count": 582,
"reviews_count": 0,
"reviews_rating": 0,
"categories": [{ "name": "AI Agents", "slug": "ai-agents", "path": "/categories/ai-agents" }],
"awards": [{ "position": 1, "period": "daily", "date": "2026-03-23" }],
"media": [{ "type": "image", "url": "https://ph-files.imgix.net/..." }],
"launch": {
"id": "1100794",
"slug": "tobira-ai",
"name": "Tobira.ai",
"daily_rank": "1",
"launch_day_score": 447,
"featured": true,
"featured_at": "2026-03-23T00:01:00-07:00",
"primary_link": "https://tobira.ai",
"hunter": { "name": "fmerian", "username": "fmerian" }
}
}

Users dataset

{
"id": "4231147",
"name": "Vlad Shipilov",
"username": "vlad_shipilov",
"headline": "Founder Tobira.ai and Revoly.ai",
"profile_url": "https://www.producthunt.com/@vlad_shipilov",
"avatar_url": "https://ph-avatars.imgix.net/...",
"twitter_username": "VladShipilov",
"website_url": null,
"linkedin_url": null,
"github_username": null,
"followers_count": 134,
"is_verified": true,
"social_links": [{ "type": "twitter", "url": "https://twitter.com/VladShipilov" }]
}

Cross-dataset references

Every saved record includes a storage object with row-level pointers for joining data across datasets:

Datasetstorage fields
Commentsstorage.item (this row), storage.product (parent product), storage.user (commenter profile)
Productsstorage.item (this row)
Usersstorage.item (this row)

Each pointer contains dataset_id, dataset_name, item_offset, api_url, and console_url for direct API access.

A DATASET_LINKS manifest is also written to the default key-value store with dataset IDs and console URLs for all three datasets.

How it works

  1. Leaderboard expansion — leaderboard URLs are fetched via Zyte API's browser rendering and parsed for ranked product links
  2. Apollo SSR data extraction — product pages are rendered in a headless browser; the scraper parses window[Symbol.for("ApolloSSRDataTransport")] for structured data from Product Hunt's Next.js/Apollo SSR layer
  3. GraphQL enrichment — a parallel persisted GraphQL query (ProductsPageLayout) fetches additional fields like description, media, categories, awards, and external links
  4. Comment pagination — pages beyond the first are fetched concurrently in batches of 5
  5. User profile scraping — commenter profiles are fetched in batches of 10 and deduplicated across all products in the run
  6. Product concurrency — up to 5 products are scraped in parallel

Pricing

This actor uses Apify's pay-per-event model. You only pay for the data you save:

EventTriggered when
fetch_productA product record is saved
fetch_commentA comment record is saved
fetch_userA user profile is saved

Disable any dataset type (e.g. scrape_users: false) to skip both the scraping and the charge.

Technical details

  • Runtime: Python 3.13, Apify SDK 3.3+
  • Dependencies: httpx, beautifulsoup4, lxml
  • External service: Zyte API for browser-rendered HTML and HTTP proxy requests
  • Memory: 512 MB