Lemmy Scraper: Posts, Comments, Communities & Users
Pricing
$1.00 / 1,000 result items
Lemmy Scraper: Posts, Comments, Communities & Users
Scrape any Lemmy instance (lemmy.world, lemmy.ml, beehaw.org and other Lemmyverse nodes) via the official /api/v3/* REST API. Posts with upvote/downvote counts, comment trees, communities with subscriber counts, user profiles, full-text search. No auth, no proxies. Pay per result.
Pricing
$1.00 / 1,000 result items
Rating
0.0
(0)
Developer
Perconey
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
What does Lemmy Scraper do?
Lemmy Scraper pulls structured data from any Lemmy instance via the official /api/v3/* REST API. Posts with upvote AND downvote counts (unlike Reddit or HN, Lemmy exposes both), full comment trees, communities with subscriber and activity counts, user profiles, full-text search. The actor calls the documented public API directly - no browser, no proxies, no cookies, no anti-bot fight. One actor works with every Lemmy-protocol server in the Lemmyverse: lemmy.world, lemmy.ml, beehaw.org, sh.itjust.works, lemm.ee, feddit.org, programming.dev, and hundreds more.
Try it instantly: pick getPosts, leave instance as https://lemmy.world, click Start. You get the current Hot feed (30 posts) with creator, community, score, upvote/downvote, comment count in under 5 seconds for $0.03.
Why use Lemmy Scraper?
- Federation researchers: Compare community dynamics across instances. The same
technologycommunity name exists on multiple instances with different culture, content, and activity. - Trend analysts: Both upvotes AND downvotes are exposed. Compute controversy ratios that platforms like HN and Reddit hide.
- Community managers: Track mentions of your project across federated Lemmy. Schedule daily
searchPostsruns with your product name. - OSS maintainers: Lemmy's user base is heavily developer-skewed.
getCommunityPostsforprogramming@programming.dev,linuxmemes@lemmy.world,selfhosted@lemmy.worldis a free signal on what dev tools are getting attention. - Academic researchers: Lemmy is a federated decentralized social network on ActivityPub - rich material for network science studies. Per-instance counts via
getSiteInfogive you the federation graph. - Reddit refugees / content migrators: Identify Lemmy communities that mirror Reddit subreddits you care about. Move your subscription pattern to the Lemmyverse.
How to use Lemmy Scraper
- Open the Input tab.
- Pick an action from the dropdown.
getPostsis the simplest starting point. - Set instance (default
https://lemmy.world). To scrape a different server, paste its URL. - For community/user/post/search actions, fill queries (one entry per line). For getPosts / listCommunities / getSiteInfo, leave queries empty.
- Tune sort (Hot / New / Active / TopWeek / TopAll / etc), maxItems, and
searchTypeif using searchPosts. - Toggle
includeCommentsfor getPostDetail orincludeSubmissionsfor getUserProfile to get the deep dive. - Click Start.
Query format by action
| Action | Query format |
|---|---|
| getSiteInfo | leave empty (uses the instance field) |
| getPosts | leave empty |
| listCommunities | leave empty |
| getCommunityDetail | technology or technology@lemmy.world or https://lemmy.world/c/technology |
| getCommunityPosts | same as getCommunityDetail |
| getUserProfile | nutomic or nutomic@lemmy.ml (auto-routes to user's home instance) |
| getPostDetail | post id (e.g. 1) or full URL (e.g. https://lemmy.world/post/1234) |
| searchPosts | free-text search query |
Input
| Field | Required | Description |
|---|---|---|
action | yes | Which API call to make. Eight options. |
instance | yes | Lemmy instance URL. Default https://lemmy.world. |
queries | sometimes | Required for community/user/post/search actions. Empty for site-info / posts feed / list-communities. |
maxItems | no | Max items per query. Default 30. |
sort | no | Hot / Active / New / TopDay / TopWeek / TopMonth / TopYear / TopAll / MostComments / NewComments. Default Hot. |
searchType | no | searchPosts only: Posts / Comments / Communities / Users / All. Default Posts. |
includeComments | no | getPostDetail only: also walk the comment tree. |
includeSubmissions | no | getUserProfile only: also fetch recent posts and comments by the user. |
token | no | Lemmy JWT for actions an instance restricts to logged-in users (rare). |
Output
Every item carries _type (post / comment / community / user / site / error) plus _action and _instance for filtering.
{"_type": "post","_action": "getPosts","_instance": "https://lemmy.world","id": 9876543,"name": "Rust 1.85 released - 2024 edition stable","url": "https://blog.rust-lang.org/...","body": "...","creator_id": 12345,"community_id": 678,"published": "2026-05-12T10:30:00.000Z","nsfw": false,"score": 542,"upvotes": 569,"downvotes": 27,"comments": 89,"post_url": "https://lemmy.world/post/9876543","creator": { "id": 12345, "name": "rustacean", "actor_id": "https://lemmy.world/u/rustacean" },"community": { "id": 678, "name": "rust", "title": "Rust", "actor_id": "https://lemmy.world/c/rust" }}
You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab.
Data fields
| Type | Key fields |
|---|---|
post | id, name (title), url, body, score, upvotes, downvotes, comments, published, nsfw, locked, post_url, creator, community |
comment | id, post_id, content, path, score, upvotes, downvotes, child_count, published, creator, community |
community | id, name, title, description, subscribers, posts, comments, users_active_day/week/month/half_year, icon, banner, actor_id, nsfw |
user | id, name, display_name, bio, avatar, banner, post_count, post_score, comment_count, comment_score, banned, published |
site | name, description, users, posts, comments, communities, users_active_day/month, admins (array), version, federated_instances count |
Pricing
Pay-per-result: $0.001 per item. No flat monthly fee.
Cost examples:
- Daily Hot feed (30 posts): $0.03
- 1,000 posts from
technology@lemmy.worldfor content research: $1.00 - 100 user profiles with submissions (~4000 items): $4.00
- Full comment tree of a 500-comment thread: $0.50
Tips
- Cross-instance routing is automatic for users and communities. If you write
nutomic@lemmy.ml, the actor queries lemmy.ml's API (not the default instance), because that's where nutomic's account record lives. Same for communities:technology@lemmy.worldqueries lemmy.world. This catches a common gotcha where Lemmy user records 404 when queried on the wrong instance. - Upvotes and downvotes are both exposed. Compute
controversy = min(up, down) / max(up, down)for finding heated threads. - Comment trees can be huge. Use maxItems to cap the BFS walk. The actor stops as soon as the budget is reached.
- searchPosts supports searchType: All to search across posts + comments + communities + users in one call. The result mix is shaped as posts in the dataset (other types currently flattened the same way; filter by
_type). - Federation aware: post/community IDs are NOT portable across instances. ID 1234 on lemmy.world is a different post than ID 1234 on lemmy.ml. Always pass full URLs to
getPostDetailso the actor extracts the correct (instance, id) pair.
FAQ, disclaimers, support
Is this legal? The actor calls each Lemmy instance's official public REST API with documented endpoints. Public read access is the design intent of the AGPL-licensed Lemmy software. We send a clear User-Agent identifying the actor and honor rate-limit / Retry-After headers.
Does it work with Mbin / kbin? Mbin (kbin's active fork) is an alternative Threadiverse server that mostly speaks the same Lemmy API. Most actions should work; trending or community-specific endpoints may differ. Open an Issue if you hit one.
Why is getUserProfile failing for a user? Lemmy user records live on the user's HOME instance. If you queried nutomic against lemmy.world, the actor returns a friendly hint to retry as nutomic@lemmy.ml. The actor auto-routes when you include the home-instance suffix.
Will I get rate-limited? Lemmy's per-IP rate limits are generous for read-only traffic and the actor backs off on 429 / Retry-After. For very heavy scraping consider donating to the instance you're hitting most.
Bug or feature request? Open an Issue on the actor's Issues tab. I usually respond within a day.
Need a scraper for Mastodon, Bluesky, Stack Overflow, Hacker News? See my other actors at https://apify.com/perconey, or open an Issue for a federated platform you need.