Lemmy Scraper: Posts, Comments, Communities & Users avatar

Lemmy Scraper: Posts, Comments, Communities & Users

Pricing

$1.00 / 1,000 result items

Go to Apify Store
Lemmy Scraper: Posts, Comments, Communities & Users

Lemmy Scraper: Posts, Comments, Communities & Users

Scrape any Lemmy instance (lemmy.world, lemmy.ml, beehaw.org and other Lemmyverse nodes) via the official /api/v3/* REST API. Posts with upvote/downvote counts, comment trees, communities with subscriber counts, user profiles, full-text search. No auth, no proxies. Pay per result.

Pricing

$1.00 / 1,000 result items

Rating

0.0

(0)

Developer

Perconey

Perconey

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

What does Lemmy Scraper do?

Lemmy Scraper pulls structured data from any Lemmy instance via the official /api/v3/* REST API. Posts with upvote AND downvote counts (unlike Reddit or HN, Lemmy exposes both), full comment trees, communities with subscriber and activity counts, user profiles, full-text search. The actor calls the documented public API directly - no browser, no proxies, no cookies, no anti-bot fight. One actor works with every Lemmy-protocol server in the Lemmyverse: lemmy.world, lemmy.ml, beehaw.org, sh.itjust.works, lemm.ee, feddit.org, programming.dev, and hundreds more.

Try it instantly: pick getPosts, leave instance as https://lemmy.world, click Start. You get the current Hot feed (30 posts) with creator, community, score, upvote/downvote, comment count in under 5 seconds for $0.03.

Why use Lemmy Scraper?

  • Federation researchers: Compare community dynamics across instances. The same technology community name exists on multiple instances with different culture, content, and activity.
  • Trend analysts: Both upvotes AND downvotes are exposed. Compute controversy ratios that platforms like HN and Reddit hide.
  • Community managers: Track mentions of your project across federated Lemmy. Schedule daily searchPosts runs with your product name.
  • OSS maintainers: Lemmy's user base is heavily developer-skewed. getCommunityPosts for programming@programming.dev, linuxmemes@lemmy.world, selfhosted@lemmy.world is a free signal on what dev tools are getting attention.
  • Academic researchers: Lemmy is a federated decentralized social network on ActivityPub - rich material for network science studies. Per-instance counts via getSiteInfo give you the federation graph.
  • Reddit refugees / content migrators: Identify Lemmy communities that mirror Reddit subreddits you care about. Move your subscription pattern to the Lemmyverse.

How to use Lemmy Scraper

  1. Open the Input tab.
  2. Pick an action from the dropdown. getPosts is the simplest starting point.
  3. Set instance (default https://lemmy.world). To scrape a different server, paste its URL.
  4. For community/user/post/search actions, fill queries (one entry per line). For getPosts / listCommunities / getSiteInfo, leave queries empty.
  5. Tune sort (Hot / New / Active / TopWeek / TopAll / etc), maxItems, and searchType if using searchPosts.
  6. Toggle includeComments for getPostDetail or includeSubmissions for getUserProfile to get the deep dive.
  7. Click Start.

Query format by action

ActionQuery format
getSiteInfoleave empty (uses the instance field)
getPostsleave empty
listCommunitiesleave empty
getCommunityDetailtechnology or technology@lemmy.world or https://lemmy.world/c/technology
getCommunityPostssame as getCommunityDetail
getUserProfilenutomic or nutomic@lemmy.ml (auto-routes to user's home instance)
getPostDetailpost id (e.g. 1) or full URL (e.g. https://lemmy.world/post/1234)
searchPostsfree-text search query

Input

FieldRequiredDescription
actionyesWhich API call to make. Eight options.
instanceyesLemmy instance URL. Default https://lemmy.world.
queriessometimesRequired for community/user/post/search actions. Empty for site-info / posts feed / list-communities.
maxItemsnoMax items per query. Default 30.
sortnoHot / Active / New / TopDay / TopWeek / TopMonth / TopYear / TopAll / MostComments / NewComments. Default Hot.
searchTypenosearchPosts only: Posts / Comments / Communities / Users / All. Default Posts.
includeCommentsnogetPostDetail only: also walk the comment tree.
includeSubmissionsnogetUserProfile only: also fetch recent posts and comments by the user.
tokennoLemmy JWT for actions an instance restricts to logged-in users (rare).

Output

Every item carries _type (post / comment / community / user / site / error) plus _action and _instance for filtering.

{
"_type": "post",
"_action": "getPosts",
"_instance": "https://lemmy.world",
"id": 9876543,
"name": "Rust 1.85 released - 2024 edition stable",
"url": "https://blog.rust-lang.org/...",
"body": "...",
"creator_id": 12345,
"community_id": 678,
"published": "2026-05-12T10:30:00.000Z",
"nsfw": false,
"score": 542,
"upvotes": 569,
"downvotes": 27,
"comments": 89,
"post_url": "https://lemmy.world/post/9876543",
"creator": { "id": 12345, "name": "rustacean", "actor_id": "https://lemmy.world/u/rustacean" },
"community": { "id": 678, "name": "rust", "title": "Rust", "actor_id": "https://lemmy.world/c/rust" }
}

You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab.

Data fields

TypeKey fields
postid, name (title), url, body, score, upvotes, downvotes, comments, published, nsfw, locked, post_url, creator, community
commentid, post_id, content, path, score, upvotes, downvotes, child_count, published, creator, community
communityid, name, title, description, subscribers, posts, comments, users_active_day/week/month/half_year, icon, banner, actor_id, nsfw
userid, name, display_name, bio, avatar, banner, post_count, post_score, comment_count, comment_score, banned, published
sitename, description, users, posts, comments, communities, users_active_day/month, admins (array), version, federated_instances count

Pricing

Pay-per-result: $0.001 per item. No flat monthly fee.

Cost examples:

  • Daily Hot feed (30 posts): $0.03
  • 1,000 posts from technology@lemmy.world for content research: $1.00
  • 100 user profiles with submissions (~4000 items): $4.00
  • Full comment tree of a 500-comment thread: $0.50

Tips

  • Cross-instance routing is automatic for users and communities. If you write nutomic@lemmy.ml, the actor queries lemmy.ml's API (not the default instance), because that's where nutomic's account record lives. Same for communities: technology@lemmy.world queries lemmy.world. This catches a common gotcha where Lemmy user records 404 when queried on the wrong instance.
  • Upvotes and downvotes are both exposed. Compute controversy = min(up, down) / max(up, down) for finding heated threads.
  • Comment trees can be huge. Use maxItems to cap the BFS walk. The actor stops as soon as the budget is reached.
  • searchPosts supports searchType: All to search across posts + comments + communities + users in one call. The result mix is shaped as posts in the dataset (other types currently flattened the same way; filter by _type).
  • Federation aware: post/community IDs are NOT portable across instances. ID 1234 on lemmy.world is a different post than ID 1234 on lemmy.ml. Always pass full URLs to getPostDetail so the actor extracts the correct (instance, id) pair.

FAQ, disclaimers, support

Is this legal? The actor calls each Lemmy instance's official public REST API with documented endpoints. Public read access is the design intent of the AGPL-licensed Lemmy software. We send a clear User-Agent identifying the actor and honor rate-limit / Retry-After headers.

Does it work with Mbin / kbin? Mbin (kbin's active fork) is an alternative Threadiverse server that mostly speaks the same Lemmy API. Most actions should work; trending or community-specific endpoints may differ. Open an Issue if you hit one.

Why is getUserProfile failing for a user? Lemmy user records live on the user's HOME instance. If you queried nutomic against lemmy.world, the actor returns a friendly hint to retry as nutomic@lemmy.ml. The actor auto-routes when you include the home-instance suffix.

Will I get rate-limited? Lemmy's per-IP rate limits are generous for read-only traffic and the actor backs off on 429 / Retry-After. For very heavy scraping consider donating to the instance you're hitting most.

Bug or feature request? Open an Issue on the actor's Issues tab. I usually respond within a day.

Need a scraper for Mastodon, Bluesky, Stack Overflow, Hacker News? See my other actors at https://apify.com/perconey, or open an Issue for a federated platform you need.