Lemmy Community Scraper
Pricing
Pay per event
Lemmy Community Scraper
Scrape posts and comments from any public Lemmy community on any Lemmy instance — the federated Reddit alternative. We handle the pagination, retries, fingerprint rotation, and rate-limit pacing — you get typed dataset rows ready to export to CSV or JSON.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Lemmy Community Scraper
We do the dirty work so your dataset stays clean. 😈
$2.05 / 1,000 posts — Export posts and optional comments from any public Lemmy community on any Lemmy instance, via the public /api/v3/ REST API. No Lemmy account. No API key. No browser automation.
Lemmy is the federated Reddit alternative running the ActivityPub protocol; every Lemmy instance exposes the same versioned REST API with no authentication required for public reads. This Actor fetches community metadata once, cursor-paginates through posts, optionally fetches comments per post, and emits a flat dataset with community + post metadata denormalised onto every row — so flat CSV exports are self-contained for analytics, BI, or SQL with zero joins.
🎯 What this scrapes
This Actor exports two row types into a single dataset, discriminated by the row_type column:
- Post rows (
row_type="post") — one row per post in the target community. Always emitted. - Comment rows (
row_type="comment") — one row per comment on each post. Emitted only whenincludeCommentsis enabled.
Every row carries the community context (name, title, subscriber count, posts count, ActivityPub actor_id) and the post context (id, title, score, upvotes, downvotes, comment count) so downstream tools see a fully denormalised, join-free table. Comment rows additionally carry the comment id, content, path, score, and a derived comment_parent_id field that preserves the parent-child relationship for tree reconstruction.
| Field | Type | Description |
|---|---|---|
row_type | string | post or comment |
instance_url | string | Lemmy instance base URL |
community_actor_id | string | ActivityPub actor_id of the community |
community_name | string | Local community name (e.g. asklemmy) |
community_title | string | Human-readable community title |
community_description | string | null | Community description, if set |
community_subscribers | integer | Subscriber count |
community_posts_count | integer | Total posts in the community |
community_local | boolean | True if the community is local to the instance |
post_id | integer | Lemmy local post id |
post_ap_id | string | ActivityPub canonical URL of the post |
post_url | string | Canonical post URL (same as post_ap_id) |
post_title | string | Post title |
post_body | string | null | Post body text; null on link posts |
post_external_url | string | null | External link URL; null on text posts |
post_score | integer | Net post score (upvotes - downvotes) |
post_upvotes | integer | Number of upvotes on the post |
post_downvotes | integer | Number of downvotes on the post |
post_comments_count | integer | Total comments on the post |
post_published | string | ISO 8601 UTC datetime |
post_updated | string | null | ISO 8601 UTC datetime of last edit |
comment_id | integer | null | Lemmy local comment id (comment rows only) |
comment_path | string | null | Lemmy thread path (e.g. 0.12345.67890) |
comment_content | string | null | Comment body text |
comment_score | integer | null | Net comment score |
comment_published | string | null | ISO 8601 UTC datetime |
comment_parent_id | integer | null | Parent comment id derived from path |
author_name | string | Author username (local form) |
author_display_name | string | null | Author display name, if set |
scraped_at | string | ISO 8601 UTC datetime this row was written |
🔥 Features
- No Lemmy account required — uses the public unauthenticated
/api/v3/REST API. - Supports any Lemmy instance —
lemmy.world,lemmy.ml,beehaw.org,sh.itjust.works, and any other public instance running Lemmy v0.19. - Two operating modes from one input — posts only, or posts + comments in the same dataset.
- Federated community syntax — pass
memes@lemmy.worldto scrape a remote community from any other instance, orasklemmyfor a local community on the chosen instance. - 17-token sort enum —
Active,Hot,New,Old,Scaled,Controversial,MostComments,NewComments,TopHour,TopSixHour,TopTwelveHour,TopDay,TopWeek,TopMonth,TopYear,TopAll. - Cursor-based post pagination + integer-page comment pagination — both verified against Lemmy v0.19 on live instances.
- Denormalised output — community metadata on every row, no joins needed for downstream analytics or CSV exports.
comment_parent_idderived fromcomment.path— preserves the comment tree so callers can reconstruct threaded discussion.- Exponential backoff with
Retry-Afterhonoured for408 / 429 / 503responses; max 5 attempts. - Pure HTTP client (
curl-cffiwith browser fingerprint impersonation) — no browser automation, low compute footprint. - Pydantic v2 input validation with named sort enum and range bounds; bare
Top(invalid on v0.19) is rejected up-front before any network call. - Pairs with
bluesky-feed-postsandbluesky-starter-packas the Federated Social Suite.
💡 Use cases
- Reddit-alternative migration research — track how communities and engagement migrate from Reddit to Lemmy after policy changes; compare subscriber and post growth across instances.
- Newsroom monitoring — subscribe to journalism, politics, or breaking-news Lemmy communities and pipe the latest top posts to Slack or Google Sheets via Apify integrations.
- Brand monitoring on the fediverse — Lemmy is a growing channel for product complaints, support discussions, and competitor mentions outside the Reddit walled garden; this Actor surfaces them on a schedule.
- Academic federated-social research — Lemmy's open, public REST API is significantly more accessible than Reddit's gated API; ideal for longitudinal community studies, sentiment tracking, or content-moderation research.
- Community trend analysis — pull
TopWeekposts across multiple communities and rank by score, comment count, or upvote ratio to benchmark community health. - Comment-tree reconstruction — combine post rows with comment rows (joined by
post_id) and thecomment_parent_idfield to rebuild the full discussion tree for NLP or moderation pipelines.
⚙️ How to use it
- Open the Actor input form.
- Set Lemmy instance URL to the base URL of any public Lemmy instance — e.g.
https://lemmy.worldorhttps://lemmy.ml. Trailing slash is stripped automatically. - Set Community name to the community you want to scrape. Either local form (
asklemmy) for a community on the chosen instance, or federated form (memes@lemmy.world) for a remote community visible from the chosen instance. - Pick a Post sort order from the 17 valid tokens.
Hot(default) blends recency and engagement;TopWeekreturns the highest-scoring posts of the last 7 days;Newreturns chronological order. - Adjust Max posts (default 100, max 5,000).
- Toggle Include comments if you want comment rows alongside post rows. When enabled, set Max comments per post (default 50, max 500).
- Leave Use Apify Proxy off unless you are behind a restrictive ISP — Lemmy instances do not block datacenter IPs, so direct routing is faster and free.
- Click Start. Results stream into the default dataset and can be exported as JSON, CSV, Excel, or XML via the Export button.
Single-community example
{"instanceUrl": "https://lemmy.world","communityName": "asklemmy","sort": "TopWeek","maxPosts": 200,"includeComments": false,"useProxy": false}
Posts + comments example
{"instanceUrl": "https://lemmy.ml","communityName": "memes@lemmy.world","sort": "Hot","maxPosts": 50,"includeComments": true,"maxCommentsPerPost": 25,"useProxy": false}
📥 Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
instanceUrl | string | yes | — | Base URL of the Lemmy instance |
communityName | string | yes | — | Community name (local or federated form) |
sort | string (enum) | no | Hot | One of 17 valid Lemmy v0.19 sort tokens |
maxPosts | integer | no | 100 | Max post rows emitted (1–5000) |
includeComments | boolean | no | false | If true, also fetch comments per post |
maxCommentsPerPost | integer | no | 50 | Max comment rows per post (1–500) |
useProxy | boolean | no | false | Route through Apify Proxy (BUYPROXIES94952) |
The full list of valid sort tokens is: Active, Hot, New, Old, Scaled, Controversial, MostComments, NewComments, TopHour, TopSixHour, TopTwelveHour, TopDay, TopWeek, TopMonth, TopYear, TopAll. Bare Top is not valid on Lemmy v0.19 and is rejected by the input validator before any network call.
📤 Output
One row per post or per comment. Community + post metadata is denormalised onto every comment row so a flat CSV is self-contained.
{"row_type": "post","instance_url": "https://lemmy.world","community_actor_id": "https://lemmy.world/c/asklemmy","community_name": "asklemmy","community_title": "Ask Lemmy","community_description": "A Lemmy equivalent of Ask Reddit.","community_subscribers": 39567,"community_posts_count": 8853,"community_local": true,"post_id": 46934299,"post_ap_id": "https://leminal.space/post/35446889","post_url": "https://leminal.space/post/35446889","post_title": "What's something that you feel genuinely sad about?","post_body": null,"post_external_url": null,"post_score": 34,"post_upvotes": 38,"post_downvotes": 4,"post_comments_count": 18,"post_published": "2026-05-16T09:31:47.723504Z","post_updated": null,"post_nsfw": false,"post_featured_community": false,"post_locked": false,"comment_id": null,"comment_ap_id": null,"comment_path": null,"comment_content": null,"comment_score": null,"comment_published": null,"comment_parent_id": null,"author_actor_id": "https://leminal.space/u/FosterMolasses","author_name": "FosterMolasses","author_display_name": null,"author_bot_account": false,"author_published": "2025-01-17T14:35:51.850105Z","scraped_at": "2026-05-16T12:00:00.000Z"}
Comment rows have the same shape, with row_type="comment" and the comment fields populated. The comment_parent_id field is derived from the penultimate segment of comment.path — null for top-level comments, otherwise the integer id of the parent comment — so callers can reconstruct the full thread tree without an extra API call.
Export formats
- JSON — full fidelity, all fields, newline-delimited
- CSV — flat, one row per post or comment, all columns including denormalised community + post metadata
- Excel —
.xlsxvia the Apify dataset converter - XML — structured per-item
All formats are available via the Apify API: GET /datasets/{id}/items?format=csv&clean=true.
💰 Pricing
Pay-Per-Event (PPE) — you pay only for what you use:
| Event | Price (USD) | When |
|---|---|---|
actor-start | $0.05 | Once per run, at boot |
result-row | $0.002 | Per post row written to the dataset |
result-row-comment | $0.001 | Per comment row written to the dataset |
Example costs
| Run | Cost |
|---|---|
| 100 posts (no comments) | $0.25 |
| 500 posts (no comments) | $1.05 |
| 1,000 posts (no comments) | $2.05 |
| 100 posts × 50 comments = 5,100 rows | $5.30 |
| 500 posts × 25 comments = 13,000 rows | $13.55 |
Comment rows are half the price of post rows because they are higher-volume and have lower per-row commercial value for most analytics use cases. Disable comment fetching when you only need post-level data — the Actor is significantly cheaper and faster.
🚧 Limitations
- Public communities only. Private or access-restricted communities require authentication and are out of scope.
- One community per run. This Actor scrapes one community per run; multi-community or instance-wide listings need separate runs (or a different Actor).
- No media download. Image, video, and external link URLs are captured in the row, but the Actor does not download the media content itself.
- No real-time streaming. The Actor takes a snapshot at run time; for live updates schedule recurring runs via Apify Schedules.
- 7-day default storage retention on the Apify FREE tier. Export your dataset immediately after the run or upgrade for longer retention.
- Lemmy v0.19 only. The Actor targets the v0.19 API surface and field paths. v0.20 (when released) may require updates; Pydantic
extra="ignore"absorbs additive changes but breaking removes will require a new version. - Comment-tree depth is preserved via
comment_pathbut the Actor does not return comments in a pre-nested structure — callers reconstruct the tree fromcomment_parent_id.
❓ FAQ
Do I need a Lemmy account?
No. Lemmy's public /api/v3/ REST API is unauthenticated by design — every endpoint this Actor calls is open to anyone without a login, signup, or API key.
What is the federated community format?
Lemmy communities can live on any instance but be subscribed to and read from any other instance via ActivityPub federation. The federated form is community@instance.tld — for example memes@lemmy.world refers to the memes community hosted on lemmy.world, accessible from any other Lemmy instance that has fetched it via federation. Local communities (on the instance you point this Actor at) use the bare local name like asklemmy.
Why does sort=Top not work?
Lemmy v0.19 removed the bare Top sort token and replaced it with compound tokens that embed a time range: TopHour, TopSixHour, TopTwelveHour, TopDay, TopWeek, TopMonth, TopYear, TopAll. Passing bare Top returns {"error": "unknown"} from the Lemmy API. This Actor rejects bare Top up-front during input validation — pick the compound token whose time range you want.
Which Lemmy instance should I point this at?
Any public Lemmy instance running v0.19. lemmy.world and lemmy.ml are the two largest general-purpose instances and good defaults. Topic-specific instances like beehaw.org, sh.itjust.works, or programming.dev are also fully supported. The instance URL you choose determines which "view" of the federated network this Actor scrapes from — different instances may have fetched different remote communities and different comment ranges, so re-running against a different instance can return slightly different remote-community data.
How do I reconstruct the comment tree from the dataset?
Each comment row carries comment_id and comment_parent_id. Group rows by post_id, then for each post: top-level comments have comment_parent_id IS NULL; replies have comment_parent_id pointing at the parent comment's comment_id within the same post. The comment_path field encodes the full ancestry (e.g. 0.12345.67890 means parent comment 12345, this comment 67890) for callers who need it.
Is scraping public Lemmy data legal?
Lemmy is AGPL-3.0 free and open-source software. Lemmy instances explicitly serve public community data through an unauthenticated REST API designed for federation. Always verify the specific instance's terms of service and your local jurisdiction's data-protection rules before using scraped data for commercial purposes.
Related Actors
Part of the Devil Scrapes Federated Social Suite:
- Bluesky Starter Pack Scraper — export full member lists from any public Bluesky Starter Pack via the AT Protocol public API.
- Bluesky Feed Posts Scraper — export posts from any public Bluesky custom feed or algorithm feed, with denormalised feed metadata on every row.
All three Actors share consistent pricing ($0.002 per post row, $0.05 per run) and field-naming conventions (<entity>_<field> snake_case) so cross-network federated-social analyses can join cleanly on author_handle / author_name.
💬 Your feedback
Found a bug, hit a rate limit, or need a new field on the output row? Open an issue on the Actor's Apify Store page or contact the Devil Scrapes team at apify.com/DevilScrapes. We ship updates within days of validated reports.