Scrape posts and comments from any public Lemmy community on any Lemmy instance — the federated Reddit alternative. We handle the pagination, retries, fingerprint rotation, and rate-limit pacing — you get typed dataset rows ready to export to CSV or JSON.
All notable changes to this project will be documented in this file.
[0.2.0] — 2026-05-20
Fixed
Add prefill to the discriminating input field so Apify's automated daily QA receives a runnable payload. Empty-input runs were tripping the Pydantic model_validate XOR/required-field check inside 100 ms, which after three consecutive days flagged the Actor "Under maintenance" and unlisted it from the Apify Store.
[0.1.0] — 2026-05-16 (published)
Published
Pushed to Apify Store as DevilScrapes/lemmy-community-scraper (actor id: 4yyYOxupijLPfFJIJ), build 0.1.1.
Cloud QA: run jsoPuEEcvAt1DAb5f — SUCCEEDED, 17 rows (5 posts + 12 comments), discriminator field present.
[0.1.0] — 2026-05-16
Added
Initial release: posts + optional comments per public Lemmy community, cursor-paginated posts and integer-paginated comments via the public /api/v3/ REST API.
Discriminated ResultRow: row_type="post" or "comment" in one dataset, with community + parent-post metadata denormalised onto every comment row.
17-token sort enum (Lemmy v0.19 compatible — bare Top rejected at input).
comment_parent_id derived from comment.path for downstream tree reconstruction.
Three PPE events: actor-start ($0.05), result-row ($0.002), result-row-comment ($0.001).
Pydantic v2 input + output models per ADR-0004.
Part of the Federated Social Suite alongside bluesky-starter-pack and bluesky-feed-posts.