# 👾 Lemmy Scraper - Federated Reddit Posts & Comments (`benthepythondev/lemmy-scraper`) Actor

Scrape Lemmy (the federated Reddit alternative) from any instance via the public API — no login needed. Get front-page or per-community posts, comments, keyword search, and community data. Clean JSON with scores, upvotes & comment counts.

- **URL**: https://apify.com/benthepythondev/lemmy-scraper.md
- **Developed by:** [ben](https://apify.com/benthepythondev) (community)
- **Categories:** Social media
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 👾 Lemmy Scraper — Posts, Comments & Communities (Federated Reddit)

Extract **Lemmy** data — the open, federated **Reddit alternative** — from **any
instance** (lemmy.world, lemmy.ml, sh.itjust.works, beehaw.org and more) through the
public API. Pull front-page or per-community posts, comments, keyword search results,
or community data as clean, structured JSON with Reddit-style scores,
upvotes/downvotes and comment counts — **no login required**. Perfect for communities
that left Reddit and for open-social research. Export to JSON/CSV/Excel, run on a
schedule, call via API, or connect to Make, Zapier or n8n.

### 👾 What is the Lemmy Scraper?

It turns any Lemmy instance into a structured dataset. Point it at a server, pick a
mode — front-page posts, posts from specific communities, comments, a keyword search,
or a list of communities — set a sort order, and it returns every matching record
straight from Lemmy's public REST API. Query the whole federated network or just one
instance, and reach cross-instance communities like `technology@lemmy.world`. It reads
a clean JSON API instead of a headless browser, so it's fast and cheap.

#### What data does it extract?

- **Post title, body and the link URL** it points to
- **Reddit-style metrics** — score, upvotes, downvotes and comment count
- **Community info** — name, title and community URL
- **Creator (author) name and profile URL**
- **Publish date, NSFW flag and thumbnail image**
- **Comments** — content, score, parent post title and author (comments mode)
- **Community listings** — description, subscribers, post/comment totals and monthly active users
- **Canonical post/comment URLs** (`ap_id`), plus a `scraped_at` timestamp

### ⬇️ Input

Choose an `instance` and a `mode`, then add `communities`, a `query` or a `sort` as
needed:

| Field | Description |
|-------|-------------|
| `mode` | `posts`, `community`, `search`, `comments` or `communities` |
| `instance` | Lemmy server to query, e.g. `lemmy.world`, `lemmy.ml`, `beehaw.org` |
| `communities` | Community names, e.g. `technology`, `asklemmy`, or cross-instance `technology@lemmy.world` |
| `query` | Keyword (search mode = posts; communities mode = community names) |
| `sort` | `Hot`, `Active`, `New`, `TopDay/Week/Month/Year/All`, `MostComments` |
| `listingType` | `All` (whole federated network) or `Local` (this instance only) |
| `maxItems` | Max records to return (1–50000) |
| `proxyConfiguration` | Optional Apify Proxy for IP rotation on large runs |

#### Example input

```json
{
  "mode": "community",
  "instance": "lemmy.world",
  "communities": ["technology", "asklemmy"],
  "sort": "TopWeek",
  "maxItems": 500
}
````

### ⬆️ Output

Every post (or comment/community) is one clean row — view it as a **table**, or export
**JSON / CSV / Excel**:

```json
{
  "type": "post",
  "id": 48685969,
  "title": "Self-hosting is easier than ever in 2026",
  "body": "Here's my setup...",
  "link_url": "https://example.com/article",
  "post_url": "https://lemmy.world/post/48685969",
  "published": "2026-06-26T09:00:00Z",
  "nsfw": false,
  "score": 842,
  "upvotes": 901,
  "downvotes": 59,
  "comments_count": 137,
  "community_name": "technology",
  "community_title": "Technology",
  "community_url": "https://lemmy.world/c/technology",
  "creator_name": "alice",
  "creator_url": "https://lemmy.world/u/alice",
  "thumbnail_url": "https://lemmy.world/pictrs/image/abc.jpg",
  "scraped_at": "2026-06-26T15:30:00.000Z"
}
```

### 💡 Use cases

- **👂 Community & topic monitoring:** track discussions about a product, brand or topic across the fediverse.
- **🔄 Reddit-migration research:** follow the communities and audiences that moved off Reddit.
- **📈 Trend & sentiment analysis:** feed posts and comments straight into an LLM.
- **🔥 Content discovery:** surface the top posts by community and time window with one sort setting.

### ❓ FAQ

**How do I scrape Lemmy posts?** Set `mode: posts` for the front page, or
`mode: community` with one or more `communities`, choose an `instance` and a `sort`,
and Run. You get every post with title, body, link, scores and comment counts.

**Do I need an API key or login?** No — public posts, comments and communities all
work with no login, straight from Lemmy's public REST API.

**Does it work on any instance, and is it federated?** Yes — point `instance` at any
Lemmy server. With `listingType: All` it sees most of the whole federated network;
with `Local` it stays on that one instance. `lemmy.world` is the largest starting
point.

**Can I scrape a community on another instance?** Yes — use `community@instance`
(e.g. `technology@lemmy.world`), since Lemmy is federated and resolves it for you.

**Can I get comments, not just posts?** Yes — `mode: comments` returns comments per
community (via `communities`) or instance-wide, with content, score, the parent post
title and the author.

**How do I find communities to scrape?** Use `mode: communities` with a `query` to
search community names, or leave the query empty to list the instance's top
communities with subscriber and activity counts.

**How many records can it return?** Up to your `maxItems` cap (up to 50,000); it
paginates automatically and, in community/comments modes, splits the cap across the
communities you give it.

**Can I run it on a schedule or via API?** Yes — schedule recurring runs in Apify,
call it via the API/SDK, or connect it to Make, Zapier or n8n.

**Is scraping Lemmy legal?** It reads publicly available data via Lemmy's own public
API. Use it responsibly for research and monitoring, and follow applicable laws and
each instance's terms.

### 🔗 You might also like

- **[Bluesky Scraper](https://apify.com/benthepythondev/bluesky-scraper)** — posts, profiles, followers & search
- **[Mastodon Scraper](https://apify.com/benthepythondev/mastodon-scraper)** — posts, hashtags & trends from any instance
- **[Reddit Scraper](https://apify.com/benthepythondev/reddit-scraper)** — posts, comments & communities
- **[Hacker News Intelligence](https://apify.com/benthepythondev/hacker-news-intelligence)** — stories, comments & trends

***

**Keywords:** Lemmy scraper, Lemmy API, fediverse scraper, federated Reddit, Reddit alternative scraper, Lemmy posts, Lemmy comments, Lemmy communities, ActivityPub, lemmy.world scraper, social media scraper, social listening, sentiment analysis, open social data, Reddit migration.

# Actor input Schema

## `mode` (type: `string`):

posts = front-page/instance posts; community = posts from specific communities; search = keyword search; comments = comments; communities = list/search communities.

## `instance` (type: `string`):

The Lemmy server to query, e.g. 'lemmy.world', 'lemmy.ml', 'sh.itjust.works', 'beehaw.org'. Public data needs no login.

## `communities` (type: `array`):

Community names, e.g. 'technology', 'asklemmy', or cross-instance 'technology@lemmy.world'. The leading ! is optional.

## `query` (type: `string`):

Keyword to search (search mode = posts; communities mode = community names).

## `sort` (type: `string`):

Sort order for posts/comments.

## `listingType` (type: `string`):

All = whole federated network; Local = only this instance's communities.

## `maxItems` (type: `integer`):

Maximum number of records to return.

## `proxyConfiguration` (type: `object`):

Optional. Lemmy's API is public, so a proxy is not required; Apify Proxy (auto) is fine for IP rotation on large runs.

## Actor input object example

```json
{
  "mode": "posts",
  "instance": "lemmy.world",
  "sort": "Hot",
  "listingType": "All",
  "maxItems": 100,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "posts",
    "instance": "lemmy.world",
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("benthepythondev/lemmy-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "posts",
    "instance": "lemmy.world",
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("benthepythondev/lemmy-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "posts",
  "instance": "lemmy.world",
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call benthepythondev/lemmy-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=benthepythondev/lemmy-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "👾 Lemmy Scraper - Federated Reddit Posts & Comments",
        "description": "Scrape Lemmy (the federated Reddit alternative) from any instance via the public API — no login needed. Get front-page or per-community posts, comments, keyword search, and community data. Clean JSON with scores, upvotes & comment counts.",
        "version": "0.1",
        "x-build-id": "EvFYpF7bxo7tgpqOV"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/benthepythondev~lemmy-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-benthepythondev-lemmy-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/benthepythondev~lemmy-scraper/runs": {
            "post": {
                "operationId": "runs-sync-benthepythondev-lemmy-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/benthepythondev~lemmy-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-benthepythondev-lemmy-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "mode": {
                        "title": "What to scrape",
                        "enum": [
                            "posts",
                            "community",
                            "search",
                            "comments",
                            "communities"
                        ],
                        "type": "string",
                        "description": "posts = front-page/instance posts; community = posts from specific communities; search = keyword search; comments = comments; communities = list/search communities.",
                        "default": "posts"
                    },
                    "instance": {
                        "title": "Lemmy instance",
                        "type": "string",
                        "description": "The Lemmy server to query, e.g. 'lemmy.world', 'lemmy.ml', 'sh.itjust.works', 'beehaw.org'. Public data needs no login.",
                        "default": "lemmy.world"
                    },
                    "communities": {
                        "title": "Communities (for community / comments mode)",
                        "type": "array",
                        "description": "Community names, e.g. 'technology', 'asklemmy', or cross-instance 'technology@lemmy.world'. The leading ! is optional.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "query": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Keyword to search (search mode = posts; communities mode = community names)."
                    },
                    "sort": {
                        "title": "Sort",
                        "enum": [
                            "Hot",
                            "Active",
                            "New",
                            "TopDay",
                            "TopWeek",
                            "TopMonth",
                            "TopYear",
                            "TopAll",
                            "MostComments"
                        ],
                        "type": "string",
                        "description": "Sort order for posts/comments.",
                        "default": "Hot"
                    },
                    "listingType": {
                        "title": "Listing type",
                        "enum": [
                            "All",
                            "Local"
                        ],
                        "type": "string",
                        "description": "All = whole federated network; Local = only this instance's communities.",
                        "default": "All"
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 50000,
                        "type": "integer",
                        "description": "Maximum number of records to return.",
                        "default": 100
                    },
                    "proxyConfiguration": {
                        "title": "Proxy",
                        "type": "object",
                        "description": "Optional. Lemmy's API is public, so a proxy is not required; Apify Proxy (auto) is fine for IP rotation on large runs.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
