Pricing

from $0.35 / 1,000 posts

Substack Newsletter Content Scraper

Scrape Substack newsletter posts, authors, dates, likes, comments, restacks, and article text. Built for content research, competitor tracking, and AI-ready datasets.

Pricing

from $0.35 / 1,000 posts

Rating

2.6

(2)

Developer

LIAICHI MUSTAPHA

Actor stats

Bookmarked

Total users

Monthly active users

8 days ago

Last modified

Categories

Social media

News

You can access the Substack Newsletter Content Scraper programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

{
  "openapi": "3.0.1",
  "info": {
    "version": "1.0",
    "x-build-id": "UZ0TlNZYATaMun0ks"
  },
  "servers": [
    {
      "url": "https://api.apify.com/v2"
    }
  ],
  "paths": {
    "/acts/scraper_guru~substack-scraper/run-sync-get-dataset-items": {
      "post": {
        "operationId": "run-sync-get-dataset-items-scraper_guru-substack-scraper",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK"
          }
        }
      }
    },
    "/acts/scraper_guru~substack-scraper/runs": {
      "post": {
        "operationId": "runs-sync-scraper_guru-substack-scraper",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor and returns information about the initiated run in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/runsResponseSchema"
                }
              }
            }
          }
        }
      }
    },
    "/acts/scraper_guru~substack-scraper/run-sync": {
      "post": {
        "operationId": "run-sync-scraper_guru-substack-scraper",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK"
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "inputSchema": {
        "type": "object",
        "required": [
          "substackUrls"
        ],
        "properties": {
          "substackUrls": {
            "title": "Substack URLs",
            "minItems": 1,
            "type": "array",
            "description": "List of Substack newsletter URLs to scrape. Format: https://newsletter.substack.com",
            "items": {
              "type": "string"
            }
          },
          "scrapingMethod": {
            "title": "Scraping Method",
            "enum": [
              "sitemap",
              "archive"
            ],
            "type": "string",
            "description": "Choose how to discover posts: Sitemap (faster, recommended) or Archive page (fallback if sitemap unavailable)",
            "default": "sitemap"
          },
          "maxPostsPerSubstack": {
            "title": "Max Posts Per Substack",
            "minimum": 0,
            "maximum": 10000,
            "type": "integer",
            "description": "Maximum number of posts to scrape from each Substack. Set to 0 for unlimited.",
            "default": 50
          },
          "batchSize": {
            "title": "Batch Size",
            "minimum": 1,
            "maximum": 100,
            "type": "integer",
            "description": "Number of Substacks to process in each batch. Lower values = more stable.",
            "default": 5
          },
          "postConcurrency": {
            "title": "Post Concurrency",
            "minimum": 1,
            "maximum": 10,
            "type": "integer",
            "description": "Number of post pages to scrape in parallel for each Substack. Lower values reduce timeouts.",
            "default": 3
          },
          "generateNewsletterDigest": {
            "title": "Generate Newsletter Digest",
            "type": "boolean",
            "description": "Build and save an optional HTML newsletter digest after scraping. Disabled by default for faster, more reliable runs.",
            "default": false
          }
        }
      },
      "runsResponseSchema": {
        "type": "object",
        "properties": {
          "data": {
            "type": "object",
            "properties": {
              "id": {
                "type": "string"
              },
              "actId": {
                "type": "string"
              },
              "userId": {
                "type": "string"
              },
              "startedAt": {
                "type": "string",
                "format": "date-time",
                "example": "2025-01-08T00:00:00.000Z"
              },
              "finishedAt": {
                "type": "string",
                "format": "date-time",
                "example": "2025-01-08T00:00:00.000Z"
              },
              "status": {
                "type": "string",
                "example": "READY"
              },
              "meta": {
                "type": "object",
                "properties": {
                  "origin": {
                    "type": "string",
                    "example": "API"
                  },
                  "userAgent": {
                    "type": "string"
                  }
                }
              },
              "stats": {
                "type": "object",
                "properties": {
                  "inputBodyLen": {
                    "type": "integer",
                    "example": 2000
                  },
                  "rebootCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "restartCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "resurrectCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "computeUnits": {
                    "type": "integer",
                    "example": 0
                  }
                }
              },
              "options": {
                "type": "object",
                "properties": {
                  "build": {
                    "type": "string",
                    "example": "latest"
                  },
                  "timeoutSecs": {
                    "type": "integer",
                    "example": 300
                  },
                  "memoryMbytes": {
                    "type": "integer",
                    "example": 1024
                  },
                  "diskMbytes": {
                    "type": "integer",
                    "example": 2048
                  }
                }
              },
              "buildId": {
                "type": "string"
              },
              "defaultKeyValueStoreId": {
                "type": "string"
              },
              "defaultDatasetId": {
                "type": "string"
              },
              "defaultRequestQueueId": {
                "type": "string"
              },
              "buildNumber": {
                "type": "string",
                "example": "1.0.0"
              },
              "containerUrl": {
                "type": "string"
              },
              "usage": {
                "type": "object",
                "properties": {
                  "ACTOR_COMPUTE_UNITS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_WRITES": {
                    "type": "integer",
                    "example": 1
                  },
                  "KEY_VALUE_STORE_LISTS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_INTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_EXTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_SERPS": {
                    "type": "integer",
                    "example": 0
                  }
                }
              },
              "usageTotalUsd": {
                "type": "number",
                "example": 0.00005
              },
              "usageUsd": {
                "type": "object",
                "properties": {
                  "ACTOR_COMPUTE_UNITS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_WRITES": {
                    "type": "number",
                    "example": 0.00005
                  },
                  "KEY_VALUE_STORE_LISTS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_INTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_EXTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_SERPS": {
                    "type": "integer",
                    "example": 0
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Substack Newsletter Content Scraper OpenAPI definition

OpenAPI is a standard for designing and describing RESTful APIs, allowing developers to define API structure, endpoints, and data formats in a machine-readable way. It simplifies API development, integration, and documentation.

OpenAPI is effective when used with AI agents and GPTs by standardizing how these systems interact with various APIs, for reliable integrations and efficient communication.

By defining machine-readable API specifications, OpenAPI allows AI models like GPTs to understand and use varied data sources, improving accuracy. This accelerates development, reduces errors, and provides context-aware responses, making OpenAPI a core component for AI applications.

You can download the OpenAPI definitions for Substack Newsletter Content Scraper from the options below:

OpenAPI.json

If you’d like to learn more about how OpenAPI powers GPTs, read our blog post.

You can also check out our other API clients:

Substack Newsletter Content Scraper API in Python

Substack Newsletter Content Scraper API in JavaScript

Substack Newsletter Content Scraper API through CLI

Substack Newsletter Content Scraper API

Substack Posts Scraper for Newsletter Research

skootle/substack-posts

Scrape Substack posts, authors, publication names, dates, excerpts, URLs, and metadata for newsletter research, creator tracking, content monitoring, and AI agents.

Skootle

Substack Scraper - Download Newsletter Content Fast

scrapers-hub/substack-scraper-download-newsletter-content-fast

Substack scraper extracts publicly available newsletter posts, titles, authors, publication dates, content, and metadata quickly 📰⚡ Perfect for content research, trend analysis, AI workflows, knowledge management, and newsletter monitoring.

Scrapers Hub

Substack Newsletter Scraper

scrapers-hub/substack-newsletter-scraper

Substack Newsletter scraper extracts publicly available newsletter posts, titles, authors, publication dates, subscriber-facing content, and metadata 📰📊 Perfect for content research, trend analysis, competitive intelligence, and newsletter monitoring.

Scrapers Hub

Substack Posts Scraper - Newsletter Data Extractor

klondikeking/substack-posts-scraper

Extract posts, engagement metrics, and newsletter data from Substack publications. Perfect for content research.

Pierrick McD0nald

Substack Newsletter Scraper

prince.sh/substack-scraper

Scrape Substack newsletter archives. Get post titles, body text, authors, and publish dates for any Substack publication. Perfect for content aggregation, news monitoring, writer research, and AI training datasets.

Prince Jain

Substack Posts Scraper - Newsletter Data

benthepythondev/substack-posts-scraper

Scrape public Substack newsletter posts from one or many publications. Extract titles, authors, dates, full content, images, categories and post URLs.

Ben

Substack Scraper — Newsletter Posts & Content

fast_api/substack-scraper

Extract Substack newsletter posts, titles, subtitles, publication dates, engagement metrics, and optional full text. Useful for media monitoring, creator research, market intelligence, and AI/RAG datasets.

Fast API

Substack Scraper — Posts, Engagement & Newsletter Analytics

samwise.agency/substack-scraper

Scrape any Substack publication: full post archives with likes, comments, restacks, paywall status, and optional full text. Works with custom domains. Clean JSON/CSV for newsletter research, competitive analysis, and content strategy.