Substack Scraper | $2 / 1k | All-In-One
Pricing
from $1.99 / 1,000 results
Substack Scraper | $2 / 1k | All-In-One
Get full articles, user profiles, and search results with All-in-One Substack Scraper. Extract rich data including titles, bios, subscriber counts, social links and engagement metrics. ideal for market research, creator discovery, trend tracking, and audience analysis.
Pricing
from $1.99 / 1,000 results
Rating
0.0
(0)
Developer
Fatih Tahta
Actor stats
4
Bookmarked
92
Total users
18
Monthly active users
0.23 hours
Issues response
3 days ago
Last modified
Categories
Share
Substack Scraper | All-In-One
Slug: fatihtahta/substack-scraper
Overview
Substack Scraper | All-In-One collects structured public data from https://substack.com, including search results, publications, people, posts, notes, comments, and reply-thread content where supported by the selected input. Records can include core identifiers, URLs, titles, publication metadata, author information, timestamps, engagement counts, and richer content fields when detail enrichment is enabled. Substack is a widely used publishing and newsletter platform, which makes its public content useful for market intelligence, editorial analysis, research, enrichment, and monitoring workflows. This actor is designed for repeatable, automated collection with consistent JSON output that can be used directly in downstream systems. It is suited to recurring data acquisition where users need a dependable, operationally clear workflow for collecting current public records over time.
Why Use This Actor
- Market research and analytics teams: collect structured Substack posts, publications, people, and recent discussions for market intelligence, topic tracking, coverage analysis, and operational reporting.
- Product and content teams: monitor subject trends, publication activity, post velocity, and audience-facing content patterns to support editorial planning and content benchmarking.
- Developers and data engineering teams: feed normalized JSON records into ETL jobs, search indexes, data warehouses, enrichment pipelines, and other downstream systems with minimal reshaping.
- Lead generation and enrichment workflows: gather public creator, publication, and content attributes that can supplement prospecting, segmentation, and account research processes.
- Monitoring and competitive tracking teams: run recurring collections to watch for newly published content, public conversation activity, and movement across tracked keywords or known pages.
Common Use Cases
- Market intelligence: track topic coverage, newly published posts, publication activity, and audience-visible conversation around specific themes.
- Lead generation: build curated lists of relevant writers, publications, or topic-specific public profiles for outreach research or enrichment.
- Competitive monitoring: watch how tracked publications or authors publish, position, and discuss topics over time.
- Catalog and directory building: populate internal databases with structured public publication, author, and post records.
- Data enrichment: append fresh public content and publication attributes to CRM, BI, or analytics datasets.
- Recurring reporting: schedule periodic runs to produce current datasets for dashboards, alerts, and trend reviews.
- Conversation tracking: collect notes, comments, and optional reply threads when the goal is to analyze public engagement rather than long-form posts alone.
Quick Start
- Choose your input strategy: add known
startUrls, enter one or morequeries, or combine both when you need discovery and direct collection in the same run. - For a first validation run, set a small
limitso you can quickly confirm the output shape and record types. - Select the appropriate
result_typefor keyword searches, and add optional filters such aspublication_date,within_days, orlanguageif you need a narrower scope. - Enable
enrich_datawhen you want fuller records, and turn onget_repliesonly when reply-thread collection is relevant to explicit note or comment-thread targets. - Run the actor in Apify Console and inspect the first dataset records.
- Increase coverage, refine filters, or add a schedule after the dataset matches your intended workflow.
Input Parameters
Use startUrls for direct collection from known Substack pages, queries for keyword-driven discovery, or combine both in a single run.
| Parameter | Type | Description | Default |
|---|---|---|---|
startUrls | array of strings | Exact Substack pages to collect from directly. Publication homepages are collected through the archive feed (sort=new, paged by offset/limit), while note/comment pages are treated as thread sources. Useful for known search pages, publication profiles, individual posts, notes, comment pages, or custom-domain pages. | – |
queries | array of strings | One or more keyword searches. Each query becomes its own search target for discovery-oriented runs. | – |
result_type | string | Result set to return for keyword searches only. Allowed values: top, recent, posts, publications, people. | top |
publication_date | string | Optional recency filter for keyword-based post results. Allowed values: last_day (24 hours), last_week (7 days), last_month, last_year. | – |
within_days | integer | Keep only posts published within the last N days. Minimum value: 1 day. | – |
language | string | Restrict keyword-based post results to a specific language. Allowed values: ar, zh, cs, nl, en, fr, de, el, hi, hu, id, it, ja, ko, la, no, pl, pt, ro, ru, es, sv, th, tr, vi. | – |
enrich_data | boolean | Fetch fuller records for supported posts, publications, or people instead of lighter search-result entries. Useful when detail is more important than run speed. | true |
get_replies | boolean | Collect reply threads for supported note or comment detail pages. It does not expand publication homepage discovery into comment-thread output. | false |
max_replies | integer | Maximum replies to save per post or thread. Use 0 to suppress replies when reply collection is enabled. | – |
limit | integer | Per-input cap on the number of results saved for each startUrls entry or search query. Minimum value: 1. Leave empty for broader collection. | – |
proxyConfiguration | object | Connection settings for Apify Proxy or custom proxy configuration when you need a different routing path. | {"useApifyProxy":true,"apifyProxyGroups":["RESIDENTIAL"]} |
Choosing Inputs
Use startUrls when you already know the exact Substack pages you want to collect, such as a specific publication, post, search page, note, or curated list of known targets. Publication homepage startUrls collect posts from that publication through the archive API rather than a mixed homepage snapshot. Use queries when you want discovery-oriented collection based on topics, names, brands, or keywords.
Narrower filters such as result_type, publication_date, within_days, and language produce more targeted datasets, while broader settings improve discovery and surface more varied records. If you are validating a new workflow, start with a small limit, inspect the first records, and then increase coverage after confirming that the returned types and fields match your downstream needs.
Example Inputs
Scenario: keyword discovery for recent posts
{"queries": ["artificial intelligence", "developer tools"],"result_type": "posts","publication_date": "last_week","language": "en","enrich_data": true,"limit": 20}
Scenario: direct URL collection from known pages
{"startUrls": ["https://substack.com/search/ai?searching=top","https://www.platformer.news/"],"enrich_data": true,"get_replies": false,"within_days": 30,"limit": 15}
Scenario: targeted monitoring with reply collection
{"queries": ["movie"],"result_type": "recent","within_days": 7,"enrich_data": true,"get_replies": true,"max_replies": 25,"limit": 10}
Output
9.1 Output destination
The actor writes results to an Apify dataset as JSON records. The dataset is designed for direct consumption by analytics tools, ETL pipelines, and downstream APIs without post-processing.
Each item contains a stable record envelope plus a type-specific payload when multiple entity types are returned in the same run.
9.2 Record envelope (all items)
- type (string, required): record family such as
postorcomment. - id (number, required): primary record identifier. Treat this as an opaque identifier in downstream systems because some record families may expose prefixed or string-serialized forms in the dataset.
- url (string, required): canonical public URL for the record.
Recommended idempotency key: type + ":" + id
Use that composite key for deduplication and upserts when loading repeated runs into warehouses, search indexes, CRMs, or operational databases. The stable envelope makes records easier to merge, deduplicate, and sync across recurring collections.
9.3 Examples
Notes:
- The examples below match the current enriched dataset structure returned when
enrich_datais enabled. - Long
body_htmlstrings are abbreviated inside the value for readability, but the example records show the full field shape now returned by the actor.
Example: post (type = "post")
{"id": "193819329","type": "post","title": "Substack Finally Introduced Notes Scheduling. This Is How To Do It.","description": "Plus, at least FIVE ways you can use it to help your Substack grow and experiment intentionally.","published_at": "2026-04-11T12:02:55.328000+00:00","updated_at": "2026-04-11T12:10:13.090000+00:00","url": "https://unstackit.substack.com/p/schedule-substack-notes","author_name": "Kristi Keller \ud83c\udde8\ud83c\udde6","author_handle": "kristikeller","publication_name": "Unstack Substack","publication_url": "https://unstackit.substack.com","main_post_id": "193819329","main_post_title": "Substack Finally Introduced Notes Scheduling. This Is How To Do It.","main_post_url": "https://unstackit.substack.com/p/schedule-substack-notes","main_post_published_at": "2026-04-11T12:02:55.328000+00:00","subtitle": "Plus, at least FIVE ways you can use it to experiment and help your Substack grow intentionally.","truncated_body_text": "This question has come up multiple times from several clients since Substack introduced the Notes feature:","image_url": "https://substack-post-media.s3.amazonaws.com/public/images/626d8a7f-9db9-4f6e-933a-6663a9f9fbe8_2240x1260.png","body_text": "This question has come up multiple times from several clients since Substack introduced the Notes feature:","body_html": "<p>This question has come up multiple times from several clients since Substack introduced the Notes feature:</p><blockquote><p><em><strong>\u201cCan we schedule our notes to publish at a later date?\u201d</strong></em></p></blockquote><p>...</p>","word_count": 549,"reaction_count": 60,"comment_count": 19,"child_comment_count": 11,"restack_count": 8,"post_tags": [{"id": "871d1cd1-6fa0-4a94-bf79-d4657addba2d","publication_id": 2328962,"name": "Substack Notes","slug": "substack-notes","hidden": false},{"id": "ed4d7b60-078b-404f-a2e9-21f49cf290f6","publication_id": 2328962,"name": "Substack How-To","slug": "substack-how-to","hidden": false}],"published_bylines": [{"id": 165322060,"name": "Kristi Keller \ud83c\udde8\ud83c\udde6","handle": "kristikeller","photo_url": "https://substackcdn.com/image/fetch/$s_!TdHE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bee1713-6b2a-45fa-bbb6-e9437756af86_316x326.png","bio": "Certified independent liberation advisor & builder of highly engaged Substack communities.","is_guest": false}],"audio_items": [{"post_id": 193819329,"voice_id": "en-US-AlloyTurboMultilingualNeural","audio_url": "https://substack-video.s3.amazonaws.com/video_upload/post/193819329/tts/12334368-98cb-45d6-a507-805e3a789351/en-US-AlloyTurboMultilingualNeural.mp3","type": "tts","status": "completed"}],"comments": [{"id": 241695637,"type": "comment","body_text": "I scheduled a note everyday this week about tulips \ud83c\udf37 and it was quick and easy ...","published_at": "2026-04-11T12:34:50.936000+00:00","author_name": "Cerina Triglavcanin","author_handle": "cerinatrig","photo_url": "https://substack-post-media.s3.amazonaws.com/public/images/c3d05e18-ea68-4208-981e-47638e58953b_405x405.png","reaction_count": 3,"children_count": 2,"childrenSummary": "2 replies by Kristi Keller \ud83c\udde8\ud83c\udde6 and others"}],"post": {"audience": "everyone","canonical_url": "https://unstackit.substack.com/p/schedule-substack-notes","id": 193819329,"post_date": "2026-04-11T12:02:55.328Z","updated_at": "2026-04-11T12:10:13.090Z","publication_id": 2328962,"search_engine_description": "FIVE ways you can use Notes scheduling to help your Substack grow and experiment intentionally.","slug": "schedule-substack-notes","subtitle": "Plus, at least FIVE ways you can use it to experiment and help your Substack grow intentionally.","title": "Substack Finally Introduced Notes Scheduling. This Is How To Do It.","type": "newsletter","write_comment_permissions": "everyone","is_published": true,"restacks": 8,"reactions": {"\u2764": 60}},"publication": {"id": 2328962,"name": "Unstack Substack","subdomain": "unstackit","logo_url": "https://substack-post-media.s3.amazonaws.com/public/images/9ec83ce1-c3b2-4b6f-87d9-41902e60a69f_500x500.png","homepage_type": "magaziney","community_enabled": true,"payments_state": "enabled"},"publication_settings": {"enable_new_publisher": true,"podcast_enabled": false},"page_rank": 0}
Example: comment (type = "comment")
{"id": "c-247726222","type": "comment","title": "Movies that Made Me Love Movies: This one I got to take a friend to the drive-in and see it! I was so excited! ...","description": "Movies that Made Me Love Movies: This one I got to take a friend to the drive-in and see it! I was so excited! ...","published_at": "2026-04-23T00:49:31.944000+00:00","url": "https://substack.com/@backyardmoviecritic/note/c-247726222","author_name": "Backyard Movie Critic","author_handle": "backyardmoviecritic","publication_name": "Backyard Movie Critic","publication_url": "https://backyardmoviecritic.substack.com","image_url": "https://substack-post-media.s3.amazonaws.com/public/images/1edaf33e-19c0-4c26-ab04-0df6f0e34e3d_452x452.png","body_text": "Movies that Made Me Love Movies: This one I got to take a friend to the drive-in and see it! I was so excited! ...","body_json": {"type": "doc","attrs": {"schemaVersion": "v1"},"content": [{"type": "paragraph","content": [{"type": "text","text": "Movies that Made Me Love Movies:"}]}]},"attachments": [{"id": "3c033bf5-9636-4857-b762-dc298a726531","type": "image","imageUrl": "https://substack-post-media.s3.amazonaws.com/public/images/8057c6bf-2643-42a2-a5bd-800cc068e831_700x1050.png","imageWidth": 700,"imageHeight": 1050,"explicit": false}],"reaction_count": 40,"comment_count": 6,"restack_count": 2,"context": {"timestamp": "2026-04-23T00:49:31.944Z","type": "note","source": "db-note"},"comment": {"id": 247726222,"body": "Movies that Made Me Love Movies: This one I got to take a friend to the drive-in and see it! I was so excited! ...","body_json": {"type": "doc","attrs": {"schemaVersion": "v1"},"content": [{"type": "paragraph","content": [{"type": "text","text": "Movies that Made Me Love Movies:"}]}]},"type": "feed","date": "2026-04-23T00:49:31.944Z","name": "Backyard Movie Critic","photo_url": "https://substack-post-media.s3.amazonaws.com/public/images/b427d26c-e0cb-4908-b8b6-2033359c2565_399x399.jpeg","bio": "Gen X film junkie: silent horror to streaming chaos. Grew up on VHS gore and HBO late nights.","handle": "backyardmoviecritic","reaction_count": 40,"reactions": {"\u2764": 40},"restacks": 2,"children_count": 6,"attachments": [{"id": "3c033bf5-9636-4857-b762-dc298a726531","type": "image","imageUrl": "https://substack-post-media.s3.amazonaws.com/public/images/8057c6bf-2643-42a2-a5bd-800cc068e831_700x1050.png","imageWidth": 700,"imageHeight": 1050,"explicit": false}],"user_primary_publication": {"id": 7618357,"subdomain": "backyardmoviecritic","name": "Backyard Movie Critic","logo_url": "https://substack-post-media.s3.amazonaws.com/public/images/1edaf33e-19c0-4c26-ab04-0df6f0e34e3d_452x452.png","payments_state": "disabled"},"language": "en"},"publication": {"id": 7618357,"subdomain": "backyardmoviecritic","name": "Backyard Movie Critic","logo_url": "https://substack-post-media.s3.amazonaws.com/public/images/1edaf33e-19c0-4c26-ab04-0df6f0e34e3d_452x452.png","payments_state": "disabled"},"page_rank": 1}
Example: reply (type = "reply")
{"id": "c-205156724","type": "reply","title": "This is such a thoughtful comment, thank you.","published_at": "2026-01-26T01:26:46.936000+00:00","url": "https://substack.com/@vanessathe/note/c-205156724","author_name": "Vanessa","author_handle": "vanessathe","main_post_id": "173236288","main_post_title": "eight films that made me fall madly in love with films","main_post_url": "https://vanessateh.substack.com/p/eight-films-that-made-me-fall-madly","main_post_published_at": "2026-01-16T05:06:39.097000+00:00","body_text": "This is such a thoughtful comment, thank you.","context": {"timestamp": "2026-01-26T01:26:46.936Z","type": "reply"},"comment": {"id": 205156724,"type": "reply","date": "2026-01-26T01:26:46.936Z","body": "This is such a thoughtful comment, thank you.","name": "Vanessa","handle": "vanessathe","children_count": 0},"comment_count": 0,"page_rank": 1}
Field Reference
Record type: post
- id (string, required): unique post identifier.
- type (string, required): record type, always
post. - url (string, required): public post URL.
- title (string, required): post title.
- description (string, optional): summary or teaser text.
- subtitle (string, optional): secondary heading or subtitle.
- published_at (string, required): publication timestamp in ISO 8601 format.
- author_name / author_handle (string, optional): primary author display name and handle.
- publication_name / publication_url (string, optional): source publication name and homepage URL.
- main_post_id / main_post_title / main_post_url / main_post_published_at (string, optional): parent post references when applicable.
- image_url (string, optional): primary image or media preview URL.
- truncated_body_text (string, optional): shortened plain-text excerpt from the post detail payload.
- body_text (string, optional): plain-text body content.
- body_html / body_json (string or object, optional): HTML and structured rich-text body content.
- word_count (integer, optional): word count for the post body.
- child_comment_count (integer, optional): nested reply count returned by post detail endpoints.
- reaction_count / comment_count / restack_count (integer, optional): visible engagement counters.
- post_tags / published_bylines / audio_items (array, optional): structured post tags, bylines, and audio metadata when returned by enriched post detail.
- comments (array, optional): trimmed top-level comment preview objects included on enriched post records.
- post / publication / publication_settings (object, optional): cleaned public detail payloads preserved from the enriched response.
- page_rank (integer, optional): rank position within the collected result page.
- attributes (object, optional): generic normalized attributes bucket used for non-Substack-like fields such as price, brand, SKU, currency, or location when available.
Record type: comment
- id (string, required): unique comment or note identifier.
- type (string, required): record type, always
comment. - url (string, required): public URL of the note or comment.
- title (string, optional): short title derived from the content.
- description (string, optional): summary text or repeat of the body.
- published_at (string, required): publication timestamp in ISO 8601 format.
- author_name / author_handle (string, optional): primary author display name and handle.
- publication_name / publication_url (string, optional): related publication name and URL.
- main_post_id / main_post_title / main_post_url / main_post_published_at (string, optional): parent post references when the comment belongs to a post thread.
- subtitle (string, optional): subtitle when the record is comment-like but backed by a richer content item.
- image_url (string, optional): preview image URL when present.
- body_text (string, optional): plain-text comment or note body.
- body_html / body_json (string or object, optional): HTML or structured rich-text body content when the source exposes it.
- attachments (array, optional): media attachments associated with the comment or note.
- word_count (integer, optional): word count when available.
- reaction_count / comment_count / restack_count (integer, optional): visible engagement counters.
- context (object, optional): cleaned note or thread context metadata such as source type and timestamp.
- comment (object, optional): cleaned enriched comment payload, including public author-facing fields, attachments, and visible counters.
- parent_comments (array, optional): trimmed parent-thread metadata when the detail endpoint includes thread ancestry.
- can_reply / is_muted (boolean, optional): public interaction state returned by comment detail endpoints.
- publication / author / post (object, optional): cleaned public nested entities when exposed by the source detail endpoint.
- search_score (number, optional): ranking score returned with search results.
- page_rank (integer, optional): rank position within the collected result page.
- attributes (object, optional): generic normalized attributes bucket used for non-Substack-like fields when available.
Record type: reply
- id (string, required): unique reply identifier.
- type (string, required): record type, always
reply. - url (string, required): public URL of the reply.
- author_name / author_handle (string, optional): primary author display name and handle.
- main_post_id / main_post_title / main_post_url / main_post_published_at (string, optional): parent post references for the thread.
- body_text / body_html / body_json (string or object, optional): reply content in plain text, HTML, or structured rich-text form.
- attachments (array, optional): media attachments associated with the reply.
- context (object, optional): cleaned thread context metadata.
- comment (object, optional): cleaned enriched reply payload.
- publication / author / post (object, optional): cleaned nested entities when available from the detail endpoint.
- reaction_count / comment_count / restack_count (integer, optional): visible engagement counters when present.
- page_rank (integer, optional): rank position within the collected result page.
Data Quality, Guarantees, And Handling
- Structured records: results are normalized into predictable JSON objects for downstream use.
- Best-effort extraction: fields may vary by region, session, public availability, or source-side interface changes.
- Optional fields: null-check in downstream code.
- Deduplication: recommend
type + ":" + id. - Freshness: results reflect the publicly available data at run time.
- Repeated runs: use the recommended idempotency key when syncing data into warehouses, CRMs, or search indexes.
Tips For Best Results
- Start with a small
limitto validate the output shape before scaling up. - Use
startUrlsfor known high-value targets andqueriesfor broader discovery. - Choose
result_typecarefully because it affects query-based searches, not directstartUrls. - Apply
publication_date,within_days, orlanguageonly when you need tighter targeting. - Turn on
enrich_datawhen downstream systems need fuller records rather than lightweight search entries. - Enable
get_repliesonly for runs where note or comment thread depth matters. - Use
type + ":" + idas your stable deduplication key across repeated runs.
How to Run on Apify
- Open the actor in Apify Console.
- Configure the available input fields for the target scope.
- Set the maximum number of outputs to collect with
limitif you want a capped run. - Click Start and wait for the run to finish.
- Review the dataset and download results in JSON, CSV, Excel, or other supported formats.
Scheduling & Automation
Scheduling
Automated Data Collection
You can schedule recurring runs to keep your dataset current without manual execution. This is useful for monitoring keywords, publications, and recent public conversation over time.
- Navigate to Schedules in Apify Console
- Create a new schedule (daily, weekly, or custom cron)
- Configure input parameters
- Enable notifications for run completion
- Add webhooks for automated processing
Integration Options
- Data warehouses: load normalized post and comment records into analytical stores for trend analysis, historical reporting, and topic-level monitoring.
- BI dashboards: track publication activity, content volume, engagement signals, and keyword coverage over time.
- Webhooks: trigger ingestion, validation, alerting, or transformation workflows after each completed run.
- API-driven applications: consume dataset records directly in internal tools, search services, or customer-facing applications.
- Google Sheets or Excel-based review: export smaller runs for editorial review, QA, or lightweight operational workflows.
- Enrichment pipelines: append fresh public publication, author, and content attributes to existing CRM or analytics records.
Export Formats And Downstream Use
Apify datasets can be exported or consumed in formats that fit both manual review and automated data delivery workflows.
- JSON: for APIs, applications, and data pipelines
- CSV or Excel: for spreadsheet workflows and manual review
- API access: for automated ingestion into internal systems
- BI and warehouses: for reporting, dashboards, and historical analysis
Performance
Estimated run times:
- Small runs (< 1,000 outputs): ~3-5 minutes
- Medium runs (1,000-5,000 outputs): ~5-15 minutes
- Large runs (5,000+ outputs): ~15-30 minutes
Execution time varies based on filters, result volume, and how much information is returned per record. Highly filtered runs can finish faster, while broad discovery or detail-rich records may take longer.
Limitations
- Availability depends on what https://substack.com publicly exposes at run time.
- Some optional fields may be absent on sparse records or record types with limited visible metadata.
- Very broad searches may take longer or require a higher
limitto capture the desired coverage. - Public field availability and naming can change when the target platform changes how content is presented.
- Regional, account-level, language, or visibility differences can affect which records and attributes are available.
- Reply-thread depth depends on the selected inputs and whether
get_repliesis enabled.
Troubleshooting
- No results returned: check your filters, keyword spelling, direct URLs, and whether the target has matching public records.
- Fewer results than expected: broaden filters, raise
limit, or confirm that enough matching public records exist. - Some fields are empty: optional fields depend on what each record publicly provides.
- Run takes longer than expected: reduce scope, lower
limitfor validation, or split broad collection into smaller runs. - Output changed: compare the current dataset with the field reference and include a small sample if you need support.
FAQ
What data does this actor collect?
It collects public Substack records such as posts, publications, people, notes, comments, and optional reply-thread content, depending on your selected inputs and filters.
Can I use direct URLs and keyword queries in the same run?
Yes. You can provide startUrls, queries, or both if you want direct collection and discovery-oriented search in one workflow.
Can I filter by date or language?
Yes. The actor supports publication_date, within_days, and language where those filters apply to the selected search flow.
Why did I receive fewer results than my limit?
limit sets an upper bound, not a guarantee. The final count depends on how many matching public records are available for your inputs and filters.
Can I schedule recurring runs?
Yes. Apify schedules can run the actor automatically on a daily, weekly, or custom cadence.
How do I avoid duplicates across runs?
Use type + ":" + id as the idempotency key when storing or syncing records downstream.
Can I export the data to CSV, Excel, or JSON?
Yes. Apify datasets support JSON export for pipelines and CSV or Excel export for spreadsheet-based review.
Does this actor collect private data?
No. It is intended for publicly available information from https://substack.com.
What should I include when reporting an issue?
Include the input you used with sensitive values redacted, the run ID, a short description of expected versus actual behavior, and an optional small output sample that shows the issue.
Compliance & Ethics
Responsible Data Collection
This actor collects publicly available newsletter, publication, author, post, note, and comment information from https://substack.com for legitimate business purposes, including:
- Media and publishing research and market analysis
- Content monitoring and reporting
- Dataset enrichment and operational analytics
This section is informational and not legal advice. Users are responsible for ensuring their collection and use of data complies with applicable laws, regulations, and platform terms.
Best Practices
- Use collected data in accordance with applicable laws, regulations, and the target site's terms
- Respect individual privacy and personal information
- Use data responsibly and avoid disruptive or excessive collection
- Do not use this actor for spamming, harassment, or other harmful purposes
- Follow relevant data protection requirements where applicable (for example GDPR or CCPA)
Support
For help, use the actor page or the repository Issues section. When reporting a problem, include the input used with sensitive values redacted, the run ID, expected versus actual behavior, and, if helpful, a small sample of the output that demonstrates the issue.