Content Intelligence Extractor avatar

Content Intelligence Extractor

Pricing

from $5.00 / 1,000 page converteds

Go to Apify Store
Content Intelligence Extractor

Content Intelligence Extractor

Extract clean Markdown from Reddit threads and news sites. Built for LLM pipelines, n8n workflows, and AI content analysis. Uses Mozilla Readability + Reddit JSON API for noise-free output.

Pricing

from $5.00 / 1,000 page converteds

Rating

0.0

(0)

Developer

Andrew Luxem

Andrew Luxem

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

Converts Reddit threads and film/entertainment news articles into clean, structured Markdown optimized for LLM pipelines, AI content analysis, and n8n automation workflows.

What it does

Give it a list of URLs — Reddit threads or articles from sites like Screen Rant, CBR, IGN, or any news site — and it returns clean Markdown with engagement signals, metadata, and source-specific fields ready to pipe directly into Claude, GPT, or any LLM.

Reddit threads are extracted via Reddit's JSON API (no browser needed) with post body, top comments sorted by upvotes, and engagement data.

Film/news sites are extracted using Mozilla Readability + Turndown — the same engine Firefox uses to strip ads, sidebars, author bios, and newsletter popups before converting to clean Markdown.

Use cases

  • Content gap analysis — feed competitor articles to an LLM to find unexplored angles
  • n8n content pipelines — schedule weekly runs, pipe output to Claude or GPT for article briefs
  • Reddit trend monitoring — extract high-upvote fan theories or discussions for content research
  • SEO research — extract and analyze top-ranking articles in bulk
  • RAG knowledge bases — clean Markdown is ideal for vector embeddings

Example input

{
"urls": [
"https://www.reddit.com/r/FanTheories/comments/abc123/my_theory/",
"https://screenrant.com/some-article/"
],
"maxRedditComments": 10,
"includeEngagementData": true
}

Example output

{
"url": "https://www.reddit.com/r/FanTheories/comments/abc123/",
"sourceType": "reddit",
"title": "Theory: The ending means something else entirely",
"markdown": "# Theory: The ending...\n\nFull post body...\n\n## Top Comments\n\n...",
"metadata": {
"wordCount": 847,
"estimatedReadTime": 4,
"engagementSignal": 3200
},
"redditSpecific": {
"subreddit": "FanTheories",
"upvotes": 3200,
"commentCount": 143,
"topComments": [{ "body": "...", "upvotes": 412 }]
}
}

n8n integration

Use the native Apify n8n node to trigger this actor on a schedule:

  1. Schedule Trigger — weekly or daily
  2. Apify: Run Actor — pass your URL list
  3. Apify: Get Dataset — fetch results
  4. Loop + LLM node — Claude/GPT analysis prompt
  5. Google Sheets / Notion — store content briefs

Pricing

Pay-per-page: $0.005 per URL processed. First 20 pages free.

Supported sources

  • Reddit (all subreddits via JSON API)
  • Screen Rant, CBR, IGN, Variety, Hollywood Reporter
  • Any article-based news or blog site
  • Custom CSS selectors to strip site-specific noise