🧪 Blog & Content Extractor (Testing Mode – Free!) avatar
🧪 Blog & Content Extractor (Testing Mode – Free!)

Pricing

Pay per usage

Go to Store
🧪 Blog & Content Extractor (Testing Mode – Free!)

🧪 Blog & Content Extractor (Testing Mode – Free!)

caring_dizi/blog-content-scraper-fixed

Developed by

Jeff Halverson

Maintained by Community

Please enjoy and give feedback to let me know what I need to fix. Greatly appreciated!! Will leave free for a week then switch once bugs and perfection is almost garnered. I'll still beat everyone on the platform with a good product by nearly half.

0.0 (0)

Pricing

Pay per usage

0

Monthly users

1

Last modified

4 hours ago

📄 Simple Web Content Scraper

This actor crawls blogs, news sites, and websites and extracts valuable content including:

  • ✅ Page Title
  • 🧠 Meta Description, Keywords, Publish Date
  • ✍️ All readable content (<p>, <h1>–<h3>)
  • 🖼️ Image URLs
  • 📤 Exports in CSV or JSON format (via Apify)

💡 Use Cases

  • Fine-tune ChatGPT or GPT-4 with your own web data
  • Build a blog summarizer, translator, or SEO tool
  • Extract content from competitors or client sites
  • Collect articles and visuals for newsletters

📥 Input

The actor accepts the following input:

1{
2  "startUrls": [{ "url": "https://example.com" }],
3  "sitemapUrls": ["https://example.com/sitemap.xml"],
4  "maxRequestsPerCrawl": 50,
5  "maxConcurrency": 5
6} | Controls speed/load (default: 5) |
7
8---
9
10## 📤 Output
11
12The output is structured JSON saved to the dataset. Example structure:
13
14```json
15{
16  "url": "https://example.com",
17  "title": "Page Title",
18  "meta": {
19    "description": "Meta description...",
20    "keywords": "some,keywords",
21    "publishDate": "2024-04-01"
22  },
23  "content": "The main article content...",
24  "images": ["https://example.com/image1.jpg", "https://example.com/image2.jpg"]
25}

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.