# Smart Article Extractor (`parseforge/article-extractor`) Actor

Extract clean article content from any news, blog, or publisher site! Pull full body text, author, publish date, word count, language, reading time, images, and metadata at scale. Ideal for content research, media monitoring, SEO audits, and AI training. Start extracting articles in minutes!

- **URL**: https://apify.com/parseforge/article-extractor.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** News, AI, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $40.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 📰 Smart Article Extractor

> 🚀 **Parse any news article or blog post into clean structured text in seconds.** Get **23 metadata fields** per article including authors, tags, publish date, lead image, paywall flag, and reading time. No API key, no registration, no manual parser maintenance.

> 🕒 **Last updated:** 2026-04-21 · **📊 23 fields** per article · **🌐 Works on any site** · **⚡ 10 articles in ~10 seconds** · **💰 Paywall detection**

The **Smart Article Extractor** takes any article URL and returns the main body as clean Markdown alongside 22 metadata fields. It scores DOM nodes by paragraph count, word count, and link density to identify the main content block, then strips navigation, sidebars, and ads. Author, tags, section, publishedAt, modifiedAt, and canonical URL are pulled from meta tags, JSON-LD, and itemprop attributes.

Extras include a paywall-detection heuristic, inline image collection, lead image (Open Graph), language detection, word count, and reading time. Concurrent fetching keeps 10 articles flying in parallel, so a list of 100 news URLs finishes in about 15 seconds. Works out of the box on most major news sites, blogs, and publishing platforms.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| News aggregators, media monitoring teams, AI app developers, content researchers, data journalists, archivists | News datasets, summarization pipelines, media monitoring, sentiment analysis, archive assembly |

---

### 📋 What the Smart Article Extractor does

Five extraction workflows in a single run:

- 📝 **Main body extraction.** DOM scoring isolates the article content and strips navigation, ads, and sidebars.
- 👥 **Author detection.** Pulls authors from meta tags, JSON-LD, and itemprop attributes.
- 📅 **Date stamps.** Captures both `article:published_time` and `article:modified_time`.
- 🏷️ **Tags and section.** Extracts `article:tag` and `article:section` metadata.
- 💰 **Paywall flag.** Heuristic detects common paywall markers so you can filter downstream.

Every record also includes the canonical URL, lead image, inline images, word count, reading time, language, site name, HTTP status, and timestamp.

> 💡 **Why it matters:** news sites each have their own HTML structure. Writing per-site parsers is brittle and breaks every time a publisher redesigns their pages. This Actor uses readability-style scoring that works across any article-shaped page.

---

### 🎬 Full Demo

_🚧 Coming soon: a 3-minute walkthrough showing extraction across news sites, blogs, and platforms._

---

### ⚙️ Input

<table>
<thead>
<tr><th>Input</th><th>Type</th><th>Default</th><th>Behavior</th></tr>
</thead>
<tbody>
<tr><td><code>startUrls</code></td><td>array of URLs</td><td>required</td><td>One or more article URLs to extract.</td></tr>
<tr><td><code>maxItems</code></td><td>integer</td><td><code>10</code></td><td>Articles returned. Free plan caps at 10, paid plan at 1,000,000.</td></tr>
</tbody>
</table>

**Example: extract a single article.**

```json
{
    "startUrls": [
        { "url": "https://techcrunch.com/2025/01/10/openai-launches-gpt-store/" }
    ],
    "maxItems": 1
}
````

**Example: batch extraction for media monitoring.**

```json
{
    "startUrls": [
        { "url": "https://www.theverge.com/2025/ai-coverage-1" },
        { "url": "https://www.wired.com/story/ai-agents-2026" },
        { "url": "https://arstechnica.com/ai/article" }
    ],
    "maxItems": 100
}
```

> ⚠️ **Good to Know:** works best on article-shaped pages (one headline, one author, one body). Homepages, category pages, and list views return thin extractions because there is no single article to score.

***

### 📊 Output

Each record contains **23 fields**. Download the dataset as CSV, Excel, JSON, or XML.

#### 🧾 Schema

| Field | Type | Example |
|---|---|---|
| 🔗 `url` | string | `"https://techcrunch.com/.../gpt-store/"` |
| 🔁 `canonicalUrl` | string | null | `"https://techcrunch.com/.../gpt-store/"` |
| 🏷️ `title` | string | null | `"OpenAI launches GPT Store"` |
| 📑 `subtitle` | string | null | `"Available to Plus, Team, Enterprise"` |
| 🧑 `author` | string | null | `"Kyle Wiggers"` |
| 👥 `authors` | string\[] | `["Kyle Wiggers"]` |
| 📅 `publishedAt` | ISO 8601 | null | `"2025-01-10T14:00:00Z"` |
| 🔁 `modifiedAt` | ISO 8601 | null | `"2025-01-10T16:30:00Z"` |
| 🏢 `siteName` | string | null | `"TechCrunch"` |
| 🗂️ `section` | string | null | `"AI"` |
| 🏷️ `tags` | string\[] | `["openai", "gpt-store"]` |
| 🌍 `language` | string | null | `"en-US"` |
| 📝 `description` | string | null | `"OpenAI rolled out the long-teased GPT Store..."` |
| 🖼️ `leadImage` | string | null | `"https://.../og.jpg"` |
| 🎨 `images` | string\[] | `["https://...", "https://..."]` |
| 📃 `markdown` | string | `"## OpenAI launches GPT Store..."` |
| 💬 `text` | string | plain text without markdown markers |
| 🧾 `html` | string | cleaned article HTML |
| 🔢 `wordCount` | number | `742` |
| ⏱️ `readingTimeMinutes` | number | `4` |
| 💰 `hasPaywall` | boolean | `false` |
| 🟢 `httpStatus` | number | `200` |
| 🕒 `scrapedAt` | ISO 8601 | `"2026-04-21T12:00:00.000Z"` |
| ❗ `error` | string | null | `"Timeout"` on failure |

#### 📦 Sample records

<details>
<summary><strong>📰 Typical news article with full metadata</strong></summary>

```json
{
    "url": "https://techcrunch.com/2025/01/10/openai-launches-gpt-store/",
    "canonicalUrl": "https://techcrunch.com/2025/01/10/openai-launches-gpt-store/",
    "title": "OpenAI launches GPT Store for custom chatbots",
    "subtitle": "Available to ChatGPT Plus, Team and Enterprise users",
    "author": "Kyle Wiggers",
    "authors": ["Kyle Wiggers"],
    "publishedAt": "2025-01-10T14:00:00Z",
    "modifiedAt": "2025-01-10T16:30:00Z",
    "siteName": "TechCrunch",
    "section": "AI",
    "tags": ["openai", "gpt-store", "chatbots"],
    "language": "en-US",
    "description": "OpenAI rolled out the long-teased GPT Store today...",
    "leadImage": "https://techcrunch.com/wp/gpt-store-og.jpg",
    "images": ["https://.../1.jpg", "https://.../2.jpg"],
    "markdown": "## OpenAI launches GPT Store\n\nOpenAI rolled out...",
    "wordCount": 742,
    "readingTimeMinutes": 4,
    "hasPaywall": false,
    "httpStatus": 200,
    "scrapedAt": "2026-04-21T12:00:00.000Z"
}
```

</details>

<details>
<summary><strong>💰 Paywalled article detected</strong></summary>

```json
{
    "url": "https://www.nytimes.com/2025/01/10/opinion/ai-regulation.html",
    "canonicalUrl": "https://www.nytimes.com/2025/01/10/opinion/ai-regulation.html",
    "title": "A New Era of AI Regulation",
    "subtitle": "The next two years will reshape the rules",
    "author": "Editorial Board",
    "authors": ["Editorial Board"],
    "publishedAt": "2025-01-10T10:00:00Z",
    "modifiedAt": null,
    "siteName": "The New York Times",
    "section": "Opinion",
    "tags": ["ai", "regulation"],
    "language": "en",
    "description": "A preview paragraph before the paywall...",
    "leadImage": "https://static01.nyt.com/...opinion-og.jpg",
    "images": [],
    "markdown": "## A New Era of AI Regulation\n\nA preview paragraph...",
    "wordCount": 120,
    "readingTimeMinutes": 1,
    "hasPaywall": true,
    "httpStatus": 200,
    "scrapedAt": "2026-04-21T12:00:00.000Z"
}
```

</details>

<details>
<summary><strong>🚧 Minimal blog post with sparse metadata</strong></summary>

```json
{
    "url": "https://example-blog.com/hello",
    "canonicalUrl": "https://example-blog.com/hello",
    "title": "Hello world",
    "subtitle": null,
    "author": null,
    "authors": [],
    "publishedAt": null,
    "modifiedAt": null,
    "siteName": null,
    "section": null,
    "tags": [],
    "language": "en",
    "description": null,
    "leadImage": null,
    "images": [],
    "markdown": "## Hello world\n\nThis is a short post.",
    "wordCount": 6,
    "readingTimeMinutes": 1,
    "hasPaywall": false,
    "httpStatus": 200,
    "scrapedAt": "2026-04-21T12:00:00.000Z"
}
```

</details>

***

### ✨ Why choose this Actor

| | Capability |
|---|---|
| 🧠 | **DOM scoring.** Readability-style extraction works across any article-shaped page without per-site rules. |
| 📊 | **23 fields.** Authors, tags, section, dates, images, paywall, reading time, and canonical URL. |
| 💰 | **Paywall detection.** Flags articles likely behind a paywall so you can filter them out. |
| ⚡ | **Fast.** 10 articles in under 10 seconds with parallel fetching. |
| 🖼️ | **Image capture.** Lead image plus every inline image URL in the article body. |
| 🚫 | **No credentials.** Runs on any public article URL. |
| 🔌 | **Integrations.** Plugs into RSS feeds, newsroom tools, and news datasets. |

> 📊 Clean article text is the foundation of news summarization, sentiment analysis, and media monitoring. This Actor delivers it consistently without per-site parsers.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| **⭐ Smart Article Extractor** *(this Actor)* | $5 free credit, then pay-per-use | Any public article URL | **Live per run** | 23 metadata fields | ⚡ 2 min |
| Open-source readability libs | Free | Whatever you host | Your code | Whatever you build | 🐢 Days |
| News API services | $99+/month | Curated feeds | Real-time | Per-plan limits | ⏳ Hours |
| Paid media monitoring | $$$+/month | Managed sources | Real-time | Rich UI | 🕒 Variable |

Pick this Actor when you want article text from arbitrary URLs without maintaining your own extraction library.

***

### 🚀 How to use

1. 📝 **Sign up.** [Create a free account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp) (takes 2 minutes).
2. 🌐 **Open the Actor.** Go to the Smart Article Extractor page on the Apify Store.
3. 🎯 **Paste URLs.** Add article URLs to the `startUrls` field and set `maxItems`.
4. 🚀 **Run it.** Click **Start** and let the Actor extract the content.
5. 📥 **Download.** Grab your results in the **Dataset** tab as CSV, Excel, JSON, or XML.

> ⏱️ Total time from signup to downloaded dataset: **3-5 minutes.** No coding required.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%" valign="top">

#### 📰 News Aggregation

- Build custom news feeds across sources
- Deduplicate stories across outlets
- Normalize article structure for downstream apps
- Feed summarization pipelines

</td>
<td width="50%" valign="top">

#### 🧠 AI & Summarization

- Extract clean text for LLM summaries
- Build news datasets for fine-tuning
- Ground chatbots with current media
- Power question-answering over news

</td>
</tr>
<tr>
<td width="50%" valign="top">

#### 📡 Media Monitoring

- Track brand mentions across outlets
- Monitor coverage of products or events
- Capture executive quotes and bylines
- Detect paywalled coverage to license

</td>
<td width="50%" valign="top">

#### 📚 Research & Archives

- Build academic text corpora
- Archive public journalism
- Extract metadata for bibliographies
- Preserve retracted or deleted articles

</td>
</tr>
</table>

***

### 🔌 Automating Smart Article Extractor

Control the scraper programmatically for scheduled runs and pipeline integrations:

- 🟢 **Node.js.** Install the `apify-client` NPM package.
- 🐍 **Python.** Use the `apify-client` PyPI package.
- 📚 See the [Apify API documentation](https://docs.apify.com/api/v2) for full details.

The [Apify Schedules feature](https://docs.apify.com/platform/schedules) lets you trigger this Actor on any cron interval. Pair it with an RSS reader or Google News feed for continuous media monitoring.

***

### ❓ Frequently Asked Questions

<details>
<summary><strong>🧩 How does it work?</strong></summary>

Pass a list of article URLs. Each page is fetched and scored to identify the main content. Meta tags, JSON-LD, and itemprop attributes supply author, date, tags, and image. Output is clean Markdown plus structured metadata.

</details>

<details>
<summary><strong>📏 How accurate is the extraction?</strong></summary>

Very accurate on article-shaped pages from major news sites and blogs. Pages with no clear article body (homepages, list views) return sparse records.

</details>

<details>
<summary><strong>💰 How does paywall detection work?</strong></summary>

A heuristic checks for common paywall markers in the HTML (class names, meta tags, JSON-LD flags). It's a best-effort flag, not a guarantee.

</details>

<details>
<summary><strong>🔁 Does it follow redirects?</strong></summary>

Yes. The `canonicalUrl` field shows the canonical URL after redirects and `<link rel="canonical">` resolution.

</details>

<details>
<summary><strong>⏰ Can I schedule regular runs?</strong></summary>

Yes. Use Apify Schedules to feed the Actor a rolling list of URLs on any cron interval.

</details>

<details>
<summary><strong>⚖️ Is it legal to use extracted article text?</strong></summary>

Fetching public articles is generally fine. Commercial redistribution of full article text usually requires a license from the publisher. Consult legal counsel for your specific use case.

</details>

<details>
<summary><strong>💼 Can I use this commercially?</strong></summary>

Internal analysis, monitoring, and research are generally fine. Republishing full article text typically requires publisher permission.

</details>

<details>
<summary><strong>💳 Do I need a paid Apify plan to use this Actor?</strong></summary>

No. Free plan covers testing (10 articles per run). A paid plan lifts the limit and speeds up concurrency.

</details>

<details>
<summary><strong>🔁 What happens if a run fails?</strong></summary>

Apify retries transient errors. Failed URLs include an `error` field. Partial datasets from failed runs are preserved.

</details>

<details>
<summary><strong>🖼️ Does it capture images?</strong></summary>

Yes. `leadImage` from Open Graph, plus every `<img>` URL in the extracted article body.

</details>

<details>
<summary><strong>🧾 Can I get the cleaned HTML instead of Markdown?</strong></summary>

Yes. The `html` field contains the cleaned article HTML after navigation and ads have been stripped.

</details>

<details>
<summary><strong>🆘 What if I need help?</strong></summary>

Our team is available through the Apify platform and the Tally form below.

</details>

***

### 🔌 Integrate with any app

Smart Article Extractor connects to any cloud service via [Apify integrations](https://apify.com/integrations):

- [**Make**](https://docs.apify.com/platform/integrations/make) - Automate multi-step workflows
- [**Zapier**](https://docs.apify.com/platform/integrations/zapier) - Connect with 5,000+ apps
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - Post article summaries to channels
- [**Airbyte**](https://docs.apify.com/platform/integrations/airbyte) - Pipe articles into your warehouse
- [**GitHub**](https://docs.apify.com/platform/integrations/github) - Trigger runs from commits
- [**Google Drive**](https://docs.apify.com/platform/integrations/drive) - Export articles to Docs

You can also use webhooks to trigger summarization and alerting pipelines when new articles finish extracting.

***

### 🔗 Recommended Actors

- [**🤖 RAG Web Browser**](https://apify.com/parseforge/rag-web-browser) - Search or fetch URLs with LLM-ready output
- [**🕸️ Website Content Crawler**](https://apify.com/parseforge/website-content-crawler) - Deep-crawl a domain with depth control
- [**🔍 Google Search Scraper**](https://apify.com/parseforge/google-search-scraper) - SERP results with rank and description
- [**📈 Google Trends Scraper**](https://apify.com/parseforge/google-trends-scraper) - Interest over time and related queries
- [**📧 Contact Info Scraper**](https://apify.com/parseforge/contact-info-scraper) - Emails, phones, and socials from URLs

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more content-extraction tools.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new scraper, propose a custom data project, or report an issue.

***

> **⚠️ Disclaimer:** this Actor is an independent tool and is not affiliated with any publisher, news outlet, or readability library. Only publicly accessible article URLs are processed. Respect the copyright and terms of service of every publisher you extract from.

# Actor input Schema

## `startUrls` (type: `array`):

URLs to process

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.bbc.com/news/articles/c86w8elez74o"
    }
  ],
  "maxItems": 10
}
```

# Actor output Schema

## `results` (type: `string`):

Complete dataset

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.bbc.com/news/articles/c86w8elez74o"
        }
    ],
    "maxItems": 10
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/article-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://www.bbc.com/news/articles/c86w8elez74o" }],
    "maxItems": 10,
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/article-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.bbc.com/news/articles/c86w8elez74o"
    }
  ],
  "maxItems": 10
}' |
apify call parseforge/article-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/article-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Smart Article Extractor",
        "description": "Extract clean article content from any news, blog, or publisher site! Pull full body text, author, publish date, word count, language, reading time, images, and metadata at scale. Ideal for content research, media monitoring, SEO audits, and AI training. Start extracting articles in minutes!",
        "version": "1.0",
        "x-build-id": "i7E26HRvIqNpHcvr1"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~article-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-article-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~article-extractor/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-article-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~article-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-article-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Article URLs",
                        "type": "array",
                        "description": "URLs to process",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
