# HTML Metadata Extractor (OG, Twitter Cards, Schema.org) (`gochujang/html-metadata-extractor`) Actor

Extract structured metadata from any URL: title, description, OpenGraph (og:title/image/type/url/site\_name), Twitter Cards, canonical, favicon, JSON-LD schema.org, language, h1 count, images, links. Used for link previews, SEO audits, content cataloging. $0.005/URL.

- **URL**: https://apify.com/gochujang/html-metadata-extractor.md
- **Developed by:** [Hojun Lee](https://apify.com/gochujang) (community)
- **Categories:** Developer tools, Automation, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## HTML Metadata Extractor

> Extract structured metadata from any URL: **title, description, OpenGraph (og:*), Twitter Cards, canonical, favicon, JSON-LD schema.org, language, h1 count, images, links.** Used for link previews, SEO audits, content cataloging. **$0.005 per URL.**

---

### Why this exists

Whenever you build a link-preview feature (Slack-style unfurl, Discord embed, content sharing), you need to call the same dozen meta-tag lookups. This actor does it in one call, returning everything you'd ever want from `<head>`.

---

### What you get per row

#### Title / description / language
| Field | Example |
|---|---|
| `title` | `Apify: Full-stack web scraping...` |
| `meta_description` | `Cloud platform for web scraping...` |
| `meta_keywords` | `web scraping, automation, ...` |
| `html_lang` | `en` |
| `domain` | `www.apify.com` |
| `canonical_url` | `https://www.apify.com/` |

#### OpenGraph
| Field | Example |
|---|---|
| `og_title` | `Apify` |
| `og_description` | `...` |
| `og_image` | `https://...og.png` |
| `og_type` | `website` |
| `og_url` | `https://...` |
| `og_site_name` | `Apify` |
| `og_locale` | `en_US` |

#### Twitter Card
| Field | Example |
|---|---|
| `twitter_card` | `summary_large_image` |
| `twitter_title` | `...` |
| `twitter_description` | `...` |
| `twitter_image` | `https://...` |
| `twitter_site` | `@apify` |
| `twitter_creator` | `@apify` |

#### Article-specific
| Field | Example |
|---|---|
| `article_published_time` | `2026-06-09T14:00:00Z` |
| `article_modified_time` | `2026-06-10T10:00:00Z` |
| `article_author` | `Jane Doe` |
| `article_section` | `Engineering` |
| `article_tags` | `["python","scraping"]` |

#### JSON-LD schema.org
Full array of structured-data objects found on the page (Article, Product, FAQPage, etc).

#### Structural stats
| Field | Example |
|---|---|
| `h1_count` | `1` |
| `first_h1` | `Welcome to Apify` |
| `image_count` | `34` |
| `link_count` | `127` |

---

### Quick start

#### Single URL
```json
{
  "url": "https://www.apify.com/"
}
````

#### Batch

```json
{
  "urls": [
    "https://www.apify.com/",
    "https://docs.apify.com/",
    "https://www.techcrunch.com/"
  ]
}
```

***

### Pricing

**Pay-Per-Event**: `$0.005 per URL.`

| Run | URLs | Cost |
|---|---|---|
| Single | 1 | $0.005 |
| Batch of 100 | 100 | $0.50 |
| Daily 1K SEO audit | 1000 | $5.00 |

***

### Use cases

1. **Slack/Discord link unfurl** — Power your own bot's URL preview
2. **SEO competitor audit** — Pull OG / Twitter / canonical from 100 sites at once
3. **Content catalog** — Build a database of article metadata from your RSS sources
4. **Schema.org validation** — Verify structured data is present on key pages
5. **Newsletter aggregation** — Get clean previews for each linked article

***

### Related actors (same author)

- [Web Page → Markdown Converter](https://apify.com/gochujang/web-to-markdown) — Get the article body too
- [Sitemap URL Discovery](https://apify.com/gochujang/sitemap-url-discovery) — Find all URLs to extract metadata from
- [PDF Text Extractor](https://apify.com/gochujang/pdf-text-extractor)
- [JSON Schema Generator](https://apify.com/gochujang/json-schema-generator)

***

### Feedback

A short review helps SEO / content engineers find it: [Leave a review on Apify Store](https://apify.com/gochujang/html-metadata-extractor#reviews)

# Actor input Schema

## `urls` (type: `array`):

List of URLs to extract metadata from. Each is billed separately.

## `url` (type: `string`):

Used when 'urls' is empty.

## `userAgent` (type: `string`):

Custom UA. Default looks like a desktop browser.

## Actor input object example

```json
{
  "urls": [
    "https://www.apify.com/"
  ],
  "url": "",
  "userAgent": ""
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

## `summary` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://www.apify.com/"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("gochujang/html-metadata-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "urls": ["https://www.apify.com/"] }

# Run the Actor and wait for it to finish
run = client.actor("gochujang/html-metadata-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://www.apify.com/"
  ]
}' |
apify call gochujang/html-metadata-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=gochujang/html-metadata-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "HTML Metadata Extractor (OG, Twitter Cards, Schema.org)",
        "description": "Extract structured metadata from any URL: title, description, OpenGraph (og:title/image/type/url/site_name), Twitter Cards, canonical, favicon, JSON-LD schema.org, language, h1 count, images, links. Used for link previews, SEO audits, content cataloging. $0.005/URL.",
        "version": "0.1",
        "x-build-id": "0X8WcStsnroMLq5c8"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/gochujang~html-metadata-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-gochujang-html-metadata-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/gochujang~html-metadata-extractor/runs": {
            "post": {
                "operationId": "runs-sync-gochujang-html-metadata-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/gochujang~html-metadata-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-gochujang-html-metadata-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "urls": {
                        "title": "URLs",
                        "type": "array",
                        "description": "List of URLs to extract metadata from. Each is billed separately.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "url": {
                        "title": "Single URL (shortcut)",
                        "type": "string",
                        "description": "Used when 'urls' is empty.",
                        "default": ""
                    },
                    "userAgent": {
                        "title": "User-Agent override",
                        "type": "string",
                        "description": "Custom UA. Default looks like a desktop browser.",
                        "default": ""
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
