# Hugging Face Scraper — (Models, Datasets, Spaces, Papers etc.) (`khadinakbar/huggingface-all-in-one-scraper`) Actor

All-in-one Hugging Face Hub scraper. Paste any URL or text query — auto-detects model, dataset, space, paper, user, org, or collection. Deep model card, lineage, evaluation results, dataset configs. MCP-ready. $0.006 per result.

- **URL**: https://apify.com/khadinakbar/huggingface-all-in-one-scraper.md
- **Developed by:** [Khadin Akbar](https://apify.com/khadinakbar) (community)
- **Categories:** AI, Automation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $6.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Hugging Face Scraper — Models, Datasets, Spaces, Papers, Users (All-in-One)

Paste any Hugging Face URL or text query — this actor auto-detects whether it is a **model**, **dataset**, **space**, **paper**, **user**, **organization**, or **collection**, and returns deep structured data per entity. The single most-queried source by AI coding agents and ML researchers, now in one MCP-ready actor.

### What you get

| Target you paste | Returns |
|---|---|
| `https://huggingface.co/meta-llama/Llama-3.1-8B` | Full model card, license, downloads/likes, eval results (`model-index`), siblings (files), base model, adapter children, quantized children, datasets cited |
| `https://huggingface.co/datasets/squad` | Dataset metadata, license, configs + splits + row counts (via `datasets-server`), task categories, language, size buckets |
| `https://huggingface.co/spaces/HuggingFaceM4/idefics_playground` | Space SDK (Gradio/Streamlit/Docker/Static), hardware tier, runtime stage, siblings, README |
| `https://huggingface.co/papers/2310.06825` | Paper title, abstract, authors with HF user links, upvotes, discussion count, ArXiv ID, publication date |
| `https://huggingface.co/karpathy` | User profile (followers, likes, PRO flag, orgs) + portfolio of models/datasets/spaces sorted by your `sortBy` |
| `https://huggingface.co/meta-llama` | Organization profile + portfolio |
| `https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-65118eb...` | Collection metadata + every item inside (model/dataset/space/paper references) |
| Free-text query like `qwen 3 instruct` | Top results from the `entityType` you pick (models / datasets / spaces / papers / all) |

**Price:** `$0.006` per result returned + `$0.00005` per actor start (per GB memory). Pay-per-event and Pay-per-usage both enabled — pick whichever fits your workload.

### Why this actor

Everyone else's Hugging Face scraper hands you 8 fields and stops. This one ships:

- **URL auto-detection** across 7 entity types. No mode toggling.
- **Deep model lineage** — base models, finetune/adapter children, quantized children (GGUF/AWQ/GPTQ).
- **Evaluation results** — parses the full `model-index` block from the model card so you can rank by benchmark scores, not just download counts.
- **Dataset configs + splits + row counts** via the official `datasets-server.huggingface.co` API — most actors skip this entirely.
- **Spaces hardware & runtime stage** — `t4-small`, `a10g-large`, `RUNNING`, `BUILDING`, `PAUSED`.
- **Collections** — every curated reading list, leaderboard, or pinned set on the Hub.
- **MCP-first design** — `responseFormat: "concise"` returns ~200 tokens per item so Claude/GPT can sample many results without blowing the context window.
- **Stable, low-cost runtime** — pure HTTP against the public HF API. No browser. No proxy churn. 99%+ success rate.

### Input

```json
{
  "targets": [
    "https://huggingface.co/meta-llama/Llama-3.1-8B",
    "https://huggingface.co/datasets/squad",
    "https://huggingface.co/papers/2310.06825",
    "https://huggingface.co/karpathy",
    "qwen 3 instruct"
  ],
  "entityType": "models",
  "resultsPerTarget": 50,
  "sortBy": "downloads",
  "filterTask": "text-generation",
  "filterLibrary": "transformers",
  "filterLanguage": "en",
  "includeReadme": true,
  "responseFormat": "detailed"
}
````

- `targets` (required) — array of URLs and/or text queries. Mix freely.
- `entityType` — when a target is a text query, search this entity type (`models` / `datasets` / `spaces` / `papers` / `all`). Ignored for URL targets.
- `resultsPerTarget` — cap per target. URL targets always return 1 record (+ child items for collections/profiles). Default 50, max 500.
- `sortBy` — `downloads`, `likes`, `modified`, `trending`. Drives search results and profile portfolios.
- `filterTask` / `filterLibrary` / `filterLanguage` — model/dataset filters (e.g. `text-generation`, `transformers`, `gguf`, `en`).
- `includeReadme` — when true, fetches the full Markdown README. Off this and bulk searches stay cheap.
- `responseFormat` — `detailed` returns every parsed field. `concise` returns ~200 tokens/item for AI agents.

#### Optional: higher rate limit

Hugging Face allows ~1,000 requests/hour without auth. Set the `HF_TOKEN` environment variable on the actor (Console → Settings → Environment variables) with a token from <https://huggingface.co/settings/tokens> to raise the cap to ~5,000/h.

### Output

Mixed dataset — each record has an `itemType` discriminator (`model`, `dataset`, `space`, `paper`, `user`, `org`, `collection`, `collection_item`, `search_result`). Pre-built dataset views in the Output tab let you slice by entity.

#### Model record (excerpt)

```json
{
  "itemType": "model",
  "id": "meta-llama/Llama-3.1-8B",
  "url": "https://huggingface.co/meta-llama/Llama-3.1-8B",
  "author": "meta-llama",
  "downloads": 12345678,
  "likes": 4321,
  "pipelineTag": "text-generation",
  "libraryName": "transformers",
  "license": "llama3.1",
  "tags": ["llama-3", "text-generation", "facebook"],
  "language": ["en"],
  "lastModified": "2026-04-22T18:51:00.000Z",
  "siblings": [{ "rfilename": "config.json", "size": 1234, "lfs": null }],
  "modelIndex": [{ "name": "...", "results": [] }],
  "baseModels": [],
  "adapterChildren": ["someone/llama-3.1-8b-lora-medical"],
  "quantizedChildren": ["bartowski/Meta-Llama-3.1-8B-GGUF"],
  "datasetsUsed": ["allenai/c4", "EleutherAI/pile"],
  "readme": "## Model Card for Llama 3.1 8B..."
}
```

#### Dataset record (excerpt)

```json
{
  "itemType": "dataset",
  "id": "squad",
  "url": "https://huggingface.co/datasets/squad",
  "downloads": 987654,
  "taskCategories": ["question-answering"],
  "language": ["en"],
  "sizeCategories": ["10K<n<100K"],
  "configs": [
    { "config": "plain_text", "splits": [{ "name": "train", "numExamples": 87599 }, { "name": "validation", "numExamples": 10570 }] }
  ]
}
```

### Use it from code

#### JavaScript / TypeScript

```js
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('khadinakbar/huggingface-all-in-one-scraper').call({
  targets: ['https://huggingface.co/meta-llama/Llama-3.1-8B', 'qwen 3 instruct'],
  entityType: 'models',
  resultsPerTarget: 25,
  responseFormat: 'concise',
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);
```

#### Python

```python
from apify_client import ApifyClient
client = ApifyClient(token="apify_api_...")
run = client.actor("khadinakbar/huggingface-all-in-one-scraper").call(
    run_input={
        "targets": ["https://huggingface.co/datasets/squad", "image classification"],
        "entityType": "all",
        "resultsPerTarget": 50,
    }
)
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
```

#### From Claude / GPT via MCP

This actor is exposed as `apify--huggingface-all-in-one-scraper` on the Apify MCP server. Any MCP-capable agent (Claude Desktop, ChatGPT custom GPTs with MCP, Cursor, Windsurf) can call it directly — set `responseFormat: "concise"` so item tokens stay small.

### FAQ

**Does it require a Hugging Face account?**
No. The actor uses public API endpoints. Add an `HF_TOKEN` env var only if you want higher rate limits.

**Can it scrape gated or private repos?**
No. The scraper only sees what the public API exposes. Gated repos return their public metadata (license, tags) but no file contents. Private repos return 401.

**How fresh is the data?**
Real-time — every record is a fresh API hit, not a cached crawl.

**Does it pull discussions / community tabs?**
Discussions are not part of v1. Papers include their HF discussion count (`commentsCount`).

**What's the difference between this and `apify/rag-web-browser`?**
`rag-web-browser` fetches arbitrary URLs and returns Markdown. This actor returns structured records with typed fields (downloads as int, lastModified as ISO 8601, evaluation results parsed into `model-index`). Use this when you need data, not prose.

**Why is the price higher than some other HF scrapers?**
You get deeper extraction (model lineage, eval results, dataset configs, full README), URL auto-detection across 7 entity types, and MCP-first design. The cheapest scrapers return ~8 fields with no lineage and no eval.

### Legal & TOS

Hugging Face's public API and content are designed for programmatic access — the Hub publishes [llms.txt](https://huggingface.co/llms.txt) and OpenAPI specs explicitly for AI consumption. This actor only hits public, unauthenticated endpoints and respects rate limits. You are responsible for complying with the licenses of any models, datasets, or papers you retrieve. Gated content is not bypassed.

### Changelog

#### 1.2 (2026-05-29)

- Fix: user and organization records now report accurate `modelsCount`, `datasetsCount`, `spacesCount` (HF API does not expose `x-total-count`; uses `limit=1000` array length).

#### 1.1 (2026-05-29)

- Reliability hardening — 620-record brutal test battery, 100% success rate, 0 schema validation failures.
- Fix: `gated: false` boolean from HF API normalized through `normalizeGated()` (was failing detailed-mode pushes for non-gated repos).
- Fix: dataset counts no longer increment on failed pushes (moved counter after `pushData` success).
- Fix: per-record try/catch — one bad portfolio entry no longer fails the whole user/org target.
- Fix: empty/whitespace targets surface as explicit warnings instead of silent drops.
- Fix: URL paths with `/tree`, `/blob`, `/commits`, `/discussions`, `/settings` suffix strip correctly.
- Fix: trailing slash + uppercase hostname normalize.
- Fix: bare `https://huggingface.co/` returns clear "not actionable" warning.
- Add: `sanitizeRecord()` strips undefined fields before push.
- Add: `dataset_schema` `gated` accepts boolean defensively.
- Add: concise mode summary field truncates at 600 chars to keep token budget.

#### 1.0 (2026-05-28)

- Initial release.
- URL auto-detection across models, datasets, spaces, papers, users, organizations, collections.
- Deep model card extraction with base/adapter/quantized lineage, evaluation results, dataset citations.
- Dataset configs + splits + row counts via `datasets-server`.
- Space SDK + hardware + runtime stage.
- `concise` / `detailed` response modes for AI-agent vs human consumers.
- Optional `HF_TOKEN` for 5K req/h rate limit.

### Related actors

- [khadinakbar/etsy-all-in-one-scraper](https://apify.com/khadinakbar/etsy-all-in-one-scraper)
- [khadinakbar/ebay-all-in-one-scraper](https://apify.com/khadinakbar/ebay-all-in-one-scraper)
- [khadinakbar/rightmove-all-in-one-scraper](https://apify.com/khadinakbar/rightmove-all-in-one-scraper)
- [khadinakbar/goodreads-all-in-one-scraper](https://apify.com/khadinakbar/goodreads-all-in-one-scraper)
- [khadinakbar/chatgpt-gpt-store-scraper](https://apify.com/khadinakbar/chatgpt-gpt-store-scraper)
- [khadinakbar/ai-search-brand-monitor](https://apify.com/khadinakbar/ai-search-brand-monitor)

# Actor input Schema

## `targets` (type: `array`):

List of Hugging Face URLs OR plain-text search queries. Each item is auto-detected: model URL (https://huggingface.co/meta-llama/Llama-3.1-8B), dataset URL (/datasets/squad), space URL (/spaces/HuggingFaceM4/idefics\_playground), paper URL (/papers/2310.06825), user URL (/karpathy), org URL (/meta-llama), collection URL (/collections/...), or a free-text query like 'qwen 3 instruct'. NOT for inference API calls — use the hugging-face-image-ai or hugging-face-audio-ai actors for that.

## `entityType` (type: `string`):

When a target is a text query (not a URL), search this entity type on Hugging Face. Ignored when the target is a URL — type is auto-detected from the path. Defaults to 'models'. Use 'all' to search models, datasets, and spaces and return the union.

## `resultsPerTarget` (type: `integer`):

Cap on records returned per target. For a single URL (model/dataset/space/paper) you get 1 deep record. For a search query, user, org, or collection target, this caps how many entity records are returned. Defaults to 50. Maximum 500 per target to keep runs predictable.

## `sortBy` (type: `string`):

Sort key for search-query targets and user/org/collection portfolio listings. 'downloads' returns most-downloaded first (best for popularity), 'likes' returns community-favored first, 'modified' returns recently updated first, 'trending' surfaces what is hot right now. Has no effect on single-URL targets.

## `filterTask` (type: `string`):

Filter search-query model targets by pipeline\_tag (Hugging Face task taxonomy). Examples: 'text-generation', 'image-classification', 'automatic-speech-recognition'. Leave empty for no filter. Ignored for dataset, space, paper, user, org, and URL targets.

## `filterLibrary` (type: `string`):

Filter search-query model targets by library\_name. Examples: 'transformers', 'diffusers', 'gguf', 'safetensors'. Leave empty for no filter. Ignored for dataset, space, paper, user, org, and URL targets.

## `filterLanguage` (type: `string`):

Filter search-query model and dataset targets by language code. Use ISO 639-1 like 'en', 'es', 'zh', 'multilingual'. Leave empty for no filter. Ignored for space, paper, user, org, and URL targets.

## `includeReadme` (type: `boolean`):

When true, fetches the full Markdown README for each model/dataset/space (one extra HTTP request per record). When false, returns only API metadata. Defaults to true for URL targets, false for bulk searches to keep token cost down.

## `responseFormat` (type: `string`):

Controls dataset item richness. 'concise' returns a token-efficient subset (id, downloads, likes, task, lastModified — best for AI agents and ≤200 tokens/item). 'detailed' returns every parsed field including tags, siblings, eval results, full README. Defaults to 'detailed'.

## `proxyConfiguration` (type: `object`):

Apify proxy settings. Datacenter is fine — Hugging Face API has no anti-bot. Residential is overkill. Defaults to Apify datacenter pool.

## Actor input object example

```json
{
  "targets": [
    "https://huggingface.co/meta-llama/Llama-3.1-8B"
  ],
  "entityType": "models",
  "resultsPerTarget": 50,
  "sortBy": "downloads",
  "filterTask": "text-generation",
  "filterLibrary": "transformers",
  "filterLanguage": "en",
  "includeReadme": true,
  "responseFormat": "detailed",
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `dataset` (type: `string`):

Default dataset items in JSON. Use the dataset\_schema views (overview / models / datasets / spaces) for filtered slices.

## `datasetUi` (type: `string`):

Browse the dataset in Apify Console with the predefined views.

## `models` (type: `string`):

Filtered dataset items where itemType=model.

## `datasets` (type: `string`):

Filtered dataset items where itemType=dataset.

## `spaces` (type: `string`):

Filtered dataset items where itemType=space.

## `summary` (type: `string`):

Counts per itemType, warnings, and runtime — written to key-value store key SUMMARY.

## `runConsole` (type: `string`):

Inspect logs, dataset, and KV store for this run.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "targets": [
        "https://huggingface.co/meta-llama/Llama-3.1-8B",
        "https://huggingface.co/datasets/squad",
        "qwen 3 instruct"
    ],
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("khadinakbar/huggingface-all-in-one-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "targets": [
        "https://huggingface.co/meta-llama/Llama-3.1-8B",
        "https://huggingface.co/datasets/squad",
        "qwen 3 instruct",
    ],
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("khadinakbar/huggingface-all-in-one-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "targets": [
    "https://huggingface.co/meta-llama/Llama-3.1-8B",
    "https://huggingface.co/datasets/squad",
    "qwen 3 instruct"
  ],
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call khadinakbar/huggingface-all-in-one-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=khadinakbar/huggingface-all-in-one-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hugging Face Scraper — (Models, Datasets, Spaces, Papers etc.)",
        "description": "All-in-one Hugging Face Hub scraper. Paste any URL or text query — auto-detects model, dataset, space, paper, user, org, or collection. Deep model card, lineage, evaluation results, dataset configs. MCP-ready. $0.006 per result.",
        "version": "1.2",
        "x-build-id": "UTptsB08J8BtMNtN3"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/khadinakbar~huggingface-all-in-one-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-khadinakbar-huggingface-all-in-one-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/khadinakbar~huggingface-all-in-one-scraper/runs": {
            "post": {
                "operationId": "runs-sync-khadinakbar-huggingface-all-in-one-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/khadinakbar~huggingface-all-in-one-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-khadinakbar-huggingface-all-in-one-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "targets"
                ],
                "properties": {
                    "targets": {
                        "title": "Hugging Face URLs or text queries",
                        "type": "array",
                        "description": "List of Hugging Face URLs OR plain-text search queries. Each item is auto-detected: model URL (https://huggingface.co/meta-llama/Llama-3.1-8B), dataset URL (/datasets/squad), space URL (/spaces/HuggingFaceM4/idefics_playground), paper URL (/papers/2310.06825), user URL (/karpathy), org URL (/meta-llama), collection URL (/collections/...), or a free-text query like 'qwen 3 instruct'. NOT for inference API calls — use the hugging-face-image-ai or hugging-face-audio-ai actors for that.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "entityType": {
                        "title": "Entity type for text queries",
                        "enum": [
                            "models",
                            "datasets",
                            "spaces",
                            "papers",
                            "all"
                        ],
                        "type": "string",
                        "description": "When a target is a text query (not a URL), search this entity type on Hugging Face. Ignored when the target is a URL — type is auto-detected from the path. Defaults to 'models'. Use 'all' to search models, datasets, and spaces and return the union.",
                        "default": "models"
                    },
                    "resultsPerTarget": {
                        "title": "Max results per target",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Cap on records returned per target. For a single URL (model/dataset/space/paper) you get 1 deep record. For a search query, user, org, or collection target, this caps how many entity records are returned. Defaults to 50. Maximum 500 per target to keep runs predictable.",
                        "default": 50
                    },
                    "sortBy": {
                        "title": "Sort search results",
                        "enum": [
                            "downloads",
                            "likes",
                            "modified",
                            "trending"
                        ],
                        "type": "string",
                        "description": "Sort key for search-query targets and user/org/collection portfolio listings. 'downloads' returns most-downloaded first (best for popularity), 'likes' returns community-favored first, 'modified' returns recently updated first, 'trending' surfaces what is hot right now. Has no effect on single-URL targets.",
                        "default": "downloads"
                    },
                    "filterTask": {
                        "title": "Filter by task / pipeline tag",
                        "type": "string",
                        "description": "Filter search-query model targets by pipeline_tag (Hugging Face task taxonomy). Examples: 'text-generation', 'image-classification', 'automatic-speech-recognition'. Leave empty for no filter. Ignored for dataset, space, paper, user, org, and URL targets."
                    },
                    "filterLibrary": {
                        "title": "Filter by library",
                        "type": "string",
                        "description": "Filter search-query model targets by library_name. Examples: 'transformers', 'diffusers', 'gguf', 'safetensors'. Leave empty for no filter. Ignored for dataset, space, paper, user, org, and URL targets."
                    },
                    "filterLanguage": {
                        "title": "Filter by language",
                        "type": "string",
                        "description": "Filter search-query model and dataset targets by language code. Use ISO 639-1 like 'en', 'es', 'zh', 'multilingual'. Leave empty for no filter. Ignored for space, paper, user, org, and URL targets."
                    },
                    "includeReadme": {
                        "title": "Include full README text",
                        "type": "boolean",
                        "description": "When true, fetches the full Markdown README for each model/dataset/space (one extra HTTP request per record). When false, returns only API metadata. Defaults to true for URL targets, false for bulk searches to keep token cost down.",
                        "default": true
                    },
                    "responseFormat": {
                        "title": "Response shape for AI agents",
                        "enum": [
                            "concise",
                            "detailed"
                        ],
                        "type": "string",
                        "description": "Controls dataset item richness. 'concise' returns a token-efficient subset (id, downloads, likes, task, lastModified — best for AI agents and ≤200 tokens/item). 'detailed' returns every parsed field including tags, siblings, eval results, full README. Defaults to 'detailed'.",
                        "default": "detailed"
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify proxy settings. Datacenter is fine — Hugging Face API has no anti-bot. Residential is overkill. Defaults to Apify datacenter pool.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
