# HuggingFace Models Datasets Spaces Scraper - Low-cost💲🔥🤖🤗 (`delectable_incubator/huggingface-models-datasets-spaces-scraper-low-cost`) Actor

Scrape Hugging Face Models, Datasets & Spaces 🤖📊 with a powerful AI ecosystem scraper. Extract repository names, owners, tags, downloads, likes, update dates, source URLs and more from keyword searches. Ideal for AI research, model discovery, dataset analysis and machine learning intelligence 🚀🌐

- **URL**: https://apify.com/delectable\_incubator/huggingface-models-datasets-spaces-scraper-low-cost.md
- **Developed by:** [Prime Scrape](https://apify.com/delectable_incubator) (community)
- **Categories:** AI, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.00005 / actor start

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

<p align="center">
  <img src="https://i.ibb.co/jkNS73wX/readme.png" alt="HuggingFace All-in-One Full-Text Search Scraper" width="100%">
</p>

---

## 🤗🔎 HuggingFace All-in-One Full-Text Search Scraper | Models, Datasets & Spaces | Apify Actor

### 🚀 Extract Hugging Face Search Results in Bulk (No Code)

The **HuggingFace All-in-One Full-Text Search Scraper (Apify Actor)** is a powerful, scalable and SEO-optimized scraping tool designed to extract **Models, Datasets and Spaces directly from Hugging Face full-text search results**.

Whether you're researching AI models, tracking dataset adoption, monitoring machine learning trends, discovering open-source projects, or building AI intelligence datasets, this actor helps you collect structured repository-level and file-level search data at scale.

---

### 🔥 Why This Hugging Face Scraper?

✔ All-in-One Models + Datasets + Spaces scraper

✔ Bulk keyword search support (SEO BOOST 🚀)

✔ Full-text repository search automation

✔ Repository-level + file-level match extraction

✔ Structured JSON / CSV / Excel exports

✔ Perfect for AI research & trend monitoring

✔ No coding required

✔ Fast & scalable cloud execution

---

### 🎯 What This Scraper Does

This Apify Actor performs automated full-text searches across the Hugging Face ecosystem and extracts structured search results.

#### 📌 Core Features

✅ Search Hugging Face Models

✅ Search Hugging Face Datasets

✅ Search Hugging Face Spaces

✅ Combine all content types in one run

✅ Bulk keyword processing

✅ Independent keyword tracking

✅ Extract repository metadata

✅ Extract matched file information

✅ Extract code snippets

✅ Extract tags & classifications

✅ Auto-pagination handling

✅ Structured export-ready output

---

### ⚡ Input Configuration (Simple & Powerful)

#### 🔥 BULK KEYWORD MODE (SEO BOOST 🚀)

````

{
"keywords": \[
"bert",
"llama",
"stable-diffusion",
"rag",
"mistral",
"multimodal"
],
"searchTypes": \[
"Models",
"Datasets",
"Spaces"
],
"maxItemsPerKeyword": 60
}

```

---

### 📊 Extracted Data Fields

| Field | Description |
|---------|---------|
| contentType | Model, Dataset or Space |
| owner | Repository owner |
| repoName | Repository name |
| repoHref | Repository path |
| repoFullUrl | Full repository URL |
| fileName | Matched file name |
| fileHref | File path |
| fileFullUrl | Full file URL |
| matchCount | Number of keyword matches |
| keyword | Search keyword |
| tags | Parsed repository tags |
| tagsRaw | Raw tags string |
| codeSnippet | Extracted matching content |
| searchTypes | Selected content filters |
| sourceUrl | Original Hugging Face search URL |

---

### 💡 Use Cases (High Demand AI SEO Keywords)

This Hugging Face scraper is ideal for:

🤖 AI model discovery

📊 Machine learning research

🧠 LLM ecosystem monitoring

🔎 Open-source AI intelligence

📈 AI trend analysis

📚 Dataset discovery

⚡ Full-text repository search

🏢 Competitive AI research

📡 AI monitoring pipelines

🔄 Automated AI market intelligence

🚀 RAG project research

🎯 Generative AI tracking

---

### 🚀 Key Features

⚡ Bulk keyword scraping support

🤖 Models, Datasets & Spaces extraction

📌 Full-text search automation

🔎 File-level match extraction

🧠 Repository intelligence gathering

📊 Structured output datasets

💾 Export-ready results

🔁 Reliable cloud execution

⚙️ Apify-native scalability

---

### 📊 Preconfigured Dataset Views

The actor automatically generates ready-to-use dataset views.

#### 🔹 Overview View

Includes:

• Content Type

• Repository Owner

• Repository Name

• Match Count

• Keyword

• Repository URL

• Matched File

Perfect for quick analysis.

#### 🔹 Detailed View

Includes:

• Repository URLs

• File URLs

• Match counts

• Tags

• Code snippets

• Search URLs

Ideal for:

🤖 AI research

📊 Dataset intelligence

🔎 Keyword monitoring

🧠 Repository analysis

#### 🔹 By Keyword View

Group results by keyword.

Perfect for topic comparison.

#### 🔹 By Type View

Group results by:

• Models

• Datasets

• Spaces

Perfect for ecosystem distribution analysis.

---

### 📤 Output Formats Supported

✔ JSON

✔ CSV

✔ Excel XLSX

✔ XML

✔ HTML

---

### 📦 Example Output

```

{
"contentType": "dataset",
"owner": "Giannis79",
"repoName": "BERT\_Journalism\_Sentiment",
"repoHref": "/datasets/Giannis79/BERT\_Journalism\_Sentiment",
"repoFullUrl": "https://huggingface.co/datasets/Giannis79/BERT\_Journalism\_Sentiment",
"fileName": "README.md",
"fileHref": "/datasets/Giannis79/BERT\_Journalism\_Sentiment/blob/main/README.md?code=true",
"fileFullUrl": "https://huggingface.co/datasets/Giannis79/BERT\_Journalism\_Sentiment/blob/main/README.md?code=true",
"matchCount": "12 matches",
"tags": \[
"region:us"
],
"tagsRaw": "tags: region:us",
"codeSnippet": "BERT Model Sentiment Analysis Project Overview...",
"keyword": "bert",
"searchTypes": \[
"Datasets",
"Spaces"
],
"sourceUrl": "https://huggingface.co/search/full-text?q=bert\&type=dataset\&type=space"
}

````

---

### 🔥 Why This is the BEST Hugging Face Full-Text Search Scraper on Apify?

✔ All-in-One search solution

✔ Models + Datasets + Spaces support

✔ Bulk keyword processing

✔ File-level result extraction

✔ AI ecosystem intelligence

✔ Enterprise-ready scalability

✔ SEO optimized marketplace listing

✔ High-performance extraction engine

---

### 💸 Pricing

This scraper runs on a **pay-per-result pricing model**.

You only pay for successfully extracted records.

💳 **Price:** $0.98 / 1,000 results

---

### ❓ FAQ (SEO BOOST SECTION)

#### Can I search multiple keywords at once?

Yes — bulk keyword mode is fully supported.

#### Can I scrape Models, Datasets and Spaces together?

Yes — all content types can be combined in a single run.

#### Does the scraper extract file-level matches?

Yes — matched files, URLs and snippets are included.

#### Is coding required?

No — 100% no-code Apify Actor.

#### Can I export the results?

Yes — JSON, CSV, Excel, XML and HTML are supported.

#### Is this useful for AI research?

Absolutely. It is designed specifically for AI ecosystem intelligence and trend monitoring.

---

### ⚠️ Disclaimer

This tool is an independent automation solution and is not affiliated with, endorsed by, or sponsored by Hugging Face.

---

### 🔗 Related Actors

- Hugging Face Models Scraper - Cheap 🤗🤖🔎

- GitHub Repositories Scraper 📦🐙🔍

And many more in the PrimeScrape ecosystem.

---

### 🌍 PrimeScrape Ecosystem

Built for large-scale:

🤖 AI intelligence

📊 Data extraction

📈 Market research

🔎 Search monitoring

🏢 Competitive intelligence

⚙️ Automation pipelines

🧠 AI training datasets

🚀 Enterprise scraping

---

### 📬 Support

⭐⭐⭐⭐⭐ Leave a review if you enjoy this scraper.

📩 Contact us for custom scraping solutions, enterprise automation projects, and private data extraction services.

# Actor input Schema

## `keywords` (type: `array`):

One or more keywords to search on HuggingFace. Each keyword is scraped independently. Examples: 'bert', 'llama', 'stable-diffusion'.
## `searchTypes` (type: `array`):

Select which HuggingFace content types to include. Each type is scraped independently so the per-type limit is applied separately. Example: 3 types × 2 keywords × 60 items = up to 360 total results.
## `maxItemsPerType` (type: `integer`):

Maximum number of results to collect for each combination of keyword + content type. For example, 60 means up to 60 Models, 60 Datasets, and 60 Spaces per keyword. Total maximum = maxItemsPerType × number of types × number of keywords.

## Actor input object example

```json
{
  "keywords": [
    "gpt"
  ],
  "searchTypes": [
    "Models",
    "Datasets",
    "Spaces"
  ],
  "maxItemsPerType": 60
}
````

# Actor output Schema

## `overview` (type: `string`):

No description

## `detailed` (type: `string`):

No description

## `by_keyword` (type: `string`):

No description

## `by_type` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "keywords": [
        "gpt"
    ],
    "searchTypes": [
        "Models",
        "Datasets",
        "Spaces"
    ],
    "maxItemsPerType": 60
};

// Run the Actor and wait for it to finish
const run = await client.actor("delectable_incubator/huggingface-models-datasets-spaces-scraper-low-cost").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "keywords": ["gpt"],
    "searchTypes": [
        "Models",
        "Datasets",
        "Spaces",
    ],
    "maxItemsPerType": 60,
}

# Run the Actor and wait for it to finish
run = client.actor("delectable_incubator/huggingface-models-datasets-spaces-scraper-low-cost").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "keywords": [
    "gpt"
  ],
  "searchTypes": [
    "Models",
    "Datasets",
    "Spaces"
  ],
  "maxItemsPerType": 60
}' |
apify call delectable_incubator/huggingface-models-datasets-spaces-scraper-low-cost --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=delectable_incubator/huggingface-models-datasets-spaces-scraper-low-cost",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "HuggingFace Models Datasets Spaces Scraper - Low-cost💲🔥🤖🤗",
        "description": "Scrape Hugging Face Models, Datasets & Spaces 🤖📊 with a powerful AI ecosystem scraper. Extract repository names, owners, tags, downloads, likes, update dates, source URLs and more from keyword searches. Ideal for AI research, model discovery, dataset analysis and machine learning intelligence 🚀🌐",
        "version": "0.0",
        "x-build-id": "8QizpUPFftBW5BWjb"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/delectable_incubator~huggingface-models-datasets-spaces-scraper-low-cost/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-delectable_incubator-huggingface-models-datasets-spaces-scraper-low-cost",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/delectable_incubator~huggingface-models-datasets-spaces-scraper-low-cost/runs": {
            "post": {
                "operationId": "runs-sync-delectable_incubator-huggingface-models-datasets-spaces-scraper-low-cost",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/delectable_incubator~huggingface-models-datasets-spaces-scraper-low-cost/run-sync": {
            "post": {
                "operationId": "run-sync-delectable_incubator-huggingface-models-datasets-spaces-scraper-low-cost",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "keywords"
                ],
                "properties": {
                    "keywords": {
                        "title": "Search Keywords",
                        "type": "array",
                        "description": "One or more keywords to search on HuggingFace. Each keyword is scraped independently. Examples: 'bert', 'llama', 'stable-diffusion'.",
                        "items": {
                            "type": "string"
                        },
                        "default": [
                            "gpt"
                        ]
                    },
                    "searchTypes": {
                        "title": "Content Types",
                        "type": "array",
                        "description": "Select which HuggingFace content types to include. Each type is scraped independently so the per-type limit is applied separately. Example: 3 types × 2 keywords × 60 items = up to 360 total results.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "Models",
                                "Datasets",
                                "Spaces"
                            ]
                        },
                        "default": [
                            "Models",
                            "Datasets",
                            "Spaces"
                        ]
                    },
                    "maxItemsPerType": {
                        "title": "Max Items per Keyword per Type",
                        "type": "integer",
                        "description": "Maximum number of results to collect for each combination of keyword + content type. For example, 60 means up to 60 Models, 60 Datasets, and 60 Spaces per keyword. Total maximum = maxItemsPerType × number of types × number of keywords.",
                        "default": 60
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
