ModelScope Model Catalog Scraper avatar

ModelScope Model Catalog Scraper

Pricing

Pay per event

Go to Apify Store
ModelScope Model Catalog Scraper

ModelScope Model Catalog Scraper

Scrape the ModelScope (modelscope.cn) AI model catalog — China's Alibaba-backed model hub. Export model IDs, tasks, frameworks, download stats, stars, licenses, and READMEs.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

Scrape the ModelScope (modelscope.cn) AI model catalog — China's Alibaba-backed model registry hosting ~200k models. Export model IDs, tasks, frameworks, download statistics, star counts, licenses, READMEs, and full metadata for all models in the catalog.

What it does

Sweeps the ModelScope JSON API task-by-task (text-generation, image-generation, multimodal, and 26 other task categories), deduplicates across task overlaps, and optionally enriches each model record with the full README from the per-model detail endpoint.

Output fields per model:

  • model_id — full identifier (namespace/name)
  • namespace, name — publisher slug and model name
  • chinese_name — display name in Chinese if present
  • task — primary task tag used for discovery
  • tasks_all — all task tags, pipe-separated
  • frameworks — ML frameworks (pytorch, tensorflow, mindspore, etc.), pipe-separated
  • languages — supported languages (en, zh, multilingual, etc.), pipe-separated
  • license — SPDX identifier (apache-2.0, mit, etc.)
  • downloads_30d — downloads in the last 30 days
  • stars — star count
  • last_updated, created_at — ISO-8601 timestamps
  • readme_text — README content, truncated to 8 KB (requires includeDetails: true)
  • model_size_params — parameter count label when tagged (7B, 72B, MoE-22B-A2B)
  • quantization_variants — available quantization types from tensor metadata, pipe-separated
  • base_model — base model ID if this is a fine-tune
  • publisher_org, publisher_url — organization name and profile URL
  • has_demo, has_inference_api — boolean flags

Input

FieldTypeDefaultDescription
tasksarray(all tasks)Limit to specific task slugs (e.g. text-generation, image-generation). Leave empty to sweep all 29 canonical tasks.
maxItemsinteger100Maximum number of models to return. Set to 0 for unlimited (full catalog run).
includeDetailsbooleantrueFetch the per-model detail endpoint for full README text and quantization variant metadata. Disabling this speeds up runs but leaves readme_text and quantization_variants empty.

Example use cases

  • West+East parity datasets — pair with the HuggingFace Model Scraper to build a combined index of both Western and Chinese open-weights releases (Qwen, DeepSeek, Yi, GLM, InternLM, ERNIE, MiniMax, etc.).
  • Model landscape research — filter by task, framework, or license to survey which Chinese labs are publishing in specific domains.
  • Download trend tracking — schedule regular runs and track downloads_30d growth for specific namespaces or model families.
  • README content analysis — extract model cards from readme_text for NLP-based capability assessment or feature extraction.

Notes

  • The API requires no authentication. No proxy is needed — direct access from Apify infrastructure works without restriction.
  • Full catalog sweeps (all tasks, includeDetails: true) are long-running. Use maxItems to cap output for targeted queries.
  • Array output fields (tasks_all, frameworks, languages, quantization_variants) use | as separator for flat dataset compatibility. Split on | in downstream processing.