Wikidata Entity Search Scraper avatar

Wikidata Entity Search Scraper

Pricing

from $14.00 / 1,000 result items

Go to Apify Store
Wikidata Entity Search Scraper

Wikidata Entity Search Scraper

Search Wikidata's open knowledge graph of 100M+ entities (people, places, brands, books, films) by name. Returns Q-ID, label, description, aliases, all claims (P-properties), sitelinks to every Wikipedia language, structured facts and image. Filter by entity type, language and full-claims fetching.

Pricing

from $14.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

ParseForge Banner

🌐 Wikidata Entity Search Scraper

🚀 Search Wikidata's open knowledge graph of 100M+ entities by name.

🕒 Last updated: 2026-05-06 · 📊 22 fields per record · 100M+ entities · people, places, brands, books, films, concepts · all claims, sitelinks, multilingual labels

The Wikidata Entity Search Scraper searches Wikidata's open knowledge graph of 100M+ entities by name. Output includes the canonical Q-ID, label, description, aliases, all claims (P-properties), sitelinks to every Wikipedia language edition, and structured facts.

Wikidata is the structured-data backbone of Wikipedia and one of the largest open knowledge graphs in the world. Filters run server-side, so a single run can resolve every entity matching a name, fetch full claim trees, or pull entities in non-English languages.

🎯 Target Audience💡 Primary Use Cases
ML pipelines, knowledge-graph engineers, journalists, fact-checkers, content recommendation engines, search developersEntity resolution, knowledge-graph augmentation, fact-checking, content enrichment, multilingual search, ML training datasets

📋 What the Wikidata Entity Search Scraper does

Five filtering workflows in a single run:

  • 🔍 Free-text search. Match entity labels and aliases.
  • 🌐 Multilingual. Search in 20+ languages (en, es, fr, de, it, ja, zh, ko, ar, hi, pt, nl, ru).
  • 🆔 Item or property. Search Q-entities (items) or P-entities (properties).
  • 📊 Full claims fetch. Optional: pull every statement, sitelink, and structured fact per entity.
  • 🏷️ Image extraction. Auto-extracts the entity's primary image from claim P18.

💡 Why it matters: clean, server-side filtering removes the parser-and-pagination work from your team and keeps your dataset fresh on every run.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan up to 1,000,000.
querystring"tesla"Term to search Wikidata entities.
languagestring"en"Search language code (ISO 639).
entityTypestring"item"`item` (Q) or `property` (P).
fetchClaimsbooleantrueFetch full claims, sitelinks, aliases per entity.

Example: all entities matching Tesla.

{
"maxItems": 50,
"query": "tesla",
"language": "en",
"fetchClaims": true
}

Example: Spanish-language Madrid entities.

{
"maxItems": 100,
"query": "madrid",
"language": "es"
}

📊 Output

Each record contains 22 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🖼️ thumbnailUrlstringnull
🆔 entityIdstring"Q478214"
📛 labelstring"Tesla"
📝 descriptionstring"American automotive, energy storage and solar power company"
🏷️ aliasesarray["Tesla Inc","Tesla Motors"]
🆔 instanceOfIdstringnull
📊 sitelinkCountnumber106
📊 claimCountnumber147
📊 claimsobject{P31:[Q43229],P159:[Q485176]}
🌐 wikidataUrlstring"https://www.wikidata.org/wiki/Q478214"
📚 wikipediaEnUrlstring"https://en.wikipedia.org/wiki/Tesla,_Inc."

📦 Sample records


✨ Why choose this Actor

Capability
📚100M+ entities. People, places, brands, books, films, concepts in a single query.
🌐Multilingual. 20+ languages with native-language labels and aliases.
📊Full structured facts. All P-property claims for entity-resolution pipelines.
🔗Sitelinks to Wikipedia. Direct links to every Wikipedia language edition.
Fast. 100 entities in under 30 seconds.

📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ This Actor$5 free credit100M+ entitiesLive per runquery, language, type, claims⚡ 2 min
Wikidata SPARQL endpointFreeAllLiveSPARQL🐢 SPARQL knowledge
Manual Wikidata browseFreeManualLiveWeb filters🕒 Manual
DBpediaFreeSubsetStaleSPARQL🐢 Setup

Pick this Actor when you want broad coverage, server-side filtering, and no pipeline maintenance.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Wikidata Entity Search Scraper page on the Apify Store.
  3. 🎯 Set input. Pick your filters and maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🤖 Knowledge Graphs

  • Entity resolution and disambiguation
  • Augment internal KGs with Wikidata facts
  • Build cross-language entity links
  • Train named-entity-recognition models

🔍 Search & Discovery

  • Power semantic search with structured facts
  • Build autocomplete with multilingual labels
  • Resolve ambiguous entity names
  • Cross-language search experiments

📰 Journalism & Fact-Checking

  • Verify entities mentioned in stories
  • Pull biographical and corporate facts
  • Cross-reference claims via P-properties
  • Map relationship networks

🤖 ML & NLP

  • Train entity-linking models
  • Build retrieval-augmented agents
  • Generate training datasets for NER
  • Multilingual KB embedding

🔌 Automating Wikidata Entity Search Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly, daily, or weekly refreshes keep downstream databases in sync automatically.


🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Knowledge-graph research
  • Reproducible KB snapshots
  • Cross-cultural KB studies
  • Course material on Wikidata

🎨 Personal and creative

  • Personal knowledge dashboards
  • Side projects with structured facts
  • Newsletter content
  • Hobbyist KB exploration

🤝 Non-profit and civic

  • Open-knowledge contributions
  • Civic literacy datasets
  • Cultural heritage cataloging
  • Multilingual literacy projects

🧪 Experimentation

  • Train entity-linking ML models
  • Prototype KB-aware chat agents
  • Build entity-resolution pipelines
  • Test cross-language search

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Provide a query and language. The Actor queries Wikidata's wbsearchentities endpoint and optionally fetches full claims via wbgetentities.

📊 How many fields per record?

22 base fields plus a claims object with every P-property and a sitelinks map across Wikipedia languages.

🌐 Which languages are supported?

20+, including English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Japanese, Chinese, Korean, Arabic, Hindi, Turkish, Polish, Swedish, Finnish, Danish, Norwegian, Czech.

🆔 What's the difference between Q and P entities?

Q-entities are items (people, places, things). P-entities are properties (relations like 'instance of', 'located in', 'date of birth').

🔁 Can I schedule runs?

Yes. Use Apify Schedules to refresh entity caches or track entity creations on a topic.

⚖️ Is this data public?

Yes. Wikidata publishes under CC0; you can use the data freely without attribution.

💳 Do I need a paid Apify plan?

No. The free plan covers preview runs. A paid plan unlocks higher item counts and scheduling.

🆘 What if a run fails?

Apify retries transient errors. Partial datasets are preserved.

🖼️ Does it return entity images?

Yes when the entity has claim P18 set. The Actor extracts the Commons image URL automatically.

📚 How do I use the claims field?

Each P-property maps to an array of values. Decode P-IDs via Wikidata's property page.


🔌 Integrate with any app

Wikidata Entity Search Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Wikimedia Foundation, Wikidata, Wikipedia, or any contributing editor. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.