Arabic Language Scrapper / Datasets for LLMs avatar
Arabic Language Scrapper / Datasets for LLMs

Pricing

Pay per event

Go to Apify Store
Arabic Language Scrapper / Datasets for LLMs

Arabic Language Scrapper / Datasets for LLMs

This scraper is designed to collect and structure rich Arabic lexical data from authoritative Arabic dictionary sources. It extracts Arabic words along with their definitions, contextual meanings, related words, synonyms, and morphological derivations.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Amr Ashour

Amr Ashour

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

❓ What is Arabic Language Scrapper / Datasets for LLMs Scraper?

A web scraping and data processing pipeline that builds high-quality, structured Arabic lexical datasets from online Arabic dictionary sources. It extracts words, meanings, contextual usages, related terms, synonyms, and morphological root derivations, transforming traditionally unstructured dictionary content into machine-readable datasets suitable for LLM training, NLP research, search engines, and language applications.

📚 What data can I extract?

  • Word — The main Arabic word entry

  • Primary meanings — Dictionary definitions

  • Contextual meanings — Usage-based definitions and examples

  • Synonyms — Words with similar meaning

  • Related words — Associated or semantically linked terms

  • Morphological derivations — Root-based derivatives (e.g., كاتب, مكتوب from ك-ت-ب)

  • Root form — The trilateral or quadrilateral Arabic root

  • Source URL — Reference page for traceability

🌍 Why scrape Arabic Language?

Arabic is one of the most widely spoken languages globally, yet high-quality structured Arabic linguistic datasets remain limited. Most authoritative dictionary resources exist only as unstructured web content.

By scraping and structuring Arabic lexical data, this project enables:

  • Better Arabic understanding in LLMs

  • Higher quality Arabic NLP pipelines

  • Semantic search in Arabic applications

  • Knowledge graph creation

  • Educational and linguistic research tools

  • This project helps close the Arabic data gap in AI.

📥 Input

{
"maxItems": 10,
}

📤 Output

"word": "الوسم",
"link": "https://www.almaany.com/ar/dict/ar-ar/",
"title": "تعريف و معنى الوسم في معجم المعاني الجامع - معجم عربي عربيتعريف و معنى الوسم في قاموس الكل. قاموس عربي عربي",
"meanings_list": [
{
"title": "الوسم: (مصطلحات)",
"description": [
"بفتح فسكون جمع وسوم من وسم ( انظر: وسام ) ، أثر الكي بالميسم. والسمة: العلامة. (فقهية)"
]
},
{
"title": "وَسَّمَ: (فعل)",
"description": [
"وسَّمَ يُوسِّم ، توسيمًا ، فهو مُوسِّم ، والمفعول مُوسَّم",
"وسَّم فلانًا: أعطاه أو منحه وسامًا"
]
}
],
"contextual_examples": {
"title": "أمثلة سياقية: الوسم، جمل ورد بها الوسم",
"examples": [
"وسمي النبالة َ بالملاحمِ تتسمْ …................. وسمي الصبابة َ بالعواطف تخلدِ (شعر الشاعر: أحمد شوقي )",
"ولو وسمَ الناسُ الجباهَ بمدحهِ …................. إذاً لاستلذوا الوسمَ والوسمُ يؤلمُ (شعر الشاعر: ابن الرومي )",
"وَهِيَ الْمَحامِدُ أَبْقَتْ خامِلاً أَبَداً …................. منْ لمْ تسمْ وسما ملكٌ بها وسما (شعر الشاعر:ابن حيوس )",
]
},
"similar_words": {
"title": "كلمات ذات صلة",
"words": [
"اِتِّسام",
"أَوْسِمة",
"أوسَم",
"تَوَسَّمَ",
"توسيم"
]
},
"related_words": {
"title": "كلمات قريبة",
"words": [
"الوسط الهندسي بين مقدارين",
"الوسط الهندسي لطولين أو عددين",
"الوسط من الشيء"
]
},
"word_derivative": {
"title": "انظر معنى وَسْمٌ مشتقات و تحليل الْوَسْم",
"derivative": "الْوَسْم : كلمة أصلها الاسم (وَسْمٌ) في صورة مفرد مذكر وجذرها (وسم) وجذعها (وسم) وتحليلها (ال + وسم)"
}
}