# Multilingual Telegram Corpus for LLM Training

**Use case:** 

Bulk message text from low-resource-language Telegram channels, cleaned into multilingual datasets for LLM pretraining and fine-tuning by ML and NLP teams.

## Input

```json
{
  "channels": [
    "bbcpersian",
    "SwahiliNews",
    "civilge",
    "kaztrend"
  ],
  "maxItems": 2000,
  "includeChannelInfoRow": true,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

## Output

```json
{
  "channel": {
    "label": "Channel",
    "format": "text"
  },
  "date": {
    "label": "Date",
    "format": "text"
  },
  "text": {
    "label": "Text",
    "format": "text"
  },
  "views": {
    "label": "Views",
    "format": "text"
  },
  "mediaType": {
    "label": "Media",
    "format": "text"
  },
  "postUrl": {
    "label": "Post URL",
    "format": "link"
  }
}
```

## About this Actor

This example demonstrates how to use [Telegram Channel Scraper](https://apify.com/dami_studio/telegram-channel-scraper) with a specific input configuration. Visit the [Actor detail page](https://apify.com/dami_studio/telegram-channel-scraper) to learn more, explore other use cases, and run it yourself.