Yet Another Dataset Translator avatar

Yet Another Dataset Translator

Try for free

No credit card required

View all Actors
Yet Another Dataset Translator

Yet Another Dataset Translator

mvolfik/yet-another-dataset-translator
Try for free

No credit card required

Actor to translate datasets with field selection and source language detection. Requires Google Translate API Key.

Actor to translate datasets with field selection and source language detection. Requires Google Translate API Key.

Features

Language detection

For each dataset item, this actor performs language detection. If it detects that the item is already in the target language, it skips translation of that item, thus saving your Google Translate budget.

Mock run

You can run this actor with empty API key to mock translate items. That way you can test your setup without spending money. Additionally, the actor prints statistics including price estimate, to allow you to predict your Google Cloud spending.

Note: the estimate is provided solely for your convenience, without any guarantees of accuracy or correctness. Always check Google Translate API pricing and perform your own estimation.

Input

dataset_ids

List of IDs of datasets to translate. This allows you to combine items from multiple Actor runs if needed.

api_key

Google Translate API key. This field is stored securely encrypted on Apify servers. If you don't provide a key, the actor will run in "mock translation" mode, only prefixing each string with "TRANSLATED " instead of calling to Google servers.

field_patterns_to_translate

Provide a list of globs to identify fields that should be translated. Supported wildcards:

  • *: any number of any characters: *Field matches Field, someField, 1Field but not field
  • ?: a single character: ?ield matches yield, Field, ield but not ield or aField
  • [chars]: a single occurence of any character in chars: [fF]ield matches field and Field, but not any of ffield, yield, ield
  • [!chars]: a single occurence of any character not in chars: [!y ]ield matches field, Yield, but not ield, yield, ield

(The globs can of course appear at any position in the pattern, and you can combine them in any way. Use a single glob * to translate all fields.)

detect_language_threshold

Language detection threshold. Default value of 0.7 is suitable for most use-cases, but if you need to be 100% sure that all output text is in given language, you can increase it to a value like 0.95.

If you provide 0, language detection won't be performed at all and all fields matched by patterns will be sent for translation.

The detection is performed at the level of items, on the first 500 characters of concatenation of fields that are to be translated. That means that from given item either all matched, or no fields are translated.

output_dataset_id

ID of output dataset, if you need to aggregate items from multiple runs into one dataset. If not provided, Actor will use its own default dataset.

translation_marker_field

Default value = wasTranslated. Each output item will contain this field, specifying if the item was translated (→ true) or not (→ false).

If you set this field to empty string (or null), the field will not exist.

original_value_field_prefix

Default value = original_. Each translated item item will also contain a copy of each translated field, prefixed with this value, that will contain the original, untranslated string. For example, for input item

{ "text": "Auf Wiedersehen." }

The output would be

1{
2    "text": "Goodbye.",
3    "original_text": "Auf Wiedersehen."
4}

If you set original_value_field_prefix to empty string (or null), the original values will not be provided in output.


Disclaimer: This Actor serves as a tool that interfaces with the Google Translate API and does not hold any responsibility for the quality of translations provided by this third-party service. By supplying an API key, the user consents to this Actor accessing the Google Translate API on their behalf. Users are responsible for ensuring that the amount of text submitted for translation is within their allocated quota and adheres to the Google Translate API's terms of service.

Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 1 star

  • Created in May 2023

  • Modified 2 years ago

Categories