Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.51.0
Changelog
π’ Release v1.0.3
- π¨ The
IndicProcessorclass has been re-written in Cython for faster implementation. This gives us atleast+10 lines/s. - A new
visualizeargument as been added topreprocess_batchto track the processing with atqdmbar.
π’ Release v1.0.2
- The repository has been renamed to
IndicTransToolkit. - π¨ The custom tokenizer is now removed from the repository. Please revert to a previous commit (v1.0.1) to use it (strongly discouraged). The official (and only tokenizer) is available on HF along with the models.
π’ Release v1.0.0
- The PreTrainedTokenizer for IndicTrans2 is now available on HF ππ Note that, you still need the
IndicProcessorto pre-process the sentences before tokenization. - π¨ In favor of the standard PreTrainedTokenizer, we deprecated the custom tokenizer. However, this custom tokenizer will still be available here for backward compatibility, but no further updates/bug-fixes will be provided.
- The
indic_evaluatefunction is now consolidated into a concreteIndicEvaluatorclass. - The data collation function for training is consolidated into a concrete
IndicDataCollatorclass. - A simple batching method is now available in the
IndicProcessor.