MohamedRashad/Dots-OCR
Smoliakov PRO
AI & ML interests
Recent Activity
Organizations

MohamedRashad/Dots-OCR

Added a vice versa model: from Ukrainian to English - https://huggingface.co/spaces/Yehor/uk-en-translator

Also, now images: https://huggingface.co/spaces/Yehor/vision-en-uk-translator

Now you can translate your audios as well: https://huggingface.co/spaces/Yehor/audio-en-uk-translator

Facts:
- Fine-tuned with 40M samples (filtered by quality metric) from ~53.5M for 1.4 epochs
- 354M params
- Requires 1 GB of RAM to run with bf16
- BLEU on FLORES-200: 27.24
- Tokens per second: 229.93 (bs=1), 1664.40 (bs=10), 8392.48 (bs=64)
- License: lfm1.0
Mode page: Yehor/kulyk-en-uk

Repository: https://github.com/egorsmkv/speech-to-text-using-php


Also, tested it on A100 with TensorRT:
https://colab.research.google.com/drive/1-agoo5ll-hWEecWQAtO1FM39sqavJxph?usp=sharing
Results are not so obvious, but it works base_rfdetr_fp16.onnx model and gives ~10ms/img

Check it out: https://github.com/egorsmkv/rf-detr-usls

This program does what datasets does. When you push dataset created by the audiofolder script, it creates parquet data and shard them internally.
So, you can use audios-to-dataset instead if you need faster speeds than datasets provides.

Repository with pre-built binaries: https://github.com/crs-org/audios-to-dataset

My channel in Telegram: https://t.me/doing_something

Slightly improved nice project that creates spectrogram and built binaries for different platform using cross-rs I've mentioned earlier in my channel.
Repo: https://github.com/crs-org/sonogram


See my previous post - https://huggingface.co/posts/Yehor/654118712490771
Repository: https://github.com/crs-org/extract-audio

https://github.com/egorsmkv/argilla-audio-annotation

Check out IREE (iree.dev), it convert models to MLIR and then execute on different platforms.
I have tested it in Rust on CPU and CUDA: https://github.com/egorsmkv/eerie-yolo11


With this tool you can extract audio files from a parquet or arrow file generated by Hugging Face datasets library.
Repository: https://github.com/egorsmkv/extract-audio