File size: 3,274 Bytes
ed41184
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import whisper
from transformers import MarianMTModel, MarianTokenizer, AutoTokenizer, AutoModelForSeq2SeqLM
import os

# Load Whisper model
model = whisper.load_model("base")

def process_video(video_file, language):
    # Save uploaded video locally
    video_path = "/tmp/video.mp4"
    with open(video_path, "wb") as f:
        f.write(video_file.read())

    try:
        print("Transcribing video to English...")
        result = model.transcribe(video_path, language="en")

        segments = []
        if language == "English":
            segments = result["segments"]
        else:
            if language == "Telugu":
                model_name = "facebook/nllb-200-distilled-600M"
                tokenizer = AutoTokenizer.from_pretrained(model_name)
                translation_model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
                tgt_lang = "tel_Telu"
                print(f"Translating to Telugu using NLLB-200 Distilled...")
                for segment in result["segments"]:
                    inputs = tokenizer(segment["text"], return_tensors="pt", padding=True)
                    translated_tokens = translation_model.generate(**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang))
                    translated_text = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
                    segments.append({"text": translated_text, "start": segment["start"], "end": segment["end"]})
            else:
                model_map = {
                    "Hindi": "Helsinki-NLP/opus-mt-en-hi",
                    "Spanish": "Helsinki-NLP/opus-mt-en-es",
                    "French": "Helsinki-NLP/opus-mt-en-fr",
                    "German": "Helsinki-NLP/opus-mt-en-de",
                    "Portuguese": "Helsinki-NLP/opus-mt-en-pt",
                    "Russian": "Helsinki-NLP/opus-mt-en-ru",
                    "Chinese": "Helsinki-NLP/opus-mt-en-zh",
                    "Arabic": "Helsinki-NLP/opus-mt-en-ar",
                    "Japanese": "Helsinki-NLP/opus-mt-en-jap"
                }
                model_name = model_map[language]
                tokenizer = MarianTokenizer.from_pretrained(model_name)
                translation_model = MarianMTModel.from_pretrained(model_name)
                print(f"Translating to {language}...")
                for segment in result["segments"]:
                    inputs = tokenizer(segment["text"], return_tensors="pt", padding=True)
                    translated = translation_model.generate(**inputs)
                    translated_text = tokenizer.decode(translated[0], skip_special_tokens=True)
                    segments.append({"text": translated_text, "start": segment["start"], "end": segment["end"]})

        # Create SRT file
        srt_path = "/tmp/subtitles.srt"
        with open(srt_path, "w", encoding="utf-8") as f:
            for i, segment in enumerate(segments, 1):
                start = f"{segment['start']:.3f}".replace(".", ",")
                end = f"{segment['end']:.3f}".replace(".", ",")
                text = segment["text"].strip()
                f.write(f"{i}\n00:00:{start} --> 00:00:{end}\n{text}\n\n")
        return srt_path

    except Exception as e:
        return f"Error: {str(e)}"