description = '''
PaddleOCR
for text extraction, GoogleTranslator
for translation, and Google Text-to-Speech (gTTS)
for audio conversion, the application provides a seamless experience for users needing real-time text translation from images with audio playback support. Users start by uploading an image, which the system processes to extract text using PaddleOCR. This extracted text is then translated into a selected language via GoogleTranslator using the deep_translator library, supporting a wide array of global languages. The translated text is subsequently converted into audio using gTTS, allowing users to listen to the translation in the chosen language. This multi-component design enables a comprehensive service flow where user inputs are transformed into text, translated, and delivered as both written and spoken output, making the application a robust tool for users needing on-the-go linguistic assistance. By modularizing the OCR, translation, and TTS functions, the application ensures scalability, maintainability, and ease of integration with other services, making it an ideal solution for enhancing accessibility and communication in diverse, multilingual environments.
To konw more about the project study the below UML diagrams:
OCR.py
, extracts text from the uploaded image.translate_speak.py
.OCR.py
and translate_speak.py
handle the core functionalities: text extraction, translation, and audio generation.App
langs_list
: List of supported languages for translation.langs_dict
: Dictionary of supported languages with their language codes.main_interface
: The Gradio interface instance.encode_image(image_path)
: Encodes an image file to a base64 string for display.OCR
ocr_with_paddle(img)
: Uses PaddleOCR to perform OCR on an image and return extracted text and the audio path.TranslateSpeak
output_path
: Path where the output audio file is saved.translate_path
: Path for the translated audio file.get_lang(lang)
: Determines the appropriate language code for text-to-speech in the target language.audio_streaming(txt, lang, to)
: Converts text to audio in the specified language, saving the result to a path determined by to
.translate_txt(lang, text)
: Translates text to the specified language and generates audio, returning the translated text and audio path.PaddleOCR (External Library Class)
ocr(img)
: Accepts an image input and returns OCR results, which OCR
class uses in ocr_with_paddle()
.GoogleTranslator (External Library Class)
get_supported_languages(as_dict)
: Returns a list or dictionary of supported languages, depending on the argument.translate(text, source, target)
: Translates the given text from source to target language.gTTS (External Library Class)
save(output_path)
: Saves the generated audio to a specified path.app_instance
langs_list
: A list of language codes supported by the application, such as English ("en"), Spanish ("es"), and French ("fr").langs_dict
: A dictionary mapping language codes to their corresponding language names.main_interface
: The main Gradio interface used for user interactions.ocr_instance
finaltext
: Contains the extracted text from an uploaded image after OCR processing.translate_speak_instance
output_path
: Path for saving the original audio output in WAV format.translate_path
: Path for saving the translated audio in WAV format.paddle_ocr_instance
language
: The language setting for OCR processing, defaulted to "en" (English).google_translator_instance
source
: The source language code for translation, set to "en" (English).target
: The target language code for translation, here set to "es" (Spanish).gtts_instance
lang
: The language code for text-to-speech, set to "en" (English) in this example.slow
: A boolean indicating the speed of the generated speech (false indicates normal speed).Each object in this diagram represents a specific instance in the application, showcasing how the components work together to provide OCR, translation, and audio features.
""" sequence_diagram = '''This sequence diagram illustrates how the components of the application interact during a typical workflow where a user uploads an image, the system extracts text from the image using OCR, translates the text into a different language, and finally generates audio from the translated text.
User Uploads an Image:
Performing OCR:
ocr_with_paddle(image)
method from the OCR class to process the uploaded image.ocr(image)
method of the PaddleOCR instance to extract text from the image.Translating the Text:
translate_txt(lang, extractedText)
method from the TranslateSpeak class to translate the extracted text into the desired language.translate(extractedText, source, target)
method of the GoogleTranslator to perform the translation.Generating Audio:
audio_streaming(translatedText, lang, to)
method of the gTTS class to generate audio from the translated text.Returning Results to the User:
The collaboration diagram illustrates how different components of the "Linguistic Lens Application" interact with each other to fulfill a particular user request, such as uploading an image, performing OCR, translating the text, and generating audio. This diagram focuses on the relationships and messages exchanged between the components rather than the chronological order of operations.
User Uploads an Image:
OCR Processing:
ocr(image)
method on the PaddleOCR component to extract text from the image.Translation of Text:
translate_txt(lang, extractedText)
method on the TranslateSpeak component, passing in the extracted text and the target language.translate(extractedText, source, target)
method.Generating Audio from Translated Text:
audio_streaming(translatedText, lang, to)
method on the gTTS component to convert the translated text into audio.Returning Results to the User:
This diagram helps in understanding the architecture of the application and the responsibilities of each component within the system. It is particularly useful for identifying how components are interrelated and how data flows through the application.
The component diagram illustrates the architecture of the "Linguistic Lens Application" by showing the major components, their roles, and the relationships between them. This diagram helps in understanding how the system is structured and how each part contributes to the overall functionality.
App:
OCR:
TranslateSpeak:
PaddleOCR:
GoogleTranslator:
gTTS (Google Text-to-Speech):
This component diagram provides a high-level view of the application’s architecture, highlighting how different components work together to deliver the desired functionality. It helps stakeholders understand the modular structure and promotes better maintenance and scalability.
The activity diagram illustrates the flow of activities in the "Linguistic Lens Application" from the moment the user uploads an image to the final display of translated text and audio playback. This diagram is useful for visualizing the dynamic behavior of the system and understanding how different processes interact.
User Uploads Image:
App Receives Image:
OCR Processing:
ocr_with_paddle(image)
method from the OCR component.PaddleOCR Performs OCR:
OCR Returns Extracted Text:
App Displays Extracted Text:
User Selects Target Language:
App Calls Translate Function:
translate_txt(lang, extractedText)
method from the TranslateSpeak component to translate the extracted text.TranslateSpeak Calls GoogleTranslator:
GoogleTranslator Returns Translated Text:
TranslateSpeak Calls gTTS for Audio Generation:
gTTS Returns Audio Path:
TranslateSpeak Returns Results to App:
App Displays Translated Text and Provides Audio Playback:
This diagram is beneficial for stakeholders and developers to understand the application's flow, facilitating better communication and ensuring a smoother development process.
© 2024
This website is made with ❤ by SARATH CHANDRA