Commit History

added textbox to use user token, remove text translation for now
85b75e6

mjuvilla commited on

fix token acquisition
d0e5d23

mjuvilla commited on

Improved feedback to the user
acd3a4d

mjuvilla commited on

fixed button issues
02be53b

mjuvilla commited on

fixed dockerfile to download tikal and build fast_align
deb0ca6

mjuvilla commited on

add fast align models
9c3b040

mjuvilla commited on

Merge branch 'main' of hf.co:spaces/mjuvilla/document-translator
3081fac

mjuvilla commited on

initial commit
fc19a34
verified

mjuvilla commited on

Merge pull request #3 from langtech-bsc/any-doc
8f1143c
unverified

mjuvilla commited on

Merge branch 'main' of hf.co:spaces/mjuvilla/document-translator into any-doc
e3602d9

mjuvilla commited on

initial commit
97eaf6b
verified

mjuvilla commited on

Fixed some bugs, added dockerfile
5c92fe7

mjuvilla commited on

Removed old code, added support for any* document, disabled "upload" button when we are processing the document
5a4c72f

mjuvilla commited on

Fixed a lot of error, now the script should crash much less often.
186c0af

mjuvilla commited on

changed nltk tokenizer to multilingual tokenizers
41ceab1

mjuvilla commited on

Fixed bug when processing docx files
4420a7f

mjuvilla commited on

updated requirements
7743917

mjuvilla commited on

added language dropdown menus to document translation tab
0b349b6

mjuvilla commited on

enforce the usage of the hf token
164a644

mjuvilla commited on

turns out odts work a bit differently and runs could have more than one tag id, which complicates things quite a lot
c79a1ef

mjuvilla commited on

Updated readme and added salamandraTA7b translator class
bc3b289

mjuvilla commited on

integrated any-doc into the gradle app, separated the translation side to make it easier to implement other translation models
6e54822

mjuvilla commited on

forgot to remove placeholder text
127870b

mjuvilla commited on

moved scripts to src folder, created new create that hopefully should be able to work with any type of document
ad4ed41

mjuvilla commited on

Merge pull request #2 from langtech-bsc/multithreading-and-optimizations
580106a
unverified

mjuvilla commited on

added execution time computation for alignments and cleanup old imports
36f2ac1

mjuvilla commited on

fixed some formatting issues
0348f21

mjuvilla commited on

now both fastaligns run in parallel, also added some improvements to the code (avoid using shell in popen, class Aligner creates the temporary files instead of doing it externally)
0efc9da

mjuvilla commited on

Merge pull request #1 from langtech-bsc/windows
fd61039
unverified

mjuvilla commited on

removed old file
f5f4b70

mjuvilla commited on

Added support for windows and linux, removed unused function, added more logs
1792639

mjuvilla commited on

gradio app for windows
100f3e3

carlosep93 commited on

added salamandraTA translation, update requirements
8030df1

mjuvilla commited on

Modified the script so we only run fastalign once instead of once per paragraph, reducing significantly the run time. It involves flattening all the text while keeping the original paragraph index to be able to reconstruct the original structure.
0fc4acd

mjuvilla commited on

forgot this continue
08ca2fd

mjuvilla commited on

Fixed issues when dealing with hyperlinks (for now we keep the text and formatting but not the link), also improved format handling and sped things up a bit by avoiding loading fastalign with empty paragraphs
595da73

mjuvilla commited on

fixed some formatting errors, still haven't fixed line spacing
b568903

mjuvilla commited on

First commit. For now the translation has not been integrated but reading a docx and writing its translation while keeping the formatting and style should work
978cbf1

mjuvilla commited on