alienet commited on
Commit
12f36d4
·
1 Parent(s): b9b16a4
.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ **/__pycache__
2
+ run.bat
3
+ openai_api_test.py
4
+ assets/
README.md CHANGED
@@ -10,53 +10,8 @@ pinned: false
10
  license: apache-2.0
11
  ---
12
 
13
- # EasyTranslator v1.0.6
14
  基于gradio的汉化辅助工具
15
- ## v1.0.6更新内容
16
- 1. 更新文件合并功能,方便多人协作。在文件合并页中可将依照指示将两个json文件合并,同步人工翻译进度。并支持导出小规模json文件方便传输。
17
-
18
- ## v1.0.5更新内容
19
- 1. 支持键盘快捷键<br>
20
- shift+w: ↑<br>
21
- shift+x: ↓<br>
22
- shift+s: save json<br>
23
- shift+r: replace<br>
24
- shift+g: gpt translate<br>
25
- shift+b: baidu translate<br>
26
-
27
- ## v1.0.4更新内容
28
- 1. 追加摸鱼模式, 将必要组件集中在半个屏幕内。在`config.json`中`moyu_mode`设为1开启, 设为0关闭
29
- 2. 加入对GPT翻译的超时检测, 时间上限在`config.json`的`openai_api_settings`中的`time_limit`处设置, 单位为秒。若请求超时, 会打印超时提示, 但不会报错)
30
- 3. GPT翻译现在将不返回重复结果
31
-
32
- ## v1.0.3更新内容
33
- 1. 支持预览页直接修改译文, 建议保存JSON后再使用此功能
34
- 2. 可选是否即时更新上次编辑id
35
-
36
- `config.json`中设置`"if_save_id_immediately"`参数, 若为1则逻辑与之前一样, 在切换id时立刻保存进`config.json`;若为0则会显示保存编辑id按钮`SAVE last edited position`, 在点击后存入`config.json`。
37
-
38
- ## v1.0.2更新内容
39
- 1. 支持批量机翻
40
-
41
- ## v1.0.1更新内容
42
- 1. 优化文件读取逻辑
43
- 2. 增加错误提示、警告等。保存JSON成功时会提示更新的译文条数
44
- 3. 允许自定义传输到gpt的prompt、自定义百度翻译的原文及目标语言
45
- 4. 追加上下文预览功能, 并允许自定义预览条数和编号。指定id将会以双星号标记, 修改过的译文将会在前面加星号标记
46
- 5. 优化按钮手感
47
-
48
- ## 特性
49
- 1. 一键机翻接口, 提供复制到剪贴板按钮
50
- 2. 便捷的上下句切换, 直接跳转功能
51
- 3. 记忆上次编辑位置功能
52
- 4. 人名翻译记忆功能, 一次修改将会同步到全体。人名词典在程序启动时读取并在保存JSON文件时保存。开启程序时可以直接改`name_cn`, 关闭程序后可以修改人名词典。下次开启程序时人名词典中的内容将会覆盖JSON文件中的`name_cn`。
53
- 5. 文本翻译记忆功能, 机翻/修改后只要不关闭程序, 切换上下句, 刷新 网页都不会影响
54
- 6. 译文缓存。相对地原文不会缓存, 所以手滑改或删掉只要切换或者刷新即可恢复。因此想查看原文具体某个词的翻译也可以直接编辑原文再机翻, 不会影响原文本。
55
- 7. 一键替换功能, 用于专有名词错译的情况。会将机翻及手翻文本中的对象全部替换。替换词典可以在运行中直接更改, 不用重开程序。
56
- 8. 便利的api key管理及prompt修改等
57
- 9. 提供JSON文件与CSV文件互转
58
- 10. 上下文预览功能
59
- <br><br>
60
 
61
  ## 使用
62
  至少需要安装python3(作者使用的版本是3.10, 其它版本尚未测试)
@@ -102,10 +57,60 @@ json文件格式要求为:
102
  ```
103
  python EasyTranslator.py
104
  ```
105
- (hf版为app.py)
106
  然后在网页中打开程序给出的网址(eg: http://127.0.0.1:7860 )
107
  <br><br>
108
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  ## 演示
110
  摸鱼模式 \
111
  ![image](https://github.com/alienet1109/EasyTranslator/blob/master/assets/moyu_mode.png) \
 
10
  license: apache-2.0
11
  ---
12
 
13
+ # EasyTranslator v1.1.0
14
  基于gradio的汉化辅助工具
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## 使用
17
  至少需要安装python3(作者使用的版本是3.10, 其它版本尚未测试)
 
57
  ```
58
  python EasyTranslator.py
59
  ```
 
60
  然后在网页中打开程序给出的网址(eg: http://127.0.0.1:7860 )
61
  <br><br>
62
 
63
+ ## v1.1.0更新内容
64
+ 1. 现支持通常翻译和批量翻译中gemini, claude, qwen, deepseek...等模型的自由选择和调用。请在`config.json`中或API页填写api key (注:当设置了OpenRouter的api时,会优先使用OpenRouter的接口)。使用gemini, claude, doubao等的官方接口请安装相关依赖包。可以自行修改`utils.py`中的MODEL_LIST以调整可选模型,避免冗余。
65
+ 2. 修改了键盘快捷键设置<br>
66
+ alt+w: ↑<br>
67
+ alt+x: ↓<br>
68
+ alt+s: save json<br>
69
+ alt+r: replace<br>
70
+ alt+q: model1 translate<br>
71
+ alt+e: model2 translate<br>
72
+
73
+ ## v1.0.6更新内容
74
+ 1. 更新文件合并功能,方便多人协作。在文件合并页中可将依照指示将两个json文件合并,同步人工翻译进度。并支持导出小规模json文件方便传输。
75
+
76
+ ## v1.0.5更新内容
77
+ 1. 支持键盘快捷键
78
+
79
+ ## v1.0.4更新内容
80
+ 1. 追加摸鱼模式, 将必要组件集中在半个屏幕内。在`config.json`中`moyu_mode`设为1开启, 设为0关闭
81
+ 2. 加入对GPT翻译的超时检测, 时间上限在`config.json`的`openai_api_settings`中的`time_limit`处设置, 单位为秒。若请求超时, 会打印超时提示, 但不会报错)
82
+ 3. GPT翻译现在将不返回重复结果
83
+
84
+ ## v1.0.3更新内容
85
+ 1. 支持预览页直接修改译文, 建议保存JSON后再使用此功能
86
+ 2. 可选是否即时更新上次编辑id
87
+
88
+ `config.json`中设置`"if_save_id_immediately"`参数, 若为1则逻辑与之前一样, 在切换id时立刻保存进`config.json`;若为0则会显示保存编辑id按钮`SAVE last edited position`, 在点击后存入`config.json`。
89
+
90
+ ## v1.0.2更新内容
91
+ 1. 支持批量机翻
92
+
93
+ ## v1.0.1更新内容
94
+ 1. 优化文件读取逻辑
95
+ 2. 增加错误提示、警告等。保存JSON成功时会提示更新的译文条数
96
+ 3. 允许自定义传输到gpt的prompt、自定义百度翻译的���文及目标语言
97
+ 4. 追加上下文预览功能, 并允许自定义预览条数和编号。指定id将会以双星号标记, 修改过的译文将会在前面加星号标记
98
+ 5. 优化按钮手感
99
+
100
+ ## 特性
101
+ 1. 一键机翻接口, 提供复制到剪贴板按钮
102
+ 2. 便捷的上下句切换, 直接跳转功能
103
+ 3. 记忆上次编辑位置功能
104
+ 4. 人名翻译记忆功能, 一次修改将会同步到全体。人名词典在程序启动时读取并在保存JSON文件时保存。开启程序时可以直接改`name_cn`, 关闭程序后可以修改人名词典。下次开启程序时人名词典中的内容将会覆盖JSON文件中的`name_cn`。
105
+ 5. 文本翻译记忆功能, 机翻/修改后只要不关闭程序, 切换上下句, 刷新 网页都不会影响
106
+ 6. 译文缓存。相对地原文不会缓存, 所以手滑改或删掉只要切换或者刷新即可恢复。因此想查看原文具体某个词的翻译也可以直接编辑原文再机翻, 不会影响原文本。
107
+ 7. 一键替换功能, 用于专有名词错译的情况。会将机翻及手翻文本中的对象全部替换。替换词典可以在运行中直接更改, 不用重开程序。
108
+ 8. 便利的api key管理及prompt修改等
109
+ 9. 提供JSON文件与CSV文件互转
110
+ 10. 上下文预览功能
111
+ <br><br>
112
+
113
+
114
  ## 演示
115
  摸鱼模式 \
116
  ![image](https://github.com/alienet1109/EasyTranslator/blob/master/assets/moyu_mode.png) \
modules/llm/BaseLLM.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from abc import ABC, abstractmethod
2
+
3
+ class BaseLLM(ABC):
4
+
5
+ def __init__(self):
6
+ pass
7
+
8
+ @abstractmethod
9
+ def initialize_message(self):
10
+ pass
11
+
12
+ @abstractmethod
13
+ def ai_message(self, payload):
14
+ pass
15
+
16
+ @abstractmethod
17
+ def system_message(self, payload):
18
+ pass
19
+
20
+ @abstractmethod
21
+ def user_message(self, payload):
22
+ pass
23
+
24
+ @abstractmethod
25
+ def get_response(self):
26
+ pass
27
+
28
+ @abstractmethod
29
+ def print_prompt(self):
30
+ pass
31
+
32
+
modules/llm/Claude.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import anthropic
2
+ import os
3
+ from .BaseLLM import BaseLLM
4
+
5
+ class Claude(BaseLLM):
6
+
7
+ def __init__(self, model="claude-3-5-sonnet-20240620"):
8
+ super(Claude, self).__init__()
9
+ self.model_name = model
10
+ self.client = anthropic.Anthropic(
11
+ api_key = os.environ.get("ANTHROPIC_API_KEY")
12
+ )
13
+ # add api_base
14
+ self.messages = []
15
+
16
+ def initialize_message(self):
17
+ self.messages = []
18
+
19
+ def ai_message(self, payload):
20
+ self.messages.append({"role": "ai", "content": payload})
21
+
22
+ def system_message(self, payload):
23
+ self.messages.append({"role": "system", "content": payload})
24
+
25
+ def user_message(self, payload):
26
+ self.messages.append({"role": "user", "content": payload})
27
+
28
+ def get_response(self):
29
+ message = self.client.messages.create(
30
+ model=self.model_name,
31
+ max_tokens=4096,
32
+ messages=self.messages
33
+ )
34
+ return message.content
35
+
36
+ def chat(self,text):
37
+ self.initialize_message()
38
+ if isinstance(messages, str):
39
+ self.user_message(text)
40
+ response = self.get_response()
41
+ return response
42
+
43
+ def print_prompt(self):
44
+ for message in self.messages:
45
+ print(message)
46
+
47
+ if __name__ == '__main__':
48
+ messages = [{"role": "system", "content": "Hello, how are you?"}]
49
+ model = "claude-3-5-sonnet-20240620"
50
+ #model = 'gpt-4o'
51
+ llm = Claude()
52
+
53
+ print(llm.chat("Say it is a test."))
54
+
55
+
modules/llm/DeepSeek.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from .BaseLLM import BaseLLM
2
+ from openai import OpenAI
3
+ import os
4
+
5
+ class DeepSeek(BaseLLM):
6
+
7
+ def __init__(self, model="deepseek-chat"):
8
+ super(DeepSeek, self).__init__()
9
+ self.client = OpenAI(
10
+ api_key=os.getenv("DEEPSEEK_API_KEY"),
11
+ base_url="https://api.deepseek.com",
12
+ )
13
+ self.model_name = model
14
+ self.messages = []
15
+
16
+
17
+ def initialize_message(self):
18
+ self.messages = []
19
+
20
+ def ai_message(self, payload):
21
+ self.messages.append({"role": "ai", "content": payload})
22
+
23
+ def system_message(self, payload):
24
+ self.messages.append({"role": "system", "content": payload})
25
+
26
+ def user_message(self, payload):
27
+ self.messages.append({"role": "user", "content": payload})
28
+
29
+ def get_response(self,temperature = 0.8):
30
+
31
+ response = self.client.chat.completions.create(
32
+ model="deepseek-chat",
33
+ messages=[
34
+ {"role": "system", "content": "You are a helpful assistant"},
35
+ {"role": "user", "content": "Hello"},
36
+ ],
37
+ stream=False
38
+ )
39
+ return response.choices[0].message.content
40
+
41
+ def chat(self,text):
42
+ self.initialize_message()
43
+ self.user_message(text)
44
+ response = self.get_response()
45
+ return response
46
+
47
+ def print_prompt(self):
48
+ for message in self.messages:
49
+ print(message)
modules/llm/Doubao.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from .BaseLLM import BaseLLM
2
+ from volcenginesdkarkruntime import Ark
3
+ import os
4
+
5
+ class Doubao(BaseLLM):
6
+
7
+ def __init__(self, model="ep-20241228220355-cqxcs"):
8
+ super(Doubao, self).__init__()
9
+ self.client = Ark(api_key=os.environ.get("ARK_API_KEY"))
10
+ self.model_name = model
11
+ self.messages = []
12
+
13
+ def initialize_message(self):
14
+ self.messages = []
15
+
16
+ def ai_message(self, payload):
17
+ self.messages.append({"role": "ai", "content": payload})
18
+
19
+ def system_message(self, payload):
20
+ self.messages.append({"role": "system", "content": payload})
21
+
22
+ def user_message(self, payload):
23
+ self.messages.append({"role": "user", "content": payload})
24
+
25
+ def get_response(self,temperature = 0.8):
26
+
27
+ completion = self.client.chat.completions.create(
28
+ model=self.model_name,
29
+ messages=self.messages,
30
+ temperature=temperature,
31
+ top_p=0.8
32
+ )
33
+
34
+ return completion.choices[0].message.content
35
+
36
+ def chat(self,text):
37
+ self.initialize_message()
38
+ self.user_message(text)
39
+ response = self.get_response()
40
+ return response
41
+
42
+ def print_prompt(self):
43
+ for message in self.messages:
44
+ print(message)
modules/llm/Gemini.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from .BaseLLM import BaseLLM
2
+ import google.generativeai as genai
3
+ import os
4
+ import time
5
+
6
+ class Gemini(BaseLLM):
7
+ def __init__(self, model="gemini-1.5-flash"):
8
+ super(Gemini, self).__init__()
9
+ genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
10
+ self.model_name = model
11
+ self.model = genai.GenerativeModel(model)
12
+ self.messages = []
13
+
14
+
15
+ def initialize_message(self):
16
+ self.messages = []
17
+
18
+ def ai_message(self, payload):
19
+ self.messages.append({"role": "model", "parts": payload})
20
+
21
+ def system_message(self, payload):
22
+ self.messages.append({"role": "system", "parts": payload})
23
+
24
+ def user_message(self, payload):
25
+ self.messages.append({"role": "user", "parts": payload})
26
+
27
+ def get_response(self,temperature = 0.8):
28
+ time.sleep(3)
29
+ chat = self.model.start_chat(
30
+ history = self.messages
31
+ )
32
+ response = chat.send_message(generation_config=genai.GenerationConfig(
33
+ temperature=temperature,
34
+ ))
35
+
36
+ return response.text
37
+
38
+ def chat(self,text):
39
+ chat = self.model.start_chat()
40
+ response = chat.send_message(text)
41
+ return response.text
42
+
43
+ def print_prompt(self):
44
+ for message in self.messages:
45
+ print(message)
modules/llm/LangChainGPT.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from .BaseLLM import BaseLLM
2
+ from openai import OpenAI
3
+ import os
4
+
5
+ class LangChainGPT(BaseLLM):
6
+
7
+ def __init__(self, model="gpt-4o-mini"):
8
+ super(LangChainGPT, self).__init__()
9
+ self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
10
+ self.model_name = model
11
+ # add api_base
12
+ self.messages = []
13
+
14
+ def initialize_message(self):
15
+ self.messages = []
16
+
17
+ def ai_message(self, payload):
18
+ self.messages.append({"role": "ai", "content": payload})
19
+
20
+ def system_message(self, payload):
21
+ self.messages.append({"role": "system", "content": payload})
22
+
23
+ def user_message(self, payload):
24
+ self.messages.append({"role": "user", "content": payload})
25
+
26
+ def get_response(self,temperature = 0.8):
27
+
28
+ completion = self.client.chat.completions.create(
29
+ model=self.model_name,
30
+ messages=self.messages,
31
+ temperature=temperature,
32
+ top_p=0.8
33
+ )
34
+ return completion.choices[0].message.content
35
+
36
+ def chat(self,text,temperature = 0.8):
37
+ self.initialize_message()
38
+ self.user_message(text)
39
+ response = self.get_response(temperature = temperature)
40
+ return response
41
+
42
+ def print_prompt(self):
43
+ for message in self.messages:
44
+ print(message)
45
+
46
+
modules/llm/LocalModel.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from .BaseLLM import BaseLLM
2
+ from peft import PeftModel
3
+ import os
4
+ from transformers import AutoModelForCausalLM, AutoTokenizer
5
+
6
+
7
+ class LocalModel(BaseLLM):
8
+ def __init__(self, model, adapter_path = None):
9
+ super(LocalModel, self).__init__()
10
+ model_name = model
11
+ self.model = AutoModelForCausalLM.from_pretrained(
12
+ model_name,
13
+ torch_dtype="auto",
14
+ device_map="auto",
15
+
16
+ )
17
+ if isinstance(adapter_path,str):
18
+ self.model = PeftModel.from_pretrained(self.model, adapter_path)
19
+ elif isinstance(adapter_path,list):
20
+ for path in adapter_path:
21
+ self.model = PeftModel.from_pretrained(self.model, path)
22
+
23
+ self.tokenizer = AutoTokenizer.from_pretrained(model_name)
24
+ self.model_name = model
25
+ self.messages = []
26
+
27
+ def initialize_message(self):
28
+ self.messages = []
29
+
30
+ def ai_message(self, payload):
31
+ self.messages.append({"role": "ai", "content": payload})
32
+
33
+ def system_message(self, payload):
34
+ self.messages.append({"role": "system", "content": payload})
35
+
36
+ def user_message(self, payload):
37
+ self.messages.append({"role": "user", "content": payload})
38
+
39
+ def get_response(self,temperature = 0.8):
40
+
41
+ text = self.tokenizer.apply_chat_template(
42
+ self.messages,
43
+ tokenize=False,
44
+ add_generation_prompt=True
45
+ )
46
+ model_inputs = self.tokenizer([text], return_tensors="pt").to(self.model.device)
47
+ generated_ids = self.model.generate(
48
+ **model_inputs,
49
+ max_new_tokens=512
50
+ )
51
+ generated_ids = [
52
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
53
+ ]
54
+
55
+ response = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
56
+ return response
57
+
58
+ def chat(self,text,temperature = 0.8):
59
+ self.initialize_message()
60
+ self.user_message(text)
61
+ response = self.get_response(temperature = temperature)
62
+ return response
63
+
64
+ def print_prompt(self):
65
+ for message in self.messages:
66
+ print(message)
modules/llm/OpenRouter.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from .BaseLLM import BaseLLM
2
+ import os
3
+ from openai import OpenAI
4
+
5
+ class OpenRouter(BaseLLM):
6
+ def __init__(self, model="deepseek/deepseek-r1:free"):
7
+ super(OpenRouter, self).__init__()
8
+ self.client = OpenAI(
9
+ api_key=os.getenv("OPENROUTER_API_KEY"),
10
+ base_url="https://openrouter.ai/api/v1",
11
+ )
12
+ self.model_name = model
13
+ self.messages = []
14
+
15
+ def initialize_message(self):
16
+ self.messages = []
17
+
18
+ def ai_message(self, payload):
19
+ self.messages.append({"role": "ai", "content": payload})
20
+
21
+ def system_message(self, payload):
22
+ self.messages.append({"role": "system", "content": payload})
23
+
24
+ def user_message(self, payload):
25
+ self.messages.append({"role": "user", "content": payload})
26
+
27
+ def get_response(self,temperature = 0.8):
28
+ completion = self.client.chat.completions.create(
29
+ model=self.model_name,
30
+ messages=self.messages
31
+ )
32
+ return completion.choices[0].message.content
33
+
34
+ def chat(self,text,temperature = 0.8):
35
+ self.initialize_message()
36
+ self.user_message(text)
37
+ response = self.get_response(temperature = temperature)
38
+ return response
39
+
40
+ def print_prompt(self):
41
+ for message in self.messages:
42
+ print(message)
modules/llm/Qwen.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from .BaseLLM import BaseLLM
2
+ from openai import OpenAI
3
+ import os
4
+
5
+ class Qwen(BaseLLM):
6
+
7
+ def __init__(self, model="qwen-max"):
8
+ # qwen-max, qwen-plus, qwen-turbo
9
+ super(Qwen, self).__init__()
10
+ self.client = OpenAI(
11
+ api_key=os.getenv("DASHSCOPE_API_KEY"),
12
+ base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
13
+ )
14
+ self.model_name = model
15
+ # add api_base
16
+ self.messages = []
17
+
18
+ def initialize_message(self):
19
+ self.messages = []
20
+
21
+ def ai_message(self, payload):
22
+ self.messages.append({"role": "ai", "content": payload})
23
+
24
+ def system_message(self, payload):
25
+ self.messages.append({"role": "system", "content": payload})
26
+
27
+ def user_message(self, payload):
28
+ self.messages.append({"role": "user", "content": payload})
29
+
30
+ def get_response(self,temperature = 0.8):
31
+
32
+ completion = self.client.chat.completions.create(
33
+ model=self.model_name,
34
+ messages=self.messages,
35
+ temperature=temperature,
36
+ top_p=0.8
37
+ )
38
+ return completion.choices[0].message.content
39
+
40
+ def chat(self,text,temperature = 0.8):
41
+ self.initialize_message()
42
+ self.user_message(text)
43
+ response = self.get_response(temperature = temperature)
44
+ return response
45
+
46
+ def print_prompt(self):
47
+ for message in self.messages:
48
+ print(message)
utils.py CHANGED
@@ -4,9 +4,71 @@ import random
4
  import json
5
  from hashlib import md5
6
  from os import path as osp
 
7
  import csv
8
  import threading
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  def load_config(filepath):
11
  with open(filepath, "r", encoding="utf-8") as file:
12
  args = json.load(file)
@@ -46,6 +108,7 @@ def get_baidu_completion(text,api_id,api_key,from_lang,to_lang):
46
  openai_api_key = args["openai_api_settings"]["openai_api_key"]
47
  time_limit = float(args["openai_api_settings"]["time_limit"])
48
  client = openai.OpenAI(api_key = openai_api_key)
 
49
  class GPTThread(threading.Thread):
50
  def __init__(self, model, messages, temperature):
51
  super().__init__()
@@ -63,19 +126,44 @@ class GPTThread(threading.Thread):
63
  )
64
  self.result = response.choices[0].message.content
65
 
66
- def get_gpt_completion(prompt, model="gpt-3.5-turbo",api_key = openai_api_key):
67
  messages = [{"role": "user", "content": prompt}]
68
  temperature = random.uniform(0,1)
69
  thread = GPTThread(model, messages,temperature)
70
  thread.start()
71
- thread.join(10)
72
  if thread.is_alive():
73
  thread.terminate()
74
  print("请求超时")
75
  return "TimeoutError", False
76
  else:
77
  return thread.result, True
78
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  def left_pad_zero(number, digit):
80
  number_str = str(number)
81
  padding_count = digit - len(number_str)
@@ -101,7 +189,7 @@ def convert_to_json(files, text_col, name_col, id_col):
101
  with open(path,"r",encoding="utf-8") as f:
102
  reader = csv.DictReader(f)
103
  line_num = sum(1 for _ in open(path,"r",encoding="utf-8"))
104
- fieldnames = reader.fieldnames
105
  if id_col not in fieldnames:
106
  ids = generate_ids(line_num)
107
  i = 0
 
4
  import json
5
  from hashlib import md5
6
  from os import path as osp
7
+ import os
8
  import csv
9
  import threading
10
 
11
+ MODEL_NAME_DICT = {
12
+ "gpt-4":"openai/gpt-4",
13
+ "gpt-4o":"openai/gpt-4o",
14
+ "gpt-4o-mini":"openai/gpt-4o-mini",
15
+ "gpt-3.5-turbo":"openai/gpt-3.5-turbo",
16
+ "deepseek-r1":"deepseek/deepseek-r1",
17
+ "deepseek-v3":"deepseek/deepseek-chat",
18
+ "gemini-2":"google/gemini-2.0-flash-001",
19
+ "gemini-1.5":"google/gemini-flash-1.5",
20
+ "llama3-70b": "meta-llama/llama-3.3-70b-instruct",
21
+ "qwen-turbo":"qwen/qwen-turbo",
22
+ "qwen-plus":"qwen/qwen-plus",
23
+ "qwen-max":"qwen/qwen-max",
24
+ "qwen-2.5-72b":"qwen/qwen-2.5-72b-instruct",
25
+ "claude-3.5-sonnet":"anthropic/claude-3.5-sonnet",
26
+ "phi-4":"microsoft/phi-4",
27
+ }
28
+
29
+ def get_models(model_name):
30
+ # return the combination of llm, embedding and tokenizer
31
+ if os.getenv("OPENROUTER_API_KEY", default="") and "YOUR" not in os.getenv("OPENROUTER_API_KEY", default="") and model_name in MODEL_NAME_DICT:
32
+ from modules.llm.OpenRouter import OpenRouter
33
+ return OpenRouter(model=MODEL_NAME_DICT[model_name])
34
+ elif model_name == 'openai':
35
+ from modules.llm.LangChainGPT import LangChainGPT
36
+ return LangChainGPT()
37
+ elif model_name.startswith('gpt-3.5'):
38
+ from modules.llm.LangChainGPT import LangChainGPT
39
+ return LangChainGPT(model="gpt-3.5-turbo")
40
+ elif model_name == 'gpt-4':
41
+ from modules.llm.LangChainGPT import LangChainGPT
42
+ return LangChainGPT(model="gpt-4")
43
+ elif model_name == 'gpt-4o':
44
+ from modules.llm.LangChainGPT import LangChainGPT
45
+ return LangChainGPT(model="gpt-4o")
46
+ elif model_name == "gpt-4o-mini":
47
+ from modules.llm.LangChainGPT import LangChainGPT
48
+ return LangChainGPT(model="gpt-4o-mini")
49
+ elif model_name.startswith("claude-3-5"):
50
+ from modules.llm.Claude import Claude
51
+ return Claude(model="claude-3-5-sonnet-20241022")
52
+ elif model_name in ["qwen-turbo","qwen-plus","qwen-max"]:
53
+ from modules.llm.Qwen import Qwen
54
+ return Qwen(model = model_name)
55
+ elif model_name.startswith('doubao'):
56
+ from modules.llm.Doubao import Doubao
57
+ return Doubao()
58
+ elif model_name.startswith('gemini-2'):
59
+ from modules.llm.Gemini import Gemini
60
+ return Gemini("gemini-2.0-flash")
61
+ elif model_name.startswith('gemini-1.5'):
62
+ from modules.llm.Gemini import Gemini
63
+ return Gemini("gemini-1.5-flash")
64
+ elif model_name.startswith("deepseek"):
65
+ from modules.llm.DeepSeek import DeepSeek
66
+ return DeepSeek()
67
+ else:
68
+ print(f'Warning! undefined model {model_name}, use gpt-4o-mini instead.')
69
+ from modules.llm.LangChainGPT import LangChainGPT
70
+ return LangChainGPT()
71
+
72
  def load_config(filepath):
73
  with open(filepath, "r", encoding="utf-8") as file:
74
  args = json.load(file)
 
108
  openai_api_key = args["openai_api_settings"]["openai_api_key"]
109
  time_limit = float(args["openai_api_settings"]["time_limit"])
110
  client = openai.OpenAI(api_key = openai_api_key)
111
+
112
  class GPTThread(threading.Thread):
113
  def __init__(self, model, messages, temperature):
114
  super().__init__()
 
126
  )
127
  self.result = response.choices[0].message.content
128
 
129
+ def get_gpt_completion(prompt, time_limit = 10, model="gpt-40-mini"):
130
  messages = [{"role": "user", "content": prompt}]
131
  temperature = random.uniform(0,1)
132
  thread = GPTThread(model, messages,temperature)
133
  thread.start()
134
+ thread.join(time_limit)
135
  if thread.is_alive():
136
  thread.terminate()
137
  print("请求超时")
138
  return "TimeoutError", False
139
  else:
140
  return thread.result, True
141
+
142
+ class LLMThread(threading.Thread):
143
+ def __init__(self, llm, prompt, temperature):
144
+ super().__init__()
145
+ self.llm = llm
146
+ self.prompt = prompt
147
+ self.temperature = temperature
148
+ self.result = ""
149
+ def terminate(self):
150
+ self._running = False
151
+ def run(self):
152
+ self.result = self.llm.chat(self.prompt, temperature = self.temperature)
153
+
154
+ def get_llm_completion(prompt, time_limit = 10, model_name="gpt-4o-mini"):
155
+ llm = get_models(model_name)
156
+ temperature = 0.7
157
+ thread = LLMThread(llm, prompt,temperature)
158
+ thread.start()
159
+ thread.join(time_limit)
160
+ if thread.is_alive():
161
+ thread.terminate()
162
+ print("请求超时")
163
+ return "TimeoutError", False
164
+ else:
165
+ return thread.result, True
166
+
167
  def left_pad_zero(number, digit):
168
  number_str = str(number)
169
  padding_count = digit - len(number_str)
 
189
  with open(path,"r",encoding="utf-8") as f:
190
  reader = csv.DictReader(f)
191
  line_num = sum(1 for _ in open(path,"r",encoding="utf-8"))
192
+ fieldnames = reader.fieldnames if reader.fieldnames else []
193
  if id_col not in fieldnames:
194
  ids = generate_ids(line_num)
195
  i = 0