File size: 2,433 Bytes
f82ab29
 
 
 
 
 
 
 
 
 
 
 
 
 
683a106
 
f82ab29
34a063a
 
dbbea01
34a063a
 
 
 
 
 
 
 
5f74857
 
34a063a
 
e27570a
 
34a063a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2a7e6c0
34a063a
 
 
 
 
 
 
 
 
e2e497c
 
71d546b
e2e497c
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
language:
- uz
tags:
- Text Generation
- PyTorch
- TensorFlow
- Transformers
- mit
- uz
- gpt2
license: apache-2.0
widget:
- text: "Covid-19 га қарши эмлаш бошланди,"
  example_title: "Namuna 1"
- text: "Суъний интеллект энг ривожланган"
  example_title: "Namuna 2"
---

<p><b>GPTuzmodel.</b>

GPTuz GPT-2 kichik modelga asoslangan Uzbek tili uchun state-of-the-art til modeli.

Bu model GPU NVIDIA V100 32GB va 0.53 GB malumotlarni kun.uz dan foydalanilgan holda Transfer Learning va Fine-tuning texnikasi asosida 1 kundan ziyod vaqt davomida o'qitilgan. 

<p><b>Qanday foydaniladi</b>

<pre><code class="language-python">
  
from transformers import AutoTokenizer, AutoModelWithLMHead
import torch

tokenizer = AutoTokenizer.from_pretrained("rifkat/GPTuz")
model = AutoModelWithLMHead.from_pretrained("rifkat/GPTuz")

tokenizer.model_max_length=1024 

</code></pre>
<p><b>Bitta so'z yaratish</b>
<pre><code class="language-python">

text = "Covid-19 га қарши эмлаш бошланди,"
inputs = tokenizer(text, return_tensors="pt")

outputs = model(**inputs, labels=inputs["input_ids"])
loss, logits = outputs[:2]
predicted_index = torch.argmax(logits[0, -1, :]).item()
predicted_text = tokenizer.decode([predicted_index])

print('input text:', text)
print('predicted text:', predicted_text)

</code></pre>
<p><b>Bitta to'liq ketma-ketlikni yarating </b>

<pre><code class="language-python">

text = "Covid-19 га қарши эмлаш бошланди, "
inputs = tokenizer(text, return_tensors="pt")


sample_outputs = model.generate(inputs.input_ids,
                                pad_token_id=50256,
                                do_sample=True, 
                                max_length=50, # kerakli token raqamini qo'ying
                                top_k=40,
                                num_return_sequences=1)


for i, sample_output in enumerate(sample_outputs):
    print(">> Generated text {}\n\n{}".format(i+1, tokenizer.decode(sample_output.tolist())))

</code></pre>

<pre><code class="language-python">
@misc {rifkat_davronov_2022,
	authors       = { {Adilova Fatima,Rifkat Davronov, Samariddin Kushmuratov, Ruzmat Safarov} },
	title        = { GPTuz (Revision 2a7e6c0) },
	year         = 2022,
	url          = { https://huggingface.co/rifkat/GPTuz },
	doi          = { 10.57967/hf/0143 },
	publisher    = { Hugging Face }
}
</code></pre>