gitlost-murali commited on
Commit
d17e301
·
verified ·
1 Parent(s): 4201e9d

Add usage and library details

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md CHANGED
@@ -1,3 +1,68 @@
1
  ---
2
  license: gpl-3.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: gpl-3.0
3
+ tags:
4
+ - ui-automation
5
+ - automation
6
+ - agents
7
+ - llm-agents
8
+ - vision
9
  ---
10
+
11
+ # Model card for PTA-Text - A *Text Only* Click Model
12
+
13
+
14
+ # Table of Contents
15
+
16
+ 0. [TL;DR](#TL;DR)
17
+ 1. [Using the model](#running-the-model)
18
+ 2. [Contribution](#contribution)
19
+ 3. [Citation](#citation)
20
+
21
+ # TL;DR
22
+
23
+ ## Details for PTA-Text:
24
+ -> __Input__: An image with a header containing the desired UI click command.
25
+
26
+ -> __Output__: [x,y] coordinate in relative coordinates 0-1 range.
27
+
28
+ __PTA-Text__ is an image encoder based on Matcha, which is an extension of Pix2Struct
29
+
30
+ # Installation
31
+
32
+ ```bash
33
+ pip install askui-ml-helper
34
+ ```
35
+
36
+ Download the checkpoint ".pt" model from files in this model card.
37
+
38
+ ## Running the model
39
+
40
+ ### In full precision, on CPU:
41
+
42
+ You can run the model in full precision on CPU:
43
+ ```python
44
+ import requests
45
+ from PIL import Image
46
+ from askui_ml_helper.utils.pta_text import PtaTextInference
47
+
48
+ pta_text_inference = PtaTextInference("pta-text-v0.1.pt")
49
+
50
+ url = "https://docs.askui.com/assets/images/how_askui_works_architecture-363bc8be35bd228e884c83d15acd19f7.png"
51
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
52
+ prompt = 'click on the text "Operating System"'
53
+
54
+
55
+ render_image = pta_text_inference.process_image_and_draw_circle(image, prompt, radius=15)
56
+ render_image.show()
57
+ >>> Uploaded image with "a red dot", where click operation is predicted
58
+ ```
59
+
60
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/5f993a63777efc07d7f1e2ce/ZNwjdENJqn-1VpXDcm_Wg.png)
61
+
62
+ # Contribution
63
+
64
+ An AskUI's open source initiative. This model is contributed and added to the Hugging Face ecosystem by [Murali Manohar @ AskUI](https://huggingface.co/gitlost-murali).
65
+
66
+ # Citation
67
+
68
+ TODO