Add usage and library details
Browse files
README.md
CHANGED
@@ -1,3 +1,68 @@
|
|
1 |
---
|
2 |
license: gpl-3.0
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: gpl-3.0
|
3 |
+
tags:
|
4 |
+
- ui-automation
|
5 |
+
- automation
|
6 |
+
- agents
|
7 |
+
- llm-agents
|
8 |
+
- vision
|
9 |
---
|
10 |
+
|
11 |
+
# Model card for PTA-Text - A *Text Only* Click Model
|
12 |
+
|
13 |
+
|
14 |
+
# Table of Contents
|
15 |
+
|
16 |
+
0. [TL;DR](#TL;DR)
|
17 |
+
1. [Using the model](#running-the-model)
|
18 |
+
2. [Contribution](#contribution)
|
19 |
+
3. [Citation](#citation)
|
20 |
+
|
21 |
+
# TL;DR
|
22 |
+
|
23 |
+
## Details for PTA-Text:
|
24 |
+
-> __Input__: An image with a header containing the desired UI click command.
|
25 |
+
|
26 |
+
-> __Output__: [x,y] coordinate in relative coordinates 0-1 range.
|
27 |
+
|
28 |
+
__PTA-Text__ is an image encoder based on Matcha, which is an extension of Pix2Struct
|
29 |
+
|
30 |
+
# Installation
|
31 |
+
|
32 |
+
```bash
|
33 |
+
pip install askui-ml-helper
|
34 |
+
```
|
35 |
+
|
36 |
+
Download the checkpoint ".pt" model from files in this model card.
|
37 |
+
|
38 |
+
## Running the model
|
39 |
+
|
40 |
+
### In full precision, on CPU:
|
41 |
+
|
42 |
+
You can run the model in full precision on CPU:
|
43 |
+
```python
|
44 |
+
import requests
|
45 |
+
from PIL import Image
|
46 |
+
from askui_ml_helper.utils.pta_text import PtaTextInference
|
47 |
+
|
48 |
+
pta_text_inference = PtaTextInference("pta-text-v0.1.pt")
|
49 |
+
|
50 |
+
url = "https://docs.askui.com/assets/images/how_askui_works_architecture-363bc8be35bd228e884c83d15acd19f7.png"
|
51 |
+
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
|
52 |
+
prompt = 'click on the text "Operating System"'
|
53 |
+
|
54 |
+
|
55 |
+
render_image = pta_text_inference.process_image_and_draw_circle(image, prompt, radius=15)
|
56 |
+
render_image.show()
|
57 |
+
>>> Uploaded image with "a red dot", where click operation is predicted
|
58 |
+
```
|
59 |
+
|
60 |
+

|
61 |
+
|
62 |
+
# Contribution
|
63 |
+
|
64 |
+
An AskUI's open source initiative. This model is contributed and added to the Hugging Face ecosystem by [Murali Manohar @ AskUI](https://huggingface.co/gitlost-murali).
|
65 |
+
|
66 |
+
# Citation
|
67 |
+
|
68 |
+
TODO
|