AskUI
/

pta-text-0.1

Model card Files Files and versions Community

gitlost-murali commited on Feb 15, 2024

Commit

d17e301

·

verified ·

1 Parent(s): 4201e9d

Add usage and library details

Files changed (1) hide show

README.md +65 -0

README.md CHANGED Viewed

@@ -1,3 +1,68 @@
 ---
 license: gpl-3.0
 ---

 ---
 license: gpl-3.0
+tags:
+- ui-automation
+- automation
+- agents
+- llm-agents
+- vision
 ---
+# Model card for PTA-Text - A *Text Only* Click Model
+# Table of Contents
+0. [TL;DR](#TL;DR)
+1. [Using the model](#running-the-model)
+2. [Contribution](#contribution)
+3. [Citation](#citation)
+# TL;DR
+## Details for PTA-Text:
+-> __Input__: An image with a header containing the desired UI click command.
+-> __Output__: [x,y] coordinate in relative coordinates 0-1 range.
+__PTA-Text__ is an image encoder based on Matcha, which is an extension of Pix2Struct
+# Installation
+```bash
+pip install askui-ml-helper
+```
+Download the checkpoint ".pt" model from files in this model card.
+## Running the model
+### In full precision, on CPU:
+You can run the model in full precision on CPU:
+```python
+import requests
+from PIL import Image
+from askui_ml_helper.utils.pta_text import PtaTextInference
+pta_text_inference = PtaTextInference("pta-text-v0.1.pt")
+url = "https://docs.askui.com/assets/images/how_askui_works_architecture-363bc8be35bd228e884c83d15acd19f7.png"
+image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
+prompt = 'click on the text "Operating System"'
+render_image = pta_text_inference.process_image_and_draw_circle(image, prompt, radius=15)
+render_image.show()
+>>> Uploaded image with "a red dot", where click operation is predicted
+```
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/5f993a63777efc07d7f1e2ce/ZNwjdENJqn-1VpXDcm_Wg.png)
+# Contribution
+An AskUI's open source initiative. This model is contributed and added to the Hugging Face ecosystem by [Murali Manohar @ AskUI](https://huggingface.co/gitlost-murali).
+# Citation
+TODO