Spaces:

FengHou97
/

Cross-Domain-Recognition

Sleeping

FengHou97 commited on May 6

Commit

ae1712c

verified ·

1 Parent(s): 88dc9b2

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -12,6 +12,8 @@ pipes = {
 inputs = [
     gr.Image(type='pil',
                     label="Image"),
     gr.Radio(choices=[
                                 "ViT/B-16",
                                 "ViT/L-14",
@@ -20,8 +22,6 @@ inputs = [
                       label="Prompt Template Prompt",
                       placeholder="Optional prompt template as prefix",
                       value="a photo of a {}"),
-    gr.Textbox(lines=1,
-                      label="Candidate Labels", placeholder="Add a class label, one by one",),
 ]
 images="festival.jpg"
@@ -35,7 +35,7 @@ def shot(image, labels_text, model_name, hypothesis_template):
 iface = gr.Interface(shot,
             inputs,
             "label",
-            examples=[["festival.jpg", "ViT/B-16", "a photo of a {}", "lantern, firecracker, couplet"]],
             description="""<p>Chinese CLIP is a contrastive-learning-based vision-language foundation model pretrained on large-scale Chinese data. For more information, please refer to the paper and official github. Also, Chinese CLIP has already been merged into Huggingface Transformers! <br><br>
             Paper: <a href='https://arxiv.org/pdf/2403.02714'>https://arxiv.org/pdf/2403.02714</a> <br>
             To begin with the demo, provide a picture (either upload manually, or select from the given examples) and add class labels one by one. Optionally, you can also add template as a prefix to the class labels. <br>""",

 inputs = [
     gr.Image(type='pil',
                     label="Image"),
+    gr.Textbox(lines=1,
+                      label="Candidate Labels", placeholder="Add a class label, one by one"),
     gr.Radio(choices=[
                                 "ViT/B-16",
                                 "ViT/L-14",
                       label="Prompt Template Prompt",
                       placeholder="Optional prompt template as prefix",
                       value="a photo of a {}"),
 ]
 images="festival.jpg"
 iface = gr.Interface(shot,
             inputs,
             "label",
+            examples=[["festival.jpg", "lantern, firecracker, couplet", "ViT/B-16", "a photo of a {}"]],
             description="""<p>Chinese CLIP is a contrastive-learning-based vision-language foundation model pretrained on large-scale Chinese data. For more information, please refer to the paper and official github. Also, Chinese CLIP has already been merged into Huggingface Transformers! <br><br>
             Paper: <a href='https://arxiv.org/pdf/2403.02714'>https://arxiv.org/pdf/2403.02714</a> <br>
             To begin with the demo, provide a picture (either upload manually, or select from the given examples) and add class labels one by one. Optionally, you can also add template as a prefix to the class labels. <br>""",