File size: 2,183 Bytes
024f28c
bda8e49
 
72a26fd
 
024f28c
 
 
72a26fd
024f28c
72a26fd
2e7c961
024f28c
 
bda8e49
 
 
 
 
 
 
 
 
 
8091e8c
 
 
 
 
 
 
 
 
 
bda8e49
 
 
 
 
 
 
 
 
 
e47d429
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
title: ScouterAI
emoji: 👓
colorFrom: green
colorTo: gray
sdk: gradio
sdk_version: 5.33.0
app_file: app.py
pinned: true
license: apache-2.0
tag: agent-demo-track
short_description: The agent using over 9000 vision models from the HF Hub.
---

# ScouterAI - The Vision enhanced Agent

Welcome to ScouterAI, my [Agents - MCP Hackathon](https://huggingface.co/Agents-MCP-Hackathon) submission. 
This app falls under the track 3 : Agentic Demo.
The goal of the app is to demonstrate the capabilities of agentic llm's combined with more "traditional" deep learning computer vision.
LLM's (and VLM's) are great models when it comes to interacting with the user and understanding its queries but are not (yet) capable of a precise perception of the images presented to them.
Computer Vision models like object detection or image segmentation models are tailored models to accomplish these tasks but require some engineering to wrap them and be user ready.
The idea of the agentic demo is to provide powerful LLM with access to expert vision models like object detection or image segmentation models.
The agent can fulfill precise perception task on any object present in the image : detection, location, classification, masking, counting, etc...

## Overview

In this preliminary app, the agent is a CodeAgent provided by the smolagents framework.
Its interface consists of a chat interface with example and a gallery which is used to display the agent's work.
The agent is provided with a set of tools :
- Task model retriever : a RAG tool which, given a task (object-detection or image-segmentation) and a query (car e.g.), returns a list of models with their model id and the list of classes it is capable of detecting/segmenting. The list if based on a curated dataset of all the models available on the HuggingFace Hub, returns the mo
- Computer vision models : Any object detection and image segmentation models available of HuggingFace
- Image processing functions : Resizing, cropping, ...
- Image annotation functions : Label, bounding box and mask annotators



To complete a user request

## Use-cases

## Stack

Agent framework : smolagents
LLM : Anthropic
Compute : Modal