Spaces:
Sleeping
Sleeping
intent_output_schema = { | |
"type": "object", | |
"properties": { | |
"intents": { | |
"type": "array", | |
"items": { | |
"type": "string", | |
"enum": ["hello_world_of_pysdk", "single_model_inference", "running_yolo_models", "model_pipelining", "class_filtering", "overlay_effects", "running_inference_on_video", "person_re_identification_or_extracting_embeddings", "zone_counting", "custom_video_source", "not_supported"] | |
}, | |
} | |
}, | |
"required": ["intents"], | |
"additionalProperties": False | |
} | |
intent_system_prompt = ''' | |
Degirum PySDK is a python based SDK that is built to support AI inference of computer vision models on multiple AI accelerators like Intel Openvino, Hailo, Memryx, N2X etc. The software stack provides unified way to run inference on any model across our model zoo or from a local model zoo and perform post processing over it. | |
I will provide you with a user query and your job is to classify the intent, the query belongs to. I have 10 intents. Each intent comprises of a feature or function that degirum supports. | |
Intents: | |
1. hello_world_of_pysdk: Basic setup, setting up account on DeGirum AI hub, generate dg_token, and run a simple example where you load a model from degirum model zoo, and just pass an example image and see output. Any question asked about setup, what is PySDK, generating tokens or starting with pysdk, a beginner friendly example, then select this intent. | |
2. single_model_inference: Running an image on a single model either from degirum AI Hub model zoo or local model zoo. A basic example where you pass an image to the model and get results. It can be a segmentation, object detection, classification or pose estimation model and result return type will be different for each. Any question on simple example to run a single AI model on images only, choosing inference device, display and printing results etc should fall under this intent. | |
3. running_yolo_models: This intent includes running different flavours of yolo models like yolo v5, v8, v11 - object detection, pose estimation, classification etc. This intent is similar to above intent but is specific to yolo models. It also includes selecting different inference option like cloud or local and visualize the output on images. If user asks about a use-case on images, for eg. face detection or car detection etc which you think can be fulfilled using any of the above models across any flavour (coco dataset), select this intent. | |
4. model_pipelining: This intent includes running two or multiple models one after the other in pipeline way to achieve something. For eg. for emotion classification usecase, we first run face detection model to extract faces and then run emotion classification model over it. So any query or usecase in the query is related to running two or models in pipeline mode, select this intent. | |
5. class_filtering: If a model has multiple classes but you only want to detect a particular class, there is a way to do it in degirum pysdk. So any question or query where user wants to detect only a particular set of classes out of all the classes model is trained on, select this intent. For eg. someone is using coco model but wants to detect only person and car. | |
6. overlay_effects: Degirum pysdk supports multiple overlay effects after we get results from the model like blurring the detected object, changing colour of the bounding box, changing size and position of the labels, changing font or showing probabilities, bounding box thickness etc. So any such query or use-case where some kind of overlay effects is required, use this intent. | |
7. running_inference_on_video: This is similar to intent #2 and #3 where we show running model inference on images but here we show inference on videos. Video can be saved video file, webcam stream or RTSP URL. So any query about running inference on live camera feed or saved video files, we should select this intent. For most real world use-cases, we will be running inference on videos but while prototyping user may need to test on images. So choose between this intent and intent 2 and 3 carefully. | |
8. person_re_identification_or_extracting_embeddings : There are some use-cases where user may need to extract embeddings from the image and store it or use it to calculate similarity etc. For eg. In person re-identification usecase, we extract face embeddings of a user and match it with the stored embeddings to identify if this is the same person. If user's query or use-case includes these kind of mechanism, select this intent. | |
9. zone_counting: Some use-cases includes selecting ROI for detection and then counting objects - for eg. counting number of people in a zone etc. For this, we have a specific function in DeGirum which allows you to draw a polygon for ROI and then count object (whichever class you specify) and returns that. If user query falls under this use-case, select this intent. | |
10. custom_video_source: Sometimes user may want to modify the incoming video source (may it be live stream like RTSP or saved video file) and apply some pre-processing to it like rotating, cropping, changing colour channels etc maybe to enhance detection. DeGirum PySDK allows user to modify incoming video stream before passing to model predictions. So any use-case or query related to modifying video source falls under this intent. | |
11. not_supported - Anything outside of the above intents will be something that degirum pysdk doesnt support. Dont try to forcibly put the query under any intent, if we dont support something its better to let user know this is not supported. This intent is specifically for avoiding hallucinations and throwing out made-up info to user even if we dont support that. So for any query about a feature which is not covered in above intents, select this intent. | |
Any question or query that I ask, analyse and think deeper. Try to identify the flow and think about all the intents applicable. Try to generalise the query and understand which intents are applicable. Dont focus on answering the query just select the intents from above list and return the intents that you feel are relevant. As I said, try to generalise the query and identify under which category this use-case falls into. If the feature is not listed above, return #11 intent which is not supported. | |
Upon asking question, just return the intent name, nothing else. Dont focus on answering the query. Just try to generalise the question or query and return the intents that match. | |
Example of chain of thought: | |
User: I want to blur the faces of females in the video stream | |
Think: It will require running a model that can detect faces. Then user will want to run a classification model to classify the gender of the detected faces. | |
And then user will want to apply some overlay effects to the video stream to blur the faces of females. | |
So it will require running two models one after the other. Hence it will require model pipelining. And also overlay effects to blur the faces. | |
Hence the intents applicable here are #4 model_pipelining and #6 overlay_effects. | |
Especially pay attention if user request requires using multiple models or model pipelines. Analyse the use-case of the user and think about the implementation to | |
identify the intents. | |
Response: | |
{{ | |
"intents": ["model_pipelining", "overlay_effects"] | |
}} | |
Only return the list of intent names based on provided schema in JSON format. If there in only one intent, return it as a list. | |
''' | |
get_answer_system_prompt = ''' | |
Degirum PySDK is a python based SDK that is built to support AI inference of computer vision models on multiple | |
AI accelerators like Intel Openvino, Hailo, Memryx, N2X etc. The software stack provides unified way to run inference on | |
any model across our model zoo or from a local model zoo and perform post processing over it. We are building a system to | |
generate answer for our users questions on PySDK. Based on user's query, we have a system that identifies which examples | |
or part of documentation can help generate answer for user query. So I will provide you example codes with comments and | |
documentations, your job is to analyse the examples and understand it conceptually. Based on that, identify what user is | |
looking for and generate code accordingly. User's request may not be exact, but it can be derived from the example code | |
that I pass to you as reference. Also, you can use commonly used opencv functions or other python frameworks to help | |
satisfy user request, but dont try to make up any degirum PySDK functions which are not in the example references. Be very | |
careful with the syntax while using degirum_tools functions. Dont make up anything, use exactly same syntax as per the examples. | |
Try to use PySDK code as much as possible. Try to generalize user's question to help better. | |
Model name and device name can be left blank unless user asks for something specific. Result object consists of following schema - | |
InferenceResults objects contain the following data: | |
degirum.postprocessor.InferenceResults.image: Original input image as a NumPy array or PIL image. | |
degirum.postprocessor.InferenceResults.image_overlay: Original image with inference results drawn on top. The drawing is model-dependent: | |
Classification models: class labels with probabilities are printed below the original image. | |
Object detection models: bounding boxes are printed on the original image. | |
Hand and pose detection models: keypoints and keypoint connections are printed on the original image. | |
Segmentation models: segments are printed on the original image. | |
degirum.postprocessor.InferenceResults.results: Keeps a list of inference results in dictionary form. Follow the property link for detailed explanation of all result formats. | |
degirum.postprocessor.InferenceResults.image_model: Preprocessed image tensor that was fed into the model (in binary form). Populated only if you enable Model.save_model_image before performing predictions. | |
The results property is what you will typically use in your code. This property contains the core prediction data. Note that if the model outputs coordinates (e.g., bounding boxes), these have been converted back to the coordinates of the original image for your convenience. | |
I will attach example references and ask users question, your job is to generate a response along with code as instructed above. ''' | |
get_txt_files = { | |
"hello_world_of_pysdk": "text_files/001_quick_start.txt", | |
"single_model_inference": "text_files/object_detection_image.txt", | |
"running_yolo_models": "text_files/002_yolov8.txt", | |
"model_pipelining": "text_files/007_model_pipelining.txt", | |
"class_filtering": "text_files/object_detection_class_filtering.txt", | |
"overlay_effects": "text_files/013_overlay_effects.txt", | |
"running_inference_on_video": "text_files/object_detection_video_stream.txt", | |
"person_re_identification_or_extracting_embeddings": "text_files/014_person_reid.txt", | |
"zone_counting": "text_files/009_zone_counting.txt", | |
"custom_video_source": "text_files/016_custom_video_source.txt", | |
} | |
intent_description_map = { | |
"hello_world_of_pysdk": "Basic setup, setting up account on DeGirum AI hub, generate dg_token, and run a simple example where you load a model from degirum model zoo, and just pass an example image and see output. Any question asked about setup, what is PySDK, generating tokens or starting with pysdk, a beginner friendly example, then select this intent.", | |
"single_model_inference": "Running an image on a single model either from degirum AI Hub model zoo or local model zoo. A basic example where you pass an image to the model and get results. It can be a segmentation, object detection, classification or pose estimation model and result return type will be different for each. Any question on simple example to run a single AI model on images only, choosing inference device, display and printing results etc should fall under this intent.", | |
"running_yolo_models": "This intent includes running different flavours of yolo models like yolo v5, v8, v11 β object detection, pose estimation, classification etc. This intent is similar to above intent but is specific to yolo models. It also includes selecting different inference option like cloud or local and visualize the output on images. If user asks about a use-case on images, for eg. face detection or car detection etc which you think can be fulfilled using any of the above models across any flavour (coco dataset), select this intent.", | |
"model_pipelining": "This intent includes running two or multiple models one after the other in pipeline way to achieve something. For eg. for emotion classification usecase, we first run face detection model to extract faces and then run emotion classification model over it. So any query or usecase in the query is related to running two or models in pipeline mode, select this intent.", | |
"class_filtering": "If a model has multiple classes but you only want to detect a particular class, there is a way to do it in degirum pysdk. So any question or query where user wants to detect only a particular set of classes out of all the classes model is trained on, select this intent. For eg. someone is using coco model but wants to detect only person and car.", | |
"overlay_effects": "Degirum pysdk supports multiple overlay effects after we get results from the model like blurring the detected object, changing colour of the bounding box, changing size and position of the labels, changing font or showing probabilities, bounding box thickness etc. So any such query or use-case where some kind of overlay effects is required, use this intent.", | |
"running_inference_on_video": "This is similar to intent #2 and #3 where we show running model inference on images but here we show inference on videos. Video can be saved video file, webcam stream or RTSP URL. So any query about running inference on live camera feed or saved video files, we should select this intent. For most real world use-cases, we will be running inference on videos but while prototyping user may need to test on images. So choose between this intent and intent 2 and 3 carefully.", | |
"person_re_identification_or_extracting_embeddings": "There are some use-cases where user may need to extract embeddings from the image and store it or use it to calculate similarity etc. For eg. In person re-identification usecase, we extract face embeddings of a user and match it with the stored embeddings to identify if this is the same person. If user's query or use-case includes these kind of mechanism, select this intent.", | |
"zone_counting": "Some use-cases includes selecting ROI for detection and then counting objects β for eg. counting number of people in a zone etc. For this, we have a specific function in DeGirum which allows you to draw a polygon for ROI and then count object (whichever class you specify) and returns that. If user query falls under this use-case, select this intent.", | |
"custom_video_source": "Sometimes user may want to modify the incoming video source (may it be live stream like RTSP or saved video file) and apply some pre-processing to it like rotating, cropping, changing colour channels etc maybe to enhance detection. DeGirum PySDK allows user to modify incoming video stream before passing to model predictions. So any use-case or query related to modifying video source falls under this intent.", | |
"not_supported": "Anything outside of the above intents will be something that degirum pysdk doesnt support. Dont try to forcibly put the query under any intent, if we dont support something its better to let user know this is not supported. This intent is specifically for avoiding hallucinations and throwing out made-up info to user even if we dont support that." | |
} | |