|
--- |
|
license: other |
|
license_name: aplux-model-farm-license |
|
license_link: https://aiot.aidlux.com/api/v1/files/license/model_farm_license_en.pdf |
|
pipeline_tag: image-classification |
|
tags: |
|
- AIoT |
|
- QNN |
|
--- |
|
|
|
 |
|
|
|
## ViT: Image Classification |
|
|
|
ViT (Vision Transformer) is a vision model introduced by Google in 2020, based on the Transformer architecture. Unlike traditional convolutional neural networks (CNNs), ViT divides an image into fixed-size patches, and these patches' linear embeddings are treated as a sequence, which is then input into the Transformer. ViT leverages self-attention mechanisms to capture long-range dependencies in the image, simplifying the process by eliminating convolutions. Although Transformers were initially designed for natural language processing tasks, ViT has shown excellent performance on image classification tasks, particularly when trained on large datasets like ImageNet. ViT's scalability allows it to handle larger image datasets and adapt to various vision tasks such as image classification and object detection. |
|
|
|
### Source model |
|
|
|
- Input shape: 224x224 |
|
- Number of parameters: 82.55M |
|
- Model size: 330.5M |
|
- Output shape: 1x1000 |
|
|
|
Source model repository: [ViT](https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py) |
|
|
|
## Performance Reference |
|
|
|
Please search model by model name in [Model Farm](https://aiot.aidlux.com/en/models) |
|
|
|
## Inference & Model Conversion |
|
|
|
Please search model by model name in [Model Farm](https://aiot.aidlux.com/en/models) |
|
|
|
## License |
|
|
|
- Source Model: [BSD-3-CLAUSE](https://github.com/pytorch/vision/blob/main/LICENSE) |
|
|
|
- Deployable Model: [APLUX-MODEL-FARM-LICENSE](https://aiot.aidlux.com/api/v1/files/license/model_farm_license_en.pdf) |
|
|