Spaces:
Configuration error
Image Classification Finetuning and Inference
Solution Technical Overview
The vision fine-tuning (transfer learning) and inference workflow demonstrates Image Classification workflows/pipelines using Intel® Transfer Learning Tool to be run along with Intel-optimized software represented using toolkits, domain kits, packages, frameworks and other libraries for effective use of Intel hardware leveraging Intel's AI instructions for fast processing and increased performance. The workflows can be easily used by applications or reference kits showcasing usage.
The workflow supports:
Image Classification Finetuning
Image Classification Inference
Table of Contents
- Overview
- [Dataset] (#dataset)
- Validated Hardware Details
- Software Requirements
- How it Works?
- Get Started
- Run Using Docker
- Run Using Bare Metal
- Expected Output
- Learn More
- Support
Overview
The vision workflow aims to train an image classifier that takes in contrast-enhanced spectral mammography (CESM) images. The pipeline creates prediction for the diagnosis of breast cancer. The goal is to minimize an expert’s involvement in categorizing samples as normal, benign, or malignant, by developing and optimizing a decision support system that automatically categorizes the CESM with the help of radiologist
Dataset
The dataset is a collection of 2,006 high-resolution contrast-enhanced spectral mammography (CESM) images (1003 low energy images and 1003 subtracted CESM images) with annotations of 326 female patients. See Figure-1. Each patient has 8 images, 4 representing each side with two views (Top Down looking and Angled Top View) consisting of low energy and subtracted CESM images. Medical reports, written by radiologists, are provided for each case along with manual segmentation annotation for the abnormal findings in each image. As a preprocessing step, we segment the images based on the manual segmentation to get the region of interest and group annotation notes based on the subject and breast side.
Figure-1: Samples of low energy and subtracted CESM images and Medical reports, written by radiologists from the Categorized contrast enhanced mammography dataset. (Khaled, 2022)
For more details of the dataset, visit the wikipage of the CESM and read Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research.
Validated Hardware Details
There are workflow-specific hardware and software setup requirements depending on how the workflow is run. Bare metal development system and Docker image running locally have the same system requirements.
Recommended Hardware | Precision |
---|---|
Intel® 4th Gen Xeon® Scalable Performance processors | BF16 |
Intel® 1st, 2nd, 3rd, and 4th Gen Xeon® Scalable Performance processors | FP32 |
To execute the reference solution presented here, use CPU for fine tuning.
Software Requirements
Linux OS (Ubuntu 22.04) is used to validate this reference solution. Make sure the following dependencies are installed.
sudo apt-get update
sudo apt-get install -y build-essential gcc git libgl1-mesa-glx libglib2.0-0 python3-dev
sudo apt-get install python3.9 python3-pip
virtualenv
through python3-venv or condapip install dataset-librarian
How It Works?
The Vision reference Implementation component uses Intel Transfer Learning Toolkit based vision workload, which is optimized for image fine-tuning and inference. This workload uses Tensorflowhub's ResNet-50 model to fine-tune a new convolutional neural network model with subtracted CESM image dataset. The images are preprocessed by using domain expert-defined segmented regions to reduce redundancies during training.
Get Started
Download the repository
git clone https://github.com/IntelAI/transfer-learning.git vision_workflow cd vision_workflow/workflows/disease_prediction
Create a new python environment
conda create -n <env name> python=3.9
conda activate <env name>
Download and Preprocess the Datasets
Use the links below to download the image datasets. Or skip to the Docker section to download the dataset using a container.
Once you have downloaded the image files and placed them into the data directory, proceed by executing the following command. This command will initiate the download of segmentation and annotation data, followed by the application of segmentation and preprocessing operations.
Command-line Interface:
- -d : Directory location where the raw dataset will be saved on your system. It's also where the preprocessed dataset files will be written. If not set, a directory with the dataset name will be created.
- --split_ratio: Split ratio of the test data, the default value is 0.1.
More details of the dataset_librarian can be found here.
python -m dataset_librarian.dataset -n brca --download --preprocess -d data/ --split_ratio 0.1
Note: See this dataset's applicable license for terms and conditions. Intel Corporation does not own the rights to this dataset and does not confer any rights to it.
Run Using Bare Metal
Install package for running vision-finetuning-inference--workflows
pip install -r requirements.txt
Note: Configure the right configurations in the config.yaml
The 'config.yaml' file includes the following parameters:
- args:
- dataset_dir: contains the path for dataset_dir
- finetune_output: saves results of finetuning in a yaml file
- inference_output : saves the results of the model on test data in the yaml file
- model: Pretrained model name (default resnetv150)
- finetune: runs vision fine-tuning
- inference: runs inference only if set to true , if false finetunes the model before inference
- saved_model_dir: Directory where trained model gets saved
- training_args:
- batch_size: Batch size for training ( default 32)
- bf16: Enable BF16 by default
- epochs: Number of epochs for training
- output_dir: Output of training model
python src/run.py --config_file config/config.yaml
Run Using Docker
1. Set Up Docker Engine And Docker Compose
You'll need to install Docker Engine on your development system. Note that while Docker Engine is free to use, Docker Desktop may require you to purchase a license. See the Docker Engine Server installation instructions for details.
To build and run this workload inside a Docker Container, ensure you have Docker Compose installed on your machine. If you don't have this tool installed, consult the official Docker Compose installation documentation.
DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker}
mkdir -p $DOCKER_CONFIG/cli-plugins
curl -SL https://github.com/docker/compose/releases/download/v2.7.0/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose
chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose
docker compose version
2. Set Up Docker Image
Pull the provided docker image.
docker pull intel/ai-workflows:pa-vision-tlt-disease-prediction
3. Run With Docker Compose
cd docker
export CONFIG=<config_file_name_without_.yaml>
docker compose run dev
Environment Variable Name | Default Value | Description |
---|---|---|
CONFIG | n/a | Config file name |
4. Clean Up Docker Container
Stop container created by docker compose and remove it.
docker compose down
Expected Output
Successful run should dump the output in a yaml file . The ouput would look like this
label:
- Benign
- Malignant
- Normal
label_id:
- 0
- 1
- 2
metric:
acc: 0.7163363099098206
loss: 0.6603914499282837
results:
P100_R_CM_CC.jpg:
- label: Normal
pred: Normal
pred_prob:
- 0.3331606984138489
- 0.28302037715911865
- 0.38381892442703247
P100_R_CM_MLO.jpg:
- label: Normal
pred: Benign
pred_prob:
- 0.38328817486763
- 0.2962200343608856
- 0.320491760969162
How to customize this use case
Tunable configurations and parameters are exposed using yaml config files allowing users to change model training hyperparameters, datatypes, paths, and dataset settings without having to modify or search through the code.
Adopt to your dataset
To deploy this reference use case on a different or customized dataset, you can easily modify the config.yaml file. For instance, if you have a new dataset, simply update the paths of finetune_output and inference_output and adjust the dataset features in the config.yaml file, as demonstrated below.
dataset_dir: --> updated Dataset path
Adopt to your model
To implement this reference use case on a different or customized pre-training model, modifications to the config.yaml file are straightforward. For instance, to use an alternate model, one can update the path of the model by modifying the 'model' fields in the config.yaml file structure. The following example illustrates this process:
model --> new_model ( eg: efficienetnetb0)
Learn More
For more information or to read about other relevant workflow examples, see these guides and software resources:
Support
If you have any questions with this workflow, want help with troubleshooting, want to report a bug or submit enhancement requests, please submit a GitHub issue.