Luis Oala
commited on
Commit
·
603bb17
1
Parent(s):
a1d6a9c
yaml in readme
Browse files- README.md +9 -0
- README.md~ +120 -0
README.md
CHANGED
@@ -1,3 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
[](https://github.com/tterb/atomic-design-ui/blob/master/LICENSEs) [](https://doi.org/10.5281/zenodo.5235536)
|
2 |
|
3 |
# Dataset Drift Controls Using Raw Image Data and Differentiable ISPs: From Raw to Logit
|
|
|
1 |
+
---
|
2 |
+
title: raw2logit
|
3 |
+
colorFrom: blue
|
4 |
+
colorTo: blue
|
5 |
+
sdk: gradio
|
6 |
+
app_file: app.py
|
7 |
+
pinned: true
|
8 |
+
---
|
9 |
+
|
10 |
[](https://github.com/tterb/atomic-design-ui/blob/master/LICENSEs) [](https://doi.org/10.5281/zenodo.5235536)
|
11 |
|
12 |
# Dataset Drift Controls Using Raw Image Data and Differentiable ISPs: From Raw to Logit
|
README.md~
ADDED
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[](https://github.com/tterb/atomic-design-ui/blob/master/LICENSEs) [](https://doi.org/10.5281/zenodo.5235536)
|
2 |
+
|
3 |
+
# Dataset Drift Controls Using Raw Image Data and Differentiable ISPs: From Raw to Logit
|
4 |
+
|
5 |
+
<!-- *This anonymous repository hosts the code for manuscript #4471 "Dataset Drift Controls Using Raw Image Data and Differentiable ISPs: From Raw to Logit", submitted to CVPR 2022.* -->
|
6 |
+
|
7 |
+
## A short introduction
|
8 |
+
Two ingredients are required for the **Raw2Logit** dataset drift controls: raw sensor data and an image processing model. This code repository contains the materials for the second ingredient, the image processing model, as well as scripts to load lada and run experiments.
|
9 |
+
|
10 |
+

|
11 |
+
|
12 |
+
To create an image, raw sensor data traverses complex image signal processing (ISP) pipelines. These pipelines are used by cameras and scientific instruments to produce the images fed into machine learning systems. The processing pipelines vary by device, influencing the resulting image statistics and ultimately contributing to dataset drift. However, this processing is rarely considered in machine learning modelling. In this study, we examine the role raw sensor data and differentiable processing models can play in controlling performance risks related to dataset drift. The findings are distilled into three applications.
|
13 |
+
|
14 |
+
1. **Drift forensics** can be used to isolate performance-sensitive data processing configurations which should be avoided during deployment of a machine learning model
|
15 |
+
2. **Drift synthesis** enables the controlled generation of drift test cases. The experiments presented here show that the average decrease in model performance is ten to four times less severe than under post-hoc perturbation testing
|
16 |
+
3. **Drift adjustment** opens up the possibility for processing adjustments in the face of drift
|
17 |
+
|
18 |
+
We make available two data sets.
|
19 |
+
1. **Raw-Microscopy**, contains
|
20 |
+
* **940 raw bright-field microscopy images** of human blood smear slides for leukocyte classification alongside
|
21 |
+
* **5,640 variations measured at six different intensities** and twelve additional sets totalling
|
22 |
+
* **11,280 images of the raw sensor data processed through different pipelines**.
|
23 |
+
3. **Raw-Drone**, contains
|
24 |
+
* **548 raw drone camera images** for car segmentation, alongside
|
25 |
+
* **3,288 variations measured at six different intensities** and also twelve additional sets totalling
|
26 |
+
* **6,576 images of the raw sensor data processed through different pipelines**.
|
27 |
+
## Data access
|
28 |
+
If you use our code you can use the convenient cloud storage integration. Data will be loaded automatically from a cloud storage bucket and stored to your working machine. You can find the code snippet doing that here:
|
29 |
+
|
30 |
+
```python
|
31 |
+
def get_b2_bucket():
|
32 |
+
bucket_name = 'perturbed-minds'
|
33 |
+
application_key_id = '003d6b042de536a0000000008'
|
34 |
+
application_key = 'K003HMNxnoa91Dy9c0V8JVCKNUnwR9U'
|
35 |
+
info = InMemoryAccountInfo()
|
36 |
+
b2_api = B2Api(info)
|
37 |
+
b2_api.authorize_account('production', application_key_id, application_key)
|
38 |
+
bucket = b2_api.get_bucket_by_name(bucket_name)
|
39 |
+
return bucket
|
40 |
+
```
|
41 |
+
We also maintain a copy of the entire dataset with a permanent identifier at Zenodo which you can find here [](https://doi.org/10.5281/zenodo.5235536).
|
42 |
+
## Code
|
43 |
+
### Dependencies
|
44 |
+
#### Conda environment and dependencies
|
45 |
+
To run this code out-of-the-box you can install the latest project conda environment stored in `environment.yml`
|
46 |
+
```console
|
47 |
+
$ conda env create -f environment.yml
|
48 |
+
```
|
49 |
+
#### segmentation_models_pytorch newest version
|
50 |
+
We noticed that PyPi package for `segmentation_models_pytorch` is sometimes behind the project's github repository. If you encounter `smp` related problems we reccomend installing directly from the `smp` reposiroty via
|
51 |
+
```console
|
52 |
+
$ python -m pip install git+https://github.com/qubvel/segmentation_models.pytorch
|
53 |
+
```
|
54 |
+
#### mlflow tracking
|
55 |
+
Note that we are maintaining a collaborative mlflow virtual lab server. The tracking API is integrated into the code. By default, anyone has read access to e.g. browse results and fetch trained, stored models. For the purpose of anonymization the link to the tracking server info is removed here as it contains identfiable information of persons who submitted jobs. You can setup your own mlflow server for the purposes of this anonymized version of code or disable mlflow tracking and use `train.py` without the virtual lab log.
|
56 |
+
### Recreate experiments
|
57 |
+
The central file for using the **Raw2Logit** framework for experiments as in the paper is `train.py` which provides a rich set of arguments to experiment with raw image data, different image processing models and task models for regression or classification. Below we provide three example prompts for the three experiments reported in the manuscript
|
58 |
+
|
59 |
+
#### Drift forensics
|
60 |
+
```console
|
61 |
+
$ python train.py \
|
62 |
+
--experiment_name YOUR-EXPERIMENT-NAME \
|
63 |
+
--run_name YOUR-RUN-NAME \
|
64 |
+
--dataset Microscopy \
|
65 |
+
--adv_training
|
66 |
+
--lr 1e-5 \
|
67 |
+
--n_splits 5 \
|
68 |
+
--epochs 5 \
|
69 |
+
--classifier_pretrained \
|
70 |
+
--processing_mode parametrized \
|
71 |
+
--augmentation weak \
|
72 |
+
--log_model True \
|
73 |
+
--iso 0.01 \
|
74 |
+
--track_processing \
|
75 |
+
--track_every_epoch \
|
76 |
+
--track_predictions \
|
77 |
+
--track_processing_gradients \
|
78 |
+
--track_save_tensors \
|
79 |
+
```
|
80 |
+
#### Drift synthesis
|
81 |
+
```console
|
82 |
+
$ python train.py \
|
83 |
+
--experiment_name YOUR-EXPERIMENT-NAME \
|
84 |
+
--run_name YOUR-RUN-NAME \
|
85 |
+
--dataset Microscopy \
|
86 |
+
--lr 1e-5 \
|
87 |
+
--n_splits 5 \
|
88 |
+
--epochs 5 \
|
89 |
+
--classifier_pretrained \
|
90 |
+
--processing_mode static \
|
91 |
+
--augmentation weak \
|
92 |
+
--log_model True \
|
93 |
+
--iso 0.01 \
|
94 |
+
--freeze_processor \
|
95 |
+
--track_processing \
|
96 |
+
--track_every_epoch \
|
97 |
+
--track_predictions \
|
98 |
+
--track_processing_gradients \
|
99 |
+
--track_save_tensors \
|
100 |
+
```
|
101 |
+
#### Drfit adjustments
|
102 |
+
```console
|
103 |
+
$ python train.py \
|
104 |
+
--experiment_name YOUR-EXPERIMENT-NAME \
|
105 |
+
--run_name YOUR-RUN-NAME \
|
106 |
+
--dataset Microscopy \
|
107 |
+
--lr 1e-5 \
|
108 |
+
--n_splits 5 \
|
109 |
+
--epochs 5 \
|
110 |
+
--classifier_pretrained \
|
111 |
+
--processing_mode parametrized \
|
112 |
+
--augmentation weak \
|
113 |
+
--log_model True \
|
114 |
+
--iso 0.01 \
|
115 |
+
--track_processing \
|
116 |
+
--track_every_epoch \
|
117 |
+
--track_predictions \
|
118 |
+
--track_processing_gradients \
|
119 |
+
--track_save_tensors \
|
120 |
+
```
|