|
## Tutorial: VISTA2D Model Creation |
|
|
|
This tutorial will guide the users to setting up all the datasets, running pre-processing, creation of organized json file lists which can be provided to VISTA-2D training pipeline. |
|
Some datasets need to be manually downloaded, others will be downloaded by a provided script. Please do not manually unzip any of the downloaded files, it will be automatically handled in the final step. |
|
|
|
### List of Datasets |
|
1.) [Cellpose](https://www.cellpose.org/dataset) |
|
|
|
2.) [TissueNet](https://datasets.deepcell.org/login) |
|
|
|
3.) [Kaggle Nuclei Segmentation](https://www.kaggle.com/c/data-science-bowl-2018/data) |
|
|
|
4.) [Omnipose - OSF repository](https://osf.io/xmury/) |
|
|
|
5.) [NIPS Cell Segmentation Challenge](https://neurips22-cellseg.grand-challenge.org/) |
|
|
|
6.) [LiveCell](https://sartorius-research.github.io/LIVECell/) |
|
|
|
7.) [Deepbacs](https://github.com/HenriquesLab/DeepBacs/wiki/Segmentation) |
|
|
|
Datasets 1-4 need to be manually downloaded, instructions to download them have been provided below. |
|
|
|
### Manual Dataset Download Instructions |
|
#### 1.) Cellpose: |
|
The dataset can be downloaded from this [link](https://www.cellpose.org/dataset). Please see below screenshots to assist in downloading it |
|
 |
|
Please enter your email and accept terms and conditions to download the dataset. |
|
|
|
 |
|
Click on train.zip and test.zip to download both directories independently. They both need to be placed in a `cellpose_dataset` directory. The `cellpose_dataset` will have to be created by the user in the root data directory. |
|
|
|
#### 2.) TissueNet |
|
Login credentials have to be created at below provided link. Please see below screenshots for further assistance. |
|
|
|
 |
|
Please create an account at the provided [link](https://datasets.deepcell.org/login). |
|
|
|
 |
|
After logging in, the above page will be visible, please make sure that version 1.0 is selected for TissueNet before clicking on download button. |
|
All the downloaded files need to be placed in a `tissuenet_dataset` directory, this directory has to be created by the user. |
|
|
|
#### 3.) Kaggle Nuclei Segmentation |
|
Kaggle credentials are required in order to access this dataset at this [link](https://www.kaggle.com/c/data-science-bowl-2018/data), the user will have to register for the challenge to access and download the dataset. |
|
Please refer below screenshots for additional help. |
|
|
|
 |
|
The `Download All` button needs to be used so all files are downloaded, the files need to be placed in a directory created by the user `kaggle_dataset`. |
|
|
|
#### 4.) Omnipose |
|
The Omnipose dataset is hosted on an [OSF repository](https://osf.io/xmury/) and the dataset part needs to be downloaded from it. Please refer below screenshots for further assistance. |
|
|
|
 |
|
The `datasets` directory needs to be selected as highlighted in the screenshot, then `download as zip` needs to be pressed for downloading the dataset. The user will have to place all the files in |
|
a user created directory named `omnipose_dataset`. |
|
|
|
### The remaining datasets will be downloaded by a python script. |
|
To run the script use the following example command `python all_file_downloader.py --dir provide_the_same_root_data_path` |
|
|
|
After completion of downloading of all datasets, below is how the data root directory should look: |
|
|
|
 |
|
|
|
### Process the downloaded data |
|
To execute VISTA-2D training pipeline, some datasets require label conversion. Please use the `root_data_path` as the input to the script, example command to execute the script is given below: |
|
|
|
`python generate_json.py --dir provide_the_same_root_data_path` |
|
|
|
### Generation of Json data lists (Optional) |
|
If one desires to generate JSON files from scratch, `generate_json.py` script performs both processing and creation of JSON files. |
|
To execute VISTA-2D training pipeline, some datasets require label conversion and then a json file list which the VISTA-2D training uses a format. |
|
Creating the json lists from the raw dataset sources, please use the `root_data_path` as the input to the script, example command to execute the script is given below: |
|
|
|
`python generate_json.py --dir provide_the_same_root_data_path` |
|
|