download_preprocessor/readme.md · MONAI/vista2d at main

Tutorial: VISTA2D Model Creation

This tutorial will guide the users to setting up all the datasets, running pre-processing, creation of organized json file lists which can be provided to VISTA-2D training pipeline. Some datasets need to be manually downloaded, others will be downloaded by a provided script. Please do not manually unzip any of the downloaded files, it will be automatically handled in the final step.

List of Datasets

1.) Cellpose

2.) TissueNet

3.) Kaggle Nuclei Segmentation

4.) Omnipose - OSF repository

5.) NIPS Cell Segmentation Challenge

6.) LiveCell

7.) Deepbacs

Datasets 1-4 need to be manually downloaded, instructions to download them have been provided below.

Manual Dataset Download Instructions

1.) Cellpose:

The dataset can be downloaded from this link. Please see below screenshots to assist in downloading it Please enter your email and accept terms and conditions to download the dataset.

Click on train.zip and test.zip to download both directories independently. They both need to be placed in a cellpose_dataset directory. The cellpose_dataset will have to be created by the user in the root data directory.

2.) TissueNet

Please create an account at the provided link.

After logging in, the above page will be visible, please make sure that version 1.0 is selected for TissueNet before clicking on download button. All the downloaded files need to be placed in a tissuenet_dataset directory, this directory has to be created by the user.

3.) Kaggle Nuclei Segmentation

Kaggle credentials are required in order to access this dataset at this link, the user will have to register for the challenge to access and download the dataset. Please refer below screenshots for additional help.

The Download All button needs to be used so all files are downloaded, the files need to be placed in a directory created by the user kaggle_dataset.

4.) Omnipose

The Omnipose dataset is hosted on an OSF repository and the dataset part needs to be downloaded from it. Please refer below screenshots for further assistance.

The datasets directory needs to be selected as highlighted in the screenshot, then download as zip needs to be pressed for downloading the dataset. The user will have to place all the files in a user created directory named omnipose_dataset.

The remaining datasets will be downloaded by a python script.

To run the script use the following example command python all_file_downloader.py --dir provide_the_same_root_data_path

After completion of downloading of all datasets, below is how the data root directory should look:

Process the downloaded data

To execute VISTA-2D training pipeline, some datasets require label conversion. Please use the root_data_path as the input to the script, example command to execute the script is given below:

python generate_json.py --dir provide_the_same_root_data_path

Generation of Json data lists (Optional)

If one desires to generate JSON files from scratch, generate_json.py script performs both processing and creation of JSON files. To execute VISTA-2D training pipeline, some datasets require label conversion and then a json file list which the VISTA-2D training uses a format. Creating the json lists from the raw dataset sources, please use the root_data_path as the input to the script, example command to execute the script is given below:

python generate_json.py --dir provide_the_same_root_data_path