| How the data got here: | |
| 1. Downloading the [DocLayNet_core.zip dataset](https://codait-cos-dax.s3.us.cloud-object-storage.appdomain.cloud/dax-doclaynet/1.0.0/DocLayNet_core.zip) using the `wget`. | |
| 2. Uploaded the downloaded `DocLayNet_core.zip` file to Hugging Face space. | |
| 3. Loading a dataset (`chainyo/rvl-cdip-invoice`) using the `load_dataset` function from the Hugging Face library. | |
| 5. Extracting and saving each image from the `train` portion of the loaded dataset into the `RVL-CDIP-invoice` directory. | |
| 7. Compressing the `RVL-CDIP-invoice` directory into a zip file (`RVL-CDIP-invoice.zip`) using the `zip` command. | |
| 8. Uploading the zip file to Hugging Face space. | |
| ``` | |
| # # !wget https://codait-cos-dax.s3.us.cloud-object-storage.appdomain.cloud/dax-doclaynet/1.0.0/DocLayNet_core.zip -O DocLayNet_core.zip | |
| # upload_to_huggingface_space( | |
| # space_name = HUGGINGFACE_SPACE_NAME, | |
| # private = True, | |
| # path_or_fileobj = './DocLayNet_core.zip', | |
| # path_in_repo = 'data/raw/DocLayNet_core.zip') | |
| # | |
| # | |
| # invoices = load_dataset('chainyo/rvl-cdip-invoice') | |
| # # can also be found at: https://huggingface.co/datasets/aharley/rvl_cdip | |
| # os.mkdir('./RVL-CDIP-invoice') | |
| # for index, invoice in enumerate(tqdm(invoices['train'])): | |
| # invoice['image'].save(f'./RVL-CDIP-invoice/{index}.png', format="png") | |
| # | |
| # !ls ./RVL-CDIP-invoice -1 | wc -l | |
| # !zip -r RVL-CDIP-invoice.zip ./RVL-CDIP-invoice | |
| # | |
| # upload_to_huggingface_space( | |
| # space_name = HUGGINGFACE_SPACE_NAME, | |
| # private = True, | |
| # path_or_fileobj = './RVL-CDIP-invoice.zip', | |
| # path_in_repo = f'data/raw/RVL-CDIP-invoice.zip') | |
| ``` |