File size: 2,285 Bytes
6613891
d4d94d2
 
6613891
79c27a2
 
 
6613891
79c27a2
d4d94d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6613891
 
d4d94d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
title: Phronesis Medical Report Generator
emoji: 🧠
colorFrom: green
colorTo: gray
sdk: gradio
app_file: app.py
pinned: false
short_description: 'REPORT GEN AND CLASSIFICATION MODEL '

---

# 🧠 Phronesis: Medical Image Diagnosis & Report Generator

**Phronesis** is a multimodal AI tool that classifies medical CT scan images (DICOM or standard formats) and generates diagnostic reports using a combination of video classification and medical language generation.

---

## πŸš€ Demo

Upload a set of DICOM (`.dcm`, `.ima`) or image (`.png`, `.jpg`) files representing slices of a CT scan. The model will:

- 🏷️ Predict a class: **acute**, **normal**, **chronic**, or **lacunar**
- πŸ“‹ Generate a short **radiology report**

[Live App β†’](https://huggingface.co/spaces/baliddeki/phronesis-ml-endpoint)

---

## πŸ—οΈ Model Architecture

- **Vision Backbone**: `3D ResNet-18` pretrained on Kinetics-400
- **Language Head**: `BioBART v2` (pretrained biomedical seq2seq model)
- **Bridge Module**: Custom `ImageToTextProjector` to align visual features with the language model
- **CombinedModel**: Unified architecture for classification + report generation

---

## πŸ§ͺ Tasks

- **Image Classification**: Categorizes brain CT scans into one of four classes.
- **Report Generation**: Produces diagnostic text conditioned on image features.

---

## πŸ–ΌοΈ Input Format

- Minimum 1, maximum ~30 image slices per scan.
- Acceptable file formats:
  - DICOM (`.dcm`, `.ima`)
  - PNG, JPEG

The model will sample or pad the series to 16 frames for temporal context.

---

## πŸ“¦ Dependencies

This app uses:
- `torch`
- `transformers`
- `torchvision`
- `huggingface_hub`
- `pydicom`
- `gradio`
- `PIL`, `numpy`

---

## πŸ” Notes

- This demo loads a private model from the Hugging Face Hub. Set your `HF_TOKEN` as a secret for the space if needed.
- Do **not use for real clinical decisions** – intended for research/demo only.

---

## πŸ™‹β€β™‚οΈ Credits

Developed by [@baliddeki](https://huggingface.co/baliddeki)

Model weights: [`baliddeki/phronesis-ml`](https://huggingface.co/baliddeki/phronesis-ml)  
Language model: [`GanjinZero/biobart-v2-base`](https://huggingface.co/GanjinZero/biobart-v2-base)

---

## πŸ“„ License

MIT or Apache 2.0 (add yours here)