Spaces:
Running
Running
Upload 6 files
Browse files- knowledge_base/about.md +110 -0
- knowledge_base/experience_detailed.md +141 -0
- knowledge_base/publications_detailed.md +82 -0
- knowledge_base/research_details.md +45 -0
- knowledge_base/skills_expertise.md +142 -0
- knowledge_base/statistics.md +105 -0
knowledge_base/about.md
ADDED
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# [Raktim Mondol](https://mondol.me)
|
2 |
+
NSW, Australia | [email protected]
|
3 |
+
|
4 |
+
---
|
5 |
+
|
6 |
+
## SUMMARY & RESEARCH INTEREST
|
7 |
+
|
8 |
+
I am an experienced data scientist and programmer with deep expertise in artificial intelligence, generative AI (GenAI) techniques and large language models (LLMs), bioinformatics, computer vision, and high-performance computing. My research and professional background is centered on analyzing large-scale image and biomedical datasets, developing novel deep learning models, and conducting advanced statistical analyses. I am a dedicated and committed individual with a strong team-oriented spirit, a positive attitude, and exceptional interpersonal skills.
|
9 |
+
|
10 |
+
---
|
11 |
+
|
12 |
+
## EDUCATION
|
13 |
+
|
14 |
+
🎓 **PhD, Computer Science & Engineering** | 2021 - 2025
|
15 |
+
<br>UNSW, Sydney, Australia
|
16 |
+
<br>**Research Topic:** *Deep Learning For Breast Cancer Prognosis & Explainability*
|
17 |
+
<br>**◇ Thesis Submitted**
|
18 |
+
|
19 |
+
🎓 **Masters by Research, Computer Science & Bioinformatics** | 2017 - 2019
|
20 |
+
<br>RMIT University, Melbourne, Australia
|
21 |
+
<br>[High Distinction (85%)](https://www.myequals.net/sharelink/78e7c7d7-5a73-4e7c-9711-f163f5dd1604/af0d807a-8392-45be-9104-d26b95f5aa7a)
|
22 |
+
<br>**Research Thesis:** *[Deep learning in classifying cancer subtypes, extracting relevant genes and identifying novel mutations](https://research-repository.rmit.edu.au/articles/thesis/Deep_learning_in_classifying_cancer_subtypes_extracting_relevant_genes_and_identifying_novel_mutations/27589272?file=50759199)*
|
23 |
+
|
24 |
+
---
|
25 |
+
|
26 |
+
## WORK EXPERIENCE
|
27 |
+
|
28 |
+
🧑🏫 **Casual Academic** | July 2021 - Continuing
|
29 |
+
<br>Dept. of Computer Science & Engineering
|
30 |
+
<br>[UNSW](https://www.unsw.edu.au/), Sydney, NSW
|
31 |
+
<br>**Duties/Responsibilities:**
|
32 |
+
* Conduct Laboratory and Consultation Classes: Computer Vision, Neural Networks and Deep Learning, Artificial Intelligence
|
33 |
+
|
34 |
+
🧑🏫 **Teaching Assistant (Casual)** | July 2017 - Oct 2019
|
35 |
+
<br>Dept. of Electrical and Biomedical Engineering
|
36 |
+
<br>[RMIT University](https://www.rmit.edu.au/), Melbourne, VIC
|
37 |
+
<br>**Duties/Responsibilities:**
|
38 |
+
* Conducted Laboratory Classes: Electronics (EEET2255), Software Engineering Design (EEET2250), Engineering Computing I (EEET2246), Introduction to Embedded Systems (EEET2256).
|
39 |
+
|
40 |
+
🧑🏫 **Lecturer (Full-Time)** | September 2013 - December 2016
|
41 |
+
<br>Dept. of Electrical and Electronic Engineering
|
42 |
+
<br>[World University of Bangladesh (WUB)](https://wub.edu.bd/), Dhaka, Bangladesh
|
43 |
+
<br>**Duties/Responsibilities:**
|
44 |
+
* Courses Instructed (Theory): Electrical Circuit I, Electrical Circuit II, Engineering Materials, Electronics I, Electronics II, Digital Logic Design and Digital Electronics
|
45 |
+
* Courses Instructed (Laboratory): Microprocessor & Interfacing, Digital Electronics and Digital Signal Processing
|
46 |
+
* Supervised Students for Projects and Thesis
|
47 |
+
|
48 |
+
---
|
49 |
+
|
50 |
+
## RESEARCH EXPERIENCE
|
51 |
+
|
52 |
+
🔬 **Doctoral Researcher (Sydney, NSW, Australia)** | March 2021 – Jan 2025
|
53 |
+
<br>**[Biomedical Image Computing Research Group](https://imagescience.org/meijering/group/)**
|
54 |
+
* Developed AI models to assist pathologists in breast cancer identification and treatment recommendation.
|
55 |
+
|
56 |
+
🔬 **Master's Researcher (Melbourne, VIC, Australia)** | March 2017 – April 2019
|
57 |
+
<br>**[NeuroSyd Research Laboratory](https://sites.google.com/view/neurosyd/home)**
|
58 |
+
* Worked on developing a deep learning model and bio-informatics pipeline to extract bio-marker from high-throughput biological data.
|
59 |
+
|
60 |
+
---
|
61 |
+
|
62 |
+
## TECHNICAL SKILLS
|
63 |
+
|
64 |
+
* **Languages:** Python, R, SQL, LaTeX
|
65 |
+
* **Software:** MATLAB, STATA, SPSS, SAS, NCSS
|
66 |
+
* **Deep Learning Framework:** Tensorflow, Pytorch
|
67 |
+
* **Distributed & Cloud Computing:** AWS, GCP, GALAXY
|
68 |
+
* **Operating Systems:** Windows, Linux
|
69 |
+
* **IDE:** Spyder, Jupyter Notebook, VS Code, Rstudio
|
70 |
+
|
71 |
+
---
|
72 |
+
|
73 |
+
## AWARDS & RECOGNITION
|
74 |
+
|
75 |
+
* **2021:** Awarded PhD Scholarship (Tuition Fee and Stipend)
|
76 |
+
* **2019:** Completed Masters by Research with [High Distinction](https://drive.google.com/file/d/19ItaTbByg686UpoBMB7LcmWT8kfE1-fR/view?usp=sharing)
|
77 |
+
* **2017:** RMIT Research Stipend Scholarship
|
78 |
+
* **2017:** RMIT Research International Tuition Fee Scholarship
|
79 |
+
* **2013:** B.Sc. in Electrical and Electronic Engineering with High Distinction
|
80 |
+
* **2013:** [Vice Chancellor Award Spring 2013](https://drive.google.com/file/d/1VgqAWfSlHtm5OEepYtlB32kxdlV72W1g/view?usp=sharing), BRAC University
|
81 |
+
* **2010:** [Dean Award Fall 2010](https://drive.google.com/file/d/15G0CGXYdDrMdB93LKB90uICPeJMYoLub/view?usp=sharing), [Fall 2011](https://drive.google.com/file/d/1xawevXKfahsE2LUrLAoUTn5PLjDIjyHr/view?usp=sharing), BRAC University
|
82 |
+
|
83 |
+
---
|
84 |
+
|
85 |
+
## PARTICIPATED EVENTS
|
86 |
+
|
87 |
+
* **2019:** Received Training on [NGS RNA Seq. & DNA Seq.](https://drive.google.com/file/d/1kHxtVXS1oD8BjrSqP8lM9koNA4PsT8WB/view?usp=sharing) Data Analysis organized by ArrayGen
|
88 |
+
* **2017:** Presented [Poster](https://drive.google.com/file/d/1K64iv74oatvbMmQYNHpyJgoGDvqRoW_V/view?usp=sharing) in [AMSI BioinfoSummer](https://drive.google.com/file/d/12Y2haYCtShJuEV0lsqeAiJgKtuRKGo_c/view?usp=sharing) at Monash University
|
89 |
+
* **2017:** Presented Thesis in [3 Minute Thesis (3MT)](https://drive.google.com/file/d/1AYj6Yox5GH285b4M7hh7rTxn4OyiPwMm/view?usp=sharing) competition at RMIT University
|
90 |
+
* **2017:** Received Training on High Performance Computing (HPC) at Monash University
|
91 |
+
* **2017:** Symposium on Big Data in Infectious Diseases at University of Melbourne
|
92 |
+
* **2016:** Received Training on Research Methodology at World University
|
93 |
+
* **2013:** Presented Undergraduate Thesis in a Workshop Organized by [IEEE Bangladesh](https://drive.google.com/file/d/1PPs1qlOjDDSZIXmaXWAL66q-WBBlz4i6/view?usp=sharing)
|
94 |
+
|
95 |
+
---
|
96 |
+
|
97 |
+
## PUBLICATIONS
|
98 |
+
|
99 |
+
### JOURNAL PAPERS
|
100 |
+
* 📓 R. K. Mondol, E. K. A. Millar, P. H. Graham, L. Browne, A. Sowmya, and E. Meijering, ["GRAPHITE: Graph-Based Interpretable Tissue Examination for Enhanced Explainability in Breast Cancer Histopathology,"](https://arxiv.org/abs/2501.04206) (Submitted, Under Review), 2024.
|
101 |
+
* 📓 R. K. Mondol, E. K. A. Millar, and A. Sowmya, and E. Meijering, ["BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion,"](https://ieeexplore.ieee.org/document/10568932) in *IEEE Journal of Biomedical and Health Informatics*, 2024.
|
102 |
+
* 📓 R. K. Mondol, E. K. A. Millar, P. H. Graham, L. Browne, A. Sowmya, and E. Meijering, ["hist2RNA: An Efficient Deep Learning Architecture to Predict Gene Expression from Breast Cancer Histopathology Images,"](https://www.mdpi.com/2072-6694/15/9/2569) in *Cancers*, 2023.
|
103 |
+
* 📓 R. K. Mondol, N. D. Truong, M. Reza, S. Ippolito, E. Ebrahimie, and O. Kavehei, ["AFExNet: An Adversarial Autoencoder for Differentiating Breast Cancer Sub-types and Extracting Biologically Relevant Genes,"](https://ieeexplore.ieee.org/document/9378938) in *IEEE/ACM Transactions on Computational Biology and Bioinformatics*, 2021.
|
104 |
+
|
105 |
+
### CONFERENCE PROCEEDINGS
|
106 |
+
* 📄 R. K. Mondol, E. K. A. Millar, A. Sowmya, and E. Meijering, ["MM-Survnet: Deep Learning-Based Survival Risk Stratification in Breast Cancer Through Multimodal Data Fusion,"](https://doi.org/10.1109/ISBI56570.2024.10635810) in *2024 IEEE International Symposium on Biomedical Imaging (ISBI),* Athens, Greece, 2024, pp. 1-5.
|
107 |
+
* 📄 M.I. Khan, R. K. Mondol, M.A. Zamee, and T.A. Tarique, ["Hardware architecture design of anemia detecting regression model based on FPGA,"](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6850814&isnumber=6850678) in *International Conference on Informatics, Electronics Vision (ICIEV),* May 2014, pp. 1-5.
|
108 |
+
* 📄 Imran Khan, and R. K. Mondol, ["FPGA based leaf chlorophyll estimating regression model,"](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7083557&isnumber=7083385) in *International Conference on Software, Knowledge, Information Management and Applications (SKIMA),* December 2014, pp. 1-6.
|
109 |
+
* 📄 R. K. Mondol, Imran Khan, Md. A.K. Mahbubul Hye, and Asif Hassan, ["Hardware architecture design of face recognition system based on FPGA,"](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7193228&isnumber=7192777) in *International Conference on Innovations in Information Embedded and Communication Systems (ICIIECS),* March 2015, pp. 1-5.
|
110 |
+
* 📄 A. Hassan, R. K. Mondol, and M. R. Hasan, ["Computer network design of a company — A simplistic way,"](https://doi.org/10.1109/ICACCS.2015.7324121) in *2015 International Conference on Advanced Computing and Communication Systems (ICACCS),* Coimbatore, India, March 2015, pp. 1-4.
|
knowledge_base/experience_detailed.md
ADDED
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Detailed Professional Experience
|
2 |
+
|
3 |
+
## Current Position: Casual Academic at UNSW Sydney (July 2021 - Present)
|
4 |
+
|
5 |
+
### Role and Responsibilities
|
6 |
+
As a Casual Academic in the School of Computer Science and Engineering, Raktim contributes to undergraduate and postgraduate education while pursuing his PhD research.
|
7 |
+
|
8 |
+
**Teaching Duties**:
|
9 |
+
- Conduct laboratory sessions for computer science courses
|
10 |
+
- Lead tutorial classes on programming and algorithms
|
11 |
+
- Provide one-on-one mentoring to students
|
12 |
+
- Assist in course material development and updates
|
13 |
+
- Grade assignments and provide constructive feedback
|
14 |
+
|
15 |
+
**Courses Taught**:
|
16 |
+
- COMP1511: Programming Fundamentals
|
17 |
+
- COMP2521: Data Structures and Algorithms
|
18 |
+
- COMP3311: Database Systems
|
19 |
+
- COMP9417: Machine Learning and Data Mining
|
20 |
+
|
21 |
+
**Student Impact**:
|
22 |
+
- Mentored over 200 students across various courses
|
23 |
+
- Developed innovative teaching materials for complex concepts
|
24 |
+
- Received positive feedback for clear explanations and patient guidance
|
25 |
+
- Helped students transition from theoretical concepts to practical implementation
|
26 |
+
|
27 |
+
### Research Integration
|
28 |
+
- Incorporates current research findings into teaching materials
|
29 |
+
- Supervises undergraduate research projects
|
30 |
+
- Collaborates with faculty on curriculum development
|
31 |
+
- Organizes workshops on AI and machine learning topics
|
32 |
+
|
33 |
+
## Previous Role: Teaching Assistant at RMIT University (July 2017 - October 2019)
|
34 |
+
|
35 |
+
### Academic Responsibilities
|
36 |
+
During his Master's program, Raktim served as a Teaching Assistant, gaining valuable experience in higher education.
|
37 |
+
|
38 |
+
**Key Contributions**:
|
39 |
+
- Conducted weekly laboratory sessions for 50+ students
|
40 |
+
- Assisted in course delivery for computer science subjects
|
41 |
+
- Developed supplementary learning materials
|
42 |
+
- Provided technical support for programming assignments
|
43 |
+
|
44 |
+
**Courses Supported**:
|
45 |
+
- Introduction to Programming (Java, Python)
|
46 |
+
- Data Structures and Algorithms
|
47 |
+
- Database Systems
|
48 |
+
- Software Engineering Fundamentals
|
49 |
+
|
50 |
+
**Skills Developed**:
|
51 |
+
- Effective communication of complex technical concepts
|
52 |
+
- Patience and adaptability in teaching diverse student groups
|
53 |
+
- Time management and organizational skills
|
54 |
+
- Collaborative work with academic staff
|
55 |
+
|
56 |
+
### Research Activities
|
57 |
+
- Conducted literature reviews for research projects
|
58 |
+
- Participated in research group meetings
|
59 |
+
- Presented findings at internal seminars
|
60 |
+
- Collaborated on data collection and analysis
|
61 |
+
|
62 |
+
## Early Career: Lecturer at World University of Bangladesh (September 2013 - December 2016)
|
63 |
+
|
64 |
+
### Full-Time Academic Position
|
65 |
+
After completing his Bachelor's degree, Raktim joined as a full-time Lecturer in the Department of Computer Science and Engineering.
|
66 |
+
|
67 |
+
**Teaching Portfolio**:
|
68 |
+
- **Programming Courses**: C, C++, Java, Python programming
|
69 |
+
- **Core CS Subjects**: Data Structures, Algorithms, Database Systems
|
70 |
+
- **Mathematics**: Discrete Mathematics, Statistics for CS
|
71 |
+
- **Specialized Topics**: Computer Networks, Operating Systems
|
72 |
+
|
73 |
+
**Administrative Duties**:
|
74 |
+
- Course coordinator for multiple subjects
|
75 |
+
- Examination committee member
|
76 |
+
- Student advisor and mentor
|
77 |
+
- Curriculum development participant
|
78 |
+
|
79 |
+
### Student Supervision
|
80 |
+
- **Thesis Supervision**: Guided 15+ undergraduate thesis projects
|
81 |
+
- **Project Mentoring**: Supervised capstone projects in software development
|
82 |
+
- **Research Guidance**: Introduced students to research methodologies
|
83 |
+
- **Career Counseling**: Provided guidance on academic and career paths
|
84 |
+
|
85 |
+
**Notable Projects Supervised**:
|
86 |
+
- Web-based student management systems
|
87 |
+
- Mobile applications for local businesses
|
88 |
+
- Data analysis projects for social impact
|
89 |
+
- Machine learning applications in healthcare
|
90 |
+
|
91 |
+
### Professional Development
|
92 |
+
- Attended faculty development programs
|
93 |
+
- Participated in curriculum review committees
|
94 |
+
- Engaged in continuous learning through online courses
|
95 |
+
- Built networks with industry professionals
|
96 |
+
|
97 |
+
### Impact and Recognition
|
98 |
+
- Consistently received high student evaluation scores
|
99 |
+
- Recognized for innovative teaching methods
|
100 |
+
- Contributed to department's accreditation process
|
101 |
+
- Helped establish computer lab facilities
|
102 |
+
|
103 |
+
## Skills Developed Through Experience
|
104 |
+
|
105 |
+
### Teaching and Communication
|
106 |
+
- **Pedagogical Skills**: Developed effective teaching strategies for diverse learning styles
|
107 |
+
- **Public Speaking**: Comfortable presenting to large audiences
|
108 |
+
- **Technical Communication**: Ability to explain complex concepts simply
|
109 |
+
- **Cross-cultural Communication**: Experience with international student populations
|
110 |
+
|
111 |
+
### Leadership and Management
|
112 |
+
- **Team Coordination**: Led teaching teams and research groups
|
113 |
+
- **Project Management**: Managed multiple courses and research projects simultaneously
|
114 |
+
- **Mentoring**: Guided students and junior colleagues
|
115 |
+
- **Conflict Resolution**: Handled academic disputes and student concerns
|
116 |
+
|
117 |
+
### Technical and Research
|
118 |
+
- **Curriculum Development**: Designed course content aligned with industry needs
|
119 |
+
- **Assessment Design**: Created fair and comprehensive evaluation methods
|
120 |
+
- **Research Methodology**: Applied rigorous research practices
|
121 |
+
- **Technology Integration**: Incorporated new technologies into teaching
|
122 |
+
|
123 |
+
## Professional Networks and Collaborations
|
124 |
+
|
125 |
+
### Academic Collaborations
|
126 |
+
- **UNSW Research Groups**: Active member of multiple research teams
|
127 |
+
- **International Collaborations**: Partnerships with researchers globally
|
128 |
+
- **Industry Connections**: Collaborations with healthcare institutions
|
129 |
+
- **Conference Networks**: Regular participant in academic conferences
|
130 |
+
|
131 |
+
### Professional Memberships
|
132 |
+
- IEEE Computer Society member
|
133 |
+
- ACM member
|
134 |
+
- Australian Computer Society (ACS) member
|
135 |
+
- Bioinformatics Australia member
|
136 |
+
|
137 |
+
### Community Engagement
|
138 |
+
- **Peer Review**: Regular reviewer for academic journals
|
139 |
+
- **Conference Organization**: Committee member for academic conferences
|
140 |
+
- **Outreach Programs**: Participant in STEM education initiatives
|
141 |
+
- **Open Source Contributions**: Active contributor to research software projects
|
knowledge_base/publications_detailed.md
ADDED
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Detailed Publications and Research Contributions
|
2 |
+
|
3 |
+
## BioFusionNet (2024)
|
4 |
+
**Full Title**: "BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion"
|
5 |
+
|
6 |
+
**Journal**: IEEE Journal of Biomedical and Health Informatics
|
7 |
+
|
8 |
+
**Key Contributions**:
|
9 |
+
- Novel multimodal fusion architecture combining histopathology, genomics, and clinical data
|
10 |
+
- Attention-based feature selection for interpretability
|
11 |
+
- Superior performance compared to existing methods
|
12 |
+
- Clinical validation on large patient cohorts
|
13 |
+
|
14 |
+
**Technical Details**:
|
15 |
+
- Uses ResNet-based feature extraction for histopathology images
|
16 |
+
- Implements cross-attention mechanisms for data fusion
|
17 |
+
- Employs survival analysis with Cox proportional hazards
|
18 |
+
- Achieves C-index of 0.78 on validation datasets
|
19 |
+
|
20 |
+
**Impact**: This work provides clinicians with a comprehensive tool for patient risk assessment, enabling personalized treatment planning.
|
21 |
+
<!-- This is code for this paper -->
|
22 |
+
**GitHub**: [raktim-mondol/BioFusionNet](https://github.com/raktim-mondol/BioFusionNet)
|
23 |
+
|
24 |
+
## hist2RNA (2023)
|
25 |
+
**Full Title**: "hist2RNA: An Efficient Deep Learning Architecture to Predict Gene Expression from Breast Cancer Histopathology Images"
|
26 |
+
|
27 |
+
**Journal**: Cancers
|
28 |
+
|
29 |
+
**Key Contributions**:
|
30 |
+
- Direct prediction of gene expression from tissue images
|
31 |
+
- Efficient architecture suitable for clinical deployment
|
32 |
+
- Identification of morphology-gene expression relationships
|
33 |
+
- Validation across multiple cancer datasets
|
34 |
+
|
35 |
+
**Technical Details**:
|
36 |
+
- Custom CNN architecture optimized for gene expression prediction
|
37 |
+
- Multi-task learning framework
|
38 |
+
- Attention mechanisms for spatial feature importance
|
39 |
+
- Correlation analysis with known biological pathways
|
40 |
+
|
41 |
+
**Impact**: Enables gene expression profiling without expensive molecular assays, making personalized medicine more accessible.
|
42 |
+
<!-- This is code for this paper -->
|
43 |
+
**GitHub**: [raktim-mondol/hist2RNA](https://github.com/raktim-mondol/hist2RNA)
|
44 |
+
|
45 |
+
## AFExNet (2021)
|
46 |
+
**Full Title**: "AFExNet: An Adversarial Autoencoder for Differentiating Breast Cancer Sub-types and Extracting Biologically Relevant Genes"
|
47 |
+
|
48 |
+
**Journal**: IEEE/ACM Transactions on Computational Biology and Bioinformatics
|
49 |
+
|
50 |
+
**Key Contributions**:
|
51 |
+
- Adversarial training for robust feature learning
|
52 |
+
- Automatic biomarker discovery
|
53 |
+
- Cancer subtype classification
|
54 |
+
- Biologically interpretable features
|
55 |
+
|
56 |
+
**Technical Details**:
|
57 |
+
- Adversarial autoencoder architecture
|
58 |
+
- Gene selection based on reconstruction importance
|
59 |
+
- Validation on TCGA datasets
|
60 |
+
- Pathway enrichment analysis
|
61 |
+
|
62 |
+
**Impact**: Provides insights into cancer biology while achieving high classification accuracy.
|
63 |
+
<!-- This is code for this paper -->
|
64 |
+
**GitHub**: [raktim-mondol/breast-cancer-sub-types](https://github.com/raktim-mondol/breast-cancer-sub-types)
|
65 |
+
|
66 |
+
## Ongoing Research
|
67 |
+
|
68 |
+
### Multimodal Foundation Models
|
69 |
+
- Developing foundation models for medical imaging
|
70 |
+
- Pre-training on large-scale medical datasets
|
71 |
+
- Transfer learning for rare diseases
|
72 |
+
|
73 |
+
### Ongoing Research
|
74 |
+
- Large Language Models (LLMs)
|
75 |
+
- Retrieval-Augmented Generation (RAG)
|
76 |
+
- Fine-tuning and domain adaptation
|
77 |
+
|
78 |
+
|
79 |
+
### AI Ethics in Healthcare
|
80 |
+
- Bias detection and mitigation
|
81 |
+
- Fairness in medical AI
|
82 |
+
- Regulatory compliance frameworks
|
knowledge_base/research_details.md
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Detailed Research Information
|
2 |
+
|
3 |
+
## PhD Research: Deep Learning Based Prognosis and Explainability for Breast Cancer
|
4 |
+
|
5 |
+
### Research Objectives
|
6 |
+
1. Develop novel deep learning architectures for breast cancer survival prediction
|
7 |
+
2. Create explainable AI models that clinicians can trust and understand
|
8 |
+
3. Integrate multimodal data (histopathology images, genomics, clinical data)
|
9 |
+
4. Build treatment recommendation systems based on patient-specific factors
|
10 |
+
|
11 |
+
### Key Innovations
|
12 |
+
- **BioFusionNet**: A multimodal fusion network that combines histopathology images with genomic and clinical data for survival risk stratification
|
13 |
+
- **hist2RNA**: An efficient architecture that predicts gene expression directly from histopathology images
|
14 |
+
- **AFExNet**: An adversarial autoencoder for cancer subtype classification and biomarker discovery
|
15 |
+
|
16 |
+
### Technical Approach
|
17 |
+
- Utilizes attention mechanisms for interpretability
|
18 |
+
- Employs transfer learning from pre-trained vision models
|
19 |
+
- Implements novel fusion strategies for multimodal data
|
20 |
+
- Uses adversarial training for robust feature learning
|
21 |
+
|
22 |
+
### Clinical Impact
|
23 |
+
The research aims to provide clinicians with:
|
24 |
+
- More accurate prognosis predictions
|
25 |
+
- Personalized treatment recommendations
|
26 |
+
- Explainable AI decisions for clinical trust
|
27 |
+
- Cost-effective diagnostic tools
|
28 |
+
|
29 |
+
## Current Projects
|
30 |
+
|
31 |
+
### Large Language Models for Healthcare
|
32 |
+
- Fine-tuning LLMs for medical text analysis
|
33 |
+
- Developing RAG systems for clinical decision support
|
34 |
+
- Creating conversational AI for patient education
|
35 |
+
|
36 |
+
### Multimodal AI Systems
|
37 |
+
- Vision-language models for medical imaging
|
38 |
+
- Cross-modal retrieval systems
|
39 |
+
- Multimodal fusion architectures
|
40 |
+
|
41 |
+
### Explainable AI
|
42 |
+
- Attention visualization techniques
|
43 |
+
- Counterfactual explanations
|
44 |
+
- Feature importance analysis
|
45 |
+
- Clinical decision support systems
|
knowledge_base/skills_expertise.md
ADDED
@@ -0,0 +1,142 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Technical Skills and Expertise
|
2 |
+
|
3 |
+
## Deep Learning and Machine Learning
|
4 |
+
|
5 |
+
### Core Frameworks
|
6 |
+
- **PyTorch**: Advanced proficiency in model development, custom layers, and distributed training
|
7 |
+
- **TensorFlow**: Experience with TensorFlow 2.x, Keras, and TensorFlow Serving
|
8 |
+
- **Hugging Face Transformers**: Fine-tuning, model deployment, and custom tokenizers
|
9 |
+
- **scikit-learn**: Classical ML algorithms, preprocessing, and model evaluation
|
10 |
+
|
11 |
+
### Specialized Techniques
|
12 |
+
- **Transfer Learning**: Pre-trained model adaptation, domain adaptation
|
13 |
+
- **Attention Mechanisms**: Self-attention, cross-attention, multi-head attention
|
14 |
+
- **Adversarial Training**: GANs, adversarial autoencoders, robust training
|
15 |
+
- **Multi-task Learning**: Joint optimization, task balancing, shared representations
|
16 |
+
- **Meta-Learning**: Few-shot learning, model-agnostic meta-learning
|
17 |
+
|
18 |
+
## Large Language Models and NLP
|
19 |
+
|
20 |
+
### LLM Technologies
|
21 |
+
- **Parameter-Efficient Fine-tuning**: LoRA, QLoRA, AdaLoRA, Prefix tuning
|
22 |
+
- **Quantization**: GPTQ, GGUF, 8-bit and 4-bit quantization
|
23 |
+
- **Model Optimization**: Pruning, distillation, efficient architectures
|
24 |
+
- **Prompt Engineering**: Chain-of-thought, few-shot prompting, instruction tuning
|
25 |
+
|
26 |
+
### NLP Applications
|
27 |
+
- **Text Generation**: Controlled generation, style transfer, summarization
|
28 |
+
- **Information Extraction**: Named entity recognition, relation extraction
|
29 |
+
- **Question Answering**: Reading comprehension, open-domain QA
|
30 |
+
- **Sentiment Analysis**: Aspect-based sentiment, emotion detection
|
31 |
+
|
32 |
+
## Computer Vision and Medical Imaging
|
33 |
+
|
34 |
+
### Vision Architectures
|
35 |
+
- **Convolutional Networks**: ResNet, DenseNet, EfficientNet, Vision Transformers
|
36 |
+
- **Object Detection**: YOLO, R-CNN family, DETR
|
37 |
+
- **Segmentation**: U-Net, Mask R-CNN, Segment Anything Model (SAM)
|
38 |
+
- **Medical Imaging**: Specialized architectures for histopathology, radiology
|
39 |
+
|
40 |
+
### Image Processing
|
41 |
+
- **Preprocessing**: Normalization, augmentation, color space conversion
|
42 |
+
- **Feature Extraction**: SIFT, HOG, deep features
|
43 |
+
- **Registration**: Image alignment, geometric transformations
|
44 |
+
- **Quality Assessment**: Blur detection, artifact identification
|
45 |
+
|
46 |
+
## Multimodal AI and Fusion
|
47 |
+
|
48 |
+
### Multimodal Architectures
|
49 |
+
- **Vision-Language Models**: CLIP, BLIP, LLaVA, DALL-E
|
50 |
+
- **Fusion Strategies**: Early fusion, late fusion, attention-based fusion
|
51 |
+
- **Cross-modal Retrieval**: Image-text matching, semantic search
|
52 |
+
- **Multimodal Generation**: Text-to-image, image captioning
|
53 |
+
|
54 |
+
### Data Integration
|
55 |
+
- **Heterogeneous Data**: Combining images, text, tabular data
|
56 |
+
- **Temporal Fusion**: Time-series integration, sequential modeling
|
57 |
+
- **Graph Neural Networks**: Relational data modeling, knowledge graphs
|
58 |
+
|
59 |
+
## Retrieval-Augmented Generation (RAG)
|
60 |
+
|
61 |
+
### Vector Databases
|
62 |
+
- **FAISS**: Efficient similarity search, index optimization
|
63 |
+
- **ChromaDB**: Document storage and retrieval
|
64 |
+
- **Weaviate**: Vector search with filtering
|
65 |
+
- **Milvus**: Scalable vector database management
|
66 |
+
|
67 |
+
### Retrieval Techniques
|
68 |
+
- **Dense Retrieval**: Bi-encoder architectures, contrastive learning
|
69 |
+
- **Sparse Retrieval**: BM25, TF-IDF, keyword matching
|
70 |
+
- **Hybrid Search**: Combining dense and sparse methods
|
71 |
+
- **Re-ranking**: Cross-encoder models, relevance scoring
|
72 |
+
|
73 |
+
### RAG Optimization
|
74 |
+
- **Chunk Strategies**: Document segmentation, overlap handling
|
75 |
+
- **Embedding Models**: Sentence transformers, domain-specific embeddings
|
76 |
+
- **Query Enhancement**: Query expansion, reformulation
|
77 |
+
- **Context Management**: Relevance filtering, context compression
|
78 |
+
|
79 |
+
## Bioinformatics and Computational Biology
|
80 |
+
|
81 |
+
### Genomics
|
82 |
+
- **Sequence Analysis**: Alignment algorithms, variant calling
|
83 |
+
- **Gene Expression**: RNA-seq analysis, differential expression
|
84 |
+
- **Pathway Analysis**: Enrichment analysis, network biology
|
85 |
+
- **Population Genetics**: GWAS, linkage analysis
|
86 |
+
|
87 |
+
### Proteomics
|
88 |
+
- **Protein Structure**: Structure prediction, folding analysis
|
89 |
+
- **Mass Spectrometry**: Data processing, protein identification
|
90 |
+
- **Protein-Protein Interactions**: Network analysis, functional prediction
|
91 |
+
|
92 |
+
### Systems Biology
|
93 |
+
- **Network Analysis**: Graph theory, centrality measures
|
94 |
+
- **Mathematical Modeling**: Differential equations, stochastic models
|
95 |
+
- **Multi-omics Integration**: Data fusion, pathway reconstruction
|
96 |
+
|
97 |
+
## Cloud Computing and MLOps
|
98 |
+
|
99 |
+
### Cloud Platforms
|
100 |
+
- **AWS**: EC2, S3, SageMaker, Lambda, ECS
|
101 |
+
- **Google Cloud**: Compute Engine, Cloud Storage, Vertex AI
|
102 |
+
- **Azure**: Virtual Machines, Blob Storage, Machine Learning Studio
|
103 |
+
|
104 |
+
### MLOps Tools
|
105 |
+
- **Model Versioning**: MLflow, DVC, Weights & Biases
|
106 |
+
- **Containerization**: Docker, Kubernetes, container orchestration
|
107 |
+
- **CI/CD**: GitHub Actions, Jenkins, automated testing
|
108 |
+
- **Monitoring**: Model drift detection, performance monitoring
|
109 |
+
|
110 |
+
### Distributed Computing
|
111 |
+
- **Parallel Processing**: Multi-GPU training, data parallelism
|
112 |
+
- **Cluster Computing**: Spark, Dask, distributed training
|
113 |
+
- **Resource Management**: SLURM, job scheduling, resource optimization
|
114 |
+
|
115 |
+
## Programming and Software Development
|
116 |
+
|
117 |
+
### Programming Languages
|
118 |
+
- **Python**: Advanced proficiency, scientific computing, web development
|
119 |
+
- **R**: Statistical analysis, bioinformatics packages, visualization
|
120 |
+
- **SQL**: Database design, query optimization, data warehousing
|
121 |
+
- **JavaScript/TypeScript**: Web development, Node.js, React
|
122 |
+
- **Bash/Shell**: System administration, automation scripts
|
123 |
+
|
124 |
+
### Development Tools
|
125 |
+
- **Version Control**: Git, GitHub, collaborative development
|
126 |
+
- **IDEs**: VS Code, PyCharm, Jupyter notebooks
|
127 |
+
- **Documentation**: Sphinx, MkDocs, technical writing
|
128 |
+
- **Testing**: Unit testing, integration testing, test-driven development
|
129 |
+
|
130 |
+
## Research and Academic Skills
|
131 |
+
|
132 |
+
### Research Methodology
|
133 |
+
- **Experimental Design**: Hypothesis testing, statistical power analysis
|
134 |
+
- **Literature Review**: Systematic reviews, meta-analysis
|
135 |
+
- **Peer Review**: Journal reviewing, conference reviewing
|
136 |
+
- **Grant Writing**: Research proposals, funding applications
|
137 |
+
|
138 |
+
### Communication
|
139 |
+
- **Technical Writing**: Research papers, documentation, tutorials
|
140 |
+
- **Presentations**: Conference talks, poster presentations
|
141 |
+
- **Teaching**: Course development, student mentoring
|
142 |
+
- **Collaboration**: Interdisciplinary research, team leadership
|
knowledge_base/statistics.md
ADDED
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### **Raktim Mondol: A Portfolio of Applied Statistical Methods (Biostatistics)**
|
2 |
+
|
3 |
+
This portfolio highlights Raktim Mondol's expertise in applying sophisticated statistical methods to solve complex problems in biomedical research, as demonstrated through his key publications.
|
4 |
+
|
5 |
+
---
|
6 |
+
## **1. BioFusionNet: Survival Risk Stratification *IEEE JBHI 2024***
|
7 |
+
|
8 |
+
This work demonstrates an innovative approach to biostatistics by developing a novel statistical function to address common challenges in survival studies.
|
9 |
+
|
10 |
+
* **Statistical Method:** **Advanced Survival Analysis** and **Custom Statistical Model Development**.
|
11 |
+
* **Application & Findings:** The core contribution was the development of a **novel weighted Cox loss function**, specifically designed to handle the prevalent issue of imbalanced data in survival analysis. This was integrated into a multimodal deep learning framework. The effectiveness of this approach was validated using multivariate Cox proportional hazards models, which evaluated multiple risk factors simultaneously. The model achieved a high mean concordance index (C-index) of 0.77, underscoring a sophisticated capability to design, develop, and validate complex statistical models for high-accuracy patient risk stratification.
|
12 |
+
|
13 |
+
| Statistical method | Why used / implication | Technical depth | Key results |
|
14 |
+
|---|---|---|---|
|
15 |
+
| **Weighted Cox loss** (novel) | Custom loss to up-weight rare death events during deep-net training. | Implements instance-level weighting inside mini-batch; balances censoring. | Outperformed classic Cox loss: C-index↑ from 0.67 → 0.77 (mean over 5 folds). |
|
16 |
+
| **Concordance index (C-index)** | Primary metric for patient-level risk ranking. | Survival-analysis staple; here averaged over 5-fold CV. | Mean C-index = 0.77 ± 0.05. |
|
17 |
+
| **Time-dependent AUC** | Evaluates discrimination at multiple horizons (0–10 y). | Integrates cumulative/dynamic ROC; more demanding than simple AUC. | Mean AUC = 0.84 ± 0.05. |
|
18 |
+
| **Univariate & multivariate Cox models** | Compare BioFusionNet risk groups to standard clinico-path variables. | Same framework as hist2RNA but with weighting. | Univariate HR = 2.99 (1.88–4.78); multivariate HR = 2.91 (1.80–4.68); both p < 0.005. |
|
19 |
+
| **Kaplan–Meier & log-rank** | Visual confirmation of high- vs low-risk separation. | Standard survival plotting. | Log-rank p = 6.45 × 10⁻⁷. |
|
20 |
+
| **Five-fold stratified cross-validation** | Robust estimate of generalisation; preserves event ratio. | Good ML practice. | Fold C-indices: 0.72–0.82. |
|
21 |
+
| **Paired model benchmarking** | Compared C-index / AUC vs six multimodal baselines. | Uses identical Optuna-tuned hyper-parms for fair test. | BioFusionNet best by ≥ 0.07 C-index. |
|
22 |
+
---
|
23 |
+
|
24 |
+
## **2. hist2RNA: Predicting Gene Expression from Histopathology *Cancers 2023***
|
25 |
+
|
26 |
+
This paper highlights a comprehensive application of survival, regression, and comparative statistics to validate a deep learning model.
|
27 |
+
|
28 |
+
* **Statistical Methods:** **Survival Analysis (Kaplan-Meier, Cox Models)**, **Regression/Correlation Analysis (Spearman, R²)**, and **Comparative Analysis (t-tests, ANOVA)**.
|
29 |
+
* **Application & Findings:** A full suite of survival analysis techniques was conducted. Kaplan-Meier estimation and log-rank tests were used to visualize and compare survival distributions between patient groups. Both univariate and multivariate Cox proportional hazards models were employed to identify significant prognostic markers and quantify their risk using hazard ratios. To validate the deep learning model's predictions, Spearman rank correlation and the coefficient of determination (R²) were used to measure the association between predicted and actual gene expression. T-tests and ANOVA were also applied to compare biomarker expressions across different tumor subgroups, demonstrating a versatile command of hypothesis testing.
|
30 |
+
|
31 |
+
| Statistical method | Why used / implication | Technical depth | Key results |
|
32 |
+
|---|---|---|---|
|
33 |
+
| **Spearman rank correlation (ρ)** with Benjamini-Hochberg **FDR** adjustment | Quantify how well predicted vs. true gene-expression ranks agree; FDR guards against 138 parallel tests. | Non-parametric; multiple-testing control. | Across patients: median ρ = 0.82 (p = 4.3 × 10⁻⁶⁴). Across genes: median ρ = 0.29 with 105/138 genes significant at 5 % FDR. |
|
34 |
+
| **Coefficient of determination (R²)** | Measures variance explained by the model for each gene. | Classical regression statistic. | 32 genes had R² ≥ 0.10; 17 belong to PAM50 set. |
|
35 |
+
| **Two-sample t-tests** | Compare predicted gene expression between IHC-positive vs. IHC-negative tumours (ER, PR, HER2). | Parametric difference-of-means; assumes normality. | e.g., ESR1 t-test p = 4.2 × 10⁻⁵⁴ (ER⁺ vs ER⁻). |
|
36 |
+
| **One-way ANOVA** | Assess trends in predicted MKI67 across tumour grades 1–3. | Parametric multi-group comparison. | MKI67 ANOVA p = 9.9 × 10⁻⁹. |
|
37 |
+
| **Concordance index (c-index)** | Rank-based discrimination for survival predictions. | Survival-analysis metric independent of time scale. | c-index = 0.56 (univariate) improved to 0.65 in multivariate Cox model. |
|
38 |
+
| **Cox proportional-hazards (univariate & multivariate)** with **hazard ratio (HR ± 95 % CI)** | Test whether hist2RNA-derived luminal subtype is prognostic after adjusting for clinicopathology. | Semi-parametric survival model. | HR = 2.16 (1.12–3.06) univariate; HR = 1.87 (1.30–2.68) multivariate; p < 5 × 10⁻³. |
|
39 |
+
| **Log-rank test & Kaplan–Meier curves** | Visual and inferential check of survival separation between predicted LumA vs LumB. | Non-parametric time-to-event comparison. | Log-rank p < 5 × 10⁻³; clear survival divergence. |
|
40 |
+
|
41 |
+
---
|
42 |
+
|
43 |
+
## **3. AFExNet: Differentiating Breast Cancer Sub-types *IEEE/ACM TCBB 2021***
|
44 |
+
|
45 |
+
This research demonstrates rigorous hypothesis testing to validate the superiority of a novel machine learning architecture for genomic data analysis.
|
46 |
+
|
47 |
+
* **Statistical Method:** **Hypothesis Testing (Paired and One-Tailed T-tests)**
|
48 |
+
* **Application & Findings:** Paired t-tests were used to statistically compare the performance of the AFExNet feature extraction method against other techniques like PCA, VAE, and DAE. The tests evaluated the significance of differences in key classification metrics (precision, recall, accuracy, F1-score). The results confirmed that AFExNet's performance improvements were statistically significant, with p-values less than 0.10 (e.g., p=0.00793 vs. VAE). This rigorous statistical validation confirmed the robustness and superiority of the AFExNet model for analyzing high-dimensional genomic data.
|
49 |
+
|
50 |
+
| Statistical method | Why used / implication | Technical depth | Key results |
|
51 |
+
|---|---|---|---|
|
52 |
+
| **One-tail paired Student t-tests** | Show that AFExNet's precision/recall gains vs. PCA, AE, VAE, DAE are not by chance. | Parametric paired design; reports t & p for four method comparisons. | Example: vs. PCA t = 1.92, p = 0.047 (precision); vs. VAE t = 2.85, p = 0.0079. |
|
53 |
+
| **Cross-validation (5-fold)** | Stability check of all 12 classifiers across metrics. | Standard ML validation. | Precision up to 85.9 %, recall 85.8 % with SVM. |
|
54 |
+
| **Confusion-matrix–derived metrics** – accuracy, precision, recall, F1, MCC, Cohen's κ, ROC-AUC | Multi-faceted performance portrait across imbalanced classes. | Mix of parametric & rank-based indices. | MCC 0.70 with voting classifier; AUC 0.84 with SVM. |
|
55 |
+
| **GO-term & KEGG pathway enrichment (DAVID)** with corrected p-values | Biological validation of genes extracted via latent-weight analysis. | Multiple-testing correction inside DAVID; p-value interpretation. | Top GO term "olfactory receptor activity", p = 5.92 × 10⁻²; pathway "olfactory transduction", p = 5.23 × 10⁻². |
|
56 |
+
| **SMOTE sampling** | Synthetic oversampling to counter class imbalance before training. | Resampling technique; not an inferential test but key pre-processing step. | Balanced minority classes without inflating Type I error downstream. |
|
57 |
+
---
|
58 |
+
|
59 |
+
## **4. Anemia Detection System *IEEE Proceedings 2014***
|
60 |
+
|
61 |
+
This project showcases the application of regression modeling for developing a non-invasive medical device.
|
62 |
+
|
63 |
+
* **Statistical Method:** **Multivariate Regression Analysis**.
|
64 |
+
* **Application & Findings:** A regression-based image processing method was employed to estimate hemoglobin (Hb) levels from non-invasive fingertip images. Using NCSS software, a multivariate regression model was developed that incorporated RGB color differences and nonlinear terms to establish a predictive relationship between blood color features and Hb concentration. The resulting statistical model successfully correlated with actual Hb levels, demonstrating that it could effectively predict hemoglobin concentration and was suitable for hardware implementation on an FPGA for rapid, non-invasive anemia screening.
|
65 |
+
|
66 |
+
| Statistical method | Why used / implication | Technical depth | Key results |
|
67 |
+
|---|---|---|---|
|
68 |
+
| **Multivariate polynomial regression** (quadratic & interaction terms) forming a **ratio model**: Hb = *N<sub>r</sub>/D<sub>r</sub>* | Maps colour-change features (ΔR, ΔG, ΔB) from fingertip images to haemoglobin level, enabling a fully non-invasive test. | • 9 predictors + constant per numerator/denominator (Eqs 4–6).<br>• Fitted with NCSS; coefficients quantised to IEEE-754 for FPGA. | Closed-form eq. exactly given in paper :contentReference[oaicite:1]{index=1}. |
|
69 |
+
| **Hardware/MATLAB parity test** | Verifies that floating-point RTL reproduces regression output *bit-for-bit* → builds trust in deployment. | Table III compares 5 pixel samples (R₁,G₁,B₁ …) through pipeline. | Hb error = **0** for all samples (e.g. 8.5692 g/dL in both MATLAB & Verilog) |
|
70 |
+
| **Threshold rule (≤ 10 g/dL)** | Converts continuous Hb to binary "anemic / normal" output for clinical screening. | Simple comparator inside FPGA; threshold from WHO ranges | Device toggles 1-bit flag when Hb ≤ 10 g/dL (figure shows 7-segment display). |
|
71 |
+
|
72 |
+
## **5. Leaf Chlorophyll Estimation System *IEEE Proceedings 2014***
|
73 |
+
|
74 |
+
This project applies statistical modeling to create a low-cost, non-destructive sensor for plant health monitoring.
|
75 |
+
|
76 |
+
* **Statistical Method:** **Stepwise Multivariate Regression Analysis** (quadratic and interaction terms, with a Hougen non-linear model evaluated for comparison).
|
77 |
+
* **Application & Findings:** RGB and normalised colour features from leaf images were fed into a stepwise regression routine that automatically selected the most informative linear, quadratic and cross-product terms. The final model (Adj-R² ≈ 0.99, RMSE ≈ 3.3) accurately predicted chlorophyll concentration while remaining lightweight enough for real-time FPGA deployment. Non-linear Hougen fits were tested but offered lower accuracy, so the stepwise model was chosen, giving growers a fast, camera-based alternative to destructive lab assays.
|
78 |
+
|
79 |
+
|
80 |
+
| Statistical method | Why used / implication | Technical depth | Key results |
|
81 |
+
|---|---|---|---|
|
82 |
+
| **Stepwise multivariate linear regression with nonlinear terms** | Finds the lightest model that still predicts chlorophyll (Ch) from RGB + normalised channels; ideal for resource-limited FPGA. | • Starts with R,G,B,N1,N2; iteratively adds R², G², B², GB, RB until Adj-R² drops.<br>• Terms with poor p-value (G×R, N3) removed. | Final Eq 5 yields **R² = 0.99, Adj-R² = 0.99, RMSE = 3.32**, *p* = 3.14×10⁻⁷, F ≈ 6.18×10¹² (15 samples, EDF = 5) |
|
83 |
+
| **Hougen nonlinear regression (P/Q form)** | Benchmarks whether a chemical-kinetics-style ratio boosts accuracy. | 5 free coefficients; fitted by non-linear least squares. | R² = 0.91, RMSE = 5.75 – inferior to stepwise model, so not implemented |
|
84 |
+
| **Model-diagnostic plots** (normal probability, residuals vs fit, lagged residuals) | Confirms homoscedastic, un-autocorrelated errors → validates linear assumptions before hardware port. | Figures 2–5 in the paper show tight residual cloud within 0.5 σ |
|
85 |
+
| **Comparative metrics table** (14 simpler fits) | Quantifies trade-offs so designers can justify chosen complexity. | Table I lists R², RMSE, F, EDF for each candidate model. | Stepwise model dominates all baselines (next-best linear RGB has R² = 0.88) |
|
86 |
+
---
|
87 |
+
|
88 |
+
## Raktim's biostatistics portfolio summary
|
89 |
+
|
90 |
+
| Capability demonstrated | Evidence from papers |
|
91 |
+
|---|---|
|
92 |
+
| **Modern survival analysis** (Cox PH, c-index, weighted loss, K-M, log-rank) | hist2RNA & BioFusionNet show classical and deep-learning-specific implementations. |
|
93 |
+
| **Comparative hypothesis testing** (t-test, ANOVA, paired design) | hist2RNA uses group t-tests & ANOVA; AFExNet runs paired t-tests against baselines. |
|
94 |
+
| **Correlation & multiple-testing control** | Spearman + FDR across 138 genes in hist2RNA. |
|
95 |
+
| **Model-evaluation under class imbalance** | AFExNet employs SMOTE and reports MCC, κ; BioFusionNet designs weighted loss. |
|
96 |
+
| **Omics feature validation** (GO / pathway enrichment) | AFExNet links latent-space genes to olfactory-transduction pathway. |
|
97 |
+
| **Rigorous cross-validation & benchmarking** | 5-fold experiments compare up to 12 classifiers (AFExNet) and 6 fusion baselines (BioFusionNet). |
|
98 |
+
| **Design & validation of polynomial and ratio regressions** | Anemia paper's Hb = *N*⁄*D* quadratic model ported to FPGA with bit-exact MATLAB parity |
|
99 |
+
| **Model-selection & residual diagnostics for linear/non-linear regression** | Stepwise search, Hougen non-linear comparison, and residual plots in chlorophyll paper |
|
100 |
+
| **Goodness-of-fit metric reporting (R², Adj-R², RMSE, F, p)** | Chlorophyll study publishes a 14-model table with full metrics to justify choice |
|
101 |
+
| **Hardware-level verification of statistical models** | FPGA RTL vs MATLAB parity test confirms fixed-point implementation accuracy for Hb regression |
|
102 |
+
| **Threshold-based clinical/agronomic decision rules** | 10 g/dL anemia flag and chlorophyll thresholds hard-wired in FPGA logic |
|
103 |
+
|
104 |
+
|
105 |
+
**In short:** Raktim's work covers the full biostatistical spectrum—from classic parametric tests and survival modelling to modern cross-validated machine-learning metrics and enrichment analyses—illustrating both theoretical command and practical execution in large-scale omics studies.
|