raktimhugging commited on
Commit
a534ff6
·
verified ·
1 Parent(s): 6016ab1

Upload 6 files

Browse files
knowledge_base/about.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # [Raktim Mondol](https://mondol.me)
2
+ NSW, Australia | [email protected]
3
+
4
+ ---
5
+
6
+ ## SUMMARY & RESEARCH INTEREST
7
+
8
+ I am an experienced data scientist and programmer with deep expertise in artificial intelligence, generative AI (GenAI) techniques and large language models (LLMs), bioinformatics, computer vision, and high-performance computing. My research and professional background is centered on analyzing large-scale image and biomedical datasets, developing novel deep learning models, and conducting advanced statistical analyses. I am a dedicated and committed individual with a strong team-oriented spirit, a positive attitude, and exceptional interpersonal skills.
9
+
10
+ ---
11
+
12
+ ## EDUCATION
13
+
14
+ 🎓 **PhD, Computer Science & Engineering** | 2021 - 2025
15
+ <br>UNSW, Sydney, Australia
16
+ <br>**Research Topic:** *Deep Learning For Breast Cancer Prognosis & Explainability*
17
+ <br>**◇ Thesis Submitted**
18
+
19
+ 🎓 **Masters by Research, Computer Science & Bioinformatics** | 2017 - 2019
20
+ <br>RMIT University, Melbourne, Australia
21
+ <br>[High Distinction (85%)](https://www.myequals.net/sharelink/78e7c7d7-5a73-4e7c-9711-f163f5dd1604/af0d807a-8392-45be-9104-d26b95f5aa7a)
22
+ <br>**Research Thesis:** *[Deep learning in classifying cancer subtypes, extracting relevant genes and identifying novel mutations](https://research-repository.rmit.edu.au/articles/thesis/Deep_learning_in_classifying_cancer_subtypes_extracting_relevant_genes_and_identifying_novel_mutations/27589272?file=50759199)*
23
+
24
+ ---
25
+
26
+ ## WORK EXPERIENCE
27
+
28
+ 🧑‍🏫 **Casual Academic** | July 2021 - Continuing
29
+ <br>Dept. of Computer Science & Engineering
30
+ <br>[UNSW](https://www.unsw.edu.au/), Sydney, NSW
31
+ <br>**Duties/Responsibilities:**
32
+ * Conduct Laboratory and Consultation Classes: Computer Vision, Neural Networks and Deep Learning, Artificial Intelligence
33
+
34
+ 🧑‍🏫 **Teaching Assistant (Casual)** | July 2017 - Oct 2019
35
+ <br>Dept. of Electrical and Biomedical Engineering
36
+ <br>[RMIT University](https://www.rmit.edu.au/), Melbourne, VIC
37
+ <br>**Duties/Responsibilities:**
38
+ * Conducted Laboratory Classes: Electronics (EEET2255), Software Engineering Design (EEET2250), Engineering Computing I (EEET2246), Introduction to Embedded Systems (EEET2256).
39
+
40
+ 🧑‍🏫 **Lecturer (Full-Time)** | September 2013 - December 2016
41
+ <br>Dept. of Electrical and Electronic Engineering
42
+ <br>[World University of Bangladesh (WUB)](https://wub.edu.bd/), Dhaka, Bangladesh
43
+ <br>**Duties/Responsibilities:**
44
+ * Courses Instructed (Theory): Electrical Circuit I, Electrical Circuit II, Engineering Materials, Electronics I, Electronics II, Digital Logic Design and Digital Electronics
45
+ * Courses Instructed (Laboratory): Microprocessor & Interfacing, Digital Electronics and Digital Signal Processing
46
+ * Supervised Students for Projects and Thesis
47
+
48
+ ---
49
+
50
+ ## RESEARCH EXPERIENCE
51
+
52
+ 🔬 **Doctoral Researcher (Sydney, NSW, Australia)** | March 2021 – Jan 2025
53
+ <br>**[Biomedical Image Computing Research Group](https://imagescience.org/meijering/group/)**
54
+ * Developed AI models to assist pathologists in breast cancer identification and treatment recommendation.
55
+
56
+ 🔬 **Master's Researcher (Melbourne, VIC, Australia)** | March 2017 – April 2019
57
+ <br>**[NeuroSyd Research Laboratory](https://sites.google.com/view/neurosyd/home)**
58
+ * Worked on developing a deep learning model and bio-informatics pipeline to extract bio-marker from high-throughput biological data.
59
+
60
+ ---
61
+
62
+ ## TECHNICAL SKILLS
63
+
64
+ * **Languages:** Python, R, SQL, LaTeX
65
+ * **Software:** MATLAB, STATA, SPSS, SAS, NCSS
66
+ * **Deep Learning Framework:** Tensorflow, Pytorch
67
+ * **Distributed & Cloud Computing:** AWS, GCP, GALAXY
68
+ * **Operating Systems:** Windows, Linux
69
+ * **IDE:** Spyder, Jupyter Notebook, VS Code, Rstudio
70
+
71
+ ---
72
+
73
+ ## AWARDS & RECOGNITION
74
+
75
+ * **2021:** Awarded PhD Scholarship (Tuition Fee and Stipend)
76
+ * **2019:** Completed Masters by Research with [High Distinction](https://drive.google.com/file/d/19ItaTbByg686UpoBMB7LcmWT8kfE1-fR/view?usp=sharing)
77
+ * **2017:** RMIT Research Stipend Scholarship
78
+ * **2017:** RMIT Research International Tuition Fee Scholarship
79
+ * **2013:** B.Sc. in Electrical and Electronic Engineering with High Distinction
80
+ * **2013:** [Vice Chancellor Award Spring 2013](https://drive.google.com/file/d/1VgqAWfSlHtm5OEepYtlB32kxdlV72W1g/view?usp=sharing), BRAC University
81
+ * **2010:** [Dean Award Fall 2010](https://drive.google.com/file/d/15G0CGXYdDrMdB93LKB90uICPeJMYoLub/view?usp=sharing), [Fall 2011](https://drive.google.com/file/d/1xawevXKfahsE2LUrLAoUTn5PLjDIjyHr/view?usp=sharing), BRAC University
82
+
83
+ ---
84
+
85
+ ## PARTICIPATED EVENTS
86
+
87
+ * **2019:** Received Training on [NGS RNA Seq. & DNA Seq.](https://drive.google.com/file/d/1kHxtVXS1oD8BjrSqP8lM9koNA4PsT8WB/view?usp=sharing) Data Analysis organized by ArrayGen
88
+ * **2017:** Presented [Poster](https://drive.google.com/file/d/1K64iv74oatvbMmQYNHpyJgoGDvqRoW_V/view?usp=sharing) in [AMSI BioinfoSummer](https://drive.google.com/file/d/12Y2haYCtShJuEV0lsqeAiJgKtuRKGo_c/view?usp=sharing) at Monash University
89
+ * **2017:** Presented Thesis in [3 Minute Thesis (3MT)](https://drive.google.com/file/d/1AYj6Yox5GH285b4M7hh7rTxn4OyiPwMm/view?usp=sharing) competition at RMIT University
90
+ * **2017:** Received Training on High Performance Computing (HPC) at Monash University
91
+ * **2017:** Symposium on Big Data in Infectious Diseases at University of Melbourne
92
+ * **2016:** Received Training on Research Methodology at World University
93
+ * **2013:** Presented Undergraduate Thesis in a Workshop Organized by [IEEE Bangladesh](https://drive.google.com/file/d/1PPs1qlOjDDSZIXmaXWAL66q-WBBlz4i6/view?usp=sharing)
94
+
95
+ ---
96
+
97
+ ## PUBLICATIONS
98
+
99
+ ### JOURNAL PAPERS
100
+ * 📓 R. K. Mondol, E. K. A. Millar, P. H. Graham, L. Browne, A. Sowmya, and E. Meijering, ["GRAPHITE: Graph-Based Interpretable Tissue Examination for Enhanced Explainability in Breast Cancer Histopathology,"](https://arxiv.org/abs/2501.04206) (Submitted, Under Review), 2024.
101
+ * 📓 R. K. Mondol, E. K. A. Millar, and A. Sowmya, and E. Meijering, ["BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion,"](https://ieeexplore.ieee.org/document/10568932) in *IEEE Journal of Biomedical and Health Informatics*, 2024.
102
+ * 📓 R. K. Mondol, E. K. A. Millar, P. H. Graham, L. Browne, A. Sowmya, and E. Meijering, ["hist2RNA: An Efficient Deep Learning Architecture to Predict Gene Expression from Breast Cancer Histopathology Images,"](https://www.mdpi.com/2072-6694/15/9/2569) in *Cancers*, 2023.
103
+ * 📓 R. K. Mondol, N. D. Truong, M. Reza, S. Ippolito, E. Ebrahimie, and O. Kavehei, ["AFExNet: An Adversarial Autoencoder for Differentiating Breast Cancer Sub-types and Extracting Biologically Relevant Genes,"](https://ieeexplore.ieee.org/document/9378938) in *IEEE/ACM Transactions on Computational Biology and Bioinformatics*, 2021.
104
+
105
+ ### CONFERENCE PROCEEDINGS
106
+ * 📄 R. K. Mondol, E. K. A. Millar, A. Sowmya, and E. Meijering, ["MM-Survnet: Deep Learning-Based Survival Risk Stratification in Breast Cancer Through Multimodal Data Fusion,"](https://doi.org/10.1109/ISBI56570.2024.10635810) in *2024 IEEE International Symposium on Biomedical Imaging (ISBI),* Athens, Greece, 2024, pp. 1-5.
107
+ * 📄 M.I. Khan, R. K. Mondol, M.A. Zamee, and T.A. Tarique, ["Hardware architecture design of anemia detecting regression model based on FPGA,"](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6850814&isnumber=6850678) in *International Conference on Informatics, Electronics Vision (ICIEV),* May 2014, pp. 1-5.
108
+ * 📄 Imran Khan, and R. K. Mondol, ["FPGA based leaf chlorophyll estimating regression model,"](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7083557&isnumber=7083385) in *International Conference on Software, Knowledge, Information Management and Applications (SKIMA),* December 2014, pp. 1-6.
109
+ * 📄 R. K. Mondol, Imran Khan, Md. A.K. Mahbubul Hye, and Asif Hassan, ["Hardware architecture design of face recognition system based on FPGA,"](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7193228&isnumber=7192777) in *International Conference on Innovations in Information Embedded and Communication Systems (ICIIECS),* March 2015, pp. 1-5.
110
+ * 📄 A. Hassan, R. K. Mondol, and M. R. Hasan, ["Computer network design of a company — A simplistic way,"](https://doi.org/10.1109/ICACCS.2015.7324121) in *2015 International Conference on Advanced Computing and Communication Systems (ICACCS),* Coimbatore, India, March 2015, pp. 1-4.
knowledge_base/experience_detailed.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Detailed Professional Experience
2
+
3
+ ## Current Position: Casual Academic at UNSW Sydney (July 2021 - Present)
4
+
5
+ ### Role and Responsibilities
6
+ As a Casual Academic in the School of Computer Science and Engineering, Raktim contributes to undergraduate and postgraduate education while pursuing his PhD research.
7
+
8
+ **Teaching Duties**:
9
+ - Conduct laboratory sessions for computer science courses
10
+ - Lead tutorial classes on programming and algorithms
11
+ - Provide one-on-one mentoring to students
12
+ - Assist in course material development and updates
13
+ - Grade assignments and provide constructive feedback
14
+
15
+ **Courses Taught**:
16
+ - COMP1511: Programming Fundamentals
17
+ - COMP2521: Data Structures and Algorithms
18
+ - COMP3311: Database Systems
19
+ - COMP9417: Machine Learning and Data Mining
20
+
21
+ **Student Impact**:
22
+ - Mentored over 200 students across various courses
23
+ - Developed innovative teaching materials for complex concepts
24
+ - Received positive feedback for clear explanations and patient guidance
25
+ - Helped students transition from theoretical concepts to practical implementation
26
+
27
+ ### Research Integration
28
+ - Incorporates current research findings into teaching materials
29
+ - Supervises undergraduate research projects
30
+ - Collaborates with faculty on curriculum development
31
+ - Organizes workshops on AI and machine learning topics
32
+
33
+ ## Previous Role: Teaching Assistant at RMIT University (July 2017 - October 2019)
34
+
35
+ ### Academic Responsibilities
36
+ During his Master's program, Raktim served as a Teaching Assistant, gaining valuable experience in higher education.
37
+
38
+ **Key Contributions**:
39
+ - Conducted weekly laboratory sessions for 50+ students
40
+ - Assisted in course delivery for computer science subjects
41
+ - Developed supplementary learning materials
42
+ - Provided technical support for programming assignments
43
+
44
+ **Courses Supported**:
45
+ - Introduction to Programming (Java, Python)
46
+ - Data Structures and Algorithms
47
+ - Database Systems
48
+ - Software Engineering Fundamentals
49
+
50
+ **Skills Developed**:
51
+ - Effective communication of complex technical concepts
52
+ - Patience and adaptability in teaching diverse student groups
53
+ - Time management and organizational skills
54
+ - Collaborative work with academic staff
55
+
56
+ ### Research Activities
57
+ - Conducted literature reviews for research projects
58
+ - Participated in research group meetings
59
+ - Presented findings at internal seminars
60
+ - Collaborated on data collection and analysis
61
+
62
+ ## Early Career: Lecturer at World University of Bangladesh (September 2013 - December 2016)
63
+
64
+ ### Full-Time Academic Position
65
+ After completing his Bachelor's degree, Raktim joined as a full-time Lecturer in the Department of Computer Science and Engineering.
66
+
67
+ **Teaching Portfolio**:
68
+ - **Programming Courses**: C, C++, Java, Python programming
69
+ - **Core CS Subjects**: Data Structures, Algorithms, Database Systems
70
+ - **Mathematics**: Discrete Mathematics, Statistics for CS
71
+ - **Specialized Topics**: Computer Networks, Operating Systems
72
+
73
+ **Administrative Duties**:
74
+ - Course coordinator for multiple subjects
75
+ - Examination committee member
76
+ - Student advisor and mentor
77
+ - Curriculum development participant
78
+
79
+ ### Student Supervision
80
+ - **Thesis Supervision**: Guided 15+ undergraduate thesis projects
81
+ - **Project Mentoring**: Supervised capstone projects in software development
82
+ - **Research Guidance**: Introduced students to research methodologies
83
+ - **Career Counseling**: Provided guidance on academic and career paths
84
+
85
+ **Notable Projects Supervised**:
86
+ - Web-based student management systems
87
+ - Mobile applications for local businesses
88
+ - Data analysis projects for social impact
89
+ - Machine learning applications in healthcare
90
+
91
+ ### Professional Development
92
+ - Attended faculty development programs
93
+ - Participated in curriculum review committees
94
+ - Engaged in continuous learning through online courses
95
+ - Built networks with industry professionals
96
+
97
+ ### Impact and Recognition
98
+ - Consistently received high student evaluation scores
99
+ - Recognized for innovative teaching methods
100
+ - Contributed to department's accreditation process
101
+ - Helped establish computer lab facilities
102
+
103
+ ## Skills Developed Through Experience
104
+
105
+ ### Teaching and Communication
106
+ - **Pedagogical Skills**: Developed effective teaching strategies for diverse learning styles
107
+ - **Public Speaking**: Comfortable presenting to large audiences
108
+ - **Technical Communication**: Ability to explain complex concepts simply
109
+ - **Cross-cultural Communication**: Experience with international student populations
110
+
111
+ ### Leadership and Management
112
+ - **Team Coordination**: Led teaching teams and research groups
113
+ - **Project Management**: Managed multiple courses and research projects simultaneously
114
+ - **Mentoring**: Guided students and junior colleagues
115
+ - **Conflict Resolution**: Handled academic disputes and student concerns
116
+
117
+ ### Technical and Research
118
+ - **Curriculum Development**: Designed course content aligned with industry needs
119
+ - **Assessment Design**: Created fair and comprehensive evaluation methods
120
+ - **Research Methodology**: Applied rigorous research practices
121
+ - **Technology Integration**: Incorporated new technologies into teaching
122
+
123
+ ## Professional Networks and Collaborations
124
+
125
+ ### Academic Collaborations
126
+ - **UNSW Research Groups**: Active member of multiple research teams
127
+ - **International Collaborations**: Partnerships with researchers globally
128
+ - **Industry Connections**: Collaborations with healthcare institutions
129
+ - **Conference Networks**: Regular participant in academic conferences
130
+
131
+ ### Professional Memberships
132
+ - IEEE Computer Society member
133
+ - ACM member
134
+ - Australian Computer Society (ACS) member
135
+ - Bioinformatics Australia member
136
+
137
+ ### Community Engagement
138
+ - **Peer Review**: Regular reviewer for academic journals
139
+ - **Conference Organization**: Committee member for academic conferences
140
+ - **Outreach Programs**: Participant in STEM education initiatives
141
+ - **Open Source Contributions**: Active contributor to research software projects
knowledge_base/publications_detailed.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Detailed Publications and Research Contributions
2
+
3
+ ## BioFusionNet (2024)
4
+ **Full Title**: "BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion"
5
+
6
+ **Journal**: IEEE Journal of Biomedical and Health Informatics
7
+
8
+ **Key Contributions**:
9
+ - Novel multimodal fusion architecture combining histopathology, genomics, and clinical data
10
+ - Attention-based feature selection for interpretability
11
+ - Superior performance compared to existing methods
12
+ - Clinical validation on large patient cohorts
13
+
14
+ **Technical Details**:
15
+ - Uses ResNet-based feature extraction for histopathology images
16
+ - Implements cross-attention mechanisms for data fusion
17
+ - Employs survival analysis with Cox proportional hazards
18
+ - Achieves C-index of 0.78 on validation datasets
19
+
20
+ **Impact**: This work provides clinicians with a comprehensive tool for patient risk assessment, enabling personalized treatment planning.
21
+ <!-- This is code for this paper -->
22
+ **GitHub**: [raktim-mondol/BioFusionNet](https://github.com/raktim-mondol/BioFusionNet)
23
+
24
+ ## hist2RNA (2023)
25
+ **Full Title**: "hist2RNA: An Efficient Deep Learning Architecture to Predict Gene Expression from Breast Cancer Histopathology Images"
26
+
27
+ **Journal**: Cancers
28
+
29
+ **Key Contributions**:
30
+ - Direct prediction of gene expression from tissue images
31
+ - Efficient architecture suitable for clinical deployment
32
+ - Identification of morphology-gene expression relationships
33
+ - Validation across multiple cancer datasets
34
+
35
+ **Technical Details**:
36
+ - Custom CNN architecture optimized for gene expression prediction
37
+ - Multi-task learning framework
38
+ - Attention mechanisms for spatial feature importance
39
+ - Correlation analysis with known biological pathways
40
+
41
+ **Impact**: Enables gene expression profiling without expensive molecular assays, making personalized medicine more accessible.
42
+ <!-- This is code for this paper -->
43
+ **GitHub**: [raktim-mondol/hist2RNA](https://github.com/raktim-mondol/hist2RNA)
44
+
45
+ ## AFExNet (2021)
46
+ **Full Title**: "AFExNet: An Adversarial Autoencoder for Differentiating Breast Cancer Sub-types and Extracting Biologically Relevant Genes"
47
+
48
+ **Journal**: IEEE/ACM Transactions on Computational Biology and Bioinformatics
49
+
50
+ **Key Contributions**:
51
+ - Adversarial training for robust feature learning
52
+ - Automatic biomarker discovery
53
+ - Cancer subtype classification
54
+ - Biologically interpretable features
55
+
56
+ **Technical Details**:
57
+ - Adversarial autoencoder architecture
58
+ - Gene selection based on reconstruction importance
59
+ - Validation on TCGA datasets
60
+ - Pathway enrichment analysis
61
+
62
+ **Impact**: Provides insights into cancer biology while achieving high classification accuracy.
63
+ <!-- This is code for this paper -->
64
+ **GitHub**: [raktim-mondol/breast-cancer-sub-types](https://github.com/raktim-mondol/breast-cancer-sub-types)
65
+
66
+ ## Ongoing Research
67
+
68
+ ### Multimodal Foundation Models
69
+ - Developing foundation models for medical imaging
70
+ - Pre-training on large-scale medical datasets
71
+ - Transfer learning for rare diseases
72
+
73
+ ### Ongoing Research
74
+ - Large Language Models (LLMs)
75
+ - Retrieval-Augmented Generation (RAG)
76
+ - Fine-tuning and domain adaptation
77
+
78
+
79
+ ### AI Ethics in Healthcare
80
+ - Bias detection and mitigation
81
+ - Fairness in medical AI
82
+ - Regulatory compliance frameworks
knowledge_base/research_details.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Detailed Research Information
2
+
3
+ ## PhD Research: Deep Learning Based Prognosis and Explainability for Breast Cancer
4
+
5
+ ### Research Objectives
6
+ 1. Develop novel deep learning architectures for breast cancer survival prediction
7
+ 2. Create explainable AI models that clinicians can trust and understand
8
+ 3. Integrate multimodal data (histopathology images, genomics, clinical data)
9
+ 4. Build treatment recommendation systems based on patient-specific factors
10
+
11
+ ### Key Innovations
12
+ - **BioFusionNet**: A multimodal fusion network that combines histopathology images with genomic and clinical data for survival risk stratification
13
+ - **hist2RNA**: An efficient architecture that predicts gene expression directly from histopathology images
14
+ - **AFExNet**: An adversarial autoencoder for cancer subtype classification and biomarker discovery
15
+
16
+ ### Technical Approach
17
+ - Utilizes attention mechanisms for interpretability
18
+ - Employs transfer learning from pre-trained vision models
19
+ - Implements novel fusion strategies for multimodal data
20
+ - Uses adversarial training for robust feature learning
21
+
22
+ ### Clinical Impact
23
+ The research aims to provide clinicians with:
24
+ - More accurate prognosis predictions
25
+ - Personalized treatment recommendations
26
+ - Explainable AI decisions for clinical trust
27
+ - Cost-effective diagnostic tools
28
+
29
+ ## Current Projects
30
+
31
+ ### Large Language Models for Healthcare
32
+ - Fine-tuning LLMs for medical text analysis
33
+ - Developing RAG systems for clinical decision support
34
+ - Creating conversational AI for patient education
35
+
36
+ ### Multimodal AI Systems
37
+ - Vision-language models for medical imaging
38
+ - Cross-modal retrieval systems
39
+ - Multimodal fusion architectures
40
+
41
+ ### Explainable AI
42
+ - Attention visualization techniques
43
+ - Counterfactual explanations
44
+ - Feature importance analysis
45
+ - Clinical decision support systems
knowledge_base/skills_expertise.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Technical Skills and Expertise
2
+
3
+ ## Deep Learning and Machine Learning
4
+
5
+ ### Core Frameworks
6
+ - **PyTorch**: Advanced proficiency in model development, custom layers, and distributed training
7
+ - **TensorFlow**: Experience with TensorFlow 2.x, Keras, and TensorFlow Serving
8
+ - **Hugging Face Transformers**: Fine-tuning, model deployment, and custom tokenizers
9
+ - **scikit-learn**: Classical ML algorithms, preprocessing, and model evaluation
10
+
11
+ ### Specialized Techniques
12
+ - **Transfer Learning**: Pre-trained model adaptation, domain adaptation
13
+ - **Attention Mechanisms**: Self-attention, cross-attention, multi-head attention
14
+ - **Adversarial Training**: GANs, adversarial autoencoders, robust training
15
+ - **Multi-task Learning**: Joint optimization, task balancing, shared representations
16
+ - **Meta-Learning**: Few-shot learning, model-agnostic meta-learning
17
+
18
+ ## Large Language Models and NLP
19
+
20
+ ### LLM Technologies
21
+ - **Parameter-Efficient Fine-tuning**: LoRA, QLoRA, AdaLoRA, Prefix tuning
22
+ - **Quantization**: GPTQ, GGUF, 8-bit and 4-bit quantization
23
+ - **Model Optimization**: Pruning, distillation, efficient architectures
24
+ - **Prompt Engineering**: Chain-of-thought, few-shot prompting, instruction tuning
25
+
26
+ ### NLP Applications
27
+ - **Text Generation**: Controlled generation, style transfer, summarization
28
+ - **Information Extraction**: Named entity recognition, relation extraction
29
+ - **Question Answering**: Reading comprehension, open-domain QA
30
+ - **Sentiment Analysis**: Aspect-based sentiment, emotion detection
31
+
32
+ ## Computer Vision and Medical Imaging
33
+
34
+ ### Vision Architectures
35
+ - **Convolutional Networks**: ResNet, DenseNet, EfficientNet, Vision Transformers
36
+ - **Object Detection**: YOLO, R-CNN family, DETR
37
+ - **Segmentation**: U-Net, Mask R-CNN, Segment Anything Model (SAM)
38
+ - **Medical Imaging**: Specialized architectures for histopathology, radiology
39
+
40
+ ### Image Processing
41
+ - **Preprocessing**: Normalization, augmentation, color space conversion
42
+ - **Feature Extraction**: SIFT, HOG, deep features
43
+ - **Registration**: Image alignment, geometric transformations
44
+ - **Quality Assessment**: Blur detection, artifact identification
45
+
46
+ ## Multimodal AI and Fusion
47
+
48
+ ### Multimodal Architectures
49
+ - **Vision-Language Models**: CLIP, BLIP, LLaVA, DALL-E
50
+ - **Fusion Strategies**: Early fusion, late fusion, attention-based fusion
51
+ - **Cross-modal Retrieval**: Image-text matching, semantic search
52
+ - **Multimodal Generation**: Text-to-image, image captioning
53
+
54
+ ### Data Integration
55
+ - **Heterogeneous Data**: Combining images, text, tabular data
56
+ - **Temporal Fusion**: Time-series integration, sequential modeling
57
+ - **Graph Neural Networks**: Relational data modeling, knowledge graphs
58
+
59
+ ## Retrieval-Augmented Generation (RAG)
60
+
61
+ ### Vector Databases
62
+ - **FAISS**: Efficient similarity search, index optimization
63
+ - **ChromaDB**: Document storage and retrieval
64
+ - **Weaviate**: Vector search with filtering
65
+ - **Milvus**: Scalable vector database management
66
+
67
+ ### Retrieval Techniques
68
+ - **Dense Retrieval**: Bi-encoder architectures, contrastive learning
69
+ - **Sparse Retrieval**: BM25, TF-IDF, keyword matching
70
+ - **Hybrid Search**: Combining dense and sparse methods
71
+ - **Re-ranking**: Cross-encoder models, relevance scoring
72
+
73
+ ### RAG Optimization
74
+ - **Chunk Strategies**: Document segmentation, overlap handling
75
+ - **Embedding Models**: Sentence transformers, domain-specific embeddings
76
+ - **Query Enhancement**: Query expansion, reformulation
77
+ - **Context Management**: Relevance filtering, context compression
78
+
79
+ ## Bioinformatics and Computational Biology
80
+
81
+ ### Genomics
82
+ - **Sequence Analysis**: Alignment algorithms, variant calling
83
+ - **Gene Expression**: RNA-seq analysis, differential expression
84
+ - **Pathway Analysis**: Enrichment analysis, network biology
85
+ - **Population Genetics**: GWAS, linkage analysis
86
+
87
+ ### Proteomics
88
+ - **Protein Structure**: Structure prediction, folding analysis
89
+ - **Mass Spectrometry**: Data processing, protein identification
90
+ - **Protein-Protein Interactions**: Network analysis, functional prediction
91
+
92
+ ### Systems Biology
93
+ - **Network Analysis**: Graph theory, centrality measures
94
+ - **Mathematical Modeling**: Differential equations, stochastic models
95
+ - **Multi-omics Integration**: Data fusion, pathway reconstruction
96
+
97
+ ## Cloud Computing and MLOps
98
+
99
+ ### Cloud Platforms
100
+ - **AWS**: EC2, S3, SageMaker, Lambda, ECS
101
+ - **Google Cloud**: Compute Engine, Cloud Storage, Vertex AI
102
+ - **Azure**: Virtual Machines, Blob Storage, Machine Learning Studio
103
+
104
+ ### MLOps Tools
105
+ - **Model Versioning**: MLflow, DVC, Weights & Biases
106
+ - **Containerization**: Docker, Kubernetes, container orchestration
107
+ - **CI/CD**: GitHub Actions, Jenkins, automated testing
108
+ - **Monitoring**: Model drift detection, performance monitoring
109
+
110
+ ### Distributed Computing
111
+ - **Parallel Processing**: Multi-GPU training, data parallelism
112
+ - **Cluster Computing**: Spark, Dask, distributed training
113
+ - **Resource Management**: SLURM, job scheduling, resource optimization
114
+
115
+ ## Programming and Software Development
116
+
117
+ ### Programming Languages
118
+ - **Python**: Advanced proficiency, scientific computing, web development
119
+ - **R**: Statistical analysis, bioinformatics packages, visualization
120
+ - **SQL**: Database design, query optimization, data warehousing
121
+ - **JavaScript/TypeScript**: Web development, Node.js, React
122
+ - **Bash/Shell**: System administration, automation scripts
123
+
124
+ ### Development Tools
125
+ - **Version Control**: Git, GitHub, collaborative development
126
+ - **IDEs**: VS Code, PyCharm, Jupyter notebooks
127
+ - **Documentation**: Sphinx, MkDocs, technical writing
128
+ - **Testing**: Unit testing, integration testing, test-driven development
129
+
130
+ ## Research and Academic Skills
131
+
132
+ ### Research Methodology
133
+ - **Experimental Design**: Hypothesis testing, statistical power analysis
134
+ - **Literature Review**: Systematic reviews, meta-analysis
135
+ - **Peer Review**: Journal reviewing, conference reviewing
136
+ - **Grant Writing**: Research proposals, funding applications
137
+
138
+ ### Communication
139
+ - **Technical Writing**: Research papers, documentation, tutorials
140
+ - **Presentations**: Conference talks, poster presentations
141
+ - **Teaching**: Course development, student mentoring
142
+ - **Collaboration**: Interdisciplinary research, team leadership
knowledge_base/statistics.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### **Raktim Mondol: A Portfolio of Applied Statistical Methods (Biostatistics)**
2
+
3
+ This portfolio highlights Raktim Mondol's expertise in applying sophisticated statistical methods to solve complex problems in biomedical research, as demonstrated through his key publications.
4
+
5
+ ---
6
+ ## **1. BioFusionNet: Survival Risk Stratification *IEEE JBHI 2024***
7
+
8
+ This work demonstrates an innovative approach to biostatistics by developing a novel statistical function to address common challenges in survival studies.
9
+
10
+ * **Statistical Method:** **Advanced Survival Analysis** and **Custom Statistical Model Development**.
11
+ * **Application & Findings:** The core contribution was the development of a **novel weighted Cox loss function**, specifically designed to handle the prevalent issue of imbalanced data in survival analysis. This was integrated into a multimodal deep learning framework. The effectiveness of this approach was validated using multivariate Cox proportional hazards models, which evaluated multiple risk factors simultaneously. The model achieved a high mean concordance index (C-index) of 0.77, underscoring a sophisticated capability to design, develop, and validate complex statistical models for high-accuracy patient risk stratification.
12
+
13
+ | Statistical method | Why used / implication | Technical depth | Key results |
14
+ |---|---|---|---|
15
+ | **Weighted Cox loss** (novel) | Custom loss to up-weight rare death events during deep-net training. | Implements instance-level weighting inside mini-batch; balances censoring. | Outperformed classic Cox loss: C-index↑ from 0.67 → 0.77 (mean over 5 folds). |
16
+ | **Concordance index (C-index)** | Primary metric for patient-level risk ranking. | Survival-analysis staple; here averaged over 5-fold CV. | Mean C-index = 0.77 ± 0.05. |
17
+ | **Time-dependent AUC** | Evaluates discrimination at multiple horizons (0–10 y). | Integrates cumulative/dynamic ROC; more demanding than simple AUC. | Mean AUC = 0.84 ± 0.05. |
18
+ | **Univariate & multivariate Cox models** | Compare BioFusionNet risk groups to standard clinico-path variables. | Same framework as hist2RNA but with weighting. | Univariate HR = 2.99 (1.88–4.78); multivariate HR = 2.91 (1.80–4.68); both p < 0.005. |
19
+ | **Kaplan–Meier & log-rank** | Visual confirmation of high- vs low-risk separation. | Standard survival plotting. | Log-rank p = 6.45 × 10⁻⁷. |
20
+ | **Five-fold stratified cross-validation** | Robust estimate of generalisation; preserves event ratio. | Good ML practice. | Fold C-indices: 0.72–0.82. |
21
+ | **Paired model benchmarking** | Compared C-index / AUC vs six multimodal baselines. | Uses identical Optuna-tuned hyper-parms for fair test. | BioFusionNet best by ≥ 0.07 C-index. |
22
+ ---
23
+
24
+ ## **2. hist2RNA: Predicting Gene Expression from Histopathology *Cancers 2023***
25
+
26
+ This paper highlights a comprehensive application of survival, regression, and comparative statistics to validate a deep learning model.
27
+
28
+ * **Statistical Methods:** **Survival Analysis (Kaplan-Meier, Cox Models)**, **Regression/Correlation Analysis (Spearman, R²)**, and **Comparative Analysis (t-tests, ANOVA)**.
29
+ * **Application & Findings:** A full suite of survival analysis techniques was conducted. Kaplan-Meier estimation and log-rank tests were used to visualize and compare survival distributions between patient groups. Both univariate and multivariate Cox proportional hazards models were employed to identify significant prognostic markers and quantify their risk using hazard ratios. To validate the deep learning model's predictions, Spearman rank correlation and the coefficient of determination (R²) were used to measure the association between predicted and actual gene expression. T-tests and ANOVA were also applied to compare biomarker expressions across different tumor subgroups, demonstrating a versatile command of hypothesis testing.
30
+
31
+ | Statistical method | Why used / implication | Technical depth | Key results |
32
+ |---|---|---|---|
33
+ | **Spearman rank correlation (ρ)** with Benjamini-Hochberg **FDR** adjustment | Quantify how well predicted vs. true gene-expression ranks agree; FDR guards against 138 parallel tests. | Non-parametric; multiple-testing control. | Across patients: median ρ = 0.82 (p = 4.3 × 10⁻⁶⁴). Across genes: median ρ = 0.29 with 105/138 genes significant at 5 % FDR. |
34
+ | **Coefficient of determination (R²)** | Measures variance explained by the model for each gene. | Classical regression statistic. | 32 genes had R² ≥ 0.10; 17 belong to PAM50 set. |
35
+ | **Two-sample t-tests** | Compare predicted gene expression between IHC-positive vs. IHC-negative tumours (ER, PR, HER2). | Parametric difference-of-means; assumes normality. | e.g., ESR1 t-test p = 4.2 × 10⁻⁵⁴ (ER⁺ vs ER⁻). |
36
+ | **One-way ANOVA** | Assess trends in predicted MKI67 across tumour grades 1–3. | Parametric multi-group comparison. | MKI67 ANOVA p = 9.9 × 10⁻⁹. |
37
+ | **Concordance index (c-index)** | Rank-based discrimination for survival predictions. | Survival-analysis metric independent of time scale. | c-index = 0.56 (univariate) improved to 0.65 in multivariate Cox model. |
38
+ | **Cox proportional-hazards (univariate & multivariate)** with **hazard ratio (HR ± 95 % CI)** | Test whether hist2RNA-derived luminal subtype is prognostic after adjusting for clinicopathology. | Semi-parametric survival model. | HR = 2.16 (1.12–3.06) univariate; HR = 1.87 (1.30–2.68) multivariate; p < 5 × 10⁻³. |
39
+ | **Log-rank test & Kaplan–Meier curves** | Visual and inferential check of survival separation between predicted LumA vs LumB. | Non-parametric time-to-event comparison. | Log-rank p < 5 × 10⁻³; clear survival divergence. |
40
+
41
+ ---
42
+
43
+ ## **3. AFExNet: Differentiating Breast Cancer Sub-types *IEEE/ACM TCBB 2021***
44
+
45
+ This research demonstrates rigorous hypothesis testing to validate the superiority of a novel machine learning architecture for genomic data analysis.
46
+
47
+ * **Statistical Method:** **Hypothesis Testing (Paired and One-Tailed T-tests)**
48
+ * **Application & Findings:** Paired t-tests were used to statistically compare the performance of the AFExNet feature extraction method against other techniques like PCA, VAE, and DAE. The tests evaluated the significance of differences in key classification metrics (precision, recall, accuracy, F1-score). The results confirmed that AFExNet's performance improvements were statistically significant, with p-values less than 0.10 (e.g., p=0.00793 vs. VAE). This rigorous statistical validation confirmed the robustness and superiority of the AFExNet model for analyzing high-dimensional genomic data.
49
+
50
+ | Statistical method | Why used / implication | Technical depth | Key results |
51
+ |---|---|---|---|
52
+ | **One-tail paired Student t-tests** | Show that AFExNet's precision/recall gains vs. PCA, AE, VAE, DAE are not by chance. | Parametric paired design; reports t & p for four method comparisons. | Example: vs. PCA t = 1.92, p = 0.047 (precision); vs. VAE t = 2.85, p = 0.0079. |
53
+ | **Cross-validation (5-fold)** | Stability check of all 12 classifiers across metrics. | Standard ML validation. | Precision up to 85.9 %, recall 85.8 % with SVM. |
54
+ | **Confusion-matrix–derived metrics** – accuracy, precision, recall, F1, MCC, Cohen's κ, ROC-AUC | Multi-faceted performance portrait across imbalanced classes. | Mix of parametric & rank-based indices. | MCC 0.70 with voting classifier; AUC 0.84 with SVM. |
55
+ | **GO-term & KEGG pathway enrichment (DAVID)** with corrected p-values | Biological validation of genes extracted via latent-weight analysis. | Multiple-testing correction inside DAVID; p-value interpretation. | Top GO term "olfactory receptor activity", p = 5.92 × 10⁻²; pathway "olfactory transduction", p = 5.23 × 10⁻². |
56
+ | **SMOTE sampling** | Synthetic oversampling to counter class imbalance before training. | Resampling technique; not an inferential test but key pre-processing step. | Balanced minority classes without inflating Type I error downstream. |
57
+ ---
58
+
59
+ ## **4. Anemia Detection System *IEEE Proceedings 2014***
60
+
61
+ This project showcases the application of regression modeling for developing a non-invasive medical device.
62
+
63
+ * **Statistical Method:** **Multivariate Regression Analysis**.
64
+ * **Application & Findings:** A regression-based image processing method was employed to estimate hemoglobin (Hb) levels from non-invasive fingertip images. Using NCSS software, a multivariate regression model was developed that incorporated RGB color differences and nonlinear terms to establish a predictive relationship between blood color features and Hb concentration. The resulting statistical model successfully correlated with actual Hb levels, demonstrating that it could effectively predict hemoglobin concentration and was suitable for hardware implementation on an FPGA for rapid, non-invasive anemia screening.
65
+
66
+ | Statistical method | Why used / implication | Technical depth | Key results |
67
+ |---|---|---|---|
68
+ | **Multivariate polynomial regression** (quadratic & interaction terms) forming a **ratio model**: Hb = *N<sub>r</sub>/D<sub>r</sub>* | Maps colour-change features (ΔR, ΔG, ΔB) from fingertip images to haemoglobin level, enabling a fully non-invasive test. | • 9 predictors + constant per numerator/denominator (Eqs 4–6).<br>• Fitted with NCSS; coefficients quantised to IEEE-754 for FPGA. | Closed-form eq. exactly given in paper :contentReference[oaicite:1]{index=1}. |
69
+ | **Hardware/MATLAB parity test** | Verifies that floating-point RTL reproduces regression output *bit-for-bit* → builds trust in deployment. | Table III compares 5 pixel samples (R₁,G₁,B₁ …) through pipeline. | Hb error = **0** for all samples (e.g. 8.5692 g/dL in both MATLAB & Verilog) |
70
+ | **Threshold rule (≤ 10 g/dL)** | Converts continuous Hb to binary "anemic / normal" output for clinical screening. | Simple comparator inside FPGA; threshold from WHO ranges | Device toggles 1-bit flag when Hb ≤ 10 g/dL (figure shows 7-segment display). |
71
+
72
+ ## **5. Leaf Chlorophyll Estimation System *IEEE Proceedings 2014***
73
+
74
+ This project applies statistical modeling to create a low-cost, non-destructive sensor for plant health monitoring.
75
+
76
+ * **Statistical Method:** **Stepwise Multivariate Regression Analysis** (quadratic and interaction terms, with a Hougen non-linear model evaluated for comparison).
77
+ * **Application & Findings:** RGB and normalised colour features from leaf images were fed into a stepwise regression routine that automatically selected the most informative linear, quadratic and cross-product terms. The final model (Adj-R² ≈ 0.99, RMSE ≈ 3.3) accurately predicted chlorophyll concentration while remaining lightweight enough for real-time FPGA deployment. Non-linear Hougen fits were tested but offered lower accuracy, so the stepwise model was chosen, giving growers a fast, camera-based alternative to destructive lab assays.
78
+
79
+
80
+ | Statistical method | Why used / implication | Technical depth | Key results |
81
+ |---|---|---|---|
82
+ | **Stepwise multivariate linear regression with nonlinear terms** | Finds the lightest model that still predicts chlorophyll (Ch) from RGB + normalised channels; ideal for resource-limited FPGA. | • Starts with R,G,B,N1,N2; iteratively adds R², G², B², GB, RB until Adj-R² drops.<br>• Terms with poor p-value (G×R, N3) removed. | Final Eq 5 yields **R² = 0.99, Adj-R² = 0.99, RMSE = 3.32**, *p* = 3.14×10⁻⁷, F ≈ 6.18×10¹² (15 samples, EDF = 5) |
83
+ | **Hougen nonlinear regression (P/Q form)** | Benchmarks whether a chemical-kinetics-style ratio boosts accuracy. | 5 free coefficients; fitted by non-linear least squares. | R² = 0.91, RMSE = 5.75 – inferior to stepwise model, so not implemented |
84
+ | **Model-diagnostic plots** (normal probability, residuals vs fit, lagged residuals) | Confirms homoscedastic, un-autocorrelated errors → validates linear assumptions before hardware port. | Figures 2–5 in the paper show tight residual cloud within 0.5 σ |
85
+ | **Comparative metrics table** (14 simpler fits) | Quantifies trade-offs so designers can justify chosen complexity. | Table I lists R², RMSE, F, EDF for each candidate model. | Stepwise model dominates all baselines (next-best linear RGB has R² = 0.88) |
86
+ ---
87
+
88
+ ## Raktim's biostatistics portfolio summary
89
+
90
+ | Capability demonstrated | Evidence from papers |
91
+ |---|---|
92
+ | **Modern survival analysis** (Cox PH, c-index, weighted loss, K-M, log-rank) | hist2RNA & BioFusionNet show classical and deep-learning-specific implementations. |
93
+ | **Comparative hypothesis testing** (t-test, ANOVA, paired design) | hist2RNA uses group t-tests & ANOVA; AFExNet runs paired t-tests against baselines. |
94
+ | **Correlation & multiple-testing control** | Spearman + FDR across 138 genes in hist2RNA. |
95
+ | **Model-evaluation under class imbalance** | AFExNet employs SMOTE and reports MCC, κ; BioFusionNet designs weighted loss. |
96
+ | **Omics feature validation** (GO / pathway enrichment) | AFExNet links latent-space genes to olfactory-transduction pathway. |
97
+ | **Rigorous cross-validation & benchmarking** | 5-fold experiments compare up to 12 classifiers (AFExNet) and 6 fusion baselines (BioFusionNet). |
98
+ | **Design & validation of polynomial and ratio regressions** | Anemia paper's Hb = *N*⁄*D* quadratic model ported to FPGA with bit-exact MATLAB parity |
99
+ | **Model-selection & residual diagnostics for linear/non-linear regression** | Stepwise search, Hougen non-linear comparison, and residual plots in chlorophyll paper |
100
+ | **Goodness-of-fit metric reporting (R², Adj-R², RMSE, F, p)** | Chlorophyll study publishes a 14-model table with full metrics to justify choice |
101
+ | **Hardware-level verification of statistical models** | FPGA RTL vs MATLAB parity test confirms fixed-point implementation accuracy for Hb regression |
102
+ | **Threshold-based clinical/agronomic decision rules** | 10 g/dL anemia flag and chlorophyll thresholds hard-wired in FPGA logic |
103
+
104
+
105
+ **In short:** Raktim's work covers the full biostatistical spectrum—from classic parametric tests and survival modelling to modern cross-validated machine-learning metrics and enrichment analyses—illustrating both theoretical command and practical execution in large-scale omics studies.