Project 1: Predicting Appropriate and Inappropriate Inpatient Cost Estimations Using Machine Learning
Collaborating Hospital: Agha Khan University Hospital (AKUH) - Department of Paediatrics and Child Health.
Medical Collaborators: Dr. Zahra Hoodbhoy and Dr. Babar Hasan
Publication: https://scibasejournals.org/clinical-and-medical-case-reports/1021.pdf
IBA Students: Mishaal Amin, Muhammad Affan, Saleha Zuberi
Brief: The project involved building multiple machine learning models for estimation of medical expenses of inpatients. The goal was to classify whether the estimation of the in-patient bill will deviate from the actual expenses, or not. This research assisted AKUH to understand features that strongly influence the actual bill to improve patient management processes. The data of 22000 patients was used who came to the facility between 2015 and 2019. Textual features were handled using NLP techniques, then multiple ML algorithms were trained on the dataset. Ensemble techniques gave the best performance with 80% accuracy, and 70% F1 score. It was identified that four features had a high relevance in this decision made by the algorithm, i.e. age of patient, visit reason, financial class, and location. Also, it was observed that deviation increased with increased length of stay.
Research Impact on Healthcare:
- Improved Financial Transparency
- Reduced Financial Stress for Patients
- Increased Patient Satisfaction
- Informed Decision-Making
- Targeted Improvement
- Enhanced Predictive Capability
- Foundation for Cost Prediction
- Supporting Value-Based Care
Project 2: Pediatric Care Pattern Explorer: A Fuzzy C-Means- Based Framework for Mortality Analysis in the Pediatric ICU
Collaborating Hospital: Agha Khan University Hospital (AKUH) - Department of Paediatrics and Child Health.
Medical Collaborator: Dr. Naveed ur Rehman Siddiqui.
Publication: Abstract accepted at the Critical Care Congress 2026
https://sccm.org/annual-congress/critical-care-conference
IBA Student: Shiza Azam
Brief: In this project, a framework was constructed for the analysis of PICU mortality data at the Agha Khan University Hospital. It involved the development of a data pipeline catered to the specific requirements of the institution’s tertiary care environment. The feature selection from the data was methodologically performed using benchmark techniques in conjunction with domain expert knowledge to allow for better knowledge discovery leading to personalized care management strategies for patients. The framework is based on soft clustering for dichotomization of data to better understand stochastic patterns that lead to mortality. The framework groups patients into small, medium and high-risk clusters and reveals interesting patterns within to enhance clinical decision making.
Research Impact on Healthcare:
- Proactive risk stratification
- Personalized care management
- Improved clinical pathways
- Enhanced data-driven quality improvement
- Deeper clinical understanding
Project 3: PEDICTOR A Framework for Prediction of Mortality in Pediatric Patients requiring Intensive Care
Collaborating Hospital: Agha Khan University Hospital (AKUH) - Department of Paediatrics and Child Health.
Medical Collaborator: Dr. Naveed ur Rehman Siddiqui
Publication: Abstract accepted at the Critical Care Congress 2026
https://sccm.org/annual-congress/critical-care-conference
IBA Student: Muhammad Amin Ghias
Brief: This work consisted of developing a framework to predict mortality of patients admitted to a pediatric intensive care unit (PICU) at the AKUH. A sophisticated data pipeline was set up to handle missing values and irrelevant signal data. A statistical oversampling technique was applied to the data to allow for better extraction of the complicated decision boundaries. Furthermore, a weighted mechanism was implemented for feature selection using both statistical techniques and domain expertise. As the data was both cross sectional and time-series, for cross-sectional data ML models with both high bias and low variance and vice versa were developed. For time-series, data models like LSTM, RNN and GRU were implemented. The hyperparameters were optimized via grid search. A stacking ensemble was set up to select the best combination out of 30 combinations. The resultant model gave high accuracy and practical precision. It achieved an F1-score of 95% with a confidence interval (CI) of 0.441(0.439–0.442) for expired patients and an AUCROC of 95% with CI of 0.932 (0.9314–0.9325) for PICU patients. In addition, Precision ranging from 0.785 to 0.967 and recall ranging from 0.399 to 0.612 were achieved for expired patients.
Research Impact on Healthcare:
- Proactive intervention
- Targeted care
- Performance monitoring leading to Quality improvement review
Project 4: Machine Learning for disease cluster arrangement of mortality in patients with Diabetes Mellitus admitted to a hospital setting in Karachi, Pakistan
Collaborating hospital: Agha Khan University Hospital (AKUH) - Department of Medicine
Medical Collaborator: Dr. Aysha Almas, Dr. Zainab Samad
Publication:
https://pubmed.ncbi.nlm.nih.gov/39722636/
IBA Student: Namra Aziz
Brief: A machine learning methodology was developed specifically for the purpose of knowledge discovery. The initial task was pipeline construction to convert 13 years of data from a hospital information system, lab records, and pharmacy records into meaningful form. The data was converted from 223 columns to 47 columns by applying variance thresholding and by applying domain expert scoring. K modes clustering was applied to the data after being optimized by using multiple measures like silhouette score, gap statistic, and elbow curve. Both thematic and algorithmic labels were applied, the latter were implemented using TF-IDF to identify the distinctive diagnostic labels and then LDA to find comorbidities. This data was then used in an explainable machine learning model to predict mortality, this confirmed that the clustering successfully separated patients into groups with significantly different mortality risks.
Research Impact on Healthcare:
- Discovery of high-risk subgroups
- Possibility of tailored, personalized intervention medicine
- Data Driven quality improvement
- Enhanced Clinical feature utility
- Possibility of adoption in clinical setting due to explainability
Project 5: Pediatric patient hospital length of stay prediction: A comparative analysis of Bayesian inference and machine learning approaches
Collaborating Hospital: Agha Khan University Hospital (AKUH) - Department of Paediatrics and Child Health.
Medical Collaborators: Dr. Zahra Hoodbhoy and Dr. Babar Hasan
IBA Student: Sarmad Zafar
Brief: In this project, the focus was on the development of a machine learning model as well as a statistical model to predict and understand the length of stay of pediatric patients. The preprocessing pipelining steps required specific NLP techniques to cater to some specific types of textual data. Afterwards several machine learning algorithms were applied, and it was observed that Extreme Boosting gave the best results with an MSLE (mean square logarithmic error) of 0.23. A Bayesian model was constructed in conjunction to get a better understanding of the statistical foundations of decision making using the available data even though the MLSE was 0.25, allowing for well informed decisions.
Research Impact on Healthcare:
- Optimized resource allocation
- Enhanced operational efficiency
- Informed clinical decision making
- Timely intervention
- Foundation for predictive analytics
Project 6: Predicting the Length of Stay of Cardiac Patients Based on Pre-Operative Variables
Collaborating Hospital: Department of Clinical Health Cardiology Tabba Heart Institute
Technology Collaborating Institute: Faculty of Computer and Information Systems, Islamic University of Madinah
Medical Collaborators: Saba Aijaz, Sana Sheikh, Muhammad Kashif, Ahson Memon, Imran Ali, Ghazal Peerwani, Asad Pathan
IBA Student: Ibrahim Abdul Rab
Brief: The main problem of focus in this project was the construction of a predictive model on the available retrospective cross-sectional data. The data had a high degree of positive skewness and variability leading to outliers which could not be ignored by the model. The data consisted of preoperative clinical features of 5363 adult patients who had undergone coronary artery bypass. Permutation feature importance was used as a criterion for feature relevance .To cater to this data, multiple ML models were developed and tested, and it was observed that the Hierarchical Bayesian model gave the best results without removing or capping the extreme values, compared to other ML algorithms which are known for learning nonlinear data. HBM provided better explanation of the changing effects of predictors across different "levels" (hierarchies) of Length of Stay, allowing for sound causal analysis. This analysis revealed an inverse relationship between certain high-risk variables (like cardiogenic shock) and very short Length of stay (Levels 0 and 1), which is a clinically relevant indicator of high patient mortality risk.
Research Impact on Healthcare:
- Optimized resource allocation
- Superior accuracy and understandability of complex data
- Early mortality risk identification
Project 7: Designing A Framework with a Novel Feature Selection Method for Predicting Mortality in Cardiac Patients
Collaborating Hospital: Department of Clinical Health Cardiology Tabba Heart Institute
Medical Collaborators: Saba Aijaz, Sana Sheikh, Muhammad Kashif, Ahson Memon, Imran Ali, Ghazal Peerwani, Asad Pathan
Brief: The goal of this work was the development of a universal framework for operative mortality prediction of cardiac patients. The available dataset was high dimensional and highly imbalanced. For the selection of relevant features, a two-pronged approach was used, in which a qualitative survey was conducted to score relevance of features. The score was translated into a weight which was then used with every feature in conjunction with statistical feature selection techniques, leading to a selection of understandable features. Then a baseline analysis was performed with conventional ML models, and it was observed that the models, even though giving a high accuracy, did not show good F1 scores probably due to quasi complete separation. A Bayesian inference method termed MCMC (Markov Chain Monte Carlo) and Firth Logistic Regression were applied to the dataset leading to good results which were better in terms of ROC, accuracy and understandability.
Research Impact on Healthcare:
- Improved patient safety
- Clinically informed modeling and trust
- Enhanced data robustness for real-world scenarios
- Optimized and prioritized resource allocation
Project 8: Predicting Subclinical Rheumatic Heart Disease in Children using Deep Learning
Collaborating Hospital: Sindh Institute of Urology and Transplantation
Medical Collaborator: Dr Babar Hasan, Dr Fatima Azeemi
IBA Student: Muhammad Kamil Shaheen
Brief: The Goal of this project is to detect Rheumatic Heart Disease in small children using machine learning to avoid invasive techniques. The technique involves synchronization of Electrocardiogram (ECG) and Phonocardiogram (PCG) data using a Variational Auto Encoder to get a combined representation, which is a novelty in this specific context. The synchronized features are combined with demographic data, and this is used to train multiple machine learning models. Mitral Valve recordings of 612 children were used, the dataset was imbalanced therefore publica datasets i.e. PhysioNet Challenge, EPHNOGRAM and SensSmart tech were also used Furthermore the SMOTE technique was applied to the dataset. It was observed that the ECG data could be transformed to PCG but not vice versa implying that ECG has more data. Afterwards, multiple ML algorithms were applied to the data showing that Gradient Boosting achieved the best AUROC and AUPRC of 1.
Research Impact on Healthcare:
- Increases Accessibility and Affordability
- Enables Effective Early Screening
- Improves Diagnostic Accuracy
Project 9 : Constraint Based Clinical Clustering of Surgical Procedures using Gaussian Mixture Models for Optimized Scheduling
Collaborating Hospital: AKUH
Medical Collaborator: Dr Asad Latif
Academic Collaborator: Dr. Taslim Murad
Brief: The project involved formulation of a novel approach for scheduling hospital operations. The proposed approach is innovative in that it puts forward the idea of using Mixture models to create clusters of related daily activities in the hospital that require resources. Furthermore, soft features were incorporated, i.e. features that don't have fixed values and vary depending upon multiple variables, e.g. length of stay, duration of surgery, etc. Ideally, the identification of clusters created through mixture models can better model the uncertainty in scheduling, thus leading to a more robust scheduling algorithm. The work is still in progress.
Research Impact on Healthcare:
- Improved Operational Efficiency
- Enhanced Resource Utilization
- Increased Scheduling Flexibility and Resilience
- Clinically Relevant Groupings
- Simplified Planning and Management
- Shift to Predictive Planning
Project 10: Optimizing Surgical Processes Using Process Mining
Collaborating Hospital: AKUH
Medical Collaborator: Dr Asad Latif
Academic Collaborator: Dr. Taslim Murad
Brief: Process mining is a technique which is used in healthcare to visualize how different activities interact in a particular context i.e. lab, hospital, ward, room, etc. The first step taken in the work was identification of the limitations of alpha minor which is prevalent in this domain. Three shortcomings were identified.
- Descriptive overreliance
- Lack of constraint integration
- Underrepresentation of surgical workflow
The main goal in this work was to enhance the Alpha Minor by embedding domain specific clinical constraints. A new version of Alpha minor was proposed named CAAM (Constraint Aware Alpha Minor) with an added Constraint validation layer. Three types of constraints were identified namely, Temporal, exclusive, and mandatory constraints. The intended outcome is that the process models produced must be both data driven and clinically coherent. This work is also in progress
Research Impact on Healthcare:
- Prevents implausible models
- Compliance verification
- Identifies process waste
- Pinpoint bottlenecks
- Objective process view
- Optimal resource Allocation
- Reduced patient stay
- Low process related error rates
Project 11: A Novel Framework for Breast Cancer Prediction in Pakistani Context
Collaborating Hospital: AKUH
Medical Collaborator: Dr Munira Memon
https://accscience.com/journal/AIH/articles/online_first/4425
https://ieeexplore.ieee.org/abstract/document/10912222
IBA Student: Saadia Humayun
Brief: Conventional ML models try to optimize a single objective, i.e. the best results of the performance matrix. This approach leads to clinically irrelevant decision making due to the shortcomings in the available data leading to spurious correlations which don’t affect a domain expert but are detrimental to ML algorithms. The problem is taken as being multi-model in this work with three objectives, to maximize the AUC, to maximize clinical relevance quantified using a 5-point scale for expert validated feature importance, and feature simplicity by considering the number of features. Three variants of multi objective genetic algorithms are implemented, i.e. Expert guided GA, adaptive GA, and multi-population GA each focused on the problem with a different perspective. This led to development of different models for different circumstances that might occur in a clinical environment.
Research Impact on Healthcare:
- Research Impact on Healthcare:
- Improves Physician-Patient Communication
- Promotes Model Parsimony and Efficiency
- Offers Flexible Decision Support
- Advances Clinical Research Methodology
Project 12: Grey Wolf Optimizer-Tuned Decision Trees for Enhanced Prediction of Mortality in Burns Patients
Collaborating Hospital: PIMS
Medical Collaborator: Dr Zoofishaan Jabeen
IBA Student: Akifa Khan
Brief: The focus of this project was the optimization of the decision tree construction process, leading to more accurate decision trees. Decision trees are more interpretable and closer to human thought processes. The downside of this interpretability is a lack of accuracy. Since the problem posed is multi-modal, so multiple evolutionary and swarm intelligence algorithms were applied (genetic, PSO, ACO, and grey wolf). The grey wolf algorithm gave optimum results. The project proposed an enhanced grey wolf based on optimized wolf frequency and fuzzification of stochastic wolf selection.
Research Impact on Healthcare:
- Earlier, More Reliable Sepsis Prediction
- Minimizing Dangerous False Negatives
- Enhanced Clinical Trust through Interpretability
- Robustness in Handling Imbalanced Data
- Reproducibility and Scalability
- Validated for Clinical Utility