sammy mustafa ai data scientist

Research


Welcome to my portfolio of research projects, showcasing a dynamic blend of scientific inquiry and data-driven innovation! My journey began with a deep dive into molecular research, laying a strong foundation in analytical thinking and problem-solving. As my career progressed, I pivoted towards leveraging data analytics in healthcare, focusing on evidence-based solutions to complex challenges. This transition highlights my adaptability to apply scientific principles in diverse contexts.

My current projects underscore my proficiency in bioinformatics and data science, demonstrating skills in data manipulation, statistical analysis, machine learning, and AI-driven strategies. These projects reflect my commitment to transforming vast datasets into meaningful insights, directly applicable to healthcare and beyond.

Please click on the project titles for the write-ups! For a detailed display of my technical skills, including coding proficiency in various languages and tools integral to data science and AI, please visit my GitHub page linked below. Each repository displays my ability to integrate multidisciplinary knowledge into practical, impactful applications.



Current Pursuit:

Driven by a commitment to alleviate the trial-and-error approach in mental health treatment, my current work at the Center for Precision Psychiatry at Massachusetts General Hospital harnesses whole-genome regression models to elucidate the pathophysiology of major depressive disorder (MDD). However, my capstone focuses on pioneering the use of dynamic causal modeling to predict the onset of depressive biotypes in individuals prior to any clinical manifestation of the disorder. This innovative approach utilizes resting-state functional MRI (rs-fMRI) data to forecast neuronal activity patterns that may indicate susceptibility to depression.

The analysis of neuroimaging data, particularly through generative embedding and machine learning classifiers, has shown promising results in predicting future depressive episodes from large datasets like the UK Biobank. The integration of dynamic causal modeling with machine learning, using techniques such as support vector machines with sigmoid kernels, has enabled us to create predictive models that can distinguish individuals at risk with a degree of accuracy previously unattainable with traditional analysis methods. The machine learning classifiers are robustly validated through nested cross-validation, ensuring the model’s generalizability and reliability.

This research not only pushes the boundaries of precision psychiatry but also offers a potential revolution in the early detection and prevention of depressive disorders. By identifying at-risk individuals before the onset of symptoms, we can target interventions more effectively, profoundly impacting patient care and mental health management. This approach exemplifies the novel application of AI in medicine, leveraging computational techniques to tackle some of the most complex challenges in mental health.



Enhancing Molecular Docking: Studying and Optimizing Generative Model Priors

In the realm of drug discovery, this research focuses on enhancing molecular docking through machine learning. Traditional docking methods, rooted in fundamental physics, often fall short in their accuracy of predicting ligand binding poses. This work advances this field by utilizing generative models like DiffDock and HarmonicFlow, which learn from data rather than relying solely on predetermined physical principles.

This analysis revealed that while the predicted molecular structures were physically feasible, their radius of gyration estimates were notably low. This suggested a significant shortfall in the models' ability to accurately represent long-range (3+ bonds distance) atomic information. Through this intricate examination, we established that the existing models, though structurally sound, lacked crucial long-range data critical for precise docking predictions.

To address these challenges, my team and I developed two innovative priors aimed at refining the predictive models. The first prior harnessed the 3D structural information of the ligand, while the second utilized consensus distances across multiple ligand conformers. We set a threshold on long-range pairwise distances to ensure the consistency of information fed into the prior. These novel approaches significantly improved the accuracy and consistency of our predictions, marking a substantial advancement in the field of drug discovery and computational biology by enhancing the reliability of generative molecular docking models.



PTENP1: A Pivotal Pseudogene in Glioblastomas

In this research, I delved into the therapeutic implications of the PTENP1 pseudogene in the context of glioblastoma treatment, a challenging and aggressive form of brain cancer. Historically considered genomic 'junk', pseudogenes have recently gained recognition for their regulatory roles in gene expression. My research focused on the PTENP1 pseudogene, which closely resembles the PTEN tumor suppressor gene.

The core of my research involved the strategic overexpression of PTENP1 in glioblastoma cells. This approach exploited the similarity between PTENP1 and PTEN to sequester microRNA-26a away from the PTEN gene. Such sequestration enabled PTEN to evade microRNA-mediated silencing, thus restoring its tumor-suppressing activity. The experimental outcomes were substantial, indicating that overexpression of PTENP1 curtailed cell proliferation, augmented the efficacy of chemotherapy, initiated cell cycle arrest, promoted autophagy, and attenuated the angiogenesis potential of the tumor cells.

This study presents a pioneering approach to cancer therapy, illustrating the potential of pseudogenes as therapeutic targets. By leveraging the PTENP1 pseudogene as a molecular decoy, the research demonstrates a novel method to reactivate tumor suppressor genes, thereby offering promising new pathways for glioblastoma treatment.



SERBP1: Exploiting RNA-Binding Protein-Mediated PAI-1 Inhibition

Here, I investigated the intricate relationship between SERBP1, a mRNA binding protein, and PAI-1, a key protein associated with aging-related conditions such as obesity, insulin resistance, Alzheimer's, and various fibroses. The focus is on exploring SERBP1's role in regulating PAI-1 mRNA, particularly in the context of cardiac fibrosis.

SERBP1, which demonstrates homology to intrinsically-disordered proteins, possesses unique prion-like properties that facilitate transgenerational epigenetic inheritance. This characteristic suggests that SERBP1 can self-template in a non-amyloid, prion-like manner, potentially amplifying its regulatory effects on PAI-1 levels.

My research aimed to elucidate and utilize the mechanism by which SERBP1 interacts with and influences PAI-1. The study involved manipulating SERBP1's localization in human cell models to observe its impact on PAI-1 expression. By targeting specific regions such as the RG/RGG region or influencing cyclic nucleotide levels, the research seeks to determine the functional differences between cytoplasmic and nuclear SERBP1 and its subsequent effect on PAI-1 levels.

The potential of SERBP1 to serve as a novel therapeutic target in cardiac fibrosis is significant. The unique prion-like behavior of SERBP1 offers a new avenue to modulate PAI-1 levels, which could lead to groundbreaking treatments for cardiac fibrosis and possibly other aging-related diseases. This research provides vital insights into the pathogenesis of age-dependent cardiac fibrosis and paves the way for developing innovative therapeutic strategies.