Research

My research interests are in the field of Operations Research, specifically in applying advanced analytics, optimization techniques, and machine learning models to solve complex problems. I have always preferred and valued a multi-disciplinary approach; I believe that research which extends across several disciplines can bridge the gaps in our understanding, cultivate broader perspectives, and ultimately help foster innovative solutions.

5 projects

AlphaFold Accessibility: Resource-Optimized Open-Source Application for Protein Structure Prediction

September, 2024 - March, 2025

Developed an optimized open-source implementation of AlphaFold 2 & 3 in joint work with Dr. William Lai at the Epigenomics Facility at Cornell University that addresses a fundamental challenge in computational structural biology: resource inefficiency. By analyzing the AlphaFold workflow, we discovered that approximately 75% of runtime is consumed by CPU-intensive multiple sequence alignment (MSA) generation, with GPU resources only required for the final structure prediction phase. Our solution intelligently separates these phases within a single Open OnDemand (OOD) instance, preventing wasteful GPU allocation during CPU-intensive steps. I presented this work at the Global Open OnDemand Conference at Harvard University in March 2025, where we demonstrated both the performance benefits and the practical implementation strategies for HPC administrators and researchers. Our implementation is particularly relevant with the release of AlphaFold 3, which natively supports the CPU/GPU separation approach, making our framework immediately compatible with this latest iteration. Our work has been adopted by several major HPC's in the United States. The code is available at github.com/EpiGenomicsCode/ProteinStructure-OOD.

Unsupervised Embeddings, Networks, and LLM’s for hypothesis generation

January, 2023 - December, 2024

This project is part of my M.Sc thesis under the supervision of Dr. Soundar Kumara and Dr. Peter Butler at Penn State. It aims to extract insights from biomedical literature by generating word embeddings to create weighted word networks. Due to the advances in large language models, biomedical named entity recognition has become much more reliable. Hence, depending on the research question, we use keywords to extract relevant information from these embedding models while carefully selecting the data sources for training. When limiting our models to literature published before a specific year, we discovered mentions of entities in our network that were not officially recognized until later, providing new avenues for the research questions we are investigating.

Causal Effect Estimation from Longitudinal Data using CFRnet and Gaussian Processes

January, 2023 - May, 2023

In this study, I utilized Counterfactual Neural Networks (CFRNet) and Gaussian processes to determine the Average Treatment Effects (ATE) in longitudinal data. The CFRNet model was applied to both treatment and control groups, with a Gaussian process integrated in the loss function to account for the temporal patterns and multilevel correlations in the data. I'm deeply appreciative of Professor Vasant Honavar's guidance, as many of the methodologies adopted stemmed from our insightful discussions. (Source code is available in my Github repository.)

From Health Challenges to Hunger: Insights into Food Insecurity in the United States

August, 2022 - December, 2022

This work explores the relationship between chronic diseases and food insecurity in the U.S. The study employs health metrics to forecast food insecurity rates at both state and county levels between 2016 and 2020. Two primary granularities were considered i.e., adults and children, derived from Feeding America and the National Survey of Children's Health respectively. By employing tree and regression based methods, the project focuses on key health factors that exacerbate food insecurity. For instance, the age of parents, adverse childhood experiences, and certain health conditions emerge as pivotal determinants. A notable finding is that neighboring counties often share similar health challenges, hinting at potential regional strategies to address food insecurity. The study highlights the potential of targeted interventions, informed by specific health metrics, to enhance food security in vulnerable communities. This project was done in partial fulfillment of IE 582: Engineering Analytics with guidance from Dr. Soundar Kumara.

University Timetable Scheduler with Genetic Algorithm based population alteration mechanism

June, 2019 - August, 2019

During my summer research internship at IIT Guwahati under Professor Swarup Bag, I developed a university timetable generator using a parent-centric recombination operator in the Genetic Algorithm. The goal was to synchronize room constraints, subject schedules, and faculty availability/preferences, ensuring that labs or lectures were not scheduled consecutively. The algorithm began with a random timetable set, assessed its fitness, executed crossovers, and iteratively improved the solutions. The final product was designed for maximum efficiency in timetable structuring. I have shared the code on my GitHub for both academic and real-world applications.