I am currently serving as a Lead NLP Researcher at Triomics, where I focus on harnessing LLMs and Generative AI to extract insights from Electronic Health Records (EHR) of cancer patients, with the goal of enhancing oncology research and care. Before joining Triomics, I held the position of Data Scientist (MLE-3) at RingCentral Innovation (India), where I specialized primarily in addressing Speech and NLP-related challenges for Conversational AI.
Before joining industry, I completed my bachelor + master degree from IIT Kanpur, where I was advised by
Dr. Gabriel Kreiman (Harvard Medical School, Boston, USA) and
Prof. K. S. Venkatesh (IIT Kanpur, India).
For my master thesis (An Integrated Computational Model of Visual Search Combining Eccentricity, Bottom-up, and Top-down Cues), I worked on the intersection of computer vision, deep learning, cognitive sciences, and neuroscience. During my undergraduate years, I was awarded the prestigious Khoran Scholarship by IUSSTF, WINStep and DBT (India) to work with Dr. Gabriel Kreiman at Harvard Medical School, Boston, USA on computational neuroscience and deep learning. Before that, I also touched upon robotics during my internship at the Centre for Smart System, SUTD Singapore and as a part of Humanoid IITK Team. During my academic career, I have had the good fortune to work with some incredible researchers, Dr. Gabriel Kreiman (Harvard Medical School, Boston, and CBMM, MIT), Mengmi Zhang (NTU Singapore, and A \*STAR Singapore), Prof. K. S. Venkatesh (IIT Kanpur India), and Prof. Nisheeth Srivastava (IIT Kanpur India).
During my free time, I love to do gardening, pencil sketching, kirigami, and watching animes. I also love 😴 (more than 50% of time).
About Me
Publications
-
Shashi Kant Gupta, Aditya Basu, Mauro Nievas, Jerrin Thomas, Nathan Wolfrath, Adhitya Ramamurthi, Bradley Taylor, Anai N. Kothari, Regina Schwind, Therica M. Miller, Sorena Nadaf-Rahrov, Yanshan Wang, Hrituraj Singh "PRISM Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models", Accepted at npj Digital Medicine (Nature) [abstract] [paper]
Clinical trial matching is the task of identifying trials for which patients may be potentially eligible. Typically, this task is labor-intensive and requires detailed verification of patient electronic health records (EHRs) against the stringent inclusion and exclusion criteria of clinical trials. This process is manual, time-intensive, and challenging to scale up, resulting in many patients missing out on potential therapeutic options. Recent advancements in Large Language Models (LLMs) have made automating patient-trial matching possible, as shown in multiple concurrent research studies. However, the current approaches are confined to constrained, often synthetic datasets that do not adequately mirror the complexities encountered in real-world medical data. In this study, we present the first, end-to-end large-scale empirical evaluation of clinical trial matching using real-world EHRs. Our study showcases the capability of LLMs to accurately match patients with appropriate clinical trials. We perform experiments with proprietary LLMs, including GPT-4 and GPT-3.5, as well as our custom fine-tuned model called OncoLLM and show that OncoLLM, despite its significantly smaller size, not only outperforms GPT-3.5 but also matches the performance of qualified medical doctors. All experiments were carried out on real-world EHRs that include clinical notes and available clinical trials from a single cancer center in the United States.
This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom. We propose a two-stage Encoder-Decoder-based E2E model. The encoder module consists of 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with a global context. The decoder module uses an attentive temporal pooling mechanism to get fixed length time-independent feature representation. The total number of parameters in the model is around 22.1 M, which is relatively light compared to using some large-scale pre-trained speech models. We achieved an EER of 15.6% in the closed track and 11.1% in the open track (baseline system 22.1%). We also curated additional LangId data from YouTube videos (having Singaporean speakers), which will be released for public use.
Visual search is a ubiquitous and often challenging daily task, exemplified by looking for the car keys at home or a friend in a crowd. An intriguing property of some classical search tasks is an asymmetry such that finding a target A among distractors B can be easier than finding B among A. To elucidate the mechanisms responsible for asymmetry in visual search, we propose a computational model that takes a target and a search image as inputs and produces a sequence of eye movements until the target is found. The model integrates eccentricity-dependent visual recognition with target-dependent top-down cues. We compared the model against human behavior in six paradigmatic search tasks that show asymmetry in humans. Without prior exposure to the stimuli or task-specific training, the model provides a plausible mechanism for search asymmetry. We hypothesized that the polarity of search asymmetry arises from experience with the natural environment. We tested this hypothesis by training the model on augmented versions of ImageNet where the biases of natural images were either removed or reversed. The polarity of search asymmetry disappeared or was altered depending on the training protocol. This study highlights how classical perceptual properties can emerge in neural network models, without the need for task-specific training, but rather as a consequence of the statistical properties of the developmental diet fed to the model. All source code and data are publicly available at here.
Deep Neural Network representations correlate very well with neural responses measured in primates' brains and with psychological representations of human similarity judgement tasks, making them possible models for human behavior-related tasks. This study investigates whether DNNs can learn an implicit association (between colors and emotions) for images. An experiment was conducted in which subjects were asked to select a color for a given emotion-inducing image. These human responses (decision probabilities) were modeled on neural networks using representations extracted from pre-trained DNNs for the images and colors (a square of the color). The model presented showed a fuzzy linear relationship with the decision probabilities. Finally, this model was presented as a model for emotion classification tasks, specifically with very few training examples, showing an improvement in accuracy from a standard classification model. This analysis can be of relevance to psychologists studying these associations and AI researchers modelling emotional intelligence in machines.
Deep Learning has become interestingly popular in computer vision, mostly attaining near or above human-level performance in various vision tasks. But recent work has also demonstrated that these deep neural networks are very vulnerable to adversarial examples (adversarial examples - inputs to a model which are naturally similar to original data but fools the model in classifying it into a wrong class). Humans are very robust against such perturbations; one possible reason could be that humans do not learn to classify based on an error between "target label" and "predicted label" but possibly due to reinforcements that they receive on their predictions. In this work, we proposed a novel method to train deep learning models on an image classification task. We used a reward-based optimization function, similar to the vanilla policy gradient method used in reinforcement learning, to train our model instead of conventional cross-entropy loss. An empirical evaluation on the cifar10 dataset showed that our method learns a more robust classifier than the same model architecture trained using cross-entropy loss function (on adversarial training). At the same time, our method shows a better generalization with the difference in test accuracy and train accuracy <2% for most of the time compared to the cross-entropy one, whose difference most of the time remains >2%.
The backpropagation algorithm is often debated for its biological plausibility. However, various learning methods for neural architecture have been proposed in search of more biologically plausible learning. Most of them have tried to solve the "weight transport problem" and try to propagate errors backward in the architecture via some alternative methods. In this work, we investigated a slightly different approach that uses only the local information which captures spike timing information with no propagation of errors. The proposed learning rule is derived from the concepts of spike timing dependant plasticity and neuronal association. A preliminary evaluation done on the binary classification of MNIST and IRIS datasets with two hidden layers shows comparable performance with backpropagation. The model learned using this method also shows a possibility of better adversarial robustness against the FGSM attack compared to the model learned through backpropagation of cross-entropy loss. The local nature of learning gives a possibility of large scale distributed and parallel learning in the network. And finally, the proposed method is a more biologically sound method that can probably help in understanding how biological neurons learn different abstractions.