Bio
I am currently a final year Ph.D. student in the College of Information and Computer Sciences at UMass Amherst, working with
Prof. Yair Zick at FED.
My research focuses on Trustworthy Reinforcement Learning (RL) and Machine Learning, with a particular emphasis on developing practical, fair, and robust algorithms.
Before starting my PhD, I completed a Master's degree in Computer Science at UMass Amherst in 2020, during which I had the privilege of working with external collaborators, Dr. Marek Petrik and Dr. Hima Lakkaraju. I also spent two years in industry as a Software Engineer at Flipkart and Endurance International Group. I graduated from NIT Durgapur with a B.Tech in Electronics and Communication Engineering.
Research Areas
Robust and Fair Decision-Making Systems: My PhD research centers on reinforcement learning and resource allocation under uncertainty, adversarial conditions, and fairness constraints. In Soft-Robust Algorithms for Batch Reinforcement Learning, I propose the soft-robust criterion as a principled alternative to the standard percentile criterion, which often leads to overly conservative policies. I develop two approximate algorithms that achieve more balanced and effective decision-making, both theoretically and empirically.
Expanding on this, Percentile Criterion Optimization in Offline Reinforcement Learning introduces a Value-at-Risk-based dynamic programming approach for robust policy optimization without constructing explicit uncertainty sets, enabling the learning of less conservative, uncertainty-aware policies.
In Data Poisoning Attacks on Off-Policy Policy Evaluation Methods, we present the first data poisoning framework targeting off-policy evaluation. Using influence functions, I show how small, targeted data perturbations can significantly skew policy value estimates, highlighting the need for robust evaluation techniques.
In Fair and Welfare-Efficient Constrained Multi-Matchings under Uncertainty, I address resource allocation with unknown agent utilities, using both stochastic and robust optimization to balance fairness and efficiency, validated on a real-world reviewer assignment dataset.
Large Language Models (LLMs): As an additional research direction, I investigate the reasoning capabilities, fine-tuning dynamics, and unlearning behavior of LLMs. In On the Impact of Fine-Tuning on Chain-of-Thought Reasoning, I study how fine-tuning affects LLM reasoning. The results show that while fine-tuning improves task-specific performance, it can reduce the consistency and faithfulness of chain-of-thought reasoning across datasets, revealing trade-offs between optimization and reasoning integrity.
I am also developing counterfactual verifiers for mathematical and logical reasoning tasks, leveraging counterfactual data to improve robustness.
In Matching Table Metadata with Business Glossaries Using Large Language Models, I apply LLMs to match enterprise metadata with business glossaries, demonstrating that LLMs can infer complex relationships between column names and glossary descriptions without manual tuning, enabling scalable metadata alignment in restricted-access environments. To further improve matching quality, we also trained Flan-T5 models using RLHF.
In Hierarchical Planning Agent for Web-Browsing Tasks, I introduce Structured Agent, a web-browsing agent that dynamically plans and executes tasks using an explicit AND/OR tree, enhancing robustness and interpretability on long-horizon tasks. As part of this work, I also trained a subplan reward model using Direct Preference Optimization (DPO) to help distinguish between effective and ineffective plan decompositions. I am currently working on improving the robustness of LLM verifiers for reasoning tasks.
Fairness-Centric and Interpretable Machine Learning: In Axiomatic Aggregations of Abductive Explanations, we tackle the challenge of generating robust and meaningful feature importance scores from multiple valid abductive explanations per data point, proposing aggregation techniques grounded in cooperative game theory (via power indices) and causal strength measures. These methods are axiomatically characterized to ensure desirable interpretability properties, and unlike popular methods such as SHAP and LIME, our approaches demonstrate improved robustness against adversarial perturbations.
I also contributed to On Welfare-Centric Fair Reinforcement Learning, where we introduce a framework for agents receiving vector-valued rewards from multiple beneficiaries and optimizing a welfare function. We show that welfare-optimal policies are inherently stochastic and start-state dependent, and present the E4 learner, which operates within an adversarial-fair learning framework to manage exploration and maintain welfare guarantees.
Education
P.h.D. in Computer Science
University of Massachusetts Amherst.
Started PhD in Fall 2022.
Robust Decision Making Systems. Supervised by
Prof. Yair Zick.
Master in Computer Science
University of Massachusetts Amherst. 2018.
Thesis: Soft Robust Algorithms for Batch RL.
Bachelor of Technology (B.Tech) in Electronics and Communications Engineering
National Institute of Technology, Durgapur B.E. 2016.
Experience
-
Amazon (Central ML Team), Seattle Spring 2024
- RESEARCH INTERN
- Developed robust and interpretable web-browsing agents for shopping tasks.
-
Harvard Business School, MA Summer 2024
- RESEARCH INTERN
- Investigated the effects of fine-tuning on reasoning abilities of Large Language Models (LLMs).
-
Microsoft Research, India Summer 2023
- RESEARCH INTERN, MENTORS - GAURAV SINHA, NAGARAJAN NATARAJAN
- Developed methods to improve robustness of alignment algorithms like DPO for Small Language Models (SLMs).
-
IBM Research, Yorktown Heights, NY Summer 2023
- RESEARCH INTERN
- Developed novel methods that leverage LLMs and human feedback for accurate metadata-to-business-glossary matching.
- Fine-tuned LLMs using RLHF with contrastive loss to improve matching accuracy.
-
IBM Watson, Yorktown Heights, NY Summer 2022
- RESEARCH INTERN
- Designed novel algorithms for efficient hyperparameter tuning in reinforcement learning.
-
IBM Watson, Yorktown Heights, NY Summer 2021
- RESEARCH INTERN
- Integrated Off-Policy Policy Evaluation algorithms into an automated optimization framework.
- Developed a variance-minimizing technique for risk estimators using influence functions from robust statistics.
-
Harvard Business School, MA Winter 2020-2021
- RESEARCH INTERN
- Developed a novel data-poisoning attack framework to analyze the sensitivity of off-policy evaluation methods.
Flipkart, Bangalore, India Aug 2017 - Jul 2018
- SOFTWARE ENGINEER
- Built Deep Learning model to detect anomalous payouts in accounting systems.
- Developed stock ledger generator API and invoice register API.
- Provided on-call support for inventory valuation and warehouse transfer systems.
Endurance International Group, Bangalore, India Jul 2016 - Aug 2017
- SOFTWARE ENGINEER
- Created APIs for web orchestration, smart search, and session management.
- Developed ML system to detect parked domains.
- Built Imagio: a fast keyword- and color-filtered image search tool.
On-going Work
FinHOP: Benchmarking Retrieval-Augmented Generation for Multi-Hop Questions on Long Financial Documents
Vinitra Muralikrishna, Prit Shah, Manas Wadhwa, Aeyan Ashraf, Wenlong Zhao, Eliot Brenner, Lanlan Ji, Dominic Seyler
(Under Review)
Counterfactual LLM Verifiers for Math and Logic Reasoning Tasks
Elita Lobo, Shiv Shankar, Chirag Agarwal, Yair Zick
(In Progress)
Publications
Please visit my Google Scholar page for an updated list of publications.
A Hierarchical Planning Framework for LLM-based Web Agents
Elita Lobo*, Frank Chen, Jingjing Meng, Yang Jiao, Nan Xi, Yan Gao
Efficient Reasoning Workshop, NeurIPS 2025
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning in LLMs
Elita Lobo*, Chirag Agarwal, Hima Lakkaraju
NAACL 2025
Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models
Anmol Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid Hasan, Elita Lobo
COLING 2024
Fair and Welfare-Efficient Resource Allocation under Uncertainty
Elita Lobo*, Justin Payan*, Cyrus Cousins, Yair Zick
NeurIPS 2024
On Welfare-Centric Fair Reinforcement Learning
Cyrus Cousins, Elita Lobo, Kavosh Asadi, Michael L. Littman
Reinforcement Learning Conference 2024
(Outstanding Paper Award)
Axiomatic Aggregations of Abductive Explanations
Vignesh Viswanathan*, Elita Lobo*, Yacine Izza, Gagan Biradar, Yair Zick
AAAI 2024
Percentile Criterion Optimization in Offline Reinforcement Learning
Elita Lobo*, Cyrus Cousins, Marek Petrik, Yair Zick
NeurIPS 2023
Data Poisoning Attacks on Off-Policy Policy Evaluation Methods
Elita Lobo*, Harvineet Singh, Cynthia Rudin, Himabindu Lakkaraju
UAI 2022
(Oral Presentation, Top 5%)
A Novel System for Metadata to Glossary Matching in Data Lakes Using Human Feedback and Generative Models
Elita Lobo*, Nhan Pham, Oktie Hassanzadeh, Dharmashankar Subramanian, Nandana Sampath Mihindukulasooriya, Long Vu
Patent (Under Review), 2024
A Metahyperparameter Tuning Framework for Reinforcement Learning
Elita Lobo*, Nhan Pham, Dharmashankar Subramanian, Tejaswini Pedapati
Patent, 2023
Matching Table Metadata with Business Glossaries Using Large Language Models
Elita Lobo*, Oktie Hassanzadeh, Nhan Pham, Nandana Mihindukulasooriya, Dharmashankar Subramanian, Horst Samulowitz
International Workshop on Ontology Matching, 2023
Soft-Robust Algorithms for Batch Reinforcement Learning
Elita Lobo*, Mohammad Ghavamzadeh, Marek Petrik
R2AW Workshop, IJCAI 2021
Behavior Policy Search for Risk Estimators in RL
Elita Lobo*, Yash Chandak, Dharmashankar Subramanian, Josiah Hanna, Marek Petrik
NeurIPS Workshop on Safe and Robust Control, 2021
Skills
- Programming Languages: Python, C++, Java
- Libraries & Frameworks: MySQL, PyTorch, TensorFlow, Transformers, Spring Boot, ElasticSearch
- Research Areas: Machine Learning, Reinforcement Learning, Convex Optimization, Natural Language Processing
Published Software
Mentorship Experience
-
PhD Applicant Support Program: Mentored underrepresented students applying to graduate school.
2023, 2024
-
CS 696DS Industry Mentorship Program - Primary PhD Mentor
UMass Amherst, 2025
Mentored master's students through an NLP-focused research project Alternate Preference Optimization (AltPO) in collaboration with Microsoft. Guided the team from project conception to publication. Our work was published at COLING 2024 and is available on arXiv (2409.13474).
-
CS 696DS Industry Mentorship Program - PhD Mentor
UMass Amherst, 2024–2025
Mentored master's students through an NLP-focused research project on multi-hop finance QA generation for RAG evaluation, in collaboration with Goldman Sachs.
Awards & Achievements
-
Outstanding Paper Award
RLC 2024
-
Graduate Scholarship: Awarded the Anuradha and Hanuma Kodavalla Graduate Scholarship in Computer Science.
UMass Amherst, 2023 - $10,000
-
Fellowship: Recipient of the UNH CEPS Graduate Fellowship.
University of New Hampshire, 2020
-
AI Fellowship: Recipient of the Robin Popplestone Fellowship in Robotics and Artificial Intelligence.
UMass Amherst, 2019 - $5,000
-
1st Place: Hackday 10 (Marketplace Category) at Flipkart.
Flipkart, 2018
-
Top 1%: 99 percentile in All India Engineering Entrance Exam (State Rank 9).
India, 2012
-
Top Rank: State Rank 11 (99.9 percentile) in Goa Engineering Entrance Exam.
India, 2012
Teaching Experience
-
Teaching Assistant: Supported instruction and grading for core Computer Science courses, including:
Operating Systems (CS377, Fall 2018),
Reasoning under Uncertainty (CS240, Spring 2019),
Numerical Optimization (CS590OP, Fall 2019),
Convex Optimization (CS690OP, Spring 2020).
University of Massachusetts Amherst