Elita Lobo

PhD Student
University of Massachusetts Amherst
elobo@umass.edu
Google Scholar
Github
LinkedIn

Bio

I am currently a final year Ph.D. student in the College of Information and Computer Sciences at UMass Amherst, working with Prof. Yair Zick at FED.
My research focuses on Trustworthy Reinforcement Learning (RL) and Machine Learning, with a particular emphasis on developing practical, fair, and robust algorithms. Before starting my PhD, I completed a Master's degree in Computer Science at UMass Amherst in 2020, during which I had the privilege of working with external collaborators, Dr. Marek Petrik and Dr. Hima Lakkaraju. I also spent two years in industry as a Software Engineer at Flipkart and Endurance International Group. I graduated from NIT Durgapur with a B.Tech in Electronics and Communication Engineering.

Research Areas

Robust and Fair Decision-Making Systems: My PhD research centers on reinforcement learning and resource allocation under uncertainty, adversarial conditions, and fairness constraints. In Soft-Robust Algorithms for Batch Reinforcement Learning, I propose the soft-robust criterion as a principled alternative to the standard percentile criterion, which often leads to overly conservative policies. I develop two approximate algorithms that achieve more balanced and effective decision-making, both theoretically and empirically.
Expanding on this, Percentile Criterion Optimization in Offline Reinforcement Learning introduces a Value-at-Risk-based dynamic programming approach for robust policy optimization without constructing explicit uncertainty sets, enabling the learning of less conservative, uncertainty-aware policies.
In Data Poisoning Attacks on Off-Policy Policy Evaluation Methods, we present the first data poisoning framework targeting off-policy evaluation. Using influence functions, I show how small, targeted data perturbations can significantly skew policy value estimates, highlighting the need for robust evaluation techniques.
In Fair and Welfare-Efficient Constrained Multi-Matchings under Uncertainty, I address resource allocation with unknown agent utilities, using both stochastic and robust optimization to balance fairness and efficiency, validated on a real-world reviewer assignment dataset.

Large Language Models (LLMs): As an additional research direction, I investigate the reasoning capabilities, fine-tuning dynamics, and unlearning behavior of LLMs. In On the Impact of Fine-Tuning on Chain-of-Thought Reasoning, I study how fine-tuning affects LLM reasoning. The results show that while fine-tuning improves task-specific performance, it can reduce the consistency and faithfulness of chain-of-thought reasoning across datasets, revealing trade-offs between optimization and reasoning integrity.
I am also developing counterfactual verifiers for mathematical and logical reasoning tasks, leveraging counterfactual data to improve robustness.
In Matching Table Metadata with Business Glossaries Using Large Language Models, I apply LLMs to match enterprise metadata with business glossaries, demonstrating that LLMs can infer complex relationships between column names and glossary descriptions without manual tuning, enabling scalable metadata alignment in restricted-access environments. To further improve matching quality, we also trained Flan-T5 models using RLHF.
In Hierarchical Planning Agent for Web-Browsing Tasks, I introduce Structured Agent, a web-browsing agent that dynamically plans and executes tasks using an explicit AND/OR tree, enhancing robustness and interpretability on long-horizon tasks. As part of this work, I also trained a subplan reward model using Direct Preference Optimization (DPO) to help distinguish between effective and ineffective plan decompositions. I am currently working on improving the robustness of LLM verifiers for reasoning tasks.

Fairness-Centric and Interpretable Machine Learning: In Axiomatic Aggregations of Abductive Explanations, we tackle the challenge of generating robust and meaningful feature importance scores from multiple valid abductive explanations per data point, proposing aggregation techniques grounded in cooperative game theory (via power indices) and causal strength measures. These methods are axiomatically characterized to ensure desirable interpretability properties, and unlike popular methods such as SHAP and LIME, our approaches demonstrate improved robustness against adversarial perturbations.
I also contributed to On Welfare-Centric Fair Reinforcement Learning, where we introduce a framework for agents receiving vector-valued rewards from multiple beneficiaries and optimizing a welfare function. We show that welfare-optimal policies are inherently stochastic and start-state dependent, and present the E4 learner, which operates within an adversarial-fair learning framework to manage exploration and maintain welfare guarantees.

Education

P.h.D. in Computer Science
University of Massachusetts Amherst.
Started PhD in Fall 2022.
Robust Decision Making Systems. Supervised by Prof. Yair Zick.

Master in Computer Science
University of Massachusetts Amherst. 2018.
Thesis: Soft Robust Algorithms for Batch RL.

Bachelor of Technology (B.Tech) in Electronics and Communications Engineering
National Institute of Technology, Durgapur B.E. 2016.

Experience

Amazon (Central ML Team), Seattle Spring 2024
- RESEARCH INTERN
- Developed robust and interpretable web-browsing agents for shopping tasks.
Harvard Business School, MA Summer 2024
- RESEARCH INTERN
- Investigated the effects of fine-tuning on reasoning abilities of Large Language Models (LLMs).
Microsoft Research, India Summer 2023
- RESEARCH INTERN, MENTORS - GAURAV SINHA, NAGARAJAN NATARAJAN
- Developed methods to improve robustness of alignment algorithms like DPO for Small Language Models (SLMs).
IBM Research, Yorktown Heights, NY Summer 2023
- RESEARCH INTERN
- Developed novel methods that leverage LLMs and human feedback for accurate metadata-to-business-glossary matching.
- Fine-tuned LLMs using RLHF with contrastive loss to improve matching accuracy.
IBM Watson, Yorktown Heights, NY Summer 2022
- RESEARCH INTERN
- Designed novel algorithms for efficient hyperparameter tuning in reinforcement learning.
IBM Watson, Yorktown Heights, NY Summer 2021
- RESEARCH INTERN
- Integrated Off-Policy Policy Evaluation algorithms into an automated optimization framework.
- Developed a variance-minimizing technique for risk estimators using influence functions from robust statistics.
Harvard Business School, MA Winter 2020-2021
- RESEARCH INTERN
- Developed a novel data-poisoning attack framework to analyze the sensitivity of off-policy evaluation methods.

Flipkart, Bangalore, India Aug 2017 - Jul 2018

SOFTWARE ENGINEER
Built Deep Learning model to detect anomalous payouts in accounting systems.
Developed stock ledger generator API and invoice register API.
Provided on-call support for inventory valuation and warehouse transfer systems.

Endurance International Group, Bangalore, India Jul 2016 - Aug 2017

SOFTWARE ENGINEER
Created APIs for web orchestration, smart search, and session management.
Developed ML system to detect parked domains.
Built Imagio: a fast keyword- and color-filtered image search tool.

On-going Work

FinHOP: Benchmarking Retrieval-Augmented Generation for Multi-Hop Questions on Long Financial Documents
Vinitra Muralikrishna, Prit Shah, Manas Wadhwa, Aeyan Ashraf, Wenlong Zhao, Eliot Brenner, Lanlan Ji, Dominic Seyler
(Under Review)

Counterfactual LLM Verifiers for Math and Logic Reasoning Tasks
Elita Lobo, Shiv Shankar, Chirag Agarwal, Yair Zick
(In Progress)

Publications

Please visit my Google Scholar page for an updated list of publications.

A Hierarchical Planning Framework for LLM-based Web Agents

Elita Lobo*, Frank Chen, Jingjing Meng, Yang Jiao, Nan Xi, Yan Gao

Efficient Reasoning Workshop, NeurIPS 2025

Paper Poster

On the Impact of Fine-Tuning on Chain-of-Thought Reasoning in LLMs

Elita Lobo*, Chirag Agarwal, Hima Lakkaraju

NAACL 2025

Paper

Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models

Anmol Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid Hasan, Elita Lobo

COLING 2024

Paper Code

Fair and Welfare-Efficient Resource Allocation under Uncertainty

Elita Lobo*, Justin Payan*, Cyrus Cousins, Yair Zick

NeurIPS 2024

Paper Code

On Welfare-Centric Fair Reinforcement Learning

Cyrus Cousins, Elita Lobo, Kavosh Asadi, Michael L. Littman

Reinforcement Learning Conference 2024

(Outstanding Paper Award)

Paper

Axiomatic Aggregations of Abductive Explanations

Vignesh Viswanathan*, Elita Lobo*, Yacine Izza, Gagan Biradar, Yair Zick

AAAI 2024

Paper Code

Percentile Criterion Optimization in Offline Reinforcement Learning

Elita Lobo*, Cyrus Cousins, Marek Petrik, Yair Zick

NeurIPS 2023

Paper Code

Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

Elita Lobo*, Harvineet Singh, Cynthia Rudin, Himabindu Lakkaraju

UAI 2022

(Oral Presentation, Top 5%)

Paper Code

A Novel System for Metadata to Glossary Matching in Data Lakes Using Human Feedback and Generative Models

Elita Lobo*, Nhan Pham, Oktie Hassanzadeh, Dharmashankar Subramanian, Nandana Sampath Mihindukulasooriya, Long Vu

Patent (Under Review), 2024

A Metahyperparameter Tuning Framework for Reinforcement Learning

Elita Lobo*, Nhan Pham, Dharmashankar Subramanian, Tejaswini Pedapati

Patent, 2023

Link

Matching Table Metadata with Business Glossaries Using Large Language Models

Elita Lobo*, Oktie Hassanzadeh, Nhan Pham, Nandana Mihindukulasooriya, Dharmashankar Subramanian, Horst Samulowitz

International Workshop on Ontology Matching, 2023

Paper

Soft-Robust Algorithms for Batch Reinforcement Learning

Elita Lobo*, Mohammad Ghavamzadeh, Marek Petrik

R2AW Workshop, IJCAI 2021

Paper

Behavior Policy Search for Risk Estimators in RL

Elita Lobo*, Yash Chandak, Dharmashankar Subramanian, Josiah Hanna, Marek Petrik

NeurIPS Workshop on Safe and Robust Control, 2021

Paper

Skills

Programming Languages: Python, C++, Java
Libraries & Frameworks: MySQL, PyTorch, TensorFlow, Transformers, Spring Boot, ElasticSearch
Research Areas: Machine Learning, Reinforcement Learning, Convex Optimization, Natural Language Processing

Published Software

Mentorship Experience

PhD Applicant Support Program: Mentored underrepresented students applying to graduate school.
2023, 2024
CS 696DS Industry Mentorship Program - Primary PhD Mentor
UMass Amherst, 2025
Mentored master's students through an NLP-focused research project Alternate Preference Optimization (AltPO) in collaboration with Microsoft. Guided the team from project conception to publication. Our work was published at COLING 2024 and is available on arXiv (2409.13474).
CS 696DS Industry Mentorship Program - PhD Mentor
UMass Amherst, 2024–2025
Mentored master's students through an NLP-focused research project on multi-hop finance QA generation for RAG evaluation, in collaboration with Goldman Sachs.

Awards & Achievements

Outstanding Paper Award
RLC 2024
Graduate Scholarship: Awarded the Anuradha and Hanuma Kodavalla Graduate Scholarship in Computer Science.
UMass Amherst, 2023 - $10,000
Fellowship: Recipient of the UNH CEPS Graduate Fellowship.
University of New Hampshire, 2020
AI Fellowship: Recipient of the Robin Popplestone Fellowship in Robotics and Artificial Intelligence.
UMass Amherst, 2019 - $5,000
1st Place: Hackday 10 (Marketplace Category) at Flipkart.
Flipkart, 2018
Top 1%: 99 percentile in All India Engineering Entrance Exam (State Rank 9).
India, 2012
Top Rank: State Rank 11 (99.9 percentile) in Goa Engineering Entrance Exam.
India, 2012

Teaching Experience

Teaching Assistant: Supported instruction and grading for core Computer Science courses, including:
Operating Systems (CS377, Fall 2018), Reasoning under Uncertainty (CS240, Spring 2019), Numerical Optimization (CS590OP, Fall 2019), Convex Optimization (CS690OP, Spring 2020).
University of Massachusetts Amherst