photo

 Elita Lobo

  PhD Student
  University of Massachusetts Amherst
  elobo@umass.edu
  Google Scholar
  Github
  LinkedIn

Bio

I am currently a final year Ph.D. student in the College of Information and Computer Sciences at UMass Amherst, working with Prof. Yair Zick at FED.
My research focuses on Trustworthy Reinforcement Learning (RL) and Machine Learning, with a particular emphasis on developing practical, fair, and robust algorithms. Before starting my PhD, I completed a Master's degree in Computer Science at UMass Amherst in 2020, during which I had the privilege of working with external collaborators, Dr. Marek Petrik and Dr. Hima Lakkaraju. I also spent two years in industry as a Software Engineer at Flipkart and Endurance International Group. I graduated from NIT Durgapur with a B.Tech in Electronics and Communication Engineering.

Research Areas

Robust and Fair Decision-Making Systems: My PhD research centers on reinforcement learning and resource allocation under uncertainty, adversarial conditions, and fairness constraints. In Soft-Robust Algorithms for Batch Reinforcement Learning, I propose the soft-robust criterion as a principled alternative to the standard percentile criterion, which often leads to overly conservative policies. I develop two approximate algorithms that achieve more balanced and effective decision-making, both theoretically and empirically.
Expanding on this, Percentile Criterion Optimization in Offline Reinforcement Learning introduces a Value-at-Risk-based dynamic programming approach for robust policy optimization without constructing explicit uncertainty sets, enabling the learning of less conservative, uncertainty-aware policies.
In Data Poisoning Attacks on Off-Policy Policy Evaluation Methods, we present the first data poisoning framework targeting off-policy evaluation. Using influence functions, I show how small, targeted data perturbations can significantly skew policy value estimates, highlighting the need for robust evaluation techniques.
In Fair and Welfare-Efficient Constrained Multi-Matchings under Uncertainty, I address resource allocation with unknown agent utilities, using both stochastic and robust optimization to balance fairness and efficiency, validated on a real-world reviewer assignment dataset.

Large Language Models (LLMs): As an additional research direction, I investigate the reasoning capabilities, fine-tuning dynamics, and unlearning behavior of LLMs. In On the Impact of Fine-Tuning on Chain-of-Thought Reasoning, I study how fine-tuning affects LLM reasoning. The results show that while fine-tuning improves task-specific performance, it can reduce the consistency and faithfulness of chain-of-thought reasoning across datasets, revealing trade-offs between optimization and reasoning integrity.
I am also developing counterfactual verifiers for mathematical and logical reasoning tasks, leveraging counterfactual data to improve robustness.
In Matching Table Metadata with Business Glossaries Using Large Language Models, I apply LLMs to match enterprise metadata with business glossaries, demonstrating that LLMs can infer complex relationships between column names and glossary descriptions without manual tuning, enabling scalable metadata alignment in restricted-access environments. To further improve matching quality, we also trained Flan-T5 models using RLHF.
In Hierarchical Planning Agent for Web-Browsing Tasks, I introduce Structured Agent, a web-browsing agent that dynamically plans and executes tasks using an explicit AND/OR tree, enhancing robustness and interpretability on long-horizon tasks. As part of this work, I also trained a subplan reward model using Direct Preference Optimization (DPO) to help distinguish between effective and ineffective plan decompositions. I am currently working on improving the robustness of LLM verifiers for reasoning tasks.

Fairness-Centric and Interpretable Machine Learning: In Axiomatic Aggregations of Abductive Explanations, we tackle the challenge of generating robust and meaningful feature importance scores from multiple valid abductive explanations per data point, proposing aggregation techniques grounded in cooperative game theory (via power indices) and causal strength measures. These methods are axiomatically characterized to ensure desirable interpretability properties, and unlike popular methods such as SHAP and LIME, our approaches demonstrate improved robustness against adversarial perturbations.
I also contributed to On Welfare-Centric Fair Reinforcement Learning, where we introduce a framework for agents receiving vector-valued rewards from multiple beneficiaries and optimizing a welfare function. We show that welfare-optimal policies are inherently stochastic and start-state dependent, and present the E4 learner, which operates within an adversarial-fair learning framework to manage exploration and maintain welfare guarantees.

Education

P.h.D. in Computer Science
University of Massachusetts Amherst.
Started PhD in Fall 2022.
Robust Decision Making Systems. Supervised by Prof. Yair Zick.
Master in Computer Science
University of Massachusetts Amherst. 2018.
Thesis: Soft Robust Algorithms for Batch RL.
Bachelor of Technology (B.Tech) in Electronics and Communications Engineering
National Institute of Technology, Durgapur B.E. 2016.

Experience

  1. Amazon (Central ML Team), Seattle Spring 2024
  2. Harvard Business School, MA Summer 2024
  3. Microsoft Research, India Summer 2023
  4. IBM Research, Yorktown Heights, NY Summer 2023
  5. IBM Watson, Yorktown Heights, NY Summer 2022
  6. IBM Watson, Yorktown Heights, NY Summer 2021
  7. Harvard Business School, MA Winter 2020-2021
  • Flipkart, Bangalore, India Aug 2017 - Jul 2018
  • Endurance International Group, Bangalore, India Jul 2016 - Aug 2017
  • On-going Work

    FinHOP: Benchmarking Retrieval-Augmented Generation for Multi-Hop Questions on Long Financial Documents
    Vinitra Muralikrishna, Prit Shah, Manas Wadhwa, Aeyan Ashraf, Wenlong Zhao, Eliot Brenner, Lanlan Ji, Dominic Seyler
    (Under Review)
    Counterfactual LLM Verifiers for Math and Logic Reasoning Tasks
    Elita Lobo, Shiv Shankar, Chirag Agarwal, Yair Zick
    (In Progress)

    Publications

    Please visit my Google Scholar page for an updated list of publications.

    A Hierarchical Planning Framework for LLM-based Web Agents
    Elita Lobo*, Frank Chen, Jingjing Meng, Yang Jiao, Nan Xi, Yan Gao
    Efficient Reasoning Workshop, NeurIPS 2025
    On the Impact of Fine-Tuning on Chain-of-Thought Reasoning in LLMs
    Elita Lobo*, Chirag Agarwal, Hima Lakkaraju
    NAACL 2025
    Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models
    Anmol Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid Hasan, Elita Lobo
    COLING 2024
    Fair and Welfare-Efficient Resource Allocation under Uncertainty
    Elita Lobo*, Justin Payan*, Cyrus Cousins, Yair Zick
    NeurIPS 2024
    On Welfare-Centric Fair Reinforcement Learning
    Cyrus Cousins, Elita Lobo, Kavosh Asadi, Michael L. Littman
    Reinforcement Learning Conference 2024
    (Outstanding Paper Award)
    Axiomatic Aggregations of Abductive Explanations
    Vignesh Viswanathan*, Elita Lobo*, Yacine Izza, Gagan Biradar, Yair Zick
    AAAI 2024
    Percentile Criterion Optimization in Offline Reinforcement Learning
    Elita Lobo*, Cyrus Cousins, Marek Petrik, Yair Zick
    NeurIPS 2023
    Data Poisoning Attacks on Off-Policy Policy Evaluation Methods
    Elita Lobo*, Harvineet Singh, Cynthia Rudin, Himabindu Lakkaraju
    UAI 2022
    (Oral Presentation, Top 5%)
    A Novel System for Metadata to Glossary Matching in Data Lakes Using Human Feedback and Generative Models
    Elita Lobo*, Nhan Pham, Oktie Hassanzadeh, Dharmashankar Subramanian, Nandana Sampath Mihindukulasooriya, Long Vu
    Patent (Under Review), 2024
    A Metahyperparameter Tuning Framework for Reinforcement Learning
    Elita Lobo*, Nhan Pham, Dharmashankar Subramanian, Tejaswini Pedapati
    Patent, 2023
    Matching Table Metadata with Business Glossaries Using Large Language Models
    Elita Lobo*, Oktie Hassanzadeh, Nhan Pham, Nandana Mihindukulasooriya, Dharmashankar Subramanian, Horst Samulowitz
    International Workshop on Ontology Matching, 2023
    Soft-Robust Algorithms for Batch Reinforcement Learning
    Elita Lobo*, Mohammad Ghavamzadeh, Marek Petrik
    R2AW Workshop, IJCAI 2021
    Behavior Policy Search for Risk Estimators in RL
    Elita Lobo*, Yash Chandak, Dharmashankar Subramanian, Josiah Hanna, Marek Petrik
    NeurIPS Workshop on Safe and Robust Control, 2021

    Skills

    Published Software

    Mentorship Experience