TTIC Student Workshop

February 23, 2024

Congratulations to the award winners!
Best Talk Award: Jiahao Li, Kavya Ravichandran
Best Poster Award: Kumar Kshitij Patel

The 7th TTIC Student Workshop will take place in person on February 23, 2024. It included student talks, a poster session, an invited talk, and a panel discussion.

Organizing Committee: Han Shao, Chung-Ming Chien, Ron Mosenzon, Madhur Tulsiani, Erica Cocom
Talk/Award Committee: Anand Bhattad, Liren Shan, Lingxiao Wang, Jiawei Zhou, Sam Buchanan

Schedule

Times in CST. (Click on the talk titles to view abstracts.)
9:00-9:20 Breakfast
9:20-9:30 Opening Remarks
9:30-10:30 Talk Session 1 (Chair: Lingxiao Wang)
Nirmit Joshi Noisy Interpolation Learning with Shallow Univariate ReLU Networks
Abstract. Understanding how overparameterized neural networks generalize despite perfect interpolation of noisy training data is a fundamental question. Mallinar et. al. noted that neural networks seem to often exhibit "tempered overfitting", wherein the population risk does not converge to the Bayes optimal error, but neither does it approach infinity, yielding non-trivial generalization. However, this has not been studied rigorously.  We provide the first rigorous analysis of the overfitting behavior of regression with minimum norm ($\ell_2$ of weights), focusing on univariate two-layer ReLU networks.  We show overfitting is tempered (with high probability) when measured with respect to the $L_1$ loss, but also show that the situation is more complex than suggested by Mallinar et. al., and overfitting is catastrophic with respect to the $L_2$ loss, or when taking an expectation over the training set.
Kumar Kshitij Patel New Perspectives on Local SGD for Federated Learning
Abstract. Federated Averaging or Local Stochastic Gradient Descent (Local SGD) is a predominant optimization technique in federated learning. It often outperforms simpler methods like mini-batch SGD for convex and non-convex objectives. However, there exists a gap between its practical success and theoretical understanding, as the effectiveness of Local SGD is challenging to prove. Our first result addresses this gap by presenting new lower bounds for Federated Averaging, which question existing heterogeneity assumptions that hope to explain its "unreasonable effectiveness." We demonstrate that accelerated mini-batch SGD can achieve optimal performance within specific heterogeneity frameworks in general convex settings. These findings underscore the importance of strong convexity in optimization and highlight the need for refined heterogeneity assumptions to characterize Local SGD's behavior accurately. We further show new upper bounds in the strongly convex setting and highlight a new perspective on local SGD's convergence by controlling its fixed point.
Complementing these bounds, our second result delves into a personalized variant of Local SGD. We establish new convergence guarantees for this personalized variant, improving the dependence on data heterogeneity assumptions. Our analysis reveals that personalized Local SGD surpasses pure local training and conventional federated learning algorithms that generate a consensus model for all devices. We highlight that personalization can do so because it mitigates the perils of local SGD by correcting its fixed point discrepancy. The insights from these results collectively advance our understanding of the dynamics of Local SGD in federated learning, offering a comprehensive view of its strengths and limitations and paving the way for future research and applications in this field.
Anmol Kabra Surrogate score design to incentivize behavior in rating systems
Abstract. Rating agencies evaluate entities such as hospitals and universities on surrogates of performance metrics. Rated entities are thus incentivized to optimize surrogates, which leads to unintended consequences of degrading on unrepresented performance metrics. The agency's objective is thus to succinctly design surrogate scores, which when optimized also improve all performance metrics. Taking inspiration from two hospital rating agencies, U.S. Government's Medicare agency and U.S. News & World Report, we present a model for designing surrogate scores that satisfy the agency's objective and interpretability restrictions. We cast this minimal score design problem into a problem of constructing polyhedral cones, and determine structural properties of performance metrics that dictate the succinctness of scores. Moreover, we give algorithms to design surrogate scores, using tools in convex analysis and computational geometry. In doing so, we describe algorithms for decomposing polyhedral cones and finding minimal frames, which is of independent technical interest.
10:30-11:00 Invited Talk: Behnam Neyshabur Thriving in the Ever-Changing AI Job Market
Abstract. The AI job market's constant evolution can be daunting for researchers striving to stay ahead. In this talk, I'm hoping to leverage on my personal struggles and experiences to share some practical strategies for adapting your skills and mindset to thrive in the dynamic world of AI. I'll also highlight some overlooked areas in academia for researchers seeking to make a meaningful impact.
11:00-11:45 Poster Session (Chair: Sam Buchanan) 5th floor common area
Kavya Ravichandran Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem
Abstract. We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem. An instance of this problem has $k$ arms, each of whose reward functions is a concave and increasing function of the number of times that arm has been pulled so far. We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an $\Omega(\sqrt{k})$ approximation factor relative to the optimal reward. We then provide a randomized online algorithm that guarantees an $O(\sqrt{k})$ approximation factor, if it is told the maximum reward achievable by the optimal arm in advance. We then show how to remove this assumption at the cost of an extra $O(\log k)$ approximation factor, achieving an overall $O(\sqrt{k} \log k)$ approximation relative to optimal.
Kumar Kshitij Patel New Perspectives on Local SGD for Federated Learning
Abstract. Federated Averaging or Local Stochastic Gradient Descent (Local SGD) is a predominant optimization technique in federated learning. It often outperforms simpler methods like mini-batch SGD for convex and non-convex objectives. However, there exists a gap between its practical success and theoretical understanding, as the effectiveness of Local SGD is challenging to prove. Our first result addresses this gap by presenting new lower bounds for Federated Averaging, which question existing heterogeneity assumptions that hope to explain its "unreasonable effectiveness." We demonstrate that accelerated mini-batch SGD can achieve optimal performance within specific heterogeneity frameworks in general convex settings. These findings underscore the importance of strong convexity in optimization and highlight the need for refined heterogeneity assumptions to characterize Local SGD's behavior accurately. We further show new upper bounds in the strongly convex setting and highlight a new perspective on local SGD's convergence by controlling its fixed point.
Complementing these bounds, our second result delves into a personalized variant of Local SGD. We establish new convergence guarantees for this personalized variant, improving the dependence on data heterogeneity assumptions. Our analysis reveals that personalized Local SGD surpasses pure local training and conventional federated learning algorithms that generate a consensus model for all devices. We highlight that personalization can do so because it mitigates the perils of local SGD by correcting its fixed point discrepancy. The insights from these results collectively advance our understanding of the dynamics of Local SGD in federated learning, offering a comprehensive view of its strengths and limitations and paving the way for future research and applications in this field.
Chung-Ming Chien Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Abstract. As the scale of generative models continues to grow, the efficient reuse and adaptation of pre-trained models have become crucial considerations. In this work, we propose Voicebox Adapter, a novel approach that integrates fine-grained conditions into a pre-trained text-conditioned speech generation model using a cross-attention module. To ensure a smooth integration of newly introduced modules with the pre-trained ones, we explore various efficient fine-tuning approaches. Our investigation reveals that the LoRA with bias-tuning configuration yields the best performance, enhancing controllability without compromising the quality of generated speech. Across three fine-grained conditional generation tasks, we empirically demonstrate the efficacy of Voicebox Adapter and its remarkable resource efficiency. Follow-up experiments further highlight the robustness of Voicebox Adapter across diverse training and data setups.
11:45-12:30 Lunch 4th floor kitchen / common area
12:30-13:30 Research at TTIC Talk: Ohad Trabelsi (Almost) Ruling Out SETH Lower Bounds for All-Pairs Max-Flow
Abstract. TBD
13:30-14:30 Talk Session 2 (Chair: Anand Bhattad)
Pushkar Shukla TIBET- Identifying and evaluating biases in Text-to-Image Generative Models
Abstract. Text-to-Image (TTI) generative models have shown great progress in the past few years in terms of their ability to generate complex and high-quality imagery. At the same time, these models have been shown to suffer from harmful biases, including exaggerated societal biases (e.g., gender, ethnicity), as well as incidental correlations that limit such model's ability to generate more diverse imagery. In this paper, we propose a general approach to study and quantify a broad spectrum of biases, for any TTI model and for any prompt, using counterfactual reasoning. Unlike other works that evaluate generated images on a predefined set of bias axes, our approach automatically identifies potential biases that might be relevant to the given prompt, and measures those biases. In addition, our paper extends quantitative scores with post-hoc explanations in terms of semantic concepts in the images generated. We show that our method is uniquely capable of explaining complex multi-dimensional biases through semantic concepts, as well as the intersectionality between different biases for any given prompt. We perform extensive user studies to illustrate that the results of our method and analysis are consistent with human judgements.
Jiahao Li Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
Abstract. Text-to-3D with diffusion models has achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low-quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate diverse 3D assets of high visual quality within 20 seconds, which is two orders of magnitude faster than previous optimization-based methods that can take 1 to 10 hours.
Jiading Fang Transcrib3D: 3D Referring Expression Resolution through Large Language Models
Abstract. 3D referring expression resolution involves identifying a unique target object within a 3D scene based on a natural language description, which may include the target’s semantic class, relation with other objects, and other attributes. We introduce Transcrib3D, an approach that brings together 3D detection methods and the emergent reasoning capabilities of large language models (LLMs), using text as the unifying medium. Text is a natural connective format, as the native representation of LLMs and the output format of detectors. This synergy allows us to sidestep the need for learning complex multi-modality “adapters” from limited 3D annotated data. The use of general-purpose LLMs to support the reasoning process brings multiple benefits: (1) It allows flexible, multi-step compositional reasoning with code generation and execution; (2) It provides human-readable and interpretable reasoning process; (3) It allows for iterative development through prompt engineering. Transcrib3D demonstrates its effectiveness in experiments, achieving state-of-the-art results on 3D referring benchmarks, with a great leap in performance from previous multi-modality baselines. Remarkably, after the visual object detection, the reasoning module runs in a zero-shot setting without any training on the dataset, contrary to most of the competitive baselines. Additionally, we provide an analysis of common failure scenarios to inform potential enhancements in future work.
14:30-14:45 Coffee & Tea Break
14:45-15:45 Talk Session 3 (Chair: Liren Shan)
Melissa Dutz Winning Without Observing Payoffs: Exploiting Behavioral Bias to Win Nearly Every Round
Abstract. Gameplay under various forms of uncertainty has been widely studied. Feldman et al. (2010) studied a particularly low-information setting in which one observes the opponent's actions but no payoffs, not even one's own, and introduced an algorithm which guarantees one's payoff nonetheless approaches the minimax optimal value (i.e., zero) in a symmetric zero-sum game. Against an opponent playing a minimax-optimal strategy, approaching the value of the game is the best one can hope to guarantee. However, a wealth of research in behavioral economics shows that people often do not make perfectly rational, optimal decisions. Here we consider whether it is possible to actually win in this setting if the opponent is behaviorally biased. We model several deterministic, biased opponents and show that even without knowing the game matrix in advance or observing any payoffs, it is possible to take advantage of each bias in order to win nearly every round (so long as the game has the property that each action beats and is beaten by at least one other action). We also provide a partial characterization of the kinds of biased strategies that can be exploited to win nearly every round, and provide algorithms for beating some kinds of biased strategies even when we don't know which strategy the opponent uses.
Max Ovsiankin Improved Approximations for $\ell_p$-Shortest Path and lp-Network Design
Abstract. We present quasipolynomial-time approximation algorithms for the $\ell_p$-shortest Path problem and for a class of lp network design problems. The $\ell_p$-version of a network design problem is one in which the edges of the instance network have vector-value weights, and one is tasked with finding a subset of the edges minimizing the $\ell_p$-norm of the sum of its weight vectors subject to some constraints.
Kavya Ravichandran Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem
Abstract. We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem. An instance of this problem has $k$ arms, each of whose reward functions is a concave and increasing function of the number of times that arm has been pulled so far. We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an $\Omega(\sqrt{k})$ approximation factor relative to the optimal reward. We then provide a randomized online algorithm that guarantees an $O(\sqrt{k})$ approximation factor, if it is told the maximum reward achievable by the optimal arm in advance. We then show how to remove this assumption at the cost of an extra $O(\log k)$ approximation factor, achieving an overall $O(\sqrt{k} \log k)$ approximation relative to optimal.
15:45-16:00 Coffee & Tea Break
16:00-16:40 Talk Session 4 (Chair: Jiawei Zhou)
David Yunis Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning
Abstract. Exploration in sparse-reward reinforcement learning is difficult due to the need for long, coordinated sequences of actions in order to achieve any reward. Moreover, in continuous action spaces there are an infinite number of possible actions, which only increases the difficulty of exploration. One class of methods designed to address these issues forms temporally extended actions, often called skills, from interaction data collected in the same domain, and optimizes a policy on top of this new action space. Typically such methods require a lengthy pretraining phase, especially in continuous action spaces, in order to form the skills before reinforcement learning can begin. Given prior evidence that the full range of the continuous action space is not required in such tasks, we propose a novel approach to skill-generation with two components. First we discretize the action space through clustering, and second we leverage a tokenization technique borrowed from natural language processing to generate temporally extended actions. Such a method outperforms baselines for skill-generation in several challenging sparse-reward domains, and requires orders-of-magnitude less computation in skill-generation and online rollouts.
Ju-Chieh Chou Toward Joint Language Modeling for Speech Units and Text
Abstract. Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform continuous speech signals into discrete units and use different methods to construct mixed speech-text data. We introduce automatic metrics to evaluate how well the joint LM mixes speech and text. We also fine-tune the LM on downstream spoken language understanding (SLU) tasks with different modalities (speech or text) and test its performance to assess the model's learning of shared representations. Our results show that by mixing speech units and text with our proposed mixing techniques, the joint LM improves over a speech-only baseline on SLU tasks and shows zero-shot cross-modal transferability.
16:40-17:20 Panel Discussion (Coordinator: Emily Ruth Diana) From Research to Job Search: Suggestions for Future Ph.D. Graduates
Panelists: Avrim Blum, Behnam Neyshabur, Madhur Tulsiani, Rad Niazadeh, Aly Azeem Khan
17:20-17:30 Awards & Final Remarks
17:30-18:30 TGIF Come and enjoy the food, beer, and have fun!