The 7th TTIC Student Workshop will take place in person on February 23, 2024. It included student talks, a poster session, an invited talk, and a panel discussion.
Organizing Committee: Han Shao,
Chung-Ming Chien, Ron Mosenzon, Madhur Tulsiani, Erica Cocom
Talk/Award Committee: Anand Bhattad, Liren Shan, Lingxiao Wang, Jiawei
Zhou,
Sam Buchanan
9:00-9:20 | Breakfast | ||
9:20-9:30 | Opening Remarks | ||
9:30-10:30 | Talk Session 1 (Chair: Lingxiao Wang) | ||
Nirmit Joshi |
Noisy Interpolation Learning with Shallow Univariate ReLU
Networks
Abstract.
Understanding how overparameterized neural networks generalize despite perfect interpolation of noisy
training data is a
fundamental question. Mallinar et. al. noted that neural networks seem to often exhibit "tempered
overfitting",
wherein the population risk does not converge to the Bayes optimal error, but neither does it approach
infinity,
yielding non-trivial generalization. However, this has not been studied rigorously. We provide the
first rigorous
analysis of the overfitting behavior of regression with minimum norm ($\ell_2$ of weights), focusing on
univariate
two-layer ReLU networks. We show overfitting is tempered (with high probability) when measured with
respect to the
$L_1$ loss, but also show that the situation is more complex than suggested by Mallinar et. al., and
overfitting is
catastrophic with respect to the $L_2$ loss, or when taking an expectation over the training set.
|
||
Kumar Kshitij Patel | New Perspectives on Local SGD for Federated
Learning
Abstract.
Federated Averaging or Local Stochastic Gradient Descent (Local SGD) is a predominant optimization
technique in
federated learning. It often outperforms simpler methods like mini-batch SGD for convex and non-convex
objectives.
However, there exists a gap between its practical success and theoretical understanding, as the
effectiveness of Local
SGD is challenging to prove. Our first result addresses this gap by presenting new lower bounds for
Federated Averaging,
which question existing heterogeneity assumptions that hope to explain its "unreasonable effectiveness."
We demonstrate
that accelerated mini-batch SGD can achieve optimal performance within specific heterogeneity frameworks
in general
convex settings. These findings underscore the importance of strong convexity in optimization and
highlight the need for
refined heterogeneity assumptions to characterize Local SGD's behavior accurately. We further show new
upper bounds in
the strongly convex setting and highlight a new perspective on local SGD's convergence by controlling
its fixed point.
Complementing these bounds, our second result delves into a personalized variant of Local SGD. We
establish new
convergence guarantees for this personalized variant, improving the dependence on data heterogeneity
assumptions. Our
analysis reveals that personalized Local SGD surpasses pure local training and conventional federated
learning
algorithms that generate a consensus model for all devices. We highlight that personalization can do so
because it
mitigates the perils of local SGD by correcting its fixed point discrepancy. The insights from these
results
collectively advance our understanding of the dynamics of Local SGD in federated learning, offering a
comprehensive view
of its strengths and limitations and paving the way for future research and applications in this field.
|
||
Anmol Kabra | Surrogate score design to incentivize behavior in
rating systems
Abstract. Rating agencies evaluate entities such as hospitals and universities on surrogates of
performance metrics.
Rated entities are thus incentivized to optimize surrogates, which leads to unintended
consequences of
degrading on unrepresented performance metrics.
The agency's objective is thus to succinctly design surrogate scores, which when optimized also improve
all performance
metrics.
Taking inspiration from two hospital rating agencies, U.S. Government's Medicare agency and U.S. News &
World Report,
we present a model for designing surrogate scores that satisfy the agency's objective and
interpretability restrictions.
We cast this minimal score design problem into a problem of constructing polyhedral cones, and
determine
structural properties of performance metrics that dictate the succinctness of scores.
Moreover, we give algorithms to design surrogate scores, using tools in convex analysis and
computational geometry.
In doing so, we describe algorithms for decomposing polyhedral cones and finding minimal frames, which
is of independent
technical interest.
|
||
10:30-11:00 | Invited Talk: Behnam Neyshabur | Thriving in the Ever-Changing AI Job Market
Abstract. The AI job market's constant evolution can be daunting for researchers striving to stay
ahead. In this talk, I'm hoping
to leverage on my personal struggles and experiences to share some practical strategies for adapting
your skills and
mindset to thrive in the dynamic world of AI. I'll also highlight some overlooked areas in academia for
researchers
seeking to make a meaningful impact.
|
|
11:00-11:45 | Poster Session (Chair: Sam Buchanan) | 5th floor common area | |
Kavya Ravichandran |
Nearly-tight Approximation Guarantees for the Improving
Multi-Armed Bandits Problem
Abstract.
We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem. An
instance of this
problem has $k$ arms, each of whose reward functions is a concave and increasing function of the
number of times
that arm has been pulled so far. We show that for any randomized online algorithm, there exists
an
instance on which it
must suffer at least an $\Omega(\sqrt{k})$ approximation factor relative to the optimal reward. We then
provide a
randomized online algorithm that guarantees an $O(\sqrt{k})$ approximation factor, if it is told the
maximum reward
achievable by the optimal arm in advance. We then show how to remove this assumption at the cost of an
extra $O(\log k)$
approximation factor, achieving an overall $O(\sqrt{k} \log k)$ approximation relative to optimal.
|
||
Kumar Kshitij Patel | New Perspectives on Local SGD for Federated
Learning
Abstract.
Federated Averaging or Local Stochastic Gradient Descent (Local SGD) is a predominant optimization
technique in
federated learning. It often outperforms simpler methods like mini-batch SGD for convex and non-convex
objectives.
However, there exists a gap between its practical success and theoretical understanding, as the
effectiveness of Local
SGD is challenging to prove. Our first result addresses this gap by presenting new lower bounds for
Federated Averaging,
which question existing heterogeneity assumptions that hope to explain its "unreasonable effectiveness."
We demonstrate
that accelerated mini-batch SGD can achieve optimal performance within specific heterogeneity frameworks
in general
convex settings. These findings underscore the importance of strong convexity in optimization and
highlight the need for
refined heterogeneity assumptions to characterize Local SGD's behavior accurately. We further show new
upper bounds in
the strongly convex setting and highlight a new perspective on local SGD's convergence by controlling
its fixed point.
Complementing these bounds, our second result delves into a personalized variant of Local SGD. We
establish new
convergence guarantees for this personalized variant, improving the dependence on data heterogeneity
assumptions. Our
analysis reveals that personalized Local SGD surpasses pure local training and conventional federated
learning
algorithms that generate a consensus model for all devices. We highlight that personalization can do so
because it
mitigates the perils of local SGD by correcting its fixed point discrepancy. The insights from these
results
collectively advance our understanding of the dynamics of Local SGD in federated learning, offering a
comprehensive view
of its strengths and limitations and paving the way for future research and applications in this field.
|
||
Chung-Ming Chien | Learning Fine-Grained Controllability on Speech
Generation via Efficient Fine-Tuning
Abstract. As the scale of generative models continues to grow, the efficient reuse and adaptation
of pre-trained models have
become crucial considerations. In this work, we propose Voicebox Adapter, a novel approach that
integrates fine-grained
conditions into a pre-trained text-conditioned speech generation model using a cross-attention module.
To ensure a
smooth integration of newly introduced modules with the pre-trained ones, we explore various efficient
fine-tuning
approaches. Our investigation reveals that the LoRA with bias-tuning configuration yields the best
performance,
enhancing controllability without compromising the quality of generated speech. Across three
fine-grained conditional
generation tasks, we empirically demonstrate the efficacy of Voicebox Adapter and its remarkable
resource efficiency.
Follow-up experiments further highlight the robustness of Voicebox Adapter across diverse training and
data setups.
|
||
11:45-12:30 | Lunch | 4th floor kitchen / common area | |
12:30-13:30 | Research at TTIC Talk: Ohad Trabelsi | (Almost) Ruling Out SETH Lower Bounds for All-Pairs
Max-Flow
Abstract. TBD
|
|
13:30-14:30 | Talk Session 2 (Chair: Anand Bhattad) | ||
Pushkar Shukla | TIBET- Identifying and evaluating biases in
Text-to-Image Generative Models
Abstract. Text-to-Image (TTI) generative models have shown great progress in the past few years
in terms of their ability to
generate complex and high-quality imagery. At the same time, these models have been shown to suffer from
harmful biases,
including exaggerated societal biases (e.g., gender, ethnicity), as well as incidental correlations that
limit such
model's ability to generate more diverse imagery. In this paper, we propose a general approach to study
and quantify a
broad spectrum of biases, for any TTI model and for any prompt, using counterfactual reasoning. Unlike
other works that
evaluate generated images on a predefined set of bias axes, our approach automatically identifies
potential biases that
might be relevant to the given prompt, and measures those biases. In addition, our paper extends
quantitative scores
with post-hoc explanations in terms of semantic concepts in the images generated. We show that our
method is uniquely
capable of explaining complex multi-dimensional biases through semantic concepts, as well as the
intersectionality
between different biases for any given prompt. We perform extensive user studies to illustrate that the
results of our
method and analysis are consistent with human judgements.
|
||
Jiahao Li | Instant3D: Fast Text-to-3D with Sparse-View Generation
and Large Reconstruction Model
Abstract. Text-to-3D with diffusion models has achieved remarkable progress in recent years.
However, existing methods either rely
on score distillation-based optimization which suffer from slow inference, low diversity and Janus
problems, or are
feed-forward methods that generate low-quality results due to the scarcity of 3D training data. In this
paper, we
propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in
a feed-forward
manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and
consistent views from
text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the
NeRF from the
generated images with a novel transformer-based sparse-view reconstructor. Through extensive
experiments, we demonstrate
that our method can generate diverse 3D assets of high visual quality within 20 seconds, which is two
orders of
magnitude faster than previous optimization-based methods that can take 1 to 10 hours.
|
||
Jiading Fang | Transcrib3D: 3D Referring Expression Resolution
through Large Language Models
Abstract. 3D referring expression resolution involves identifying a unique target object within a
3D scene based on a natural
language description, which may include the target’s semantic class, relation with other objects, and
other attributes.
We introduce Transcrib3D, an approach that brings together 3D detection methods and the emergent
reasoning capabilities
of large language models (LLMs), using text as the unifying medium. Text is a natural connective format,
as the native
representation of LLMs and the output format of detectors. This synergy allows us to sidestep the need
for learning
complex multi-modality “adapters” from limited 3D annotated data. The use of general-purpose LLMs to
support the
reasoning process brings multiple benefits: (1) It allows flexible, multi-step compositional reasoning
with code
generation and execution; (2) It provides human-readable and interpretable reasoning process; (3) It
allows for
iterative development through prompt engineering. Transcrib3D demonstrates its effectiveness in
experiments, achieving
state-of-the-art results on 3D referring benchmarks, with a great leap in performance from previous
multi-modality
baselines. Remarkably, after the visual object detection, the reasoning module runs in a zero-shot
setting without any
training on the dataset, contrary to most of the competitive baselines. Additionally, we provide an
analysis of common
failure scenarios to inform potential enhancements in future work.
|
||
14:30-14:45 | Coffee & Tea Break | ||
14:45-15:45 | Talk Session 3 (Chair: Liren Shan) | ||
Melissa Dutz | Winning Without Observing Payoffs: Exploiting
Behavioral Bias to Win Nearly Every Round
Abstract. Gameplay under various forms of uncertainty has been widely studied. Feldman et al.
(2010) studied a particularly
low-information setting in which one observes the opponent's actions but no payoffs, not even one's own,
and introduced
an algorithm which guarantees one's payoff nonetheless approaches the minimax optimal value (i.e., zero)
in a symmetric
zero-sum game. Against an opponent playing a minimax-optimal strategy, approaching the value of the game
is the best one
can hope to guarantee. However, a wealth of research in behavioral economics shows that people often do
not make
perfectly rational, optimal decisions. Here we consider whether it is possible to actually win in this
setting if the
opponent is behaviorally biased. We model several deterministic, biased opponents and show that even
without knowing the
game matrix in advance or observing any payoffs, it is possible to take advantage of each bias in order
to win nearly
every round (so long as the game has the property that each action beats and is beaten by at least one
other action). We
also provide a partial characterization of the kinds of biased strategies that can be exploited to win
nearly every
round, and provide algorithms for beating some kinds of biased strategies even when we don't know which
strategy the
opponent uses.
|
||
Max Ovsiankin | Improved Approximations for $\ell_p$-Shortest Path and
lp-Network Design
Abstract. We present quasipolynomial-time approximation algorithms for the $\ell_p$-shortest Path
problem and for a class of lp network design problems. The $\ell_p$-version of a network design problem
is one
in which the edges of the instance network have vector-value weights, and one is tasked with finding a
subset of the edges minimizing the $\ell_p$-norm of the sum of its weight vectors subject to some
constraints.
|
||
Kavya Ravichandran |
Nearly-tight Approximation Guarantees for the Improving
Multi-Armed Bandits Problem
Abstract.
We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem. An
instance of this
problem has $k$ arms, each of whose reward functions is a concave and increasing function of the
number of times
that arm has been pulled so far. We show that for any randomized online algorithm, there exists
an
instance on which it
must suffer at least an $\Omega(\sqrt{k})$ approximation factor relative to the optimal reward. We then
provide a
randomized online algorithm that guarantees an $O(\sqrt{k})$ approximation factor, if it is told the
maximum reward
achievable by the optimal arm in advance. We then show how to remove this assumption at the cost of an
extra $O(\log k)$
approximation factor, achieving an overall $O(\sqrt{k} \log k)$ approximation relative to optimal.
|
||
15:45-16:00 | Coffee & Tea Break | ||
16:00-16:40 | Talk Session 4 (Chair: Jiawei Zhou) | ||
David Yunis | Subwords as Skills: Tokenization for Sparse-Reward
Reinforcement Learning
Abstract. Exploration in sparse-reward reinforcement learning is difficult due to the need for
long, coordinated sequences of
actions in order to achieve any reward. Moreover, in continuous action spaces there are an infinite
number of possible
actions, which only increases the difficulty of exploration. One class of methods designed to address
these issues forms
temporally extended actions, often called skills, from interaction data collected in the same domain,
and optimizes a
policy on top of this new action space. Typically such methods require a lengthy pretraining phase,
especially in
continuous action spaces, in order to form the skills before reinforcement learning can begin. Given
prior evidence that
the full range of the continuous action space is not required in such tasks, we propose a novel approach
to
skill-generation with two components. First we discretize the action space through clustering, and
second we leverage a
tokenization technique borrowed from natural language processing to generate temporally extended
actions. Such a method
outperforms baselines for skill-generation in several challenging sparse-reward domains, and requires
orders-of-magnitude less computation in skill-generation and online rollouts.
|
||
Ju-Chieh Chou | Toward Joint Language Modeling for Speech Units and
Text
Abstract. Speech and text are two major forms of human language. The research community has been
focusing on mapping speech to
text or vice versa for many years. However, in the field of language modeling, very little effort has
been made to model
them jointly. In light of this, we explore joint language modeling for speech units and text.
Specifically, we compare
different speech tokenizers to transform continuous speech signals into discrete units and use different
methods to
construct mixed speech-text data. We introduce automatic metrics to evaluate how well the joint LM mixes
speech and
text. We also fine-tune the LM on downstream spoken language understanding (SLU) tasks with different
modalities (speech
or text) and test its performance to assess the model's learning of shared representations. Our results
show that by
mixing speech units and text with our proposed mixing techniques, the joint LM improves over a
speech-only baseline on
SLU tasks and shows zero-shot cross-modal transferability.
|
||
16:40-17:20 | Panel Discussion (Coordinator: Emily Ruth Diana) | From Research to Job Search: Suggestions for Future Ph.D. Graduates Panelists: Avrim Blum, Behnam Neyshabur, Madhur Tulsiani, Rad Niazadeh, Aly Azeem Khan |
|
17:20-17:30 | Awards & Final Remarks | ||
17:30-18:30 | TGIF | Come and enjoy the food, beer, and have fun! |