The 6th TTIC Student Workshop took place in person on November 18, 2022. It included student talks, a poster session, and an invited talk.
Organizing Committee: Ben Lai, Max Ovsiankin, Karen Livescu, Madhur
Tulsiani, Erica Cocom
Talk/Award Committee: Ali Vakilan, Lee Cohen, Hongyuan Mei
8:30-9:05 | Breakfast & Opening Remarks | ||
9:05-10:45 | Talk Session 1 (Chair: Ali Vakilian) | ||
Ben Lai |
Flexible Backbone Construction via Deep Generative Model
for de novo Protein Design
Abstract.
Current computational protein design paradigms heavily rely on robust backbone assembly for downstream
sequence design programs such as Rosetta and these methods often require idealized backbone structure
and
extensive manual calibration for successful design of the target fold. However, such design approaches
often involve expert curation of the template structures as well as conformation database therefore
time-consuming in practice. Recently, many deep-learning powered fixed backbone sequence design methods
have been proposed, these methods showed robustness towards degenerate backbone structures by
introducing
noise at the training stage of the deep learning module. Fast and accurate structure prediction methods
can enable rapid scanning of the designed sequences for structure compliance.
Here, we present a VAE based backbone generative model that can construct designable backbone structures
from coarsely grained topological constraints for flexible and efficient backbone generation. To form an
end-to-end design pipeline, we will combine our backbone generative model with a deep learning based
fixed-backbone sequence design method ProteinMPNN and validate the design in silico with the AlphaFold2
structure predictor. We will demonstrate the capability of our design paradigm by parametric design of
(i)
helix bundles with customized topological properties and (ii) ligand binding proteins ab initio.
|
||
Ron Mosenzon | Exact Flow Sparsification Requires Unbounded
Size
Abstract. Given a large edge-capacitated network $G$ and a subset of $k$ vertices called
terminals,
an (exact) flow sparsifier is a small network $G'$
that preserves (exactly) all multicommodity flows that can be routed between the terminals.
Flow sparsifiers were introduced by Leighton and Moitra [STOC 2010],
and have been studied and used in many algorithmic contexts.
A fundamental question that remained open for over a decade,
asks whether every $k$-terminal network admits an exact flow sparsifier
whose size is bounded by some function $f(k)$
(regardless of the size of $G$ or its capacities).
We resolve this question in the negative
by proving that there exist $6$-terminal networks $G$
whose flow sparsifiers $G'$ must have arbitrarily large size.
This unboundedness is perhaps surprising,
since the analogous sparsification that preserves all terminal cuts
(called exact cut sparsifier or mimicking network)
admits sparsifiers of size $f_0(k)\leq 2^{2^k}$
[Hagerup, Katajainen, Nishimura, and Ragde, JCSS 1998].
We prove our results by analyzing the set of all feasible demands in the network,
known as the demand polytope.
We identify an invariant of this polytope,
essentially the slope of certain facets,
that can be made arbitrarily large even for $k=6$,
and implies an explicit lower bound on the size of the network.
We further use this technique to answer, again in the negative,
an open question of Seymour [JCTB 2015] regarding flow-sparsification
that uses only contractions and preserves the infeasibility of one demand vector.
|
||
Kumar Kshitij Patel | Distributed Online and Bandit Convex
Optimization
Abstract. We study the problems of distributed online and bandit convex optimization against an
adaptive adversary. We aim to minimize the average regret on $M$ machines working in parallel for $T$
time
steps that are allowed to communicate $R$ times intermittently. Assuming the underlying cost functions
are
convex, our results show collaboration is not beneficial if the machines have access to the first-order
gradient information at the queried points. We show that in this setting, simple non-collaborative
algorithms are min-max optimal. This contrasts the stochastic setting, where each machine samples the
cost
functions from a fixed distribution. Next, we consider the harder setting of distributed optimization
with
bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the
queried points. The key finding here is to identify the high-dimensional regime where collaboration is
beneficial and may even lead to a linear speedup in the number of machines. Our results are the first
attempts towards bridging the gap between distributed online optimization against stochastic and
adaptive
adversaries.
|
||
Naren Sarayu Manoj | Group Lewis Weights and Group Norm
Sparsification
Abstract. Given $p \ge 2$, a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, and a partitioning
of $[m]$ into $k$ size-$r$ groups $S_1, \dots, S_k$, we prove that there exists a subset $T$ of $[k]$
and a corresponding group reweighting such that $\sum_{i=1}^k \max_{j \in S_i} \lvert\langle
\mathbf{a_j},
\mathbf{x}\rangle\rvert^p \approx_{1+\varepsilon} \sum_{i\in T} \max_{j \in S_i} \lvert\langle
\mathbf{a_j},
\mathbf{x}\rangle\rvert^p$ for $\lvert T\rvert =
\widetilde{O}\left(\varepsilon^{-2}p^2n^{\max(1,p/2)}\right)$. Additionally, we show that there exists
an algorithm with runtime $\widetilde{O}\left(\mathsf{nnz}(\mathbf{A})+n^{\omega}\right)$ that returns
such a subset, where $\omega$ is the matrix multiplication runtime exponent. Our results partially
resolve an open question posed in the paper Chaining, Group Leverage Score Overestimates, and
Fast Spectral Hypergraph Sparsification by Jambulapati, Liu, and Sidford. Our results also yield
an
alternate analysis of "one-shot" Lewis sampling for $p\ge 2$, a new result proven in the SODA 2023
paper Online Lewis Weight Sampling by Woodruff and Yasuda.
|
||
10:45-10:55 | Lightning Talks (Chairs: Talk/Award Committee) | ||
Anmol Kabra | Exponential Family Model-Based Reinforcement Learning
via Score Matching
Abstract. We propose an optimistic model-based algorithm, dubbed SMRL, for finite-horizon
episodic
reinforcement learning (RL) when the transition model is specified by exponential family distributions
with $d$ parameters and the reward is bounded and known. SMRL uses score matching, an unnormalized
density
estimation technique that enables efficient estimation of the model parameter by ridge regression. Under
standard regularity assumptions, SMRL achieves $\tilde{O}(d \sqrt{H^3 T})$ online regret, where $H$ is
the
length of each episode and $T$ is the total number of interactions (ignoring polynomial dependence on
structural scale parameters).
|
||
Jiading Fang | NeRF Registration with 3D Feature Field
Abstract. Neural Radiance Field (NeRF) has demonstrated impressive representative power of 3D
scenes. It is likely that such implicit representations are going to be the standard way, instead of
pointcloud, of exchanging 3D information. Traditionally, pointcloud registration has been an important
problem when integrating pointclouds from different sources. However, there is no previous work dealing
with this problem in terms of NeRF.
We propose a novel method that handles NeRF registration problem where there are multiple overlapped
NeRFs constructed independently by different agent (e.g. individual vehicles within a large
data-collecting fleet) and only NeRFs are communicated without the original images that construct them.
The method extends the traditional feature matching techniques into 3D and leverages the recent progress
made by feature field distilled from 2D self-supervised-learned features.
|
||
Pushkar Shukla | CAVLi:- Generating Local Explanations for human
Interpretable Concepts
Abstract. Interpretability techniques aim to provide the rationale behind a model's decision,
typically by explaining either an individual prediction (local explanation, e.g. 'why is this patient
diagnosed with this condition') or a class of predictions (global explanation, e.g. 'why is this set of
patients diagnosed with this condition in general'). While there are many methods focused on either one,
few frameworks can provide both local and global explanations consistently. In this work, we combine two
powerful existing techniques, one local (Local interpretable model-agnostic explanations (LIME) and one
global (Testing with Concept Activation Vectors), to provide local and global concept-based
explanations. We sanity-check our results on synthetic datasets as well as with human concepts on the
ImageNet dataset.
|
||
11:00-12:00 | Keynote Talk: Aly Azeem Khan Speaker Bio |
Computational Immunology: New computational approaches to understand immune function | |
12:00-12:40 | Lunch | 4th floor kitchen / common area | |
12:40-13:15 | Lunch + Poster Session (includes posters for lightning talks) | 5th floor common area | |
Ju-Chieh Chou | Joint language modeling for speech and text via
speech-text splicing
Abstract. An open question in speech research is whether text data can be used to improve
pre-trained speech models. We introduce an approach to learning a joint speech-text language model (LM)
by
augmenting the training data with alternating speech and text. By learning from the concatenated
discretized speech and text, the model learns shared representations between speech and text. We show
that
the model is able to perform cross-modal continuation, and zero-shot transfer between speech and text
after fine-tuning on spoken language understanding tasks. We conduct experiments on different units for
speech and different subsets of speech and/or text training data to better understand the properties of
speech-text joint LM.
|
||
13:15-14:55 | Talk Session 2 (Chair: Lee Cohen) | ||
Gene Li | Pessimism for Offline Linear Contextual Bandits using
$L_p$ Confidence Sets
Abstract. We present a family $\{\hat{\pi}_p\}_{p \geq 1}$ of pessimistic learning rules for
offline learning of linear
contextual bandits, relying on confidence sets with respect to different $\ell_p$ norms, where
$\hat{\pi}_2$ corresponds
to Bellman-consistent pessimism (BCP), while $\hat{\pi}_\infty$ is a novel generalization of lower
confidence bound (LCB)
to the linear setting. We show that the novel $\hat{\pi}_\infty$ learning rule is, in a sense,
adaptively
optimal, as it
achieves the minimax performance (up to log factors) against all $\ell_q$-constrained problems, and as
such it
strictly dominates all other predictors in the family, including $\hat{\pi}_2$.
|
||
Han Shao | A Theory of PAC Learnability under Transformation
Invariances
Abstract. Transformation invariances are present in many real-world problems. For example, image
classification is usually invariant to rotation and color transformation: a rotated car in a different
color is still identified as a car. Data augmentation, which adds the transformed data into the training
set and trains a model on the augmented data, is one commonly used technique to build these invariances
into the learning process. However, it is unclear how data augmentation performs theoretically and what
the optimal algorithm is in presence of transformation invariances. In this paper, we study PAC
learnability under transformation invariances in three settings according to different levels of
realizability: (i) A hypothesis fits the augmented data; (ii) A hypothesis fits only the original data
and
the transformed data lying in the support of the data distribution; (iii) Agnostic case. One interesting
observation is that distinguishing between the original data and the transformed data is necessary to
achieve optimal accuracy in setting (ii) and (iii), which implies that any algorithm not differentiating
between the original and transformed data (including data augmentation) is not optimal. Furthermore,
this
type of algorithms can even "harm" the accuracy. In setting (i), although it is unnecessary to
distinguish between the two data sets, data augmentation still does not perform optimally. Due to such a
difference, we propose two combinatorial mesures characterizing the optimal sample complexity in
setting
(i) and (ii)(iii) and provide the optimal algorithms.
|
||
Keziah Naggita | The Strategic Perceptron
Abstract. The classical Perceptron algorithm provides a simple and elegant procedure for
learning a linear classifier. In each step, the algorithm observes the sample's position and
label and updates the current predictor accordingly if it makes a mistake. However, in presence
of
strategic agents that desire to be classified as positive and that are able to modify their position by
a
limited amount, the classifier may not be able to observe the true position of agents but rather a
position where the agent pretends to be. Unlike the original setting with perfect knowledge of
positions,
in this situation the Perceptron algorithm fails to achieve its guarantees, and we illustrate examples
with the predictor oscillating between two solutions forever, making an unbounded number of mistakes
even
though a perfect large-margin linear classifier exists. Our main contribution is providing a modified
Perceptron-style algorithm which makes a bounded number of mistakes in presence of strategic agents with
both $\ell_2$ and weighted $\ell_1$ manipulation costs. In our baseline model, knowledge of the
manipulation costs (i.e., the extent to which an agent may manipulate) is assumed. In our most general
model, we relax this assumption and provide an algorithm which learns and refines both the classifier
and
its cost estimates to achieve good mistake bounds even when manipulation costs are unknown.
|
||
Omar Montasser | Boosting Barely Robust Learners
Abstract. We present an oracle-efficient algorithm for boosting the adversarial robustness of
barely robust learners. Barely robust learning algorithms learn predictors that are adversarially robust
only on a small fraction $\beta \ll 1$ of the data distribution. Our proposed notion of barely robust
learning
requires robustness with respect to a "larger" perturbation set; which we show is necessary for strongly
robust learning, and that weaker relaxations are not sufficient for strongly robust learning. Our
results
reveal a qualitative and quantitative equivalence between two seemingly unrelated problems: strongly
robust learning and barely robust learning.
|
||
14:55-15:15 | Coffee & Tea Break | ||
15:15-16:55 | Talk Session 3 (Chair: Hongyuan Mei) | ||
Freda Shi | Language Models are Multilingual Chain-of-Thought
Reasoners
Abstract. We evaluate the reasoning abilities of large language models in multilingual settings.
We introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating 250
grade-school math problems from the GSM8K dataset (Cobbe et al., 2021) into ten typologically diverse
languages. We find that the ability to solve MGSM problems via chain-of-thought prompting emerges with
increasing model scale, and that models have strikingly strong multilingual reasoning abilities, even in
underrepresented languages such as Bengali and Swahili. Finally, we show that the multilingual reasoning
abilities of language models extend to other tasks such as commonsense reasoning and word-in-context
semantic judgment. The MGSM benchmark is publicly available at
https://github.com/google-research/url-nlp.
|
||
Haochen Wang | Score Jacobian Chaining: Lifting Pretrained 2D
Diffusion
Models to 3D for Generation and Reconstruction
Abstract. Inference of a diffusion model can be interpreted as optimization on a vector field of
gradients, and we can apply chain rule on gradients. We back-propagate the score of a diffusion model
through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field.
This setup aggregates 2D scores at multiple camera viewpoints into a 3D score, and repurposes a
pre-trained 2D model for 3D data generation. We run the algorithm on several off-the-shelf diffusion
image generative models, including the recently released Stable Diffusion trained on the large-scale
LAION 5B dataset. We develop applications for 3D asset generation, editing as well as single-view 3D
reconstruction, and explore the validity of our assumptions.
|
||
Chung-Ming Chien | Voice Conversion with Pre-Trained Self-Supervised
Models
Abstract. The success of self-supervised learning has brought speech technology to a new era.
With
these powerful self-supervised pre-trained models, large amounts of labeled data are no longer necessary
in a variety of tasks. In this work, we present one of the first attempts to apply self-supervised
representations to a one-shot voice conversion task, where the model converts utterances spoken by any
speaker to the voice of a target speaker it has never seen before, given only one utterance from that
target speaker. We demonstrate that the phonetic and speaker information can be extracted and separated
from self-supervised representations without any labeled data. Though no phonetic annotation is used
during training, our analysis shows that the model implicitly learns to break down the utterances and
align short speech fragments from different speakers with similar phonetic content, which enables the
conversion of speaker identity without changing the content. Compared with previous work that does not
use
self-supervised pre-training, our method improves both naturalness and speaker similarity of the
converted
speech.
|
||
Takuma Yoneda | Diffusion Models for Shared Autonomy
Abstract. Diffusion models have recently gained a lot of attention, especially in the context of
computer vision. In this work, we focus on the property of a diffusion model that it learns to "reverse
diffuse" a random sample back to a near in-distribution sample seen during training, and adapt it as an
assistive policy in shared autonomy. This property of diffusion models eliminates the necessity of
modeling user's behavior or learning when to intervene that previous methods had to deal with.
|
||
16:55-17:15 | Coffee & Tea Break | ||
17:15-17:30 | Awards & Final Remarks |