The 6th TTIC Student Workshop took place in person on November 18, 2022. It included student talks, a poster session, and an invited talk.
Organizing Committee: Ben Lai, Max Ovsiankin, Karen Livescu, Madhur
Tulsiani, Erica Cocom
Talk/Award Committee: Ali Vakilan, Lee Cohen, Hongyuan Mei
8:309:05  Breakfast & Opening Remarks  
9:0510:45  Talk Session 1 (Chair: Ali Vakilian)  
Ben Lai 
Flexible Backbone Construction via Deep Generative Model
for de novo Protein Design
Abstract.
Current computational protein design paradigms heavily rely on robust backbone assembly for downstream
sequence design programs such as Rosetta and these methods often require idealized backbone structure
and
extensive manual calibration for successful design of the target fold. However, such design approaches
often involve expert curation of the template structures as well as conformation database therefore
timeconsuming in practice. Recently, many deeplearning powered fixed backbone sequence design methods
have been proposed, these methods showed robustness towards degenerate backbone structures by
introducing
noise at the training stage of the deep learning module. Fast and accurate structure prediction methods
can enable rapid scanning of the designed sequences for structure compliance.
Here, we present a VAE based backbone generative model that can construct designable backbone structures from coarsely grained topological constraints for flexible and efficient backbone generation. To form an endtoend design pipeline, we will combine our backbone generative model with a deep learning based fixedbackbone sequence design method ProteinMPNN and validate the design in silico with the AlphaFold2 structure predictor. We will demonstrate the capability of our design paradigm by parametric design of (i) helix bundles with customized topological properties and (ii) ligand binding proteins ab initio. 

Ron Mosenzon  Exact Flow Sparsification Requires Unbounded
Size
Abstract. Given a large edgecapacitated network $G$ and a subset of $k$ vertices called
terminals,
an (exact) flow sparsifier is a small network $G'$
that preserves (exactly) all multicommodity flows that can be routed between the terminals.
Flow sparsifiers were introduced by Leighton and Moitra [STOC 2010],
and have been studied and used in many algorithmic contexts.
A fundamental question that remained open for over a decade, asks whether every $k$terminal network admits an exact flow sparsifier whose size is bounded by some function $f(k)$ (regardless of the size of $G$ or its capacities). We resolve this question in the negative by proving that there exist $6$terminal networks $G$ whose flow sparsifiers $G'$ must have arbitrarily large size. This unboundedness is perhaps surprising, since the analogous sparsification that preserves all terminal cuts (called exact cut sparsifier or mimicking network) admits sparsifiers of size $f_0(k)\leq 2^{2^k}$ [Hagerup, Katajainen, Nishimura, and Ragde, JCSS 1998]. We prove our results by analyzing the set of all feasible demands in the network, known as the demand polytope. We identify an invariant of this polytope, essentially the slope of certain facets, that can be made arbitrarily large even for $k=6$, and implies an explicit lower bound on the size of the network. We further use this technique to answer, again in the negative, an open question of Seymour [JCTB 2015] regarding flowsparsification that uses only contractions and preserves the infeasibility of one demand vector. 

Kumar Kshitij Patel  Distributed Online and Bandit Convex
Optimization
Abstract. We study the problems of distributed online and bandit convex optimization against an
adaptive adversary. We aim to minimize the average regret on $M$ machines working in parallel for $T$
time
steps that are allowed to communicate $R$ times intermittently. Assuming the underlying cost functions
are
convex, our results show collaboration is not beneficial if the machines have access to the firstorder
gradient information at the queried points. We show that in this setting, simple noncollaborative
algorithms are minmax optimal. This contrasts the stochastic setting, where each machine samples the
cost
functions from a fixed distribution. Next, we consider the harder setting of distributed optimization
with
bandit (zerothorder) feedback, where the machines can only access values of the cost functions at the
queried points. The key finding here is to identify the highdimensional regime where collaboration is
beneficial and may even lead to a linear speedup in the number of machines. Our results are the first
attempts towards bridging the gap between distributed online optimization against stochastic and
adaptive
adversaries.


Naren Sarayu Manoj  Group Lewis Weights and Group Norm
Sparsification
Abstract. Given $p \ge 2$, a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, and a partitioning
of $[m]$ into $k$ size$r$ groups $S_1, \dots, S_k$, we prove that there exists a subset $T$ of $[k]$
and a corresponding group reweighting such that $\sum_{i=1}^k \max_{j \in S_i} \lvert\langle
\mathbf{a_j},
\mathbf{x}\rangle\rvert^p \approx_{1+\varepsilon} \sum_{i\in T} \max_{j \in S_i} \lvert\langle
\mathbf{a_j},
\mathbf{x}\rangle\rvert^p$ for $\lvert T\rvert =
\widetilde{O}\left(\varepsilon^{2}p^2n^{\max(1,p/2)}\right)$. Additionally, we show that there exists
an algorithm with runtime $\widetilde{O}\left(\mathsf{nnz}(\mathbf{A})+n^{\omega}\right)$ that returns
such a subset, where $\omega$ is the matrix multiplication runtime exponent. Our results partially
resolve an open question posed in the paper Chaining, Group Leverage Score Overestimates, and
Fast Spectral Hypergraph Sparsification by Jambulapati, Liu, and Sidford. Our results also yield
an
alternate analysis of "oneshot" Lewis sampling for $p\ge 2$, a new result proven in the SODA 2023
paper Online Lewis Weight Sampling by Woodruff and Yasuda.


10:4510:55  Lightning Talks (Chairs: Talk/Award Committee)  
Anmol Kabra  Exponential Family ModelBased Reinforcement Learning
via Score Matching
Abstract. We propose an optimistic modelbased algorithm, dubbed SMRL, for finitehorizon
episodic
reinforcement learning (RL) when the transition model is specified by exponential family distributions
with $d$ parameters and the reward is bounded and known. SMRL uses score matching, an unnormalized
density
estimation technique that enables efficient estimation of the model parameter by ridge regression. Under
standard regularity assumptions, SMRL achieves $\tilde{O}(d \sqrt{H^3 T})$ online regret, where $H$ is
the
length of each episode and $T$ is the total number of interactions (ignoring polynomial dependence on
structural scale parameters).


Jiading Fang  NeRF Registration with 3D Feature Field
Abstract. Neural Radiance Field (NeRF) has demonstrated impressive representative power of 3D
scenes. It is likely that such implicit representations are going to be the standard way, instead of
pointcloud, of exchanging 3D information. Traditionally, pointcloud registration has been an important
problem when integrating pointclouds from different sources. However, there is no previous work dealing
with this problem in terms of NeRF.
We propose a novel method that handles NeRF registration problem where there are multiple overlapped
NeRFs constructed independently by different agent (e.g. individual vehicles within a large
datacollecting fleet) and only NeRFs are communicated without the original images that construct them.
The method extends the traditional feature matching techniques into 3D and leverages the recent progress
made by feature field distilled from 2D selfsupervisedlearned features.


Pushkar Shukla  CAVLi: Generating Local Explanations for human
Interpretable Concepts
Abstract. Interpretability techniques aim to provide the rationale behind a model's decision,
typically by explaining either an individual prediction (local explanation, e.g. 'why is this patient
diagnosed with this condition') or a class of predictions (global explanation, e.g. 'why is this set of
patients diagnosed with this condition in general'). While there are many methods focused on either one,
few frameworks can provide both local and global explanations consistently. In this work, we combine two
powerful existing techniques, one local (Local interpretable modelagnostic explanations (LIME) and one
global (Testing with Concept Activation Vectors), to provide local and global conceptbased
explanations. We sanitycheck our results on synthetic datasets as well as with human concepts on the
ImageNet dataset.


11:0012:00  Keynote Talk: Aly Azeem Khan Speaker Bio 
Computational Immunology: New computational approaches to understand immune function  
12:0012:40  Lunch  4th floor kitchen / common area  
12:4013:15  Lunch + Poster Session (includes posters for lightning talks)  5th floor common area  
JuChieh Chou  Joint language modeling for speech and text via
speechtext splicing
Abstract. An open question in speech research is whether text data can be used to improve
pretrained speech models. We introduce an approach to learning a joint speechtext language model (LM)
by
augmenting the training data with alternating speech and text. By learning from the concatenated
discretized speech and text, the model learns shared representations between speech and text. We show
that
the model is able to perform crossmodal continuation, and zeroshot transfer between speech and text
after finetuning on spoken language understanding tasks. We conduct experiments on different units for
speech and different subsets of speech and/or text training data to better understand the properties of
speechtext joint LM.


13:1514:55  Talk Session 2 (Chair: Lee Cohen)  
Gene Li  Pessimism for Offline Linear Contextual Bandits using
$L_p$ Confidence Sets
Abstract. We present a family $\{\hat{\pi}_p\}_{p \geq 1}$ of pessimistic learning rules for
offline learning of linear
contextual bandits, relying on confidence sets with respect to different $\ell_p$ norms, where
$\hat{\pi}_2$ corresponds
to Bellmanconsistent pessimism (BCP), while $\hat{\pi}_\infty$ is a novel generalization of lower
confidence bound (LCB)
to the linear setting. We show that the novel $\hat{\pi}_\infty$ learning rule is, in a sense,
adaptively
optimal, as it
achieves the minimax performance (up to log factors) against all $\ell_q$constrained problems, and as
such it
strictly dominates all other predictors in the family, including $\hat{\pi}_2$.


Han Shao  A Theory of PAC Learnability under Transformation
Invariances
Abstract. Transformation invariances are present in many realworld problems. For example, image
classification is usually invariant to rotation and color transformation: a rotated car in a different
color is still identified as a car. Data augmentation, which adds the transformed data into the training
set and trains a model on the augmented data, is one commonly used technique to build these invariances
into the learning process. However, it is unclear how data augmentation performs theoretically and what
the optimal algorithm is in presence of transformation invariances. In this paper, we study PAC
learnability under transformation invariances in three settings according to different levels of
realizability: (i) A hypothesis fits the augmented data; (ii) A hypothesis fits only the original data
and
the transformed data lying in the support of the data distribution; (iii) Agnostic case. One interesting
observation is that distinguishing between the original data and the transformed data is necessary to
achieve optimal accuracy in setting (ii) and (iii), which implies that any algorithm not differentiating
between the original and transformed data (including data augmentation) is not optimal. Furthermore,
this
type of algorithms can even "harm" the accuracy. In setting (i), although it is unnecessary to
distinguish between the two data sets, data augmentation still does not perform optimally. Due to such a
difference, we propose two combinatorial mesures characterizing the optimal sample complexity in
setting
(i) and (ii)(iii) and provide the optimal algorithms.


Keziah Naggita  The Strategic Perceptron
Abstract. The classical Perceptron algorithm provides a simple and elegant procedure for
learning a linear classifier. In each step, the algorithm observes the sample's position and
label and updates the current predictor accordingly if it makes a mistake. However, in presence
of
strategic agents that desire to be classified as positive and that are able to modify their position by
a
limited amount, the classifier may not be able to observe the true position of agents but rather a
position where the agent pretends to be. Unlike the original setting with perfect knowledge of
positions,
in this situation the Perceptron algorithm fails to achieve its guarantees, and we illustrate examples
with the predictor oscillating between two solutions forever, making an unbounded number of mistakes
even
though a perfect largemargin linear classifier exists. Our main contribution is providing a modified
Perceptronstyle algorithm which makes a bounded number of mistakes in presence of strategic agents with
both $\ell_2$ and weighted $\ell_1$ manipulation costs. In our baseline model, knowledge of the
manipulation costs (i.e., the extent to which an agent may manipulate) is assumed. In our most general
model, we relax this assumption and provide an algorithm which learns and refines both the classifier
and
its cost estimates to achieve good mistake bounds even when manipulation costs are unknown.


Omar Montasser  Boosting Barely Robust Learners
Abstract. We present an oracleefficient algorithm for boosting the adversarial robustness of
barely robust learners. Barely robust learning algorithms learn predictors that are adversarially robust
only on a small fraction $\beta \ll 1$ of the data distribution. Our proposed notion of barely robust
learning
requires robustness with respect to a "larger" perturbation set; which we show is necessary for strongly
robust learning, and that weaker relaxations are not sufficient for strongly robust learning. Our
results
reveal a qualitative and quantitative equivalence between two seemingly unrelated problems: strongly
robust learning and barely robust learning.


14:5515:15  Coffee & Tea Break  
15:1516:55  Talk Session 3 (Chair: Hongyuan Mei)  
Freda Shi  Language Models are Multilingual ChainofThought
Reasoners
Abstract. We evaluate the reasoning abilities of large language models in multilingual settings.
We introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating 250
gradeschool math problems from the GSM8K dataset (Cobbe et al., 2021) into ten typologically diverse
languages. We find that the ability to solve MGSM problems via chainofthought prompting emerges with
increasing model scale, and that models have strikingly strong multilingual reasoning abilities, even in
underrepresented languages such as Bengali and Swahili. Finally, we show that the multilingual reasoning
abilities of language models extend to other tasks such as commonsense reasoning and wordincontext
semantic judgment. The MGSM benchmark is publicly available at
https://github.com/googleresearch/urlnlp.


Haochen Wang  Score Jacobian Chaining: Lifting Pretrained 2D
Diffusion
Models to 3D for Generation and Reconstruction
Abstract. Inference of a diffusion model can be interpreted as optimization on a vector field of
gradients, and we can apply chain rule on gradients. We backpropagate the score of a diffusion model
through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field.
This setup aggregates 2D scores at multiple camera viewpoints into a 3D score, and repurposes a
pretrained 2D model for 3D data generation. We run the algorithm on several offtheshelf diffusion
image generative models, including the recently released Stable Diffusion trained on the largescale
LAION 5B dataset. We develop applications for 3D asset generation, editing as well as singleview 3D
reconstruction, and explore the validity of our assumptions.


ChungMing Chien  Voice Conversion with PreTrained SelfSupervised
Models
Abstract. The success of selfsupervised learning has brought speech technology to a new era.
With
these powerful selfsupervised pretrained models, large amounts of labeled data are no longer necessary
in a variety of tasks. In this work, we present one of the first attempts to apply selfsupervised
representations to a oneshot voice conversion task, where the model converts utterances spoken by any
speaker to the voice of a target speaker it has never seen before, given only one utterance from that
target speaker. We demonstrate that the phonetic and speaker information can be extracted and separated
from selfsupervised representations without any labeled data. Though no phonetic annotation is used
during training, our analysis shows that the model implicitly learns to break down the utterances and
align short speech fragments from different speakers with similar phonetic content, which enables the
conversion of speaker identity without changing the content. Compared with previous work that does not
use
selfsupervised pretraining, our method improves both naturalness and speaker similarity of the
converted
speech.


Takuma Yoneda  Diffusion Models for Shared Autonomy
Abstract. Diffusion models have recently gained a lot of attention, especially in the context of
computer vision. In this work, we focus on the property of a diffusion model that it learns to "reverse
diffuse" a random sample back to a near indistribution sample seen during training, and adapt it as an
assistive policy in shared autonomy. This property of diffusion models eliminates the necessity of
modeling user's behavior or learning when to intervene that previous methods had to deal with.


16:5517:15  Coffee & Tea Break  
17:1517:30  Awards & Final Remarks 