Casey Chu

Casey Chu

Hello! I’m a researcher at OpenAI, currently working on multimodal AI systems. Let’s chat sometime!

Previously, I was a grad student in computational math at Stanford University and an undergrad majoring in math at Harvey Mudd College. Find me on Twitter, Stack Overflow, and GitHub, or check out my resume.

A Bayesian trains a classifier

November 2018

Perspectives on the variational autoencoder

November 2018

The principle of maximum entropy

July 2018

Flavors of Wasserstein GAN

March 2018

Why does algebra work?

January 2013

The Dirichlet function in closed form

March 2012

The equivalence between Stein variational gradient descent and black-box variational inference

2020

ICLR 2020 DeepDiffEq Workshop

We formalize an equivalence between two popular methods for Bayesian inference: Stein variational gradient descent (SVGD) and black-box variational inference (BBVI). In particular, we show that BBVI corresponds precisely to SVGD when the kernel is the neural tangent kernel. Furthermore, we interpret SVGD and BBVI as kernel gradient flows; we do this by leveraging the recent perspective that views SVGD as a gradient flow in the space of probability distributions and showing that BBVI naturally motivates a Riemannian structure on that space. We observe that kernel gradient flow also describes dynamics found in the training of generative adversarial networks (GANs). This work thereby unifies several existing techniques in variational inference and generative modeling and identifies the kernel as a fundamental object governing the behavior of these algorithms, motivating deeper analysis of its properties.

Smoothness and Stability in GANs

2020

ICLR 2020

Generative adversarial networks, or GANs, commonly display unstable behavior during training. In this work, we develop a principled theoretical framework for understanding the stability of various types of GANs. In particular, we derive conditions that guarantee eventual stationarity of the generator when it is trained with gradient descent, conditions that must be satisfied by the divergence that is minimized by the GAN and the generator's architecture. We find that existing GAN variants satisfy some, but not all, of these conditions. Using tools from convex analysis, optimal transport, and reproducing kernels, we construct a GAN that fulfills these conditions simultaneously. In the process, we explain and clarify the need for various existing GAN stabilization techniques, including Lipschitz constraints, gradient penalties, and smooth activation functions.

Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning

2019

ICML 2019

This paper provides a unifying view of a wide range of problems of interest in machine learning by framing them as the minimization of functionals defined on the space of probability measures. In particular, we show that generative adversarial networks, variational inference, and actor-critic methods in reinforcement learning can all be seen through the lens of our framework. We then discuss a generic optimization algorithm for our formulation, called probability functional descent (PFD), and show how this algorithm recovers existing methods developed independently in the settings mentioned earlier.

CycleGAN, a Master of Steganography

2017

NIPS 2017 Workshop on Machine Deception

CycleGAN (Zhu et al. 2017) is one recent successful approach to learn a transformation between two image distributions. In a series of experiments, we demonstrate an intriguing property of the model: CycleGAN learns to "hide" information about a source image into the images it generates in a nearly imperceptible, high-frequency signal. This trick ensures that the generator can recover the original sample and thus satisfy the cyclic consistency requirement, while the generated image remains realistic. We connect this phenomenon with adversarial attacks by viewing CycleGAN's training procedure as training a generator of adversarial examples and demonstrate that the cyclic consistency loss causes CycleGAN to be especially vulnerable to adversarial attacks.