Posts by Tags

Gaussian mixture models

Gaussian mixture models

17 minute read

Published:

Gaussian mixture models are a very popular method for data clustering. Here I will define the Gaussian mixture model and also derive the EM algorithm for performing maximum likelihood estimation of its paramters.

Laplacian matrix

The graph Laplacian

12 minute read

Published:

At the heart of of a number of important machine learning algorithms, such as spectral clustering, lies a matrix called the graph Laplacian. In this post, I’ll walk through the intuition behind the graph Laplacian and describe how it represents the discrete analogue to the Laplacian operator on continuous multivariate functions.

Lebesgue integration

Demystifying measure-theoretic probability theory (part 3: expectation)

10 minute read

Published:

In this series of posts, I present my understanding of some basic concepts in measure theory — the mathematical study of objects with “size”— that have enabled me to gain a deeper understanding into the foundations of probability theory.

RNA-seq

Three strategies for cataloging cell types

7 minute read

Published:

In my previous post, I outlined a conceptual framework for defining and reasoning about “cell types”. Specifically, I noted that the idea of a “cell type” can be viewed as a human-made partition on the universal cellular state space. In this post, I attempt to distill three strategies for partitioning this state space and agreeing on cell type definitions.

On cell types and cell states

10 minute read

Published:

The advent of single-cell genomics has brought about new efforts to characterize and catalog all of the cell types in the human body. Despite these efforts, the very definition of a “cell type” is under debate. In this post, I will discuss a conceptual framework for defining cell types as subsets of states in an underlying cellular state space. Moreover, I will link the cellular state space to biomedical ontologies that attempt to capture biological knowledge regarding cell types.

RNA-seq: the basics

19 minute read

Published:

RNA sequencing (RNA-seq) has become a ubiquitous tool in biomedical research for measuring gene expression in a population of cells, or a single cell, across the genome. Despite its ubiquity, RNA-seq is relatively complex and there exists a large research effort towards developing statistical and computational methods for analyzing the raw data that it produces. In this post, I will provide a high level overview of RNA-seq and describe how to interpret some of the common units in which gene expression is measured from an RNA-seq experiment.

bioinformatics

Three strategies for cataloging cell types

7 minute read

Published:

In my previous post, I outlined a conceptual framework for defining and reasoning about “cell types”. Specifically, I noted that the idea of a “cell type” can be viewed as a human-made partition on the universal cellular state space. In this post, I attempt to distill three strategies for partitioning this state space and agreeing on cell type definitions.

On cell types and cell states

10 minute read

Published:

The advent of single-cell genomics has brought about new efforts to characterize and catalog all of the cell types in the human body. Despite these efforts, the very definition of a “cell type” is under debate. In this post, I will discuss a conceptual framework for defining cell types as subsets of states in an underlying cellular state space. Moreover, I will link the cellular state space to biomedical ontologies that attempt to capture biological knowledge regarding cell types.

RNA-seq: the basics

19 minute read

Published:

RNA sequencing (RNA-seq) has become a ubiquitous tool in biomedical research for measuring gene expression in a population of cells, or a single cell, across the genome. Despite its ubiquity, RNA-seq is relatively complex and there exists a large research effort towards developing statistical and computational methods for analyzing the raw data that it produces. In this post, I will provide a high level overview of RNA-seq and describe how to interpret some of the common units in which gene expression is measured from an RNA-seq experiment.

calculus of variations

Functionals and functional derivatives

13 minute read

Published:

The calculus of variations is a field of mathematics that deals with the optimization of functions of functions, called functionals. This topic was not taught to me in my computer science education, but it lies at the foundation of a number of important concepts and algorithms in the data sciences such as gradient boosting and variational inference. In this post, I will provide an explanation of the functional derivative and show how it relates to the gradient of an ordinary multivariate function.

cell type

Three strategies for cataloging cell types

7 minute read

Published:

In my previous post, I outlined a conceptual framework for defining and reasoning about “cell types”. Specifically, I noted that the idea of a “cell type” can be viewed as a human-made partition on the universal cellular state space. In this post, I attempt to distill three strategies for partitioning this state space and agreeing on cell type definitions.

On cell types and cell states

10 minute read

Published:

The advent of single-cell genomics has brought about new efforts to characterize and catalog all of the cell types in the human body. Despite these efforts, the very definition of a “cell type” is under debate. In this post, I will discuss a conceptual framework for defining cell types as subsets of states in an underlying cellular state space. Moreover, I will link the cellular state space to biomedical ontologies that attempt to capture biological knowledge regarding cell types.

clustering

Gaussian mixture models

17 minute read

Published:

Gaussian mixture models are a very popular method for data clustering. Here I will define the Gaussian mixture model and also derive the EM algorithm for performing maximum likelihood estimation of its paramters.

covariance

Visualizing covariance

1 minute read

Published:

Covariance quantifies to what extent two random variables are linearly correlated. In this post, I will outline a visualization of covariance that helped me better intuit this concept.

deep learning

Variational autoencoders

32 minute read

Published:

Variational autoencoders (VAEs) are a family of deep generative models with use cases that span many applications, from image processing to bioinformatics. There are two complimentary ways of viewing the VAE: as a probabilistic model that is fit using variational Bayesian inference, or as a type of autoencoding neural network. In this post, we present the mathematical theory behind VAEs, which is rooted in Bayesian inference, and how this theory leads to an emergent autoencoding algorithm. We also discuss the similarities and differences between VAEs and standard autoencoders. Lastly, we present an implementation of a VAE in PyTorch and apply it to the task of modeling the MNIST dataset of hand-written digits.

education

True understanding is “seeing” in 3D

3 minute read

Published:

In this post, I will discuss an analogy that I find useful for thinking about what it means to “understand” something: True understanding of a concept is akin to “seeing” the concept in its native three-dimensional space, whereas partial understanding is merely seeing a two-dimensional projection of that inherently three-dimensional concept.

entropy

Shannon’s Source Coding Theorem (Foundations of information theory: Part 3)

13 minute read

Published:

The mathematical field of information theory attempts to mathematically describe the concept of “information”. In the first two posts, we discussed the concepts of self-information and information entropy. In this post, we step through Shannon’s Source Coding Theorem to see how the information entropy of a probability distribution describes the best-achievable efficiency required to communicate samples from the distribution.

Information entropy (Foundations of information theory: Part 2)

8 minute read

Published:

The mathematical field of information theory attempts to mathematically describe the concept of “information”. In this series of posts, I will attempt to describe my understanding of how, both philosophically and mathematically, information theory defines the polymorphic, and often amorphous, concept of information. In the first post, we discussed the concept of self-information. In this second post, we will build on this foundation to discuss the concept of information entropy.

evidence lower bound

The evidence lower bound (ELBO)

3 minute read

Published:

The evidence lower bound is an important quantity at the core of a number of important algorithms used in statistical inference including expectation-maximization and variational inference. In this post, I describe its context, definition, and derivation.

expectation

Demystifying measure-theoretic probability theory (part 3: expectation)

10 minute read

Published:

In this series of posts, I present my understanding of some basic concepts in measure theory — the mathematical study of objects with “size”— that have enabled me to gain a deeper understanding into the foundations of probability theory.

functions

Matrices as functions

3 minute read

Published:

At the core of linear algebra is the idea that matrices represent functions. In this post, we’ll look at a few common, elementary functions and discuss their corresponding matrices.

gene expression

Three strategies for cataloging cell types

7 minute read

Published:

In my previous post, I outlined a conceptual framework for defining and reasoning about “cell types”. Specifically, I noted that the idea of a “cell type” can be viewed as a human-made partition on the universal cellular state space. In this post, I attempt to distill three strategies for partitioning this state space and agreeing on cell type definitions.

On cell types and cell states

10 minute read

Published:

The advent of single-cell genomics has brought about new efforts to characterize and catalog all of the cell types in the human body. Despite these efforts, the very definition of a “cell type” is under debate. In this post, I will discuss a conceptual framework for defining cell types as subsets of states in an underlying cellular state space. Moreover, I will link the cellular state space to biomedical ontologies that attempt to capture biological knowledge regarding cell types.

RNA-seq: the basics

19 minute read

Published:

RNA sequencing (RNA-seq) has become a ubiquitous tool in biomedical research for measuring gene expression in a population of cells, or a single cell, across the genome. Despite its ubiquity, RNA-seq is relatively complex and there exists a large research effort towards developing statistical and computational methods for analyzing the raw data that it produces. In this post, I will provide a high level overview of RNA-seq and describe how to interpret some of the common units in which gene expression is measured from an RNA-seq experiment.

information theory

Perplexity: a more intuitive measure of uncertainty than entropy

2 minute read

Published:

Like entropy, perplexity is an information theoretic quantity that describes the uncertainty of a random variable. In fact, perplexity is simply a monotonic function of entropy and thus, in some sense, they can be used interchangeabley. So why do we need it? In this post, I’ll discuss why perplexity is a more intuitive measure of uncertainty than entropy.

Shannon’s Source Coding Theorem (Foundations of information theory: Part 3)

13 minute read

Published:

The mathematical field of information theory attempts to mathematically describe the concept of “information”. In the first two posts, we discussed the concepts of self-information and information entropy. In this post, we step through Shannon’s Source Coding Theorem to see how the information entropy of a probability distribution describes the best-achievable efficiency required to communicate samples from the distribution.

Information entropy (Foundations of information theory: Part 2)

8 minute read

Published:

The mathematical field of information theory attempts to mathematically describe the concept of “information”. In this series of posts, I will attempt to describe my understanding of how, both philosophically and mathematically, information theory defines the polymorphic, and often amorphous, concept of information. In the first post, we discussed the concept of self-information. In this second post, we will build on this foundation to discuss the concept of information entropy.

What is information? (Foundations of information theory: Part 1)

4 minute read

Published:

The mathematical field of information theory attempts to mathematically describe the concept of “information”. In this series of posts, I will attempt to describe my understanding of how, both philosophically and mathematically, information theory defines the polymorphic, and often amorphous, concept of information. In this first post, I will describe Shannon’s self-information.

insight

True understanding is “seeing” in 3D

3 minute read

Published:

In this post, I will discuss an analogy that I find useful for thinking about what it means to “understand” something: True understanding of a concept is akin to “seeing” the concept in its native three-dimensional space, whereas partial understanding is merely seeing a two-dimensional projection of that inherently three-dimensional concept.

intrinsic dimensionality

Intrinsic dimensionality

6 minute read

Published:

In my formal education, I found that the concept of “intrinsic dimensionality” was never explicitly taught; however, it undergirds so many concepts in linear algebra and the data sciences such as the rank of a matrix and feature selection. In this post I will discuss the difference between the extrinsic dimensionality of a space versus its intrinsic dimensionality.

knowledge representation

Three strategies for cataloging cell types

7 minute read

Published:

In my previous post, I outlined a conceptual framework for defining and reasoning about “cell types”. Specifically, I noted that the idea of a “cell type” can be viewed as a human-made partition on the universal cellular state space. In this post, I attempt to distill three strategies for partitioning this state space and agreeing on cell type definitions.

On cell types and cell states

10 minute read

Published:

The advent of single-cell genomics has brought about new efforts to characterize and catalog all of the cell types in the human body. Despite these efforts, the very definition of a “cell type” is under debate. In this post, I will discuss a conceptual framework for defining cell types as subsets of states in an underlying cellular state space. Moreover, I will link the cellular state space to biomedical ontologies that attempt to capture biological knowledge regarding cell types.

linear algebra

Row reduction with elementary matrices

10 minute read

Published:

In this post we discuss the row reduction algorithm for solving a system of linear equations that have exactly one solution. We will then show how the row reduction algorithm can be represented as a process involving a sequence of matrix multiplications involving a special class of matrices called elementary matrices. That is, each elementary matrix represents a single elementary row operation in the row reduction algorithm.

Reasoning about systems of linear equations using linear algebra

5 minute read

Published:

In this blog post, we will discuss the relationship between matrices and systems of linear equations. Specifically, we will show how systems of linear equations can be represented as a single matrix equation. Solutions to the system of linear equations can be reasoned about by examining the characteristics of the matrices and vectors in that matrix equation.

Span and linear independence

4 minute read

Published:

A very important concept linear algebra is that of linear independence. In this blog post we present the definition for the span of a set of vectors. Then, we use this definition to discuss the definition for linear independence. Finally, we discuss some intuition into this fundamental idea.

Normed vector spaces

8 minute read

Published:

When first introduced to Euclidean vectors, one is taught that the length of the vector’s arrow is called the norm of the vector. In this post, we present the more rigorous and abstract definition of a norm and show how it generalizes the notion of “length” to non-Euclidean vector spaces. We also discuss how the norm induces a metric function on pairs of vectors so that one can discuss distances between vectors.

Vector spaces

11 minute read

Published:

The concept of a vector space is a foundational concept in mathematics, physics, and the data sciences. In this post, we first present and explain the definition of a vector space and then go on to describe properties of vector spaces. Lastly, we present a few examples of vector spaces that go beyond the usual Euclidean vectors that are often taught in introductory math and science courses.

Invertible matrices

13 minute read

Published:

In this post, we discuss invertible matrices: those matrices that characterize invertible linear transformations. We discuss three different perspectives for intuiting inverse matrices as well as several of their properties.

Matrix multiplication

11 minute read

Published:

At first glance, the definition for the product of two matrices can be unintuitive. In this post, we discuss three perspectives for viewing matrix multiplication. It is the third perspective that gives this “unintuitive” definition its power: that matrix multiplication represents the composition of linear transformations.

Matrices characterize linear transformations

5 minute read

Published:

Linear transformations are functions mapping vectors between two vector spaces that preserve vector addition and scalar multiplication. In this post, we show that there exists a one-to-one corresondence between linear transformations between coordinate vector spaces and matrices. Thus, we can view a matrix as representing a unique linear transformation between coordinate vector spaces.

Matrices as functions

3 minute read

Published:

At the core of linear algebra is the idea that matrices represent functions. In this post, we’ll look at a few common, elementary functions and discuss their corresponding matrices.

Matrix-vector multiplication

5 minute read

Published:

Matrix-vector multiplication is an operation between a matrix and a vector that produces a new vector. In this post, I’ll define matrix vector multiplication as well as three angles from which to view this concept. The third angle entails viewing matrices as functions between vector spaces

Introducing matrices

7 minute read

Published:

Here, I will introduce the three main ways of thinking about matrices. This high-level description of the multi-faceted way of thinking about matrices would have helped me better intuit matrices when I was first introduced to them in my undergraduate linear algebra course.

linear transformation

Matrices characterize linear transformations

5 minute read

Published:

Linear transformations are functions mapping vectors between two vector spaces that preserve vector addition and scalar multiplication. In this post, we show that there exists a one-to-one corresondence between linear transformations between coordinate vector spaces and matrices. Thus, we can view a matrix as representing a unique linear transformation between coordinate vector spaces.

Matrix-vector multiplication

5 minute read

Published:

Matrix-vector multiplication is an operation between a matrix and a vector that produces a new vector. In this post, I’ll define matrix vector multiplication as well as three angles from which to view this concept. The third angle entails viewing matrices as functions between vector spaces

machine learning

Variational autoencoders

32 minute read

Published:

Variational autoencoders (VAEs) are a family of deep generative models with use cases that span many applications, from image processing to bioinformatics. There are two complimentary ways of viewing the VAE: as a probabilistic model that is fit using variational Bayesian inference, or as a type of autoencoding neural network. In this post, we present the mathematical theory behind VAEs, which is rooted in Bayesian inference, and how this theory leads to an emergent autoencoding algorithm. We also discuss the similarities and differences between VAEs and standard autoencoders. Lastly, we present an implementation of a VAE in PyTorch and apply it to the task of modeling the MNIST dataset of hand-written digits.

Blackbox variational inference via the reparameterization gradient

21 minute read

Published:

Variational inference (VI) is a mathematical framework for doing Bayesian inference by approximating the posterior distribution over the latent variables in a latent variable model when the true posterior is intractable. In this post, we will discuss a flexible variational inference algorithm, called blackbox VI via the reparameterization gradient, that works “out of the box” for a wide variety of models with minimal need for the tedious mathematical derivations that deriving VI algorithms usually require. We will then use this method to do Bayesian linear regression.

Variational inference

5 minute read

Published:

In this post, I will present a high-level explanation of variational inference: a paradigm for estimating a posterior distribution when computing it explicitly is intractable. Variational inference finds an approximate posterior by solving a specific optimization problem that seeks to minimize the disparity between the true posterior and the approximate posterior.

Gaussian mixture models

17 minute read

Published:

Gaussian mixture models are a very popular method for data clustering. Here I will define the Gaussian mixture model and also derive the EM algorithm for performing maximum likelihood estimation of its paramters.

The evidence lower bound (ELBO)

3 minute read

Published:

The evidence lower bound is an important quantity at the core of a number of important algorithms used in statistical inference including expectation-maximization and variational inference. In this post, I describe its context, definition, and derivation.

Expectation-maximization: theory and intuition

13 minute read

Published:

Expectation-maximization (EM) is a popular algorithm for performing maximum-likelihood estimation of the parameters in a latent variable model. In this post, I discuss the theory behind, and intuition into this algorithm.

mathematics

Row reduction with elementary matrices

10 minute read

Published:

In this post we discuss the row reduction algorithm for solving a system of linear equations that have exactly one solution. We will then show how the row reduction algorithm can be represented as a process involving a sequence of matrix multiplications involving a special class of matrices called elementary matrices. That is, each elementary matrix represents a single elementary row operation in the row reduction algorithm.

Reasoning about systems of linear equations using linear algebra

5 minute read

Published:

In this blog post, we will discuss the relationship between matrices and systems of linear equations. Specifically, we will show how systems of linear equations can be represented as a single matrix equation. Solutions to the system of linear equations can be reasoned about by examining the characteristics of the matrices and vectors in that matrix equation.

Span and linear independence

4 minute read

Published:

A very important concept linear algebra is that of linear independence. In this blog post we present the definition for the span of a set of vectors. Then, we use this definition to discuss the definition for linear independence. Finally, we discuss some intuition into this fundamental idea.

Functionals and functional derivatives

13 minute read

Published:

The calculus of variations is a field of mathematics that deals with the optimization of functions of functions, called functionals. This topic was not taught to me in my computer science education, but it lies at the foundation of a number of important concepts and algorithms in the data sciences such as gradient boosting and variational inference. In this post, I will provide an explanation of the functional derivative and show how it relates to the gradient of an ordinary multivariate function.

Normed vector spaces

8 minute read

Published:

When first introduced to Euclidean vectors, one is taught that the length of the vector’s arrow is called the norm of the vector. In this post, we present the more rigorous and abstract definition of a norm and show how it generalizes the notion of “length” to non-Euclidean vector spaces. We also discuss how the norm induces a metric function on pairs of vectors so that one can discuss distances between vectors.

The overloaded equals sign

5 minute read

Published:

Two of the most important relationships in mathematics, namely equality and definition, are both denoted using the same symbol – namely, the equals sign. The overloading of this symbol confuses students in mathematics and computer programming. In this post, I argue for the use of two different symbols for these two fundamentally different operators.

Vector spaces

11 minute read

Published:

The concept of a vector space is a foundational concept in mathematics, physics, and the data sciences. In this post, we first present and explain the definition of a vector space and then go on to describe properties of vector spaces. Lastly, we present a few examples of vector spaces that go beyond the usual Euclidean vectors that are often taught in introductory math and science courses.

Invertible matrices

13 minute read

Published:

In this post, we discuss invertible matrices: those matrices that characterize invertible linear transformations. We discuss three different perspectives for intuiting inverse matrices as well as several of their properties.

Intrinsic dimensionality

6 minute read

Published:

In my formal education, I found that the concept of “intrinsic dimensionality” was never explicitly taught; however, it undergirds so many concepts in linear algebra and the data sciences such as the rank of a matrix and feature selection. In this post I will discuss the difference between the extrinsic dimensionality of a space versus its intrinsic dimensionality.

Matrix multiplication

11 minute read

Published:

At first glance, the definition for the product of two matrices can be unintuitive. In this post, we discuss three perspectives for viewing matrix multiplication. It is the third perspective that gives this “unintuitive” definition its power: that matrix multiplication represents the composition of linear transformations.

Matrices characterize linear transformations

5 minute read

Published:

Linear transformations are functions mapping vectors between two vector spaces that preserve vector addition and scalar multiplication. In this post, we show that there exists a one-to-one corresondence between linear transformations between coordinate vector spaces and matrices. Thus, we can view a matrix as representing a unique linear transformation between coordinate vector spaces.

Matrices as functions

3 minute read

Published:

At the core of linear algebra is the idea that matrices represent functions. In this post, we’ll look at a few common, elementary functions and discuss their corresponding matrices.

Matrix-vector multiplication

5 minute read

Published:

Matrix-vector multiplication is an operation between a matrix and a vector that produces a new vector. In this post, I’ll define matrix vector multiplication as well as three angles from which to view this concept. The third angle entails viewing matrices as functions between vector spaces

Introducing matrices

7 minute read

Published:

Here, I will introduce the three main ways of thinking about matrices. This high-level description of the multi-faceted way of thinking about matrices would have helped me better intuit matrices when I was first introduced to them in my undergraduate linear algebra course.

The graph Laplacian

12 minute read

Published:

At the heart of of a number of important machine learning algorithms, such as spectral clustering, lies a matrix called the graph Laplacian. In this post, I’ll walk through the intuition behind the graph Laplacian and describe how it represents the discrete analogue to the Laplacian operator on continuous multivariate functions.

Demystifying measure-theoretic probability theory (part 3: expectation)

10 minute read

Published:

In this series of posts, I present my understanding of some basic concepts in measure theory — the mathematical study of objects with “size”— that have enabled me to gain a deeper understanding into the foundations of probability theory.

matrices

Invertible matrices

13 minute read

Published:

In this post, we discuss invertible matrices: those matrices that characterize invertible linear transformations. We discuss three different perspectives for intuiting inverse matrices as well as several of their properties.

Matrix multiplication

11 minute read

Published:

At first glance, the definition for the product of two matrices can be unintuitive. In this post, we discuss three perspectives for viewing matrix multiplication. It is the third perspective that gives this “unintuitive” definition its power: that matrix multiplication represents the composition of linear transformations.

Matrices as functions

3 minute read

Published:

At the core of linear algebra is the idea that matrices represent functions. In this post, we’ll look at a few common, elementary functions and discuss their corresponding matrices.

Matrix-vector multiplication

5 minute read

Published:

Matrix-vector multiplication is an operation between a matrix and a vector that produces a new vector. In this post, I’ll define matrix vector multiplication as well as three angles from which to view this concept. The third angle entails viewing matrices as functions between vector spaces

Introducing matrices

7 minute read

Published:

Here, I will introduce the three main ways of thinking about matrices. This high-level description of the multi-faceted way of thinking about matrices would have helped me better intuit matrices when I was first introduced to them in my undergraduate linear algebra course.

measure theory

Demystifying measure-theoretic probability theory (part 3: expectation)

10 minute read

Published:

In this series of posts, I present my understanding of some basic concepts in measure theory — the mathematical study of objects with “size”— that have enabled me to gain a deeper understanding into the foundations of probability theory.

measureable function

ontologies

Three strategies for cataloging cell types

7 minute read

Published:

In my previous post, I outlined a conceptual framework for defining and reasoning about “cell types”. Specifically, I noted that the idea of a “cell type” can be viewed as a human-made partition on the universal cellular state space. In this post, I attempt to distill three strategies for partitioning this state space and agreeing on cell type definitions.

On cell types and cell states

10 minute read

Published:

The advent of single-cell genomics has brought about new efforts to characterize and catalog all of the cell types in the human body. Despite these efforts, the very definition of a “cell type” is under debate. In this post, I will discuss a conceptual framework for defining cell types as subsets of states in an underlying cellular state space. Moreover, I will link the cellular state space to biomedical ontologies that attempt to capture biological knowledge regarding cell types.

pedagogy

The overloaded equals sign

5 minute read

Published:

Two of the most important relationships in mathematics, namely equality and definition, are both denoted using the same symbol – namely, the equals sign. The overloading of this symbol confuses students in mathematics and computer programming. In this post, I argue for the use of two different symbols for these two fundamentally different operators.

probabilistic modeling

Blackbox variational inference via the reparameterization gradient

21 minute read

Published:

Variational inference (VI) is a mathematical framework for doing Bayesian inference by approximating the posterior distribution over the latent variables in a latent variable model when the true posterior is intractable. In this post, we will discuss a flexible variational inference algorithm, called blackbox VI via the reparameterization gradient, that works “out of the box” for a wide variety of models with minimal need for the tedious mathematical derivations that deriving VI algorithms usually require. We will then use this method to do Bayesian linear regression.

probabilistic models

Variational autoencoders

32 minute read

Published:

Variational autoencoders (VAEs) are a family of deep generative models with use cases that span many applications, from image processing to bioinformatics. There are two complimentary ways of viewing the VAE: as a probabilistic model that is fit using variational Bayesian inference, or as a type of autoencoding neural network. In this post, we present the mathematical theory behind VAEs, which is rooted in Bayesian inference, and how this theory leads to an emergent autoencoding algorithm. We also discuss the similarities and differences between VAEs and standard autoencoders. Lastly, we present an implementation of a VAE in PyTorch and apply it to the task of modeling the MNIST dataset of hand-written digits.

probability

Perplexity: a more intuitive measure of uncertainty than entropy

2 minute read

Published:

Like entropy, perplexity is an information theoretic quantity that describes the uncertainty of a random variable. In fact, perplexity is simply a monotonic function of entropy and thus, in some sense, they can be used interchangeabley. So why do we need it? In this post, I’ll discuss why perplexity is a more intuitive measure of uncertainty than entropy.

Variational inference

5 minute read

Published:

In this post, I will present a high-level explanation of variational inference: a paradigm for estimating a posterior distribution when computing it explicitly is intractable. Variational inference finds an approximate posterior by solving a specific optimization problem that seeks to minimize the disparity between the true posterior and the approximate posterior.

Gaussian mixture models

17 minute read

Published:

Gaussian mixture models are a very popular method for data clustering. Here I will define the Gaussian mixture model and also derive the EM algorithm for performing maximum likelihood estimation of its paramters.

The evidence lower bound (ELBO)

3 minute read

Published:

The evidence lower bound is an important quantity at the core of a number of important algorithms used in statistical inference including expectation-maximization and variational inference. In this post, I describe its context, definition, and derivation.

Visualizing covariance

1 minute read

Published:

Covariance quantifies to what extent two random variables are linearly correlated. In this post, I will outline a visualization of covariance that helped me better intuit this concept.

Expectation-maximization: theory and intuition

13 minute read

Published:

Expectation-maximization (EM) is a popular algorithm for performing maximum-likelihood estimation of the parameters in a latent variable model. In this post, I discuss the theory behind, and intuition into this algorithm.

Demystifying measure-theoretic probability theory (part 3: expectation)

10 minute read

Published:

In this series of posts, I present my understanding of some basic concepts in measure theory — the mathematical study of objects with “size”— that have enabled me to gain a deeper understanding into the foundations of probability theory.

random variable

self-information

What is information? (Foundations of information theory: Part 1)

4 minute read

Published:

The mathematical field of information theory attempts to mathematically describe the concept of “information”. In this series of posts, I will attempt to describe my understanding of how, both philosophically and mathematically, information theory defines the polymorphic, and often amorphous, concept of information. In this first post, I will describe Shannon’s self-information.

single-cell

Three strategies for cataloging cell types

7 minute read

Published:

In my previous post, I outlined a conceptual framework for defining and reasoning about “cell types”. Specifically, I noted that the idea of a “cell type” can be viewed as a human-made partition on the universal cellular state space. In this post, I attempt to distill three strategies for partitioning this state space and agreeing on cell type definitions.

On cell types and cell states

10 minute read

Published:

The advent of single-cell genomics has brought about new efforts to characterize and catalog all of the cell types in the human body. Despite these efforts, the very definition of a “cell type” is under debate. In this post, I will discuss a conceptual framework for defining cell types as subsets of states in an underlying cellular state space. Moreover, I will link the cellular state space to biomedical ontologies that attempt to capture biological knowledge regarding cell types.

spectral graph theory

The graph Laplacian

12 minute read

Published:

At the heart of of a number of important machine learning algorithms, such as spectral clustering, lies a matrix called the graph Laplacian. In this post, I’ll walk through the intuition behind the graph Laplacian and describe how it represents the discrete analogue to the Laplacian operator on continuous multivariate functions.

statistics

Perplexity: a more intuitive measure of uncertainty than entropy

2 minute read

Published:

Like entropy, perplexity is an information theoretic quantity that describes the uncertainty of a random variable. In fact, perplexity is simply a monotonic function of entropy and thus, in some sense, they can be used interchangeabley. So why do we need it? In this post, I’ll discuss why perplexity is a more intuitive measure of uncertainty than entropy.

Variational inference

5 minute read

Published:

In this post, I will present a high-level explanation of variational inference: a paradigm for estimating a posterior distribution when computing it explicitly is intractable. Variational inference finds an approximate posterior by solving a specific optimization problem that seeks to minimize the disparity between the true posterior and the approximate posterior.

Gaussian mixture models

17 minute read

Published:

Gaussian mixture models are a very popular method for data clustering. Here I will define the Gaussian mixture model and also derive the EM algorithm for performing maximum likelihood estimation of its paramters.

The evidence lower bound (ELBO)

3 minute read

Published:

The evidence lower bound is an important quantity at the core of a number of important algorithms used in statistical inference including expectation-maximization and variational inference. In this post, I describe its context, definition, and derivation.

Visualizing covariance

1 minute read

Published:

Covariance quantifies to what extent two random variables are linearly correlated. In this post, I will outline a visualization of covariance that helped me better intuit this concept.

Expectation-maximization: theory and intuition

13 minute read

Published:

Expectation-maximization (EM) is a popular algorithm for performing maximum-likelihood estimation of the parameters in a latent variable model. In this post, I discuss the theory behind, and intuition into this algorithm.

Demystifying measure-theoretic probability theory (part 3: expectation)

10 minute read

Published:

In this series of posts, I present my understanding of some basic concepts in measure theory — the mathematical study of objects with “size”— that have enabled me to gain a deeper understanding into the foundations of probability theory.

tutorial

Variational autoencoders

32 minute read

Published:

Variational autoencoders (VAEs) are a family of deep generative models with use cases that span many applications, from image processing to bioinformatics. There are two complimentary ways of viewing the VAE: as a probabilistic model that is fit using variational Bayesian inference, or as a type of autoencoding neural network. In this post, we present the mathematical theory behind VAEs, which is rooted in Bayesian inference, and how this theory leads to an emergent autoencoding algorithm. We also discuss the similarities and differences between VAEs and standard autoencoders. Lastly, we present an implementation of a VAE in PyTorch and apply it to the task of modeling the MNIST dataset of hand-written digits.

Blackbox variational inference via the reparameterization gradient

21 minute read

Published:

Variational inference (VI) is a mathematical framework for doing Bayesian inference by approximating the posterior distribution over the latent variables in a latent variable model when the true posterior is intractable. In this post, we will discuss a flexible variational inference algorithm, called blackbox VI via the reparameterization gradient, that works “out of the box” for a wide variety of models with minimal need for the tedious mathematical derivations that deriving VI algorithms usually require. We will then use this method to do Bayesian linear regression.

Row reduction with elementary matrices

10 minute read

Published:

In this post we discuss the row reduction algorithm for solving a system of linear equations that have exactly one solution. We will then show how the row reduction algorithm can be represented as a process involving a sequence of matrix multiplications involving a special class of matrices called elementary matrices. That is, each elementary matrix represents a single elementary row operation in the row reduction algorithm.

Reasoning about systems of linear equations using linear algebra

5 minute read

Published:

In this blog post, we will discuss the relationship between matrices and systems of linear equations. Specifically, we will show how systems of linear equations can be represented as a single matrix equation. Solutions to the system of linear equations can be reasoned about by examining the characteristics of the matrices and vectors in that matrix equation.

Span and linear independence

4 minute read

Published:

A very important concept linear algebra is that of linear independence. In this blog post we present the definition for the span of a set of vectors. Then, we use this definition to discuss the definition for linear independence. Finally, we discuss some intuition into this fundamental idea.

Functionals and functional derivatives

13 minute read

Published:

The calculus of variations is a field of mathematics that deals with the optimization of functions of functions, called functionals. This topic was not taught to me in my computer science education, but it lies at the foundation of a number of important concepts and algorithms in the data sciences such as gradient boosting and variational inference. In this post, I will provide an explanation of the functional derivative and show how it relates to the gradient of an ordinary multivariate function.

Normed vector spaces

8 minute read

Published:

When first introduced to Euclidean vectors, one is taught that the length of the vector’s arrow is called the norm of the vector. In this post, we present the more rigorous and abstract definition of a norm and show how it generalizes the notion of “length” to non-Euclidean vector spaces. We also discuss how the norm induces a metric function on pairs of vectors so that one can discuss distances between vectors.

The overloaded equals sign

5 minute read

Published:

Two of the most important relationships in mathematics, namely equality and definition, are both denoted using the same symbol – namely, the equals sign. The overloading of this symbol confuses students in mathematics and computer programming. In this post, I argue for the use of two different symbols for these two fundamentally different operators.

Vector spaces

11 minute read

Published:

The concept of a vector space is a foundational concept in mathematics, physics, and the data sciences. In this post, we first present and explain the definition of a vector space and then go on to describe properties of vector spaces. Lastly, we present a few examples of vector spaces that go beyond the usual Euclidean vectors that are often taught in introductory math and science courses.

Invertible matrices

13 minute read

Published:

In this post, we discuss invertible matrices: those matrices that characterize invertible linear transformations. We discuss three different perspectives for intuiting inverse matrices as well as several of their properties.

Perplexity: a more intuitive measure of uncertainty than entropy

2 minute read

Published:

Like entropy, perplexity is an information theoretic quantity that describes the uncertainty of a random variable. In fact, perplexity is simply a monotonic function of entropy and thus, in some sense, they can be used interchangeabley. So why do we need it? In this post, I’ll discuss why perplexity is a more intuitive measure of uncertainty than entropy.

Variational inference

5 minute read

Published:

In this post, I will present a high-level explanation of variational inference: a paradigm for estimating a posterior distribution when computing it explicitly is intractable. Variational inference finds an approximate posterior by solving a specific optimization problem that seeks to minimize the disparity between the true posterior and the approximate posterior.

RNA-seq: the basics

19 minute read

Published:

RNA sequencing (RNA-seq) has become a ubiquitous tool in biomedical research for measuring gene expression in a population of cells, or a single cell, across the genome. Despite its ubiquity, RNA-seq is relatively complex and there exists a large research effort towards developing statistical and computational methods for analyzing the raw data that it produces. In this post, I will provide a high level overview of RNA-seq and describe how to interpret some of the common units in which gene expression is measured from an RNA-seq experiment.

Intrinsic dimensionality

6 minute read

Published:

In my formal education, I found that the concept of “intrinsic dimensionality” was never explicitly taught; however, it undergirds so many concepts in linear algebra and the data sciences such as the rank of a matrix and feature selection. In this post I will discuss the difference between the extrinsic dimensionality of a space versus its intrinsic dimensionality.

Matrix multiplication

11 minute read

Published:

At first glance, the definition for the product of two matrices can be unintuitive. In this post, we discuss three perspectives for viewing matrix multiplication. It is the third perspective that gives this “unintuitive” definition its power: that matrix multiplication represents the composition of linear transformations.

Matrices characterize linear transformations

5 minute read

Published:

Linear transformations are functions mapping vectors between two vector spaces that preserve vector addition and scalar multiplication. In this post, we show that there exists a one-to-one corresondence between linear transformations between coordinate vector spaces and matrices. Thus, we can view a matrix as representing a unique linear transformation between coordinate vector spaces.

Matrices as functions

3 minute read

Published:

At the core of linear algebra is the idea that matrices represent functions. In this post, we’ll look at a few common, elementary functions and discuss their corresponding matrices.

Matrix-vector multiplication

5 minute read

Published:

Matrix-vector multiplication is an operation between a matrix and a vector that produces a new vector. In this post, I’ll define matrix vector multiplication as well as three angles from which to view this concept. The third angle entails viewing matrices as functions between vector spaces

Introducing matrices

7 minute read

Published:

Here, I will introduce the three main ways of thinking about matrices. This high-level description of the multi-faceted way of thinking about matrices would have helped me better intuit matrices when I was first introduced to them in my undergraduate linear algebra course.

Gaussian mixture models

17 minute read

Published:

Gaussian mixture models are a very popular method for data clustering. Here I will define the Gaussian mixture model and also derive the EM algorithm for performing maximum likelihood estimation of its paramters.

The graph Laplacian

12 minute read

Published:

At the heart of of a number of important machine learning algorithms, such as spectral clustering, lies a matrix called the graph Laplacian. In this post, I’ll walk through the intuition behind the graph Laplacian and describe how it represents the discrete analogue to the Laplacian operator on continuous multivariate functions.

Shannon’s Source Coding Theorem (Foundations of information theory: Part 3)

13 minute read

Published:

The mathematical field of information theory attempts to mathematically describe the concept of “information”. In the first two posts, we discussed the concepts of self-information and information entropy. In this post, we step through Shannon’s Source Coding Theorem to see how the information entropy of a probability distribution describes the best-achievable efficiency required to communicate samples from the distribution.

Information entropy (Foundations of information theory: Part 2)

8 minute read

Published:

The mathematical field of information theory attempts to mathematically describe the concept of “information”. In this series of posts, I will attempt to describe my understanding of how, both philosophically and mathematically, information theory defines the polymorphic, and often amorphous, concept of information. In the first post, we discussed the concept of self-information. In this second post, we will build on this foundation to discuss the concept of information entropy.

What is information? (Foundations of information theory: Part 1)

4 minute read

Published:

The mathematical field of information theory attempts to mathematically describe the concept of “information”. In this series of posts, I will attempt to describe my understanding of how, both philosophically and mathematically, information theory defines the polymorphic, and often amorphous, concept of information. In this first post, I will describe Shannon’s self-information.

The evidence lower bound (ELBO)

3 minute read

Published:

The evidence lower bound is an important quantity at the core of a number of important algorithms used in statistical inference including expectation-maximization and variational inference. In this post, I describe its context, definition, and derivation.

Visualizing covariance

1 minute read

Published:

Covariance quantifies to what extent two random variables are linearly correlated. In this post, I will outline a visualization of covariance that helped me better intuit this concept.

Expectation-maximization: theory and intuition

13 minute read

Published:

Expectation-maximization (EM) is a popular algorithm for performing maximum-likelihood estimation of the parameters in a latent variable model. In this post, I discuss the theory behind, and intuition into this algorithm.

Demystifying measure-theoretic probability theory (part 3: expectation)

10 minute read

Published:

In this series of posts, I present my understanding of some basic concepts in measure theory — the mathematical study of objects with “size”— that have enabled me to gain a deeper understanding into the foundations of probability theory.

variational inference

Blackbox variational inference via the reparameterization gradient

21 minute read

Published:

Variational inference (VI) is a mathematical framework for doing Bayesian inference by approximating the posterior distribution over the latent variables in a latent variable model when the true posterior is intractable. In this post, we will discuss a flexible variational inference algorithm, called blackbox VI via the reparameterization gradient, that works “out of the box” for a wide variety of models with minimal need for the tedious mathematical derivations that deriving VI algorithms usually require. We will then use this method to do Bayesian linear regression.