Joseph Van Name - PokeTube

02:22

Singular values of Jacobians of a deep neural network at select points.

02:15

A 1302 parameter neural network interpolates 100x2 real numbers without gradient descent.

05:28

Spectrum of the Jacobian of a vector field when traveling in the vector field direction.

00:44

The spectrum of the Hessian during gradient descent training with a different loss function.

00:59

The spectrum of the Hessian during gradient descent training.

00:46

Using a CLSRDR to find a submatrix of a non-square matrix with maximum sum

04:52

I fed a deep neural network complex inputs during training and it suffered from exploding gradients.

06:07

The evolutionary algorithm maximizes the sum of a 32 by 32 submatrix of a 64 by 64 matrix 10 times.

01:29

Using a CLSRDR to find a submatrix of a non-square matrix with maximum sum

04:27

An evolutionary algorithm maximizes the sum of a 32 by 32 submatrix of a 64 by 64 matrix 10 times.

01:24

Two 40 layer neural networks solving the same problem with zero initialization converge similarly.

01:26

The mean and variance of inputs through the layers of a deep narrow neural network

01:13

The similarity between weight matrices of the layers in a deep neural network during training

01:34

A 660 parameter neural network interpolates 250 x 2 training parameters and fails on the test data.

01:00

The columns of a weight matrix of a shallow network form a sphere after training: Visualization 2

01:00

The columns of a weight matrix of a shallow network form a sphere after training.

00:59

The columns of a weight matrix of a shallow network form a circle after training.

01:32

Sorted final weight vector of a shallow neural network during training

01:37

The weight matrices of shallow neural networks trained identically except for the learning rates

02:14

A 1064 parameter neural network with sine activation interpolates 450x2 real numbers.

05:34

A 1344 parameter 10 layer deep neural network successfully interpolating 450x2 real numbers.

02:23

A 1302 parameter ReLU neural network successfully interpolating 450x2 real numbers.

01:40

A 1302 parameter neural network successfully interpolating 450x2 real numbers.

04:20

Loss visualization of a 341 parameter neural network successfully interpolating 250 real numbers.

02:02

A ReLU network with simplistic initialization tries and fails to emulate its parent network.

01:15

A daughter neural network fails to learn the structure of its parent network: Visualization 4

01:14

A daughter neural network fails to learn the structure of its parent network: Visualization 3

01:06

A daughter neural network fails to learn the structure of its parent network: Visualization 2

02:24

A daughter neural network fails to learn the structure of its parent network.

06:57

Linear feedback shift register: The powers of the rational normal form of primitive polynomials

02:01

Weight matrices of a neural network trained to find an eigenvector of a linear operator: Round 4

03:01

Weight matrices of a neural network trained to find an eigenvector of a linear operator: Round 3

00:42

Weight matrices of a neural network trained to find an eigenvector of a linear operator: Round 2

02:38

Weight matrices of a neural network trained to find an eigenvector of a linear operator: Round 1

04:38

A neural network N does not stabilize when mistrained to get N(N(x))=x for all x.

06:45

Error map as evolutionary algorithm gets binary operation to satisfy the identity (x*y)*x=x*(y*x).

10:00:00

10 more hours of spectra from the cryptanalysis of monomial S-boxes and linear layer: no repeats

08:32

Spectra from the cryptanalysis of AES-like S-boxes

02:29:12

Optimal play for Bennett's pebble game: The algorithm of the future.

01:57

Weight matrices of a neural network with too high learning rate as it learns the identity function.

17:28

A cellular automaton reverses itself and reverts back to its original state.

20:00

The type of reversible cellular automaton that evolves very slowly

03:49

The gradient of a neural network during training: adaptive learning rate goes to zero.

02:00

Weight matrices of a neural network being trained to compute the identity function.

03:20

Training a neural network while alternating between two data sets

02:50

A neural network reverts back to its original state as we turn L1 regularization on and off.

19:41

Swapping the rows of a matrix with random positive entries to maximize the spectral radius

20:15

Swapping the rows of a matrix with random positive entries to minimize the spectral radius

02:27

A daughter neural network transitions from memorizing to understanding its parent network.

03:20

A neural network transitions from memorizing to understanding data.

04:27

A neural network retains imperfections after training and resetting neurons.

04:27

A neural network retains imperfections after training and resetting neurons. Unbrightened version.

02:31

A daughter neural network perfectly learns from a parent neural network after being ablated.

01:57

I fixed a neural network by ablating then regrowing neurons.

01:10

Affine neural network trained with data that splits the network into two.

02:25

A linear neural network splits into two parallel networks after I ablated it.

00:42

Train with this kind of data and your neural network will split into two networks. A linear network.

03:34:09

Entries in matrices from the cryptanalysis of monomial S-boxes in Fourier transform basis

12:18

Sums of matrices from the cryptanalysis of monomial S-boxes in Fourier transform of reordered basis

12:18

Sums of matrices from the cryptanalysis of monomial S-boxes: Now in reordered basis