Herman Kamper - Poke

Herman Kamper @UCBu4J-JIs-UORp5pQ6M48nw@youtube.com

5.7K subscribers - no pronouns :c

More from this channel (soon)

Videos Playlists

Recently Uploaded Popular Oldest

Can we solve inequality in South Africa? Interview with Dieter von Fintel (TGIF 2024)

Reinforcement learning from human feedback (NLP817 12.3)

The difference between GPT and ChatGPT (NLP817 12.2)

Large language model training and inference (NLP817 12.1)

Extensions of RNNs (NLP817 9.7)

Solutions to exploding and vanishing gradients (in RNNs) (NLP817 9.6)

Vanishing and exploding gradients in RNNs (NLP817 9.5)

Backpropagation through time (NLP817 9.4)

RNN definition and computational graph (NLP817 9.3)

RNN language model loss function (NLP817 9.2)

From feedforward to recurrent neural networks (NLP817 9.1)

Embedding layers in neural networks

Git workflow extras (including merge conflicts)

Evaluating word embeddings (NLP817 7.12)

GloVe word embeddings (NLP817 7.11)

Skip-gram with negative sampling (NLP817 7.10)

Continuous bag-of-words (CBOW) (NLP817 7.9)

Skip-gram example (NLP817 7.8)

Skip-gram as a neural network (NLP817 7.7)

Skip-gram optimisation (NLP817 7.6)

Skip-gram model structure (NLP817 7.5)

Skip-gram loss function (NLP817 7.4)

Skip-gram introduction (NLP817 7.3)

One-hot word embeddings (NLP817 7.2)

Why word embeddings? (NLP817 7.1)

What can large spoken language models tell us about speech? (IndabaX South Africa 2023)

Hidden Markov models in practice (NLP817 5.13)

The log-sum-exp trick (NLP817 5.12)

Why expectation maximisation works (NLP817 5.11)

Soft expectation maximisation for HMMs (NLP817 5.10)

Hard expectation maximisation for HMMs (NLP817 5.9)

Learning in HMMs (NLP817 5.8)

The forward algorithm for HMMs (NLP817 5.7)

Why do we want the marginal probability in an HMM? (NLP817 5.6)

Viterbi HMM example (NLP817 5.5)

The Viterbi algorithm for HMMs (NLP817 5.4)

The three HMM problems (NLP817 5.3)

Hidden Markov model definition (NLP817 5.2)

A first hidden Markov model example (NLP817 5.1)

What are perplexity and entropy? (NLP817 4)

Are N-gram language models still used today? (NLP817 3.12)

Kneser-Ney smoothing (NLP817 3.11)

Language model backoff (NLP817 3.10)

Language model interpolation (NLP817 3.9)

Absolute discounting in language models (NLP817 3.8)

Additive smoothing in language models (NLP817 3.7)

Language model smoothing intuition (NLP817 3.6)

Evaluating language models using perplexity (NLP817 3.5)

Why use log in language models? (NLP817 3.4)

Start and end of sentence tokens in language models (NLP817 3.3)

N-gram language models (NLP817 3.2)

The language modelling problem (NLP817 3.1)

Transformer (NLP817 11.10)

Cross-attention (NLP817 11.9)

Masking the future in self-attention (NLP817 11.8)

Multi-head attention (NLP817 11.7)

The clock analogy for positional encodings (NLP817 11.6)

Positional encodings in transformers (NLP817 11.5)

Self-attention in matrix form (NLP817 11.4)