Olewave | Poke

45:46

Meta's Movie Gen vs. OpenAI's Sora: a Detailed Review

42:53

Google Researcher's In-Depth Analysis on End-to-End Speech Recognition, Part 1: Overview & Modeling

22:56

Deduct OpenAI GPT-4o's Neural Network Architecture

56:48

Google's Universal Speech Model for 100+ languages beats OpenAI's Whisper Model

48:47

Long Review: Apple's MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

10:11

A Quick Review of Apple's SOTA Multimodal LLM: MM1

52:05

The Secret That Made Claude 3 Trump GPT-4

54:20

Generative AI Models Related to Sora: Normalizing Flows

34:09

Variational Autoencoder (VAE) and Reparameterization Trick - Revisiting the Classic Generative Model

19:06

A Review of Microsoft+OpenAI, Google, Meta, and Nvidia's Open Source Large Speech Models for ASR

01:16:59

[Detailed Paper Reading] Zipformer: A faster and better encoder for automatic speech recognition

17:07

Tycho:a tookit for building high-ROI in-house speech-related services (ASR/TTS/Translation):Overview

20:42

From OpenAI's Whisper Model to Your Own In-House ASR Service: Postprocessing and Language Modeling

19:40

From OpenAI's Whisper Model to Your Own In-House ASR Service: Long Audio and Streaming (Part 3)

19:41

From OpenAI's Whisper Model to Your Own In-House ASR Service: ROI (Return-on-Investment) (Part 2)

25:11

From OpenAI's Whisper Model to Your Own In-House ASR Service: Overview (Part 1)

39:12

Why word timestamps generated by OpenAI Whisper are not accurate? How to make them accurate again?

42:02

Speech Generative AI: VoiceBox by Meta AI (also Flow Matching and Neural ODE)

12:58

I-JEPA: Yannn LeCun's First 'World Model' for Computer Vision

32:14

LoRA: allow a high school student to train Large Language Model (GPT-3) with a gaming graphics card

33:41

A Review of SpeechT5: Introducing Google's T5 into Speech (ASR, TTS, SID, ...) Tasks

35:44

A Review of Deepmind's WaveNet for TTS/Audio Synthesis (Does it look like GPT to you?)

35:04

Review of HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

38:06

In-depth Review of Google's SoundStream: An End-to-End Neural Audio Codec

30:12

Disclosing OpenAI GPT-4's vision+text model, data, and cost to train (speculated)

34:42

A Review of GPT-4's Technical Report (GPT-4 in a Nutshell)

01:11:12

[Olewave's Review] AudioLM: a Language Modeling Approach to Audio Generation

47:23

In-depth review of OpenAI's GPT-3 : Language Models are Few-Shot Learners (Part 3/3: Results&Rest)

15:27

[10 mins] Explain Why OpenAI's Whisper API Isn't As Good As ChatGPT

44:19

In-depth review of OpenAI's GPT-3 : Language Models are Few-Shot Learners (Part 2/3: Results)

03:31

WSJ Made a Mistake in Translating China's Foreign Minister's Speech; What has the Diplomat Said?

54:43

In-depth review of OpenAI's GPT-3 : Language Models are Few-Shot Learners (Part 1/3: Intro&Approach)

06:07

Understand Microsoft's VALL-E in 3 Minutes (SOTA Zero-shot TTS)

57:46

In-depth Review of VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

01:05:10

ChatGPT/ChatGPT Plus/InstructGPT:Training language models to follow instructions with human feedback

06:39

Explain how ChatGPT/ChatGPT Plus works in 3 minutes

01:33:00

[Olewave's Review] CLIP (3/3): Learning Transferable Visual Models From Natural Language Supervision

01:38:05

[Olewave's Review] CLIP (2/3): Learning Transferable Visual Models From Natural Language Supervision

55:00

[Olewave's Review] CLIP (1/3): Learning Transferable Visual Models From Natural Language Supervision

55:06

[Olewave's Review] Token-level Sequence Labeling for SLU using Compositional E2E Models

30:36

[Olewave's Review] Branchformer: Parallel MLP-Attention Architectures, and E-Branchformer

17:33

Non Collision Mispronunciation Addition (NCMA) for Accented ASR

44:26

[Olewave's Review] OpenAI's Whisper ASR: Robust Speech Recognition via Large-Scale Weak Supervision

22:01

One-Edit-Distance FSA/Network-based (OEDN) in Mispronunciation Detection and Accented ASR

01:40:04

Olewave's most detailed illustration of RNN-T: Sequence Transduction with Recurrent Neural Networks

32:33

Google fired Blake Lemoine for saying AI bot is sentient? Does LaMDA or ChatGPT think like human?

38:31

[Olewave's Long Review] Efficient Training of Neural Transducer for Speech Recognition

09:26

Boris Johnson’s Rise and Fall - an analysis of the mics

51:57

[Olewave's Long Review] Xception: Deep Learning with Depthwise Separable Convolutions

02:23

[Olewave's Short Review] Xception: Deep Learning with Depthwise Separable Convolutions

12:17

Dr. Fauci was caught on hot mic - 'what a moron' -- analysis from a research perspective

30:56

Phased Array Radar on China's Aircraft Carrier Fujian 003 and Its Connection with Speech Beamforming

25:34

How Does the All-New Dictation in iOS 16 Work? Reveal Apple's Secret Sauce by a Speech Researcher!

42:22

[Long Review] Conformer: Convolution-augmented Transformer for Speech Recognition

03:20

[Short Review] Conformer: Convolution-augmented Transformer for Speech Recognition

28:30

博士大叔使用计算机作弊降维打击2022高考数学压轴大题 Ph.D. uses computer cheating to solve 2022 college entrance math exam

57:31

[Long Review] Cascaded Diffusion Models for High Fidelity Image Generation

04:30

[Short Review] Cascaded Diffusion Models for High Fidelity Image Generation

35:51

[Long Review] Axial Attention in Multidimensional Transformers

02:42

[Short Review] Axial Attention in Multidimensional Transformers