Channel Avatar

Olewave @UCm99ZwZ1bODHskkHwMOue0w@youtube.com

1.3K subscribers - no pronouns :c

Olewave.com: Bespoke Data Labeling and Customized NLP/CV/Spe


42:53
Google Researcher's In-Depth Analysis on End-to-End Speech Recognition, Part 1: Overview & Modeling
22:56
Deduct OpenAI GPT-4o's Neural Network Architecture
56:48
Google's Universal Speech Model for 100+ languages beats OpenAI's Whisper Model
48:47
Long Review: Apple's MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
10:11
A Quick Review of Apple's SOTA Multimodal LLM: MM1
52:05
The Secret That Made Claude 3 Trump GPT-4
54:20
Generative AI Models Related to Sora: Normalizing Flows
34:09
Variational Autoencoder (VAE) and Reparameterization Trick - Revisiting the Classic Generative Model
19:06
A Review of Microsoft+OpenAI, Google, Meta, and Nvidia's Open Source Large Speech Models for ASR
01:16:59
[Detailed Paper Reading] Zipformer: A faster and better encoder for automatic speech recognition
17:07
Tycho:a tookit for building high-ROI in-house speech-related services (ASR/TTS/Translation):Overview
20:42
From OpenAI's Whisper Model to Your Own In-House ASR Service: Postprocessing and Language Modeling
19:40
From OpenAI's Whisper Model to Your Own In-House ASR Service: Long Audio and Streaming (Part 3)
19:41
From OpenAI's Whisper Model to Your Own In-House ASR Service: ROI (Return-on-Investment) (Part 2)
25:11
From OpenAI's Whisper Model to Your Own In-House ASR Service: Overview (Part 1)
39:12
Why word timestamps generated by OpenAI Whisper are not accurate? How to make them accurate again?
42:02
Speech Generative AI: VoiceBox by Meta AI (also Flow Matching and Neural ODE)
12:58
I-JEPA: Yannn LeCun's First 'World Model' for Computer Vision
32:14
LoRA: allow a high school student to train Large Language Model (GPT-3) with a gaming graphics card
33:41
A Review of SpeechT5: Introducing Google's T5 into Speech (ASR, TTS, SID, ...) Tasks
35:44
A Review of Deepmind's WaveNet for TTS/Audio Synthesis (Does it look like GPT to you?)
35:04
Review of HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
38:06
In-depth Review of Google's SoundStream: An End-to-End Neural Audio Codec
30:12
Disclosing OpenAI GPT-4's vision+text model, data, and cost to train (speculated)
34:42
A Review of GPT-4's Technical Report (GPT-4 in a Nutshell)
01:11:12
[Olewave's Review] AudioLM: a Language Modeling Approach to Audio Generation
47:23
In-depth review of OpenAI's GPT-3 : Language Models are Few-Shot Learners (Part 3/3: Results&Rest)
15:27
[10 mins] Explain Why OpenAI's Whisper API Isn't As Good As ChatGPT
44:19
In-depth review of OpenAI's GPT-3 : Language Models are Few-Shot Learners (Part 2/3: Results)
03:31
WSJ Made a Mistake in Translating China's Foreign Minister's Speech; What has the Diplomat Said?
54:43
In-depth review of OpenAI's GPT-3 : Language Models are Few-Shot Learners (Part 1/3: Intro&Approach)
06:07
Understand Microsoft's VALL-E in 3 Minutes (SOTA Zero-shot TTS)
57:46
In-depth Review of VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
01:05:10
ChatGPT/ChatGPT Plus/InstructGPT:Training language models to follow instructions with human feedback
06:39
Explain how ChatGPT/ChatGPT Plus works in 3 minutes
01:33:00
[Olewave's Review] CLIP (3/3): Learning Transferable Visual Models From Natural Language Supervision
01:38:05
[Olewave's Review] CLIP (2/3): Learning Transferable Visual Models From Natural Language Supervision
55:00
[Olewave's Review] CLIP (1/3): Learning Transferable Visual Models From Natural Language Supervision
55:06
[Olewave's Review] Token-level Sequence Labeling for SLU using Compositional E2E Models
30:36
[Olewave's Review] Branchformer: Parallel MLP-Attention Architectures, and E-Branchformer
17:33
Non Collision Mispronunciation Addition (NCMA) for Accented ASR
44:26
[Olewave's Review] OpenAI's Whisper ASR: Robust Speech Recognition via Large-Scale Weak Supervision
22:01
One-Edit-Distance FSA/Network-based (OEDN) in Mispronunciation Detection and Accented ASR
01:40:04
Olewave's most detailed illustration of RNN-T: Sequence Transduction with Recurrent Neural Networks
32:33
Google fired Blake Lemoine for saying AI bot is sentient? Does LaMDA or ChatGPT think like human?
38:31
[Olewave's Long Review] Efficient Training of Neural Transducer for Speech Recognition
09:26
Boris Johnson’s Rise and Fall - an analysis of the mics
51:57
[Olewave's Long Review] Xception: Deep Learning with Depthwise Separable Convolutions
02:23
[Olewave's Short Review] Xception: Deep Learning with Depthwise Separable Convolutions
12:17
Dr. Fauci was caught on hot mic - 'what a moron' -- analysis from a research perspective
30:56
Phased Array Radar on China's Aircraft Carrier Fujian 003 and Its Connection with Speech Beamforming
25:34
How Does the All-New Dictation in iOS 16 Work? Reveal Apple's Secret Sauce by a Speech Researcher!
42:22
[Long Review] Conformer: Convolution-augmented Transformer for Speech Recognition
03:20
[Short Review] Conformer: Convolution-augmented Transformer for Speech Recognition
28:30
博士大叔使用计算机作弊降维打击2022高考数学压轴大题 Ph.D. uses computer cheating to solve 2022 college entrance math exam
57:31
[Long Review] Cascaded Diffusion Models for High Fidelity Image Generation
04:30
[Short Review] Cascaded Diffusion Models for High Fidelity Image Generation
35:51
[Long Review] Axial Attention in Multidimensional Transformers
02:42
[Short Review] Axial Attention in Multidimensional Transformers
01:03:49
[Long Review] Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis