Views : 17,614
Genre: Science & Technology
Date of upload: May 1, 2024 ^^
Rating : 4.925 (10/526 LTDR)
RYD date created : 2024-05-18T02:30:51.752578Z
See in json
Top Comments of this video!! :3
26:30 that 'really?' and the following struggle with basic math is WAAAAY to relatable
15 |
The main loss function (7) looks like it can be meaningfully simplified with school-level math.
Lor = -log(sigm( log ( odds(y_w|x) / odds(y_l|x)))), where sigm(a) = 1/(1 + exp(-a)) = exp(a) / (1 + exp(a))
Let's assume that both odds(y_w|x) and odds(y_l|x) are positive (because softmax)
By plugging in the sigmoid, we get
Lor = - log (exp(log(odds(y_w|x) / odds(y_l|x) )) / (1 + exp(log(odds(y_w|x) / odds(y_l|x)))) )
Note that exp(log(odds(y_w|x) / odds(y_l|x)) = odds(y_w|x) / odds(y_l|x). We use this to simplify:
Lor = - log( [odds(y_w|x) / odds(y_l|x)] / (1 + odds(y_w|x) / odds(y_l|x)) )
Finally, multiply both numerator and denominator by odds(y_l|x) to get
Lor = - log(odds(y_w|x) / (odds(y_w|x) + odds(y_l|x)) )
Intuitively, this is the negative log-probability of (the odds of good response) / (odds of good response + odds of bad response ).
If you minimize the average loss over multiple texts, it's the same as maximizing the odds that the model chooses winning response in every pair (of winning+losing responses).
12 |
I very like more technical content from you. I usually read tech news in telegram and your NL New are greats, but very ordinal and simple. So such paper explanations are kind of impact to the DS community, such videos grands new ideas and increase understanding of the field for those, who tried to dive in the deep. Of course it less popular due to complexity of material for audience, but much more interesting. So thank you for such format.
1 |
27:57
“the corresponding side”
Maybe they mistakenly switched the w l givens in the denominators?
1 |
"Specifically, 1 - p(y|x) in the denominators amplifies the gradients when the corresponding side of the likelihood p(y|x) is low". I think that (1 - p(y|x)) have two different meanings here: it can be the result of differentiation by coincidence and also the "corresponding side" of the likelihood, i.e., 1 - p(y|x). So, when it says the "corresponding side" of p(y|x) is low, it means that 1 - p(y|x) is low.
1 |
@r9999t
2 weeks ago
Glad you're back to technical content this time. Any AI YouTuber can give us latest AI news, but you're just about the only one that can give technical insight into the stories.
23 |