Effects of L2 loss and smooth L1 loss The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsRight choice of accuracy metric or loss functionLoss function to maximize sum of targetsDecomposable output regression neural networkLog loss and expected aggregatesCustom loss function with additional parameter in KerasL2 loss vs. mean squared lossReason for having both low loss and same predicted class?Enable to reproduce the loss of training while predictingContrastive loss problem in a character-level, siamese NN modelDifference between “reducing batch_size” and “increasing epochs” to decrease loss amount?

Word to describe a time interval

Is it ok to offer lower paid work as a trial period before negotiating for a full-time job?

Windows 10: How to Lock (not sleep) laptop on lid close?

Can we generate random numbers using irrational numbers like π and e?

Using `min_active_rowversion` for global temporary tables

For what reasons would an animal species NOT cross a *horizontal* land bridge?

Working through Single Responsibility Principle in Python when Calls are Expensive

The following signatures were invalid: EXPKEYSIG 1397BC53640DB551

How do you keep chess fun when your opponent constantly beats you?

Identify 80s or 90s comics with ripped creatures (not dwarves)

Can the Right Ascension and Argument of Perigee of a spacecraft's orbit keep varying by themselves with time?

Can the DM override racial traits?

Homework question about an engine pulling a train

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Did the new image of black hole confirm the general theory of relativity?

How to substitute curly brackets with round brackets in a grid of list

Match Roman Numerals

What can I do if neighbor is blocking my solar panels intentionally?

Do warforged have souls?

How to handle characters who are more educated than the author?

Can I visit the Trinity College (Cambridge) library and see some of their rare books

Why can't wing-mounted spoilers be used to steepen approaches?

What was the last x86 CPU that did not have the x87 floating-point unit built in?

60's-70's movie: home appliances revolting against the owners

Effects of L2 loss and smooth L1 loss

The 2019 Stack Overflow Developer Survey Results Are In

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsRight choice of accuracy metric or loss functionLoss function to maximize sum of targetsDecomposable output regression neural networkLog loss and expected aggregatesCustom loss function with additional parameter in KerasL2 loss vs. mean squared lossReason for having both low loss and same predicted class?Enable to reproduce the loss of training while predictingContrastive loss problem in a character-level, siamese NN modelDifference between “reducing batch_size” and “increasing epochs” to decrease loss amount?

Can any one tell me what the effects of $L_2$ loss and smooth $L_1$ loss (i.e. Huber loss with $alpha = 1$) are, and when to use each of them ?

edited Apr 3 at 11:33

bradS

657213

asked Apr 3 at 4:29

HOANG GIANG

426

add a comment |

Can any one tell me what the effects of $L_2$ loss and smooth $L_1$ loss (i.e. Huber loss with $alpha = 1$) are, and when to use each of them ?

edited Apr 3 at 11:33

bradS

657213

asked Apr 3 at 4:29

HOANG GIANG

426

add a comment |

Can any one tell me what the effects of $L_2$ loss and smooth $L_1$ loss (i.e. Huber loss with $alpha = 1$) are, and when to use each of them ?

edited Apr 3 at 11:33

bradS

657213

asked Apr 3 at 4:29

HOANG GIANG

426

Can any one tell me what the effects of $L_2$ loss and smooth $L_1$ loss (i.e. Huber loss with $alpha = 1$) are, and when to use each of them ?

loss-function

edited Apr 3 at 11:33

bradS

657213

asked Apr 3 at 4:29

HOANG GIANG

426

edited Apr 3 at 11:33

bradS

657213

asked Apr 3 at 4:29

HOANG GIANG

426

edited Apr 3 at 11:33

bradS

657213

edited Apr 3 at 11:33

bradS

657213

edited Apr 3 at 11:33

bradS

657213

asked Apr 3 at 4:29

HOANG GIANG

426

asked Apr 3 at 4:29

HOANG GIANG

426

asked Apr 3 at 4:29

HOANG GIANG

426

add a comment |

1 Answer
1

active

oldest

votes

First, Huber loss only works in one-dimension as it requires $$left|boldsymbolaright|_2=left|boldsymbolaright|_1=delta$$at the intersection of two functions, which only holds in one-dimension. Norms $L_2$ and $L_1$ are defined for vectors. Therefore, in my opinion, Huber loss better be compared with squared loss rather than $L_2$ loss, since "$L_2$" presumes a multi-dimensional input compared to "squared".

Huber loss is the same as squared loss for differences less than $delta$, and the same as absolute loss for differences larger than $delta$, i.e.
$$beginalign*
L_delta(y_n, f_theta(boldsymbolx_n))
=left{
beginmatrix
frac12left(y_n - f_theta(boldsymbolx_n)right)^2 & left|y_n - f(boldsymbolx_n)right| leq delta,\
deltaleft|y_n - f_theta(boldsymbolx_n)right| - frac12delta^2, & textotherwise.
endmatrix
right.
endalign*$$

where $y_n$ is the target of data point $n$, and $f_theta(boldsymbolx_n)$ is model's prediction. Note that $L_delta$ has nothing to do with $L_p$ norm, despite the similar notations.

Because of this definition, for large differences due to outliers, gradient of loss function remains constant $pm delta$, the same as absolute loss, i.e.
$$fracy_n - f_theta(boldsymbolx_n)rightpartial theta_i = pm delta fracpartial f_theta(boldsymbolx_n)partial theta_i$$
compared to squared loss, where gradient increases with the difference, i.e.
$$fracpartial frac12left(y_n - f_theta(boldsymbolx_n)right)^2partial theta_i = -left(y_n - f_theta(boldsymbolx_n)right)fracpartial f_theta(boldsymbolx_n)partial theta_i$$

which leads to large contributions from outliers when we update a parameter solely based on squared loss as follows:
$$beginalign*
theta'_i &=theta_i + lambda sum_n fracpartial f_theta(boldsymbolx_n)partial theta_ileft(y_n - f_theta(boldsymbolx_n)right) \
&= theta_i + lambdasum_n notin textoutliers fracpartial f_theta(boldsymbolx_n)partial theta_i(textsmall) +lambdasum_n in textoutliers fracpartial f_theta(boldsymbolx_n)partial theta_i(textlarge)
endalign*$$

It is worth noting that, here, outliers are irregularities in the joint input-output space $(boldsymbolx_n, y_n)$, not necessarily just in the input space $boldsymbolx_n$ as we usually visualize in unsupervised tasks. For example, in a linear trend, none of $(x, y)=(1, 2)$, $(5, 10)$, $(10, 20)$ are outliers, but $(1, 10)$ is, which leads to large difference $(10 - 2)$ when model expects (predicts) $f_theta(1)=2$.

When to use each of them?

Reminding that we are only talking about one-dimensional targets, Huber loss is a complete replacement for squared loss to deal with outliers. However, the challenge is the choice of $delta$, which makes it a less favorable "first choice" when we are not yet familiar with the problem at hand. Therefore, we may start with squared loss (or other losses), and after a while try to experiment with Huber loss for different values of $delta$.

edited Apr 3 at 9:03

answered Apr 3 at 7:38

Esmailian

3,181320

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48482%2feffects-of-l2-loss-and-smooth-l1-loss%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

where $y_n$ is the target of data point $n$, and $f_theta(boldsymbolx_n)$ is model's prediction. Note that $L_delta$ has nothing to do with $L_p$ norm, despite the similar notations.

When to use each of them?

edited Apr 3 at 9:03

answered Apr 3 at 7:38

Esmailian

3,181320

add a comment |

where $y_n$ is the target of data point $n$, and $f_theta(boldsymbolx_n)$ is model's prediction. Note that $L_delta$ has nothing to do with $L_p$ norm, despite the similar notations.

When to use each of them?

edited Apr 3 at 9:03

answered Apr 3 at 7:38

Esmailian

3,181320

add a comment |

where $y_n$ is the target of data point $n$, and $f_theta(boldsymbolx_n)$ is model's prediction. Note that $L_delta$ has nothing to do with $L_p$ norm, despite the similar notations.

When to use each of them?

edited Apr 3 at 9:03

answered Apr 3 at 7:38

Esmailian

3,181320

where $y_n$ is the target of data point $n$, and $f_theta(boldsymbolx_n)$ is model's prediction. Note that $L_delta$ has nothing to do with $L_p$ norm, despite the similar notations.

When to use each of them?

edited Apr 3 at 9:03

answered Apr 3 at 7:38

Esmailian

3,181320

edited Apr 3 at 9:03

answered Apr 3 at 7:38

Esmailian

3,181320

answered Apr 3 at 7:38

Esmailian

3,181320

answered Apr 3 at 7:38

Esmailian

3,181320

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High

1 Answer
1

1 Answer
1

1 Answer
1