Replacing mean by median over batch-size to lessen the impact of outliers The 2019 Stack Overflow Developer Survey Results Are InImplementing RMSProp, but finding differences between reference versionsHow can we detect the existence of outliers using mean and median?Are there any rules for choosing the size of a mini-batch?What is the advantage of keeping batch size a power of 2?How to decide for the contamination value (proportion of the outliers) in my dataset?Keras intuition/guidelines for setting epochs and batch sizeWhy is the batch size same as before?Why replacing null values with outliers?Batch normalization vs batch sizeIs there any formal explanation for the sensitivity of AdaBoost to outliers?

When should I buy a clipper card after flying to Oakland?

Why devices on different VLANs, but on the same subnet, can't communicate?

What information about me do stores get via my credit card?

Loose spokes after only a few rides

With regards to an effect that triggers when a creature attacks, how does it entering the battlefield tapped and attacking apply?

How to type a long/em dash `—`

Is there a better way to do an empty check in Java?

Can there be female White Walkers?

The difference between dialogue marks

How to manage monthly salary

How can I add encounters in the Lost Mine of Phandelver campaign without giving PCs too much XP?

Multiply Two Integer Polynomials

Is an up-to-date browser secure on an out-of-date OS?

Should I use my personal e-mail address, or my workplace one, when registering to external websites for work purposes?

The phrase "to the numbers born"?

Why doesn't mkfifo with a mode of 1755 grant read permissions and sticky bit to the user?

What are the motivations for publishing new editions of an existing textbook, beyond new discoveries in a field?

Can a flute soloist sit?

Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past

Ubuntu Server install with full GUI

What is the motivation for a law requiring 2 parties to consent for recording a conversation

What could be the right powersource for 15 seconds lifespan disposable giant chainsaw?

Why didn't the Event Horizon Telescope team mention Sagittarius A*?

Falsification in Math vs Science

Replacing mean by median over batch-size to lessen the impact of outliers

The 2019 Stack Overflow Developer Survey Results Are InImplementing RMSProp, but finding differences between reference versionsHow can we detect the existence of outliers using mean and median?Are there any rules for choosing the size of a mini-batch?What is the advantage of keeping batch size a power of 2?How to decide for the contamination value (proportion of the outliers) in my dataset?Keras intuition/guidelines for setting epochs and batch sizeWhy is the batch size same as before?Why replacing null values with outliers?Batch normalization vs batch sizeIs there any formal explanation for the sensitivity of AdaBoost to outliers?

In the case of training a Neural Network on a regression task. Assuming the data has a significant amount of outliers. Provided that the error needs to be RMS and not MAE. Can it be better (as in less sensitive to the outliers) to replace the average over batch size in the weights update by a median over batch size computation?

For a batch size large enough, this should lessen the impact the contribution from the outliers. It does not seem to be common though, at least to current knowledge. What are the shortcomings of this approach?

asked Mar 29 at 14:26

Learning is a mess

229211

$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05

$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07

$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23

$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25

add a comment |

asked Mar 29 at 14:26

Learning is a mess

229211

$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05

$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07

$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23

$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25

add a comment |

asked Mar 29 at 14:26

Learning is a mess

229211

neural-network training outlier

asked Mar 29 at 14:26

Learning is a mess

229211

asked Mar 29 at 14:26

Learning is a mess

229211

asked Mar 29 at 14:26

Learning is a mess

229211

asked Mar 29 at 14:26

Learning is a mess

229211

asked Mar 29 at 14:26

Learning is a mess

229211

$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05

$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07

$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23

$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25

add a comment |

$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05

$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07

$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23

$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25

To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?

– Esmailian
Mar 29 at 15:05

@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.

– Learning is a mess
Mar 29 at 15:07

This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.

– Esmailian
Mar 29 at 15:23

@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)

– Learning is a mess
Apr 1 at 15:25

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48222%2freplacing-mean-by-median-over-batch-size-to-lessen-the-impact-of-outliers%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

KLj,bPMU47JhKeTmJo

搜尋此網誌

Trjtdtk

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli