Replacing mean by median over batch-size to lessen the impact of outliers The 2019 Stack Overflow Developer Survey Results Are InImplementing RMSProp, but finding differences between reference versionsHow can we detect the existence of outliers using mean and median?Are there any rules for choosing the size of a mini-batch?What is the advantage of keeping batch size a power of 2?How to decide for the contamination value (proportion of the outliers) in my dataset?Keras intuition/guidelines for setting epochs and batch sizeWhy is the batch size same as before?Why replacing null values with outliers?Batch normalization vs batch sizeIs there any formal explanation for the sensitivity of AdaBoost to outliers?

When should I buy a clipper card after flying to Oakland?

Why devices on different VLANs, but on the same subnet, can't communicate?

What information about me do stores get via my credit card?

Loose spokes after only a few rides

With regards to an effect that triggers when a creature attacks, how does it entering the battlefield tapped and attacking apply?

How to type a long/em dash `—`

Is there a better way to do an empty check in Java?

Can there be female White Walkers?

The difference between dialogue marks

How to manage monthly salary

How can I add encounters in the Lost Mine of Phandelver campaign without giving PCs too much XP?

Multiply Two Integer Polynomials

Is an up-to-date browser secure on an out-of-date OS?

Should I use my personal e-mail address, or my workplace one, when registering to external websites for work purposes?

The phrase "to the numbers born"?

Why doesn't mkfifo with a mode of 1755 grant read permissions and sticky bit to the user?

What are the motivations for publishing new editions of an existing textbook, beyond new discoveries in a field?

Can a flute soloist sit?

Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past

Ubuntu Server install with full GUI

What is the motivation for a law requiring 2 parties to consent for recording a conversation

What could be the right powersource for 15 seconds lifespan disposable giant chainsaw?

Why didn't the Event Horizon Telescope team mention Sagittarius A*?

Falsification in Math vs Science



Replacing mean by median over batch-size to lessen the impact of outliers



The 2019 Stack Overflow Developer Survey Results Are InImplementing RMSProp, but finding differences between reference versionsHow can we detect the existence of outliers using mean and median?Are there any rules for choosing the size of a mini-batch?What is the advantage of keeping batch size a power of 2?How to decide for the contamination value (proportion of the outliers) in my dataset?Keras intuition/guidelines for setting epochs and batch sizeWhy is the batch size same as before?Why replacing null values with outliers?Batch normalization vs batch sizeIs there any formal explanation for the sensitivity of AdaBoost to outliers?










2












$begingroup$


In the case of training a Neural Network on a regression task. Assuming the data has a significant amount of outliers. Provided that the error needs to be RMS and not MAE. Can it be better (as in less sensitive to the outliers) to replace the average over batch size in the weights update by a median over batch size computation?



For a batch size large enough, this should lessen the impact the contribution from the outliers. It does not seem to be common though, at least to current knowledge. What are the shortcomings of this approach?










share|improve this question









$endgroup$











  • $begingroup$
    To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
    $endgroup$
    – Esmailian
    Mar 29 at 15:05











  • $begingroup$
    @Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
    $endgroup$
    – Learning is a mess
    Mar 29 at 15:07










  • $begingroup$
    This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
    $endgroup$
    – Esmailian
    Mar 29 at 15:23










  • $begingroup$
    @Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
    $endgroup$
    – Learning is a mess
    Apr 1 at 15:25















2












$begingroup$


In the case of training a Neural Network on a regression task. Assuming the data has a significant amount of outliers. Provided that the error needs to be RMS and not MAE. Can it be better (as in less sensitive to the outliers) to replace the average over batch size in the weights update by a median over batch size computation?



For a batch size large enough, this should lessen the impact the contribution from the outliers. It does not seem to be common though, at least to current knowledge. What are the shortcomings of this approach?










share|improve this question









$endgroup$











  • $begingroup$
    To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
    $endgroup$
    – Esmailian
    Mar 29 at 15:05











  • $begingroup$
    @Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
    $endgroup$
    – Learning is a mess
    Mar 29 at 15:07










  • $begingroup$
    This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
    $endgroup$
    – Esmailian
    Mar 29 at 15:23










  • $begingroup$
    @Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
    $endgroup$
    – Learning is a mess
    Apr 1 at 15:25













2












2








2


1



$begingroup$


In the case of training a Neural Network on a regression task. Assuming the data has a significant amount of outliers. Provided that the error needs to be RMS and not MAE. Can it be better (as in less sensitive to the outliers) to replace the average over batch size in the weights update by a median over batch size computation?



For a batch size large enough, this should lessen the impact the contribution from the outliers. It does not seem to be common though, at least to current knowledge. What are the shortcomings of this approach?










share|improve this question









$endgroup$




In the case of training a Neural Network on a regression task. Assuming the data has a significant amount of outliers. Provided that the error needs to be RMS and not MAE. Can it be better (as in less sensitive to the outliers) to replace the average over batch size in the weights update by a median over batch size computation?



For a batch size large enough, this should lessen the impact the contribution from the outliers. It does not seem to be common though, at least to current knowledge. What are the shortcomings of this approach?







neural-network training outlier






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 29 at 14:26









Learning is a messLearning is a mess

229211




229211











  • $begingroup$
    To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
    $endgroup$
    – Esmailian
    Mar 29 at 15:05











  • $begingroup$
    @Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
    $endgroup$
    – Learning is a mess
    Mar 29 at 15:07










  • $begingroup$
    This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
    $endgroup$
    – Esmailian
    Mar 29 at 15:23










  • $begingroup$
    @Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
    $endgroup$
    – Learning is a mess
    Apr 1 at 15:25
















  • $begingroup$
    To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
    $endgroup$
    – Esmailian
    Mar 29 at 15:05











  • $begingroup$
    @Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
    $endgroup$
    – Learning is a mess
    Mar 29 at 15:07










  • $begingroup$
    This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
    $endgroup$
    – Esmailian
    Mar 29 at 15:23










  • $begingroup$
    @Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
    $endgroup$
    – Learning is a mess
    Apr 1 at 15:25















$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05





$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05













$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07




$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07












$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23




$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23












$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25




$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25










0






active

oldest

votes












Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48222%2freplacing-mean-by-median-over-batch-size-to-lessen-the-impact-of-outliers%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48222%2freplacing-mean-by-median-over-batch-size-to-lessen-the-impact-of-outliers%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?