Replacing mean by median over batch-size to lessen the impact of outliers The 2019 Stack Overflow Developer Survey Results Are InImplementing RMSProp, but finding differences between reference versionsHow can we detect the existence of outliers using mean and median?Are there any rules for choosing the size of a mini-batch?What is the advantage of keeping batch size a power of 2?How to decide for the contamination value (proportion of the outliers) in my dataset?Keras intuition/guidelines for setting epochs and batch sizeWhy is the batch size same as before?Why replacing null values with outliers?Batch normalization vs batch sizeIs there any formal explanation for the sensitivity of AdaBoost to outliers?
When should I buy a clipper card after flying to Oakland?
Why devices on different VLANs, but on the same subnet, can't communicate?
What information about me do stores get via my credit card?
Loose spokes after only a few rides
With regards to an effect that triggers when a creature attacks, how does it entering the battlefield tapped and attacking apply?
How to type a long/em dash `—`
Is there a better way to do an empty check in Java?
Can there be female White Walkers?
The difference between dialogue marks
How to manage monthly salary
How can I add encounters in the Lost Mine of Phandelver campaign without giving PCs too much XP?
Multiply Two Integer Polynomials
Is an up-to-date browser secure on an out-of-date OS?
Should I use my personal e-mail address, or my workplace one, when registering to external websites for work purposes?
The phrase "to the numbers born"?
Why doesn't mkfifo with a mode of 1755 grant read permissions and sticky bit to the user?
What are the motivations for publishing new editions of an existing textbook, beyond new discoveries in a field?
Can a flute soloist sit?
Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past
Ubuntu Server install with full GUI
What is the motivation for a law requiring 2 parties to consent for recording a conversation
What could be the right powersource for 15 seconds lifespan disposable giant chainsaw?
Why didn't the Event Horizon Telescope team mention Sagittarius A*?
Falsification in Math vs Science
Replacing mean by median over batch-size to lessen the impact of outliers
The 2019 Stack Overflow Developer Survey Results Are InImplementing RMSProp, but finding differences between reference versionsHow can we detect the existence of outliers using mean and median?Are there any rules for choosing the size of a mini-batch?What is the advantage of keeping batch size a power of 2?How to decide for the contamination value (proportion of the outliers) in my dataset?Keras intuition/guidelines for setting epochs and batch sizeWhy is the batch size same as before?Why replacing null values with outliers?Batch normalization vs batch sizeIs there any formal explanation for the sensitivity of AdaBoost to outliers?
$begingroup$
In the case of training a Neural Network on a regression task. Assuming the data has a significant amount of outliers. Provided that the error needs to be RMS and not MAE. Can it be better (as in less sensitive to the outliers) to replace the average over batch size in the weights update by a median over batch size computation?
For a batch size large enough, this should lessen the impact the contribution from the outliers. It does not seem to be common though, at least to current knowledge. What are the shortcomings of this approach?
neural-network training outlier
$endgroup$
add a comment |
$begingroup$
In the case of training a Neural Network on a regression task. Assuming the data has a significant amount of outliers. Provided that the error needs to be RMS and not MAE. Can it be better (as in less sensitive to the outliers) to replace the average over batch size in the weights update by a median over batch size computation?
For a batch size large enough, this should lessen the impact the contribution from the outliers. It does not seem to be common though, at least to current knowledge. What are the shortcomings of this approach?
neural-network training outlier
$endgroup$
$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05
$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07
$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23
$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25
add a comment |
$begingroup$
In the case of training a Neural Network on a regression task. Assuming the data has a significant amount of outliers. Provided that the error needs to be RMS and not MAE. Can it be better (as in less sensitive to the outliers) to replace the average over batch size in the weights update by a median over batch size computation?
For a batch size large enough, this should lessen the impact the contribution from the outliers. It does not seem to be common though, at least to current knowledge. What are the shortcomings of this approach?
neural-network training outlier
$endgroup$
In the case of training a Neural Network on a regression task. Assuming the data has a significant amount of outliers. Provided that the error needs to be RMS and not MAE. Can it be better (as in less sensitive to the outliers) to replace the average over batch size in the weights update by a median over batch size computation?
For a batch size large enough, this should lessen the impact the contribution from the outliers. It does not seem to be common though, at least to current knowledge. What are the shortcomings of this approach?
neural-network training outlier
neural-network training outlier
asked Mar 29 at 14:26
Learning is a messLearning is a mess
229211
229211
$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05
$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07
$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23
$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25
add a comment |
$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05
$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07
$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23
$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25
$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05
$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05
$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07
$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07
$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23
$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23
$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25
$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48222%2freplacing-mean-by-median-over-batch-size-to-lessen-the-impact-of-outliers%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48222%2freplacing-mean-by-median-over-batch-size-to-lessen-the-impact-of-outliers%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
To be more specific, we what to replace the average of weight gradients by median of gradients for each one-dimensional weight?
$endgroup$
– Esmailian
Mar 29 at 15:05
$begingroup$
@Esmailian Yes, maybe I was not clear enough, at weight update time the gradients are not averaged in the batch of samples dimension, instead the median over the same axes as the former is taken.
$endgroup$
– Learning is a mess
Mar 29 at 15:07
$begingroup$
This could be a breakthrough :) It remotely makes sense. There could be a correspondence between outlier samples and outlier gradients.
$endgroup$
– Esmailian
Mar 29 at 15:23
$begingroup$
@Esmailian I am yet to be convinced that this holds a breakthrough. But I am very curious about the cases for which it is more efficient, and how far it can go =)
$endgroup$
– Learning is a mess
Apr 1 at 15:25