How to Normalise features for small datasets?applying word2vec on small text filesImplementing Batch normalisation in Neural networkClassification methods using one overlapping featureData preprocessing: Should we normalise images pixel-wise?What kind of neural network structure is suitable for image to image learning?Is this a problem for a Seq2Seq model?Classification of phone numbers belonging to same client
Which was the first story featuring espers?
Make a Bowl of Alphabet Soup
How can I write humor as character trait?
Does grappling negate Mirror Image?
Why is the Sun approximated as a black body at ~ 5800 K?
What do you call a word that can be spelled forward or backward forming two different words
What features enable the Su-25 Frogfoot to operate with such a wide variety of fuels?
Why do Radio Buttons not fill the entire outer circle?
Microchip documentation does not label CAN buss pins on micro controller pinout diagram
Why Shazam when there is already Superman?
What kind of floor tile is this?
C++ copy constructor called at return
The IT department bottlenecks progress, how should I handle this?
How do I tell my boss that I'm quitting soon, especially given that a colleague just left this week
Why should universal income be universal?
Is it necessary to use pronouns with the verb "essere"?
What's the name of the logical fallacy where a debater extends a statement far beyond the original statement to make it true?
Why is so much work done on numerical verification of the Riemann Hypothesis?
Taxes on Dividends in a Roth IRA
A variation to the phrase "hanging over my shoulders"
Non-trope happy ending?
Can you use Vicious Mockery to win an argument or gain favours?
US tourist/student visa
Why do ¬, ∀ and ∃ have the same precedence?
How to Normalise features for small datasets?
applying word2vec on small text filesImplementing Batch normalisation in Neural networkClassification methods using one overlapping featureData preprocessing: Should we normalise images pixel-wise?What kind of neural network structure is suitable for image to image learning?Is this a problem for a Seq2Seq model?Classification of phone numbers belonging to same client
$begingroup$
I am working with a small dataset ( N = 50 ).
I would like to normalise my input features.
I am facing the following issues:
- Because of the small size of the dataset the range of training input features would differ from the range of testing input features.
- The input features do not have a theoretical upper bound.
Can you suggest normalisation techniques suitable for this task? Any paper suggestions would also be appreciated.
machine-learning preprocessing
$endgroup$
add a comment |
$begingroup$
I am working with a small dataset ( N = 50 ).
I would like to normalise my input features.
I am facing the following issues:
- Because of the small size of the dataset the range of training input features would differ from the range of testing input features.
- The input features do not have a theoretical upper bound.
Can you suggest normalisation techniques suitable for this task? Any paper suggestions would also be appreciated.
machine-learning preprocessing
$endgroup$
add a comment |
$begingroup$
I am working with a small dataset ( N = 50 ).
I would like to normalise my input features.
I am facing the following issues:
- Because of the small size of the dataset the range of training input features would differ from the range of testing input features.
- The input features do not have a theoretical upper bound.
Can you suggest normalisation techniques suitable for this task? Any paper suggestions would also be appreciated.
machine-learning preprocessing
$endgroup$
I am working with a small dataset ( N = 50 ).
I would like to normalise my input features.
I am facing the following issues:
- Because of the small size of the dataset the range of training input features would differ from the range of testing input features.
- The input features do not have a theoretical upper bound.
Can you suggest normalisation techniques suitable for this task? Any paper suggestions would also be appreciated.
machine-learning preprocessing
machine-learning preprocessing
asked Feb 16 at 7:35
Pranav GargPranav Garg
112
112
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
You can use a MinMaxScaler on your train set, which will normalize your features inside [0, 1]. The same scaler can transform the test set and if there are values greater than the ones found in the train set, the scaler handles that by returning values greater than 1. Essentially, your test set will be normalized.
$endgroup$
add a comment |
$begingroup$
You should normalise your dataset after the split. You could try out a standard scaling as in this way you avoid taking into account the minimum and maximum value.
The fact that you have 2 different ranges in train and test is not positive at all. In this case you can do some resampling such as SMOTE.
Let me know
$endgroup$
1
$begingroup$
Normalizing before splitting is a bad practice, because you incorporate knowledge from the test set into your preprocessing
$endgroup$
– pcko1
Mar 18 at 8:53
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45675%2fhow-to-normalise-features-for-small-datasets%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can use a MinMaxScaler on your train set, which will normalize your features inside [0, 1]. The same scaler can transform the test set and if there are values greater than the ones found in the train set, the scaler handles that by returning values greater than 1. Essentially, your test set will be normalized.
$endgroup$
add a comment |
$begingroup$
You can use a MinMaxScaler on your train set, which will normalize your features inside [0, 1]. The same scaler can transform the test set and if there are values greater than the ones found in the train set, the scaler handles that by returning values greater than 1. Essentially, your test set will be normalized.
$endgroup$
add a comment |
$begingroup$
You can use a MinMaxScaler on your train set, which will normalize your features inside [0, 1]. The same scaler can transform the test set and if there are values greater than the ones found in the train set, the scaler handles that by returning values greater than 1. Essentially, your test set will be normalized.
$endgroup$
You can use a MinMaxScaler on your train set, which will normalize your features inside [0, 1]. The same scaler can transform the test set and if there are values greater than the ones found in the train set, the scaler handles that by returning values greater than 1. Essentially, your test set will be normalized.
answered Mar 18 at 8:56
pcko1pcko1
1,581417
1,581417
add a comment |
add a comment |
$begingroup$
You should normalise your dataset after the split. You could try out a standard scaling as in this way you avoid taking into account the minimum and maximum value.
The fact that you have 2 different ranges in train and test is not positive at all. In this case you can do some resampling such as SMOTE.
Let me know
$endgroup$
1
$begingroup$
Normalizing before splitting is a bad practice, because you incorporate knowledge from the test set into your preprocessing
$endgroup$
– pcko1
Mar 18 at 8:53
add a comment |
$begingroup$
You should normalise your dataset after the split. You could try out a standard scaling as in this way you avoid taking into account the minimum and maximum value.
The fact that you have 2 different ranges in train and test is not positive at all. In this case you can do some resampling such as SMOTE.
Let me know
$endgroup$
1
$begingroup$
Normalizing before splitting is a bad practice, because you incorporate knowledge from the test set into your preprocessing
$endgroup$
– pcko1
Mar 18 at 8:53
add a comment |
$begingroup$
You should normalise your dataset after the split. You could try out a standard scaling as in this way you avoid taking into account the minimum and maximum value.
The fact that you have 2 different ranges in train and test is not positive at all. In this case you can do some resampling such as SMOTE.
Let me know
$endgroup$
You should normalise your dataset after the split. You could try out a standard scaling as in this way you avoid taking into account the minimum and maximum value.
The fact that you have 2 different ranges in train and test is not positive at all. In this case you can do some resampling such as SMOTE.
Let me know
edited 2 days ago
answered Feb 16 at 7:55
3nomis3nomis
1929
1929
1
$begingroup$
Normalizing before splitting is a bad practice, because you incorporate knowledge from the test set into your preprocessing
$endgroup$
– pcko1
Mar 18 at 8:53
add a comment |
1
$begingroup$
Normalizing before splitting is a bad practice, because you incorporate knowledge from the test set into your preprocessing
$endgroup$
– pcko1
Mar 18 at 8:53
1
1
$begingroup$
Normalizing before splitting is a bad practice, because you incorporate knowledge from the test set into your preprocessing
$endgroup$
– pcko1
Mar 18 at 8:53
$begingroup$
Normalizing before splitting is a bad practice, because you incorporate knowledge from the test set into your preprocessing
$endgroup$
– pcko1
Mar 18 at 8:53
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45675%2fhow-to-normalise-features-for-small-datasets%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown