Normalizing the data setHow to normalize data for Neural Network and Decision ForestWhy is my PCA boomerang-shaped when normalizing?Normalizing time dataCan one build linear models on “chunks” of the data set, if one can't build them on the entire data set?Normalizing test dataWhat are some situations when normalizing input data to zero mean, unit variance is not appropriate or not beneficial?Normalization set dataNormalizing / standardizing training and validation datanormalizing data and avoiding dividing by zeroNormalizing Jaccard similarity scores in relation to differences in document length

Relation between independence and correlation of uniform random variables

Using Past-Perfect interchangeably with the Past Continuous

PTIJ: Do Irish Jews have "the luck of the Irish"?

What does Jesus mean regarding "Raca," and "you fool?" - is he contrasting them?

두음법칙 - When did North and South diverge in pronunciation of initial ㄹ?

World War I as a war of liberals against authoritarians?

Would it be believable to defy demographics in a story?

If "dar" means "to give", what does "daros" mean?

Should I be concerned about student access to a test bank?

Comment Box for Substitution Method of Integrals

Maths symbols and unicode-math input inside siunitx commands

What (if any) is the reason to buy in small local stores?

Knife as defense against stray dogs

Unfrosted light bulb

Constant Current LED Circuit

Is it possible to stack the damage done by the Absorb Elements spell?

Help rendering a complicated sum/product formula

Does .bashrc contain syntax errors?

In Aliens, how many people were on LV-426 before the Marines arrived?

How is the partial sum of a geometric sequence calculated?

Am I eligible for the Eurail Youth pass? I am 27.5 years old

Loading the leaflet Map in Lightning Web Component

What is the relationship between relativity and the Doppler effect?

Do US professors/group leaders only get a salary, but no group budget?

Normalizing the data set

How to normalize data for Neural Network and Decision ForestWhy is my PCA boomerang-shaped when normalizing?Normalizing time dataCan one build linear models on “chunks” of the data set, if one can't build them on the entire data set?Normalizing test dataWhat are some situations when normalizing input data to zero mean, unit variance is not appropriate or not beneficial?Normalization set dataNormalizing / standardizing training and validation datanormalizing data and avoiding dividing by zeroNormalizing Jaccard similarity scores in relation to differences in document length

I have two questions :

Why doesn't normalization have any effect on linear regressor performance (mathematical approach is appreciated ) ?

When we normalize the training set we ought to normalize the target set too . Won't it affect the performance ? I mean won't the data set change completely because the model had different Ranges of features as compared to the ranges of features in target set .

I tried googling the questions but was not able to come to conclusion . Any help would be appreciated .

Thanks !

edited yesterday

I_Play_With_Data

1,214532

asked 2 days ago

Apoorv Jain

1242

$begingroup$
Can you give the code in which implement this linear regressor?
$endgroup$
– Alireza Zolanvari
yesterday

$begingroup$
What do you mean by "performance"? Computational performance, score performance, residuals?
$endgroup$
– gented
yesterday

$begingroup$
I mean score performance
$endgroup$
– Apoorv Jain
yesterday

$begingroup$
Linear regressor will be effected by the scaling for sure, so try making sure that you did it correctly. Otherwise since it assigns weights to the cols, it will just pick the ones which will help it to reach the target
$endgroup$
– Aditya
yesterday

add a comment |

I have two questions :

Why doesn't normalization have any effect on linear regressor performance (mathematical approach is appreciated ) ?

When we normalize the training set we ought to normalize the target set too . Won't it affect the performance ? I mean won't the data set change completely because the model had different Ranges of features as compared to the ranges of features in target set .

I tried googling the questions but was not able to come to conclusion . Any help would be appreciated .

Thanks !

edited yesterday

I_Play_With_Data

1,214532

asked 2 days ago

Apoorv Jain

1242

$begingroup$
Can you give the code in which implement this linear regressor?
$endgroup$
– Alireza Zolanvari
yesterday

$begingroup$
What do you mean by "performance"? Computational performance, score performance, residuals?
$endgroup$
– gented
yesterday

$begingroup$
I mean score performance
$endgroup$
– Apoorv Jain
yesterday

$begingroup$
Linear regressor will be effected by the scaling for sure, so try making sure that you did it correctly. Otherwise since it assigns weights to the cols, it will just pick the ones which will help it to reach the target
$endgroup$
– Aditya
yesterday

add a comment |

I have two questions :

Why doesn't normalization have any effect on linear regressor performance (mathematical approach is appreciated ) ?

When we normalize the training set we ought to normalize the target set too . Won't it affect the performance ? I mean won't the data set change completely because the model had different Ranges of features as compared to the ranges of features in target set .

I tried googling the questions but was not able to come to conclusion . Any help would be appreciated .

Thanks !

edited yesterday

I_Play_With_Data

1,214532

asked 2 days ago

Apoorv Jain

1242

I have two questions :

Why doesn't normalization have any effect on linear regressor performance (mathematical approach is appreciated ) ?

When we normalize the training set we ought to normalize the target set too . Won't it affect the performance ? I mean won't the data set change completely because the model had different Ranges of features as compared to the ranges of features in target set .

I tried googling the questions but was not able to come to conclusion . Any help would be appreciated .

Thanks !

linear-regression normalization

edited yesterday

I_Play_With_Data

1,214532

asked 2 days ago

Apoorv Jain

1242

edited yesterday

I_Play_With_Data

1,214532

asked 2 days ago

Apoorv Jain

1242

edited yesterday

I_Play_With_Data

1,214532

edited yesterday

I_Play_With_Data

1,214532

edited yesterday

I_Play_With_Data

1,214532

asked 2 days ago

Apoorv Jain

1242

asked 2 days ago

Apoorv Jain

1242

asked 2 days ago

Apoorv Jain

1242

$begingroup$
Can you give the code in which implement this linear regressor?
$endgroup$
– Alireza Zolanvari
yesterday

$begingroup$
What do you mean by "performance"? Computational performance, score performance, residuals?
$endgroup$
– gented
yesterday

$begingroup$
I mean score performance
$endgroup$
– Apoorv Jain
yesterday

$begingroup$
Linear regressor will be effected by the scaling for sure, so try making sure that you did it correctly. Otherwise since it assigns weights to the cols, it will just pick the ones which will help it to reach the target
$endgroup$
– Aditya
yesterday

add a comment |

$begingroup$
Can you give the code in which implement this linear regressor?
$endgroup$
– Alireza Zolanvari
yesterday

$begingroup$
What do you mean by "performance"? Computational performance, score performance, residuals?
$endgroup$
– gented
yesterday

$begingroup$
I mean score performance
$endgroup$
– Apoorv Jain
yesterday

$begingroup$
Linear regressor will be effected by the scaling for sure, so try making sure that you did it correctly. Otherwise since it assigns weights to the cols, it will just pick the ones which will help it to reach the target
$endgroup$
– Aditya
yesterday

Can you give the code in which implement this linear regressor?

– Alireza Zolanvari
yesterday

What do you mean by "performance"? Computational performance, score performance, residuals?

– gented
yesterday

I mean score performance

– Apoorv Jain
yesterday

Linear regressor will be effected by the scaling for sure, so try making sure that you did it correctly. Otherwise since it assigns weights to the cols, it will just pick the ones which will help it to reach the target

– Aditya
yesterday

add a comment |

1 Answer
1

active

oldest

votes

Why doesn't normalization have any effect on linear regressor performance (mathematical approach is appreciated)?

Theoretically, normalization does not influence the performance of the model. In order to understand this let us have a look at a standard linear regression.
$$y_i = boldsymbolw^Tboldsymbolx_i + b+varepsilon_i$$

If we Introduce the scaled independent variable $boldsymbolz_i=dfrac1sigmaleft[ boldsymbolx_i - barboldsymbolxright] implies boldsymbolx_i=barboldsymbolx+sigmaboldsymbolz_i$. This will result in
$$y_i = boldsymbolw^Tbarboldsymbolx+sigmaboldsymbolw^Tboldsymbolz_i+b+varepsilon_i.$$

If we introduce $tildeb=b+boldsymbolw^Tbarboldsymbolx$ and $tildeboldsymbolw^T=sigmaboldsymbolw^T$ we can rewrite the equation into

$$y_i=tildeboldsymbolw^Tboldsymbolz_i+tildeb+varepsilon_i.$$

Hence, we see that the transformed independent variables just change the bias (if a translation is included) and the weights are scaled by $sigma$. The significance of the parameters will not change only their specified values.

But if normalization does not really enhance the model, why do we still do it? The reason is more computationally inspired. If we had very large values the weights would need to be very small such that we can scale the output of the regression into a reasonable range. A large range of numerical values forces us to reserve more memory for our variables that we use. Hence, it is better to normalize our inputs, such that the parameters don't need to scale the inputs down to a reasonable amount that it fits the output.

When we normalize the training set we ought to normalize the target set too. Won't it affect the performance? I mean won't the data set
change completely because the model had different ranges of features
as compared to the ranges of features in the target set.

Normalizing the output is not necessary, but it can also improve the numerical efficiency. You can just use the previous linear transformation on your dependent variable (output) and you will see that you can rewrite it to a standard linear regression in the new output. Just remember to transform your inputs and retransform your outputs if you want to use the original variables.

Does the significance of the parameters change?

In order to show that scaling the inputs by a constant factor $sigma$ does not influence the significance of the parameters, we will calculate the $t$-value for a given coefficient. If the $t$-value stays invariant the $p$-value will also stay invariant.

For the linear regression $y_i = boldsymbolwboldsymbolx_i+varepsilon_i$ (bias absorbed into the weight vector). The regression coefficients are given by

$$hatboldsymbolw=left[boldsymbolX^TboldsymbolX right]^-1boldsymbolX^Tboldsymbolyquad text , in which quad
boldsymbolX=beginbmatrixboldsymbolx_1^T \ vdots \ boldsymbolx_N^Tendbmatrix$$

is the data matrix with an added $1$-column for the bias.

Additionally, we need the matrix
$$boldsymbolC=left[boldsymbolX^TboldsymbolX right]^-1quad text and quad s_e = sqrtdfrac(boldsymboly-hatboldsymboly)^T(boldsymboly-hatboldsymboly)N-p-1,$$

in which $N$ is the number of observations, $p$ is the number of predictors, $boldsymboly$ is the vector of outputs and $hatboldsymboly$ is the vector of predicted outputs. We saw that the predicted values will stay invariant under scaling. Hence $s_e$ is invariant under scaling.

The $t$-value for a regression weight given by

$$t_i= dfrachatw_i-mathbbEleft[hatw_iright]s_esqrtc_ii.$$

The $c_ii$-values are the corresponding diagonal values of the $boldsymbolC$ matrix.

If we scale our inputs by $sigma$ (we ignore the 1 column of the data set which is only important for the bias) then we observe

$$boldsymbolX'=sigmaboldsymbolX
text , hatboldsymbolw' = dfrac1sigmahatboldsymbolw text , mathbbEleft[ hatboldsymbolw'right]=dfrac1sigmamathbbEleft[ hatboldsymbolwright] quad text and quad boldsymbolC' = dfrac1sigma^2boldsymbolC.$$
The last condition implies
$$implies c_ii' = dfrac1sigma^2c_ii implies sqrtc_ii' = dfrac1sigmasqrtc_ii.$$

By these observations, we see that the $t$-value stays invariant.

$$t' = dfrachatw_i'-mathbbEleft[hatw_i' right]s_esqrtc_ii'=dfrachatw_i-mathbbEleft[hatw_i right]s_esqrtc_ii=t$$

Hence, the significance of the regression coefficients didn't change either. In theory, the same should apply individually scaling the variables but the algebra gets more complicated.

edited 17 hours ago

answered yesterday

MachineLearner

1438

New contributor

$begingroup$
This is slightly incorrect. The $sigma$ aren't necessarily the same for all components and in that case I'm not sure that eventually you can factor them out.
$endgroup$
– gented
yesterday

$begingroup$
Yes, you can, but the coefficients you end up with are different than the original ones (not just a re-scaling if the $sigma$ are different). Essentially you just showed that a composition of two affine maps is an affine map, which is obvious - but that has nothing to do with invariance of the performance.
$endgroup$
– gented
19 hours ago

$begingroup$
The intercept changes and the coefficients are re-scaled individually: this means that the significance tests may result in different values (they may or they may not). Basically my point is that an answer to the question must prove that such tests don't change - which you haven't. Same goes for the residuals (they may or may not change, but it must be proven). A sketch of the answer is provided here: stats.stackexchange.com/questions/162399/…
$endgroup$
– gented
19 hours ago

$begingroup$
@gented I added the claimed result for uniform scaling. Nonuniform scaling should work as well but the algebra is more involved.
$endgroup$
– MachineLearner
17 hours ago

$begingroup$
thank you, it's now a thorough answer, +1 :)
$endgroup$
– gented
17 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47392%2fnormalizing-the-data-set%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Why doesn't normalization have any effect on linear regressor performance (mathematical approach is appreciated)?

If we introduce $tildeb=b+boldsymbolw^Tbarboldsymbolx$ and $tildeboldsymbolw^T=sigmaboldsymbolw^T$ we can rewrite the equation into

$$y_i=tildeboldsymbolw^Tboldsymbolz_i+tildeb+varepsilon_i.$$

When we normalize the training set we ought to normalize the target set too. Won't it affect the performance? I mean won't the data set
change completely because the model had different ranges of features
as compared to the ranges of features in the target set.

Does the significance of the parameters change?

For the linear regression $y_i = boldsymbolwboldsymbolx_i+varepsilon_i$ (bias absorbed into the weight vector). The regression coefficients are given by

$$hatboldsymbolw=left[boldsymbolX^TboldsymbolX right]^-1boldsymbolX^Tboldsymbolyquad text , in which quad
boldsymbolX=beginbmatrixboldsymbolx_1^T \ vdots \ boldsymbolx_N^Tendbmatrix$$

is the data matrix with an added $1$-column for the bias.

Additionally, we need the matrix
$$boldsymbolC=left[boldsymbolX^TboldsymbolX right]^-1quad text and quad s_e = sqrtdfrac(boldsymboly-hatboldsymboly)^T(boldsymboly-hatboldsymboly)N-p-1,$$

The $t$-value for a regression weight given by

$$t_i= dfrachatw_i-mathbbEleft[hatw_iright]s_esqrtc_ii.$$

The $c_ii$-values are the corresponding diagonal values of the $boldsymbolC$ matrix.

If we scale our inputs by $sigma$ (we ignore the 1 column of the data set which is only important for the bias) then we observe

By these observations, we see that the $t$-value stays invariant.

$$t' = dfrachatw_i'-mathbbEleft[hatw_i' right]s_esqrtc_ii'=dfrachatw_i-mathbbEleft[hatw_i right]s_esqrtc_ii=t$$

Hence, the significance of the regression coefficients didn't change either. In theory, the same should apply individually scaling the variables but the algebra gets more complicated.

edited 17 hours ago

answered yesterday

MachineLearner

1438

New contributor

$begingroup$
This is slightly incorrect. The $sigma$ aren't necessarily the same for all components and in that case I'm not sure that eventually you can factor them out.
$endgroup$
– gented
yesterday

$begingroup$
Yes, you can, but the coefficients you end up with are different than the original ones (not just a re-scaling if the $sigma$ are different). Essentially you just showed that a composition of two affine maps is an affine map, which is obvious - but that has nothing to do with invariance of the performance.
$endgroup$
– gented
19 hours ago

$begingroup$
The intercept changes and the coefficients are re-scaled individually: this means that the significance tests may result in different values (they may or they may not). Basically my point is that an answer to the question must prove that such tests don't change - which you haven't. Same goes for the residuals (they may or may not change, but it must be proven). A sketch of the answer is provided here: stats.stackexchange.com/questions/162399/…
$endgroup$
– gented
19 hours ago

$begingroup$
@gented I added the claimed result for uniform scaling. Nonuniform scaling should work as well but the algebra is more involved.
$endgroup$
– MachineLearner
17 hours ago

$begingroup$
thank you, it's now a thorough answer, +1 :)
$endgroup$
– gented
17 hours ago

add a comment |

Why doesn't normalization have any effect on linear regressor performance (mathematical approach is appreciated)?

If we introduce $tildeb=b+boldsymbolw^Tbarboldsymbolx$ and $tildeboldsymbolw^T=sigmaboldsymbolw^T$ we can rewrite the equation into

$$y_i=tildeboldsymbolw^Tboldsymbolz_i+tildeb+varepsilon_i.$$

When we normalize the training set we ought to normalize the target set too. Won't it affect the performance? I mean won't the data set
change completely because the model had different ranges of features
as compared to the ranges of features in the target set.

Does the significance of the parameters change?

For the linear regression $y_i = boldsymbolwboldsymbolx_i+varepsilon_i$ (bias absorbed into the weight vector). The regression coefficients are given by

$$hatboldsymbolw=left[boldsymbolX^TboldsymbolX right]^-1boldsymbolX^Tboldsymbolyquad text , in which quad
boldsymbolX=beginbmatrixboldsymbolx_1^T \ vdots \ boldsymbolx_N^Tendbmatrix$$

is the data matrix with an added $1$-column for the bias.

Additionally, we need the matrix
$$boldsymbolC=left[boldsymbolX^TboldsymbolX right]^-1quad text and quad s_e = sqrtdfrac(boldsymboly-hatboldsymboly)^T(boldsymboly-hatboldsymboly)N-p-1,$$

The $t$-value for a regression weight given by

$$t_i= dfrachatw_i-mathbbEleft[hatw_iright]s_esqrtc_ii.$$

The $c_ii$-values are the corresponding diagonal values of the $boldsymbolC$ matrix.

If we scale our inputs by $sigma$ (we ignore the 1 column of the data set which is only important for the bias) then we observe

By these observations, we see that the $t$-value stays invariant.

$$t' = dfrachatw_i'-mathbbEleft[hatw_i' right]s_esqrtc_ii'=dfrachatw_i-mathbbEleft[hatw_i right]s_esqrtc_ii=t$$

Hence, the significance of the regression coefficients didn't change either. In theory, the same should apply individually scaling the variables but the algebra gets more complicated.

edited 17 hours ago

answered yesterday

MachineLearner

1438

New contributor

$begingroup$
This is slightly incorrect. The $sigma$ aren't necessarily the same for all components and in that case I'm not sure that eventually you can factor them out.
$endgroup$
– gented
yesterday

$begingroup$
Yes, you can, but the coefficients you end up with are different than the original ones (not just a re-scaling if the $sigma$ are different). Essentially you just showed that a composition of two affine maps is an affine map, which is obvious - but that has nothing to do with invariance of the performance.
$endgroup$
– gented
19 hours ago

$begingroup$
The intercept changes and the coefficients are re-scaled individually: this means that the significance tests may result in different values (they may or they may not). Basically my point is that an answer to the question must prove that such tests don't change - which you haven't. Same goes for the residuals (they may or may not change, but it must be proven). A sketch of the answer is provided here: stats.stackexchange.com/questions/162399/…
$endgroup$
– gented
19 hours ago

$begingroup$
@gented I added the claimed result for uniform scaling. Nonuniform scaling should work as well but the algebra is more involved.
$endgroup$
– MachineLearner
17 hours ago

$begingroup$
thank you, it's now a thorough answer, +1 :)
$endgroup$
– gented
17 hours ago

add a comment |

Why doesn't normalization have any effect on linear regressor performance (mathematical approach is appreciated)?

If we introduce $tildeb=b+boldsymbolw^Tbarboldsymbolx$ and $tildeboldsymbolw^T=sigmaboldsymbolw^T$ we can rewrite the equation into

$$y_i=tildeboldsymbolw^Tboldsymbolz_i+tildeb+varepsilon_i.$$

When we normalize the training set we ought to normalize the target set too. Won't it affect the performance? I mean won't the data set
change completely because the model had different ranges of features
as compared to the ranges of features in the target set.

Does the significance of the parameters change?

For the linear regression $y_i = boldsymbolwboldsymbolx_i+varepsilon_i$ (bias absorbed into the weight vector). The regression coefficients are given by

$$hatboldsymbolw=left[boldsymbolX^TboldsymbolX right]^-1boldsymbolX^Tboldsymbolyquad text , in which quad
boldsymbolX=beginbmatrixboldsymbolx_1^T \ vdots \ boldsymbolx_N^Tendbmatrix$$

is the data matrix with an added $1$-column for the bias.

Additionally, we need the matrix
$$boldsymbolC=left[boldsymbolX^TboldsymbolX right]^-1quad text and quad s_e = sqrtdfrac(boldsymboly-hatboldsymboly)^T(boldsymboly-hatboldsymboly)N-p-1,$$

The $t$-value for a regression weight given by

$$t_i= dfrachatw_i-mathbbEleft[hatw_iright]s_esqrtc_ii.$$

The $c_ii$-values are the corresponding diagonal values of the $boldsymbolC$ matrix.

If we scale our inputs by $sigma$ (we ignore the 1 column of the data set which is only important for the bias) then we observe

By these observations, we see that the $t$-value stays invariant.

$$t' = dfrachatw_i'-mathbbEleft[hatw_i' right]s_esqrtc_ii'=dfrachatw_i-mathbbEleft[hatw_i right]s_esqrtc_ii=t$$

Hence, the significance of the regression coefficients didn't change either. In theory, the same should apply individually scaling the variables but the algebra gets more complicated.

edited 17 hours ago

answered yesterday

MachineLearner

1438

New contributor

Why doesn't normalization have any effect on linear regressor performance (mathematical approach is appreciated)?

If we introduce $tildeb=b+boldsymbolw^Tbarboldsymbolx$ and $tildeboldsymbolw^T=sigmaboldsymbolw^T$ we can rewrite the equation into

$$y_i=tildeboldsymbolw^Tboldsymbolz_i+tildeb+varepsilon_i.$$

When we normalize the training set we ought to normalize the target set too. Won't it affect the performance? I mean won't the data set
change completely because the model had different ranges of features
as compared to the ranges of features in the target set.

Does the significance of the parameters change?

For the linear regression $y_i = boldsymbolwboldsymbolx_i+varepsilon_i$ (bias absorbed into the weight vector). The regression coefficients are given by

$$hatboldsymbolw=left[boldsymbolX^TboldsymbolX right]^-1boldsymbolX^Tboldsymbolyquad text , in which quad
boldsymbolX=beginbmatrixboldsymbolx_1^T \ vdots \ boldsymbolx_N^Tendbmatrix$$

is the data matrix with an added $1$-column for the bias.

Additionally, we need the matrix
$$boldsymbolC=left[boldsymbolX^TboldsymbolX right]^-1quad text and quad s_e = sqrtdfrac(boldsymboly-hatboldsymboly)^T(boldsymboly-hatboldsymboly)N-p-1,$$

The $t$-value for a regression weight given by

$$t_i= dfrachatw_i-mathbbEleft[hatw_iright]s_esqrtc_ii.$$

The $c_ii$-values are the corresponding diagonal values of the $boldsymbolC$ matrix.

If we scale our inputs by $sigma$ (we ignore the 1 column of the data set which is only important for the bias) then we observe

By these observations, we see that the $t$-value stays invariant.

$$t' = dfrachatw_i'-mathbbEleft[hatw_i' right]s_esqrtc_ii'=dfrachatw_i-mathbbEleft[hatw_i right]s_esqrtc_ii=t$$

Hence, the significance of the regression coefficients didn't change either. In theory, the same should apply individually scaling the variables but the algebra gets more complicated.

edited 17 hours ago

answered yesterday

MachineLearner

1438

New contributor

edited 17 hours ago

answered yesterday

MachineLearner

1438

New contributor

answered yesterday

MachineLearner

1438

answered yesterday

MachineLearner

1438

New contributor

MachineLearner is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
This is slightly incorrect. The $sigma$ aren't necessarily the same for all components and in that case I'm not sure that eventually you can factor them out.
$endgroup$
– gented
yesterday

$begingroup$
Yes, you can, but the coefficients you end up with are different than the original ones (not just a re-scaling if the $sigma$ are different). Essentially you just showed that a composition of two affine maps is an affine map, which is obvious - but that has nothing to do with invariance of the performance.
$endgroup$
– gented
19 hours ago

$begingroup$
The intercept changes and the coefficients are re-scaled individually: this means that the significance tests may result in different values (they may or they may not). Basically my point is that an answer to the question must prove that such tests don't change - which you haven't. Same goes for the residuals (they may or may not change, but it must be proven). A sketch of the answer is provided here: stats.stackexchange.com/questions/162399/…
$endgroup$
– gented
19 hours ago

$begingroup$
@gented I added the claimed result for uniform scaling. Nonuniform scaling should work as well but the algebra is more involved.
$endgroup$
– MachineLearner
17 hours ago

$begingroup$
thank you, it's now a thorough answer, +1 :)
$endgroup$
– gented
17 hours ago

add a comment |

$begingroup$
This is slightly incorrect. The $sigma$ aren't necessarily the same for all components and in that case I'm not sure that eventually you can factor them out.
$endgroup$
– gented
yesterday

$begingroup$
Yes, you can, but the coefficients you end up with are different than the original ones (not just a re-scaling if the $sigma$ are different). Essentially you just showed that a composition of two affine maps is an affine map, which is obvious - but that has nothing to do with invariance of the performance.
$endgroup$
– gented
19 hours ago

$begingroup$
The intercept changes and the coefficients are re-scaled individually: this means that the significance tests may result in different values (they may or they may not). Basically my point is that an answer to the question must prove that such tests don't change - which you haven't. Same goes for the residuals (they may or may not change, but it must be proven). A sketch of the answer is provided here: stats.stackexchange.com/questions/162399/…
$endgroup$
– gented
19 hours ago

$begingroup$
@gented I added the claimed result for uniform scaling. Nonuniform scaling should work as well but the algebra is more involved.
$endgroup$
– MachineLearner
17 hours ago

$begingroup$
thank you, it's now a thorough answer, +1 :)
$endgroup$
– gented
17 hours ago

This is slightly incorrect. The $sigma$ aren't necessarily the same for all components and in that case I'm not sure that eventually you can factor them out.

– gented
yesterday

Yes, you can, but the coefficients you end up with are different than the original ones (not just a re-scaling if the $sigma$ are different). Essentially you just showed that a composition of two affine maps is an affine map, which is obvious - but that has nothing to do with invariance of the performance.

– gented
19 hours ago

The intercept changes and the coefficients are re-scaled individually: this means that the significance tests may result in different values (they may or they may not). Basically my point is that an answer to the question must prove that such tests don't change - which you haven't. Same goes for the residuals (they may or may not change, but it must be proven). A sketch of the answer is provided here: stats.stackexchange.com/questions/162399/…

– gented
19 hours ago

@gented I added the claimed result for uniform scaling. Nonuniform scaling should work as well but the algebra is more involved.

– MachineLearner
17 hours ago

thank you, it's now a thorough answer, +1 :)

– gented
17 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

NE5o9eR

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer
1

1 Answer
1

1 Answer
1