Is dimension reduction helpful to select features for a classification problem?2019 Community Moderator ElectionDimension reduction for logical arraysVarious algorithms performance in a problem and what can be deduced about data and problem?selecting variable randomly at each node in a tree in Random ForestWhy are autoencoders for dimension reduction symmetrical?Dimensionality reduction with known colinearity between featuresWhich algorithm to apply for choosing the right pointHow to select features for Text classification problemWhich Kind of Machine Learning should I use for an Optimization Problem?Feeding machine learning model with different matrixPCA, SMOTE and cross validation- how to combine them together?
The plural of 'stomach"
quarter to five p.m
If a character can use a +X magic weapon as a spellcasting focus, does it add the bonus to spell attacks or spell save DCs?
Can I Retrieve Email Addresses from BCC?
Is there an Impartial Brexit Deal comparison site?
What is the opposite of 'gravitas'?
Why are on-board computers allowed to change controls without notifying the pilots?
Lay out the Carpet
Unattended/Unattended to?
Applicability of Single Responsibility Principle
Curses work by shouting - How to avoid collateral damage?
What will be the benefits of Brexit?
How does a character multiclassing into warlock get a focus?
How will losing mobility of one hand affect my career as a programmer?
Is a roofing delivery truck likely to crack my driveway slab?
Why is delta-v is the most useful quantity for planning space travel?
Failed to fetch jessie backports repository
Have I saved too much for retirement so far?
How could Frankenstein get the parts for his _second_ creature?
Is there a problem with hiding "forgot password" until it's needed?
Everything Bob says is false. How does he get people to trust him?
How does residential electricity work?
Why did Kant, Hegel, and Adorno leave some words and phrases in the Greek alphabet?
What is the intuitive meaning of having a linear relationship between the logs of two variables?
Is dimension reduction helpful to select features for a classification problem?
2019 Community Moderator ElectionDimension reduction for logical arraysVarious algorithms performance in a problem and what can be deduced about data and problem?selecting variable randomly at each node in a tree in Random ForestWhy are autoencoders for dimension reduction symmetrical?Dimensionality reduction with known colinearity between featuresWhich algorithm to apply for choosing the right pointHow to select features for Text classification problemWhich Kind of Machine Learning should I use for an Optimization Problem?Feeding machine learning model with different matrixPCA, SMOTE and cross validation- how to combine them together?
$begingroup$
Let's say I have a data set but I don't know what features are relevant to solve a classification/regression problem.
In this case, is it worth/good to use a dimension reduction algorithm and then apply a classification algorithm? Or can I just select "randomly" my features by using my common sense and then try to tune my algorithm next?
Also if someone has some explanation of a dimension reduction "in real life with real use case" it would be great because I feel my comprehension of dimension reduction is wrong!
machine-learning classification data-mining pca dimensionality-reduction
$endgroup$
add a comment |
$begingroup$
Let's say I have a data set but I don't know what features are relevant to solve a classification/regression problem.
In this case, is it worth/good to use a dimension reduction algorithm and then apply a classification algorithm? Or can I just select "randomly" my features by using my common sense and then try to tune my algorithm next?
Also if someone has some explanation of a dimension reduction "in real life with real use case" it would be great because I feel my comprehension of dimension reduction is wrong!
machine-learning classification data-mining pca dimensionality-reduction
$endgroup$
add a comment |
$begingroup$
Let's say I have a data set but I don't know what features are relevant to solve a classification/regression problem.
In this case, is it worth/good to use a dimension reduction algorithm and then apply a classification algorithm? Or can I just select "randomly" my features by using my common sense and then try to tune my algorithm next?
Also if someone has some explanation of a dimension reduction "in real life with real use case" it would be great because I feel my comprehension of dimension reduction is wrong!
machine-learning classification data-mining pca dimensionality-reduction
$endgroup$
Let's say I have a data set but I don't know what features are relevant to solve a classification/regression problem.
In this case, is it worth/good to use a dimension reduction algorithm and then apply a classification algorithm? Or can I just select "randomly" my features by using my common sense and then try to tune my algorithm next?
Also if someone has some explanation of a dimension reduction "in real life with real use case" it would be great because I feel my comprehension of dimension reduction is wrong!
machine-learning classification data-mining pca dimensionality-reduction
machine-learning classification data-mining pca dimensionality-reduction
edited 14 hours ago
Media
7,42262162
7,42262162
asked Feb 20 at 23:02
FK IEFK IE
212
212
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
$begingroup$
Well, let's say it depends on the distribution of your data. In approaches like PCA the approach does not care about the labels of the data in hand. This is why PCA may lead to data which are sometimes difficult to be separated or vice versa. PCA just cares about which direction leads to more variance and take that direction as a new basis. Not caring about the labels is why you cannot say it may lead to a better space for classification or not. You have to employ that and after that, investigate whether it's helpful or not. Approaches like LDA or other variants of that take care of the labels but they are linear classifiers which are not strong at least in a current feature space where you've not done any feature engineering.
$endgroup$
add a comment |
$begingroup$
The question is: why you want to apply a features selection?
In many algorithms, you can use all the features and it will be the model that picks the one that are more important for the prediction.
To me some reasons to apply features selection is:
- business cost of using more features
- interpretation of results
- fear that noise in the data can let the model pick up wrong features and bias results
New contributor
$endgroup$
add a comment |
$begingroup$
If you don't care which features are included, using PCA (or something similar) can help.
If you do have some information on which features influence classification or regression, you can certainly try to fit a model without dimensional reduction.
PCA, which is one of the more common dimensional reduction techniques, yields vectors that are all orthogonal (as in, uncorrelated). This means that even if your features are correlated, after the dimensional reduction, your model won't struggle with collinearity. Depending on your model type, this can be crucial. A real life example could be any housing dataset, where the features describe the house and the target is the price. Many of the features will be correlated (e.g. number of bathrooms and number of bedroom or number of rooms and square footage), and so a linear regression model may get tripped up by the collinearity. Dimensional reduction will capture the variance across the features while yielding fewer columns.
$endgroup$
add a comment |
$begingroup$
For feature selection, we can also use Random Forest. Check this one:
https://chrisalbon.com/machine_learning/trees_and_forests/feature_selection_using_random_forest/
Also, forward/backward stepwise variable selection is an option. Check this one:
https://gerardnico.com/data_mining/stepwise_regression
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45922%2fis-dimension-reduction-helpful-to-select-features-for-a-classification-problem%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Well, let's say it depends on the distribution of your data. In approaches like PCA the approach does not care about the labels of the data in hand. This is why PCA may lead to data which are sometimes difficult to be separated or vice versa. PCA just cares about which direction leads to more variance and take that direction as a new basis. Not caring about the labels is why you cannot say it may lead to a better space for classification or not. You have to employ that and after that, investigate whether it's helpful or not. Approaches like LDA or other variants of that take care of the labels but they are linear classifiers which are not strong at least in a current feature space where you've not done any feature engineering.
$endgroup$
add a comment |
$begingroup$
Well, let's say it depends on the distribution of your data. In approaches like PCA the approach does not care about the labels of the data in hand. This is why PCA may lead to data which are sometimes difficult to be separated or vice versa. PCA just cares about which direction leads to more variance and take that direction as a new basis. Not caring about the labels is why you cannot say it may lead to a better space for classification or not. You have to employ that and after that, investigate whether it's helpful or not. Approaches like LDA or other variants of that take care of the labels but they are linear classifiers which are not strong at least in a current feature space where you've not done any feature engineering.
$endgroup$
add a comment |
$begingroup$
Well, let's say it depends on the distribution of your data. In approaches like PCA the approach does not care about the labels of the data in hand. This is why PCA may lead to data which are sometimes difficult to be separated or vice versa. PCA just cares about which direction leads to more variance and take that direction as a new basis. Not caring about the labels is why you cannot say it may lead to a better space for classification or not. You have to employ that and after that, investigate whether it's helpful or not. Approaches like LDA or other variants of that take care of the labels but they are linear classifiers which are not strong at least in a current feature space where you've not done any feature engineering.
$endgroup$
Well, let's say it depends on the distribution of your data. In approaches like PCA the approach does not care about the labels of the data in hand. This is why PCA may lead to data which are sometimes difficult to be separated or vice versa. PCA just cares about which direction leads to more variance and take that direction as a new basis. Not caring about the labels is why you cannot say it may lead to a better space for classification or not. You have to employ that and after that, investigate whether it's helpful or not. Approaches like LDA or other variants of that take care of the labels but they are linear classifiers which are not strong at least in a current feature space where you've not done any feature engineering.
answered 14 hours ago
MediaMedia
7,42262162
7,42262162
add a comment |
add a comment |
$begingroup$
The question is: why you want to apply a features selection?
In many algorithms, you can use all the features and it will be the model that picks the one that are more important for the prediction.
To me some reasons to apply features selection is:
- business cost of using more features
- interpretation of results
- fear that noise in the data can let the model pick up wrong features and bias results
New contributor
$endgroup$
add a comment |
$begingroup$
The question is: why you want to apply a features selection?
In many algorithms, you can use all the features and it will be the model that picks the one that are more important for the prediction.
To me some reasons to apply features selection is:
- business cost of using more features
- interpretation of results
- fear that noise in the data can let the model pick up wrong features and bias results
New contributor
$endgroup$
add a comment |
$begingroup$
The question is: why you want to apply a features selection?
In many algorithms, you can use all the features and it will be the model that picks the one that are more important for the prediction.
To me some reasons to apply features selection is:
- business cost of using more features
- interpretation of results
- fear that noise in the data can let the model pick up wrong features and bias results
New contributor
$endgroup$
The question is: why you want to apply a features selection?
In many algorithms, you can use all the features and it will be the model that picks the one that are more important for the prediction.
To me some reasons to apply features selection is:
- business cost of using more features
- interpretation of results
- fear that noise in the data can let the model pick up wrong features and bias results
New contributor
New contributor
answered 12 hours ago
VD93VD93
111
111
New contributor
New contributor
add a comment |
add a comment |
$begingroup$
If you don't care which features are included, using PCA (or something similar) can help.
If you do have some information on which features influence classification or regression, you can certainly try to fit a model without dimensional reduction.
PCA, which is one of the more common dimensional reduction techniques, yields vectors that are all orthogonal (as in, uncorrelated). This means that even if your features are correlated, after the dimensional reduction, your model won't struggle with collinearity. Depending on your model type, this can be crucial. A real life example could be any housing dataset, where the features describe the house and the target is the price. Many of the features will be correlated (e.g. number of bathrooms and number of bedroom or number of rooms and square footage), and so a linear regression model may get tripped up by the collinearity. Dimensional reduction will capture the variance across the features while yielding fewer columns.
$endgroup$
add a comment |
$begingroup$
If you don't care which features are included, using PCA (or something similar) can help.
If you do have some information on which features influence classification or regression, you can certainly try to fit a model without dimensional reduction.
PCA, which is one of the more common dimensional reduction techniques, yields vectors that are all orthogonal (as in, uncorrelated). This means that even if your features are correlated, after the dimensional reduction, your model won't struggle with collinearity. Depending on your model type, this can be crucial. A real life example could be any housing dataset, where the features describe the house and the target is the price. Many of the features will be correlated (e.g. number of bathrooms and number of bedroom or number of rooms and square footage), and so a linear regression model may get tripped up by the collinearity. Dimensional reduction will capture the variance across the features while yielding fewer columns.
$endgroup$
add a comment |
$begingroup$
If you don't care which features are included, using PCA (or something similar) can help.
If you do have some information on which features influence classification or regression, you can certainly try to fit a model without dimensional reduction.
PCA, which is one of the more common dimensional reduction techniques, yields vectors that are all orthogonal (as in, uncorrelated). This means that even if your features are correlated, after the dimensional reduction, your model won't struggle with collinearity. Depending on your model type, this can be crucial. A real life example could be any housing dataset, where the features describe the house and the target is the price. Many of the features will be correlated (e.g. number of bathrooms and number of bedroom or number of rooms and square footage), and so a linear regression model may get tripped up by the collinearity. Dimensional reduction will capture the variance across the features while yielding fewer columns.
$endgroup$
If you don't care which features are included, using PCA (or something similar) can help.
If you do have some information on which features influence classification or regression, you can certainly try to fit a model without dimensional reduction.
PCA, which is one of the more common dimensional reduction techniques, yields vectors that are all orthogonal (as in, uncorrelated). This means that even if your features are correlated, after the dimensional reduction, your model won't struggle with collinearity. Depending on your model type, this can be crucial. A real life example could be any housing dataset, where the features describe the house and the target is the price. Many of the features will be correlated (e.g. number of bathrooms and number of bedroom or number of rooms and square footage), and so a linear regression model may get tripped up by the collinearity. Dimensional reduction will capture the variance across the features while yielding fewer columns.
answered Feb 21 at 1:13
David AtlasDavid Atlas
312
312
add a comment |
add a comment |
$begingroup$
For feature selection, we can also use Random Forest. Check this one:
https://chrisalbon.com/machine_learning/trees_and_forests/feature_selection_using_random_forest/
Also, forward/backward stepwise variable selection is an option. Check this one:
https://gerardnico.com/data_mining/stepwise_regression
New contributor
$endgroup$
add a comment |
$begingroup$
For feature selection, we can also use Random Forest. Check this one:
https://chrisalbon.com/machine_learning/trees_and_forests/feature_selection_using_random_forest/
Also, forward/backward stepwise variable selection is an option. Check this one:
https://gerardnico.com/data_mining/stepwise_regression
New contributor
$endgroup$
add a comment |
$begingroup$
For feature selection, we can also use Random Forest. Check this one:
https://chrisalbon.com/machine_learning/trees_and_forests/feature_selection_using_random_forest/
Also, forward/backward stepwise variable selection is an option. Check this one:
https://gerardnico.com/data_mining/stepwise_regression
New contributor
$endgroup$
For feature selection, we can also use Random Forest. Check this one:
https://chrisalbon.com/machine_learning/trees_and_forests/feature_selection_using_random_forest/
Also, forward/backward stepwise variable selection is an option. Check this one:
https://gerardnico.com/data_mining/stepwise_regression
New contributor
New contributor
answered 15 hours ago
AnjuAnju
82
82
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45922%2fis-dimension-reduction-helpful-to-select-features-for-a-classification-problem%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown