Convert categorical data in numeric preserve euclidean distance The Next CEO of Stack Overflow2019 Community Moderator ElectionConverting non-numeric data values into equivalent rank scoresMultidimensional Scaling with Categorical DataI have n dimensional data and I want to check integrity, can I downgrade to 2 dimensional feature space via PCA and do so?Under what conditions should an autoencoder be chosen over kernel PCA?Principal Component Analysis and abnormal dataData scaling before PCA: how to deal with categorical values?Reconstructing original data points from t-SNE outputGuidance needed with dimension reduction for clustering - some numerical, lots of categorical dataGiven a 12x12 binary image (only black and white pixels) what is its dimensionality? And how can I define dimensionality of a data space?Scale of the data after PCA
Can you teleport closer to a creature you are Frightened of?
Finitely generated matrix groups whose eigenvalues are all algebraic
My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?
How to coordinate airplane tickets?
Is it a bad idea to plug the other end of ESD strap to wall ground?
How does a dynamic QR code work?
How can a day be of 24 hours?
Read/write a pipe-delimited file line by line with some simple text manipulation
Why did early computer designers eschew integers?
My boss doesn't want me to have a side project
How can I prove that a state of equilibrium is unstable?
How seriously should I take size and weight limits of hand luggage?
Early programmable calculators with RS-232
What does this strange code stamp on my passport mean?
What happens if you break a law in another country outside of that country?
Is it okay to majorly distort historical facts while writing a fiction story?
How do I secure a TV wall mount?
Is it possible to make a 9x9 table fit within the default margins?
Which acid/base does a strong base/acid react when added to a buffer solution?
Shortening a title without changing its meaning
How to pronounce fünf in 45
logical reads on global temp table, but not on session-level temp table
How can I replace x-axis labels with pre-determined symbols?
Planeswalker Ability and Death Timing
Convert categorical data in numeric preserve euclidean distance
The Next CEO of Stack Overflow2019 Community Moderator ElectionConverting non-numeric data values into equivalent rank scoresMultidimensional Scaling with Categorical DataI have n dimensional data and I want to check integrity, can I downgrade to 2 dimensional feature space via PCA and do so?Under what conditions should an autoencoder be chosen over kernel PCA?Principal Component Analysis and abnormal dataData scaling before PCA: how to deal with categorical values?Reconstructing original data points from t-SNE outputGuidance needed with dimension reduction for clustering - some numerical, lots of categorical dataGiven a 12x12 binary image (only black and white pixels) what is its dimensionality? And how can I define dimensionality of a data space?Scale of the data after PCA
$begingroup$
I m looking how to preserve Euclidean distance with categorical attribute.
Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?
I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.
Sameone have to suggest or explain a great tecnique?
data pca dimensionality-reduction
$endgroup$
add a comment |
$begingroup$
I m looking how to preserve Euclidean distance with categorical attribute.
Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?
I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.
Sameone have to suggest or explain a great tecnique?
data pca dimensionality-reduction
$endgroup$
$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19
$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02
$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06
$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17
add a comment |
$begingroup$
I m looking how to preserve Euclidean distance with categorical attribute.
Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?
I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.
Sameone have to suggest or explain a great tecnique?
data pca dimensionality-reduction
$endgroup$
I m looking how to preserve Euclidean distance with categorical attribute.
Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?
I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.
Sameone have to suggest or explain a great tecnique?
data pca dimensionality-reduction
data pca dimensionality-reduction
edited Mar 26 at 6:04
theantomc
asked Mar 25 at 15:10
theantomctheantomc
143
143
$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19
$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02
$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06
$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17
add a comment |
$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19
$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02
$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06
$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17
$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19
$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19
$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02
$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02
$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06
$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06
$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17
$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.
for example:
Data - [M, F, M, M]
Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]
If three types of classes are there, then it would be a 3 arrayed input for each datapoint.
If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.
But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.
Vote up, if you find this helpful ;)
New contributor
$endgroup$
$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01
$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01
$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49
add a comment |
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47951%2fconvert-categorical-data-in-numeric-preserve-euclidean-distance%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.
for example:
Data - [M, F, M, M]
Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]
If three types of classes are there, then it would be a 3 arrayed input for each datapoint.
If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.
But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.
Vote up, if you find this helpful ;)
New contributor
$endgroup$
$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01
$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01
$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49
add a comment |
$begingroup$
If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.
for example:
Data - [M, F, M, M]
Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]
If three types of classes are there, then it would be a 3 arrayed input for each datapoint.
If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.
But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.
Vote up, if you find this helpful ;)
New contributor
$endgroup$
$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01
$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01
$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49
add a comment |
$begingroup$
If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.
for example:
Data - [M, F, M, M]
Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]
If three types of classes are there, then it would be a 3 arrayed input for each datapoint.
If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.
But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.
Vote up, if you find this helpful ;)
New contributor
$endgroup$
If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.
for example:
Data - [M, F, M, M]
Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]
If three types of classes are there, then it would be a 3 arrayed input for each datapoint.
If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.
But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.
Vote up, if you find this helpful ;)
New contributor
New contributor
answered Mar 26 at 6:41
William ScottWilliam Scott
1063
1063
New contributor
New contributor
$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01
$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01
$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49
add a comment |
$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01
$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01
$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49
$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01
$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01
$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01
$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01
$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49
$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47951%2fconvert-categorical-data-in-numeric-preserve-euclidean-distance%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19
$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02
$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06
$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17