Convert categorical data in numeric preserve euclidean distance The Next CEO of Stack Overflow2019 Community Moderator ElectionConverting non-numeric data values into equivalent rank scoresMultidimensional Scaling with Categorical DataI have n dimensional data and I want to check integrity, can I downgrade to 2 dimensional feature space via PCA and do so?Under what conditions should an autoencoder be chosen over kernel PCA?Principal Component Analysis and abnormal dataData scaling before PCA: how to deal with categorical values?Reconstructing original data points from t-SNE outputGuidance needed with dimension reduction for clustering - some numerical, lots of categorical dataGiven a 12x12 binary image (only black and white pixels) what is its dimensionality? And how can I define dimensionality of a data space?Scale of the data after PCA

Can you teleport closer to a creature you are Frightened of?

Finitely generated matrix groups whose eigenvalues are all algebraic

My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?

How to coordinate airplane tickets?

Is it a bad idea to plug the other end of ESD strap to wall ground?

How does a dynamic QR code work?

How can a day be of 24 hours?

Read/write a pipe-delimited file line by line with some simple text manipulation

Why did early computer designers eschew integers?

My boss doesn't want me to have a side project

How can I prove that a state of equilibrium is unstable?

How seriously should I take size and weight limits of hand luggage?

Early programmable calculators with RS-232

What does this strange code stamp on my passport mean?

What happens if you break a law in another country outside of that country?

Is it okay to majorly distort historical facts while writing a fiction story?

How do I secure a TV wall mount?

Is it possible to make a 9x9 table fit within the default margins?

Which acid/base does a strong base/acid react when added to a buffer solution?

Shortening a title without changing its meaning

How to pronounce fünf in 45

logical reads on global temp table, but not on session-level temp table

How can I replace x-axis labels with pre-determined symbols?

Planeswalker Ability and Death Timing



Convert categorical data in numeric preserve euclidean distance



The Next CEO of Stack Overflow
2019 Community Moderator ElectionConverting non-numeric data values into equivalent rank scoresMultidimensional Scaling with Categorical DataI have n dimensional data and I want to check integrity, can I downgrade to 2 dimensional feature space via PCA and do so?Under what conditions should an autoencoder be chosen over kernel PCA?Principal Component Analysis and abnormal dataData scaling before PCA: how to deal with categorical values?Reconstructing original data points from t-SNE outputGuidance needed with dimension reduction for clustering - some numerical, lots of categorical dataGiven a 12x12 binary image (only black and white pixels) what is its dimensionality? And how can I define dimensionality of a data space?Scale of the data after PCA










0












$begingroup$


I m looking how to preserve Euclidean distance with categorical attribute.



Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?



I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.



Sameone have to suggest or explain a great tecnique?










share|improve this question











$endgroup$











  • $begingroup$
    One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
    $endgroup$
    – Kiritee Gak
    Mar 26 at 6:19










  • $begingroup$
    You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
    $endgroup$
    – theantomc
    Mar 26 at 9:02










  • $begingroup$
    Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
    $endgroup$
    – Kiritee Gak
    Mar 26 at 9:06










  • $begingroup$
    No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
    $endgroup$
    – theantomc
    Mar 26 at 11:17















0












$begingroup$


I m looking how to preserve Euclidean distance with categorical attribute.



Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?



I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.



Sameone have to suggest or explain a great tecnique?










share|improve this question











$endgroup$











  • $begingroup$
    One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
    $endgroup$
    – Kiritee Gak
    Mar 26 at 6:19










  • $begingroup$
    You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
    $endgroup$
    – theantomc
    Mar 26 at 9:02










  • $begingroup$
    Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
    $endgroup$
    – Kiritee Gak
    Mar 26 at 9:06










  • $begingroup$
    No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
    $endgroup$
    – theantomc
    Mar 26 at 11:17













0












0








0





$begingroup$


I m looking how to preserve Euclidean distance with categorical attribute.



Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?



I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.



Sameone have to suggest or explain a great tecnique?










share|improve this question











$endgroup$




I m looking how to preserve Euclidean distance with categorical attribute.



Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?



I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.



Sameone have to suggest or explain a great tecnique?







data pca dimensionality-reduction






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 26 at 6:04







theantomc

















asked Mar 25 at 15:10









theantomctheantomc

143




143











  • $begingroup$
    One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
    $endgroup$
    – Kiritee Gak
    Mar 26 at 6:19










  • $begingroup$
    You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
    $endgroup$
    – theantomc
    Mar 26 at 9:02










  • $begingroup$
    Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
    $endgroup$
    – Kiritee Gak
    Mar 26 at 9:06










  • $begingroup$
    No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
    $endgroup$
    – theantomc
    Mar 26 at 11:17
















  • $begingroup$
    One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
    $endgroup$
    – Kiritee Gak
    Mar 26 at 6:19










  • $begingroup$
    You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
    $endgroup$
    – theantomc
    Mar 26 at 9:02










  • $begingroup$
    Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
    $endgroup$
    – Kiritee Gak
    Mar 26 at 9:06










  • $begingroup$
    No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
    $endgroup$
    – theantomc
    Mar 26 at 11:17















$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19




$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19












$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02




$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02












$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06




$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06












$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17




$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17










1 Answer
1






active

oldest

votes


















0












$begingroup$

If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.



for example:



Data - [M, F, M, M]



Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]



If three types of classes are there, then it would be a 3 arrayed input for each datapoint.



If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.



But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.



Vote up, if you find this helpful ;)






share|improve this answer








New contributor




William Scott is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$












  • $begingroup$
    But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
    $endgroup$
    – theantomc
    Mar 26 at 9:01










  • $begingroup$
    Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
    $endgroup$
    – William Scott
    Mar 26 at 22:01











  • $begingroup$
    i wish avoid that people will be cluster as the same , for value that i had put in my dataset
    $endgroup$
    – theantomc
    Mar 27 at 7:49











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47951%2fconvert-categorical-data-in-numeric-preserve-euclidean-distance%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.



for example:



Data - [M, F, M, M]



Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]



If three types of classes are there, then it would be a 3 arrayed input for each datapoint.



If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.



But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.



Vote up, if you find this helpful ;)






share|improve this answer








New contributor




William Scott is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$












  • $begingroup$
    But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
    $endgroup$
    – theantomc
    Mar 26 at 9:01










  • $begingroup$
    Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
    $endgroup$
    – William Scott
    Mar 26 at 22:01











  • $begingroup$
    i wish avoid that people will be cluster as the same , for value that i had put in my dataset
    $endgroup$
    – theantomc
    Mar 27 at 7:49















0












$begingroup$

If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.



for example:



Data - [M, F, M, M]



Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]



If three types of classes are there, then it would be a 3 arrayed input for each datapoint.



If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.



But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.



Vote up, if you find this helpful ;)






share|improve this answer








New contributor




William Scott is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$












  • $begingroup$
    But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
    $endgroup$
    – theantomc
    Mar 26 at 9:01










  • $begingroup$
    Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
    $endgroup$
    – William Scott
    Mar 26 at 22:01











  • $begingroup$
    i wish avoid that people will be cluster as the same , for value that i had put in my dataset
    $endgroup$
    – theantomc
    Mar 27 at 7:49













0












0








0





$begingroup$

If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.



for example:



Data - [M, F, M, M]



Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]



If three types of classes are there, then it would be a 3 arrayed input for each datapoint.



If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.



But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.



Vote up, if you find this helpful ;)






share|improve this answer








New contributor




William Scott is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$



If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.



for example:



Data - [M, F, M, M]



Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]



If three types of classes are there, then it would be a 3 arrayed input for each datapoint.



If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.



But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.



Vote up, if you find this helpful ;)







share|improve this answer








New contributor




William Scott is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer






New contributor




William Scott is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered Mar 26 at 6:41









William ScottWilliam Scott

1063




1063




New contributor




William Scott is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





William Scott is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






William Scott is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • $begingroup$
    But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
    $endgroup$
    – theantomc
    Mar 26 at 9:01










  • $begingroup$
    Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
    $endgroup$
    – William Scott
    Mar 26 at 22:01











  • $begingroup$
    i wish avoid that people will be cluster as the same , for value that i had put in my dataset
    $endgroup$
    – theantomc
    Mar 27 at 7:49
















  • $begingroup$
    But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
    $endgroup$
    – theantomc
    Mar 26 at 9:01










  • $begingroup$
    Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
    $endgroup$
    – William Scott
    Mar 26 at 22:01











  • $begingroup$
    i wish avoid that people will be cluster as the same , for value that i had put in my dataset
    $endgroup$
    – theantomc
    Mar 27 at 7:49















$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01




$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01












$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01





$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01













$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49




$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49

















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47951%2fconvert-categorical-data-in-numeric-preserve-euclidean-distance%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?