Convert categorical data in numeric preserve euclidean distance The Next CEO of Stack Overflow2019 Community Moderator ElectionConverting non-numeric data values into equivalent rank scoresMultidimensional Scaling with Categorical DataI have n dimensional data and I want to check integrity, can I downgrade to 2 dimensional feature space via PCA and do so?Under what conditions should an autoencoder be chosen over kernel PCA?Principal Component Analysis and abnormal dataData scaling before PCA: how to deal with categorical values?Reconstructing original data points from t-SNE outputGuidance needed with dimension reduction for clustering - some numerical, lots of categorical dataGiven a 12x12 binary image (only black and white pixels) what is its dimensionality? And how can I define dimensionality of a data space?Scale of the data after PCA

Can you teleport closer to a creature you are Frightened of?

Finitely generated matrix groups whose eigenvalues are all algebraic

My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?

How to coordinate airplane tickets?

Is it a bad idea to plug the other end of ESD strap to wall ground?

How does a dynamic QR code work?

How can a day be of 24 hours?

Read/write a pipe-delimited file line by line with some simple text manipulation

Why did early computer designers eschew integers?

My boss doesn't want me to have a side project

How can I prove that a state of equilibrium is unstable?

How seriously should I take size and weight limits of hand luggage?

Early programmable calculators with RS-232

What does this strange code stamp on my passport mean?

What happens if you break a law in another country outside of that country?

Is it okay to majorly distort historical facts while writing a fiction story?

How do I secure a TV wall mount?

Is it possible to make a 9x9 table fit within the default margins?

Which acid/base does a strong base/acid react when added to a buffer solution?

Shortening a title without changing its meaning

How to pronounce fünf in 45

logical reads on global temp table, but not on session-level temp table

How can I replace x-axis labels with pre-determined symbols?

Planeswalker Ability and Death Timing

Convert categorical data in numeric preserve euclidean distance

The Next CEO of Stack Overflow

2019 Community Moderator ElectionConverting non-numeric data values into equivalent rank scoresMultidimensional Scaling with Categorical DataI have n dimensional data and I want to check integrity, can I downgrade to 2 dimensional feature space via PCA and do so?Under what conditions should an autoencoder be chosen over kernel PCA?Principal Component Analysis and abnormal dataData scaling before PCA: how to deal with categorical values?Reconstructing original data points from t-SNE outputGuidance needed with dimension reduction for clustering - some numerical, lots of categorical dataGiven a 12x12 binary image (only black and white pixels) what is its dimensionality? And how can I define dimensionality of a data space?Scale of the data after PCA

I m looking how to preserve Euclidean distance with categorical attribute.

Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?

I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.

Sameone have to suggest or explain a great tecnique?

edited Mar 26 at 6:04

asked Mar 25 at 15:10

theantomc

143

$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19

$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02

$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06

$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17

add a comment |

I m looking how to preserve Euclidean distance with categorical attribute.

Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?

I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.

Sameone have to suggest or explain a great tecnique?

edited Mar 26 at 6:04

asked Mar 25 at 15:10

theantomc

143

$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19

$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02

$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06

$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17

add a comment |

I m looking how to preserve Euclidean distance with categorical attribute.

Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?

I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.

Sameone have to suggest or explain a great tecnique?

edited Mar 26 at 6:04

asked Mar 25 at 15:10

theantomc

143

I m looking how to preserve Euclidean distance with categorical attribute.

Ad example, if I have a dataset with attribute of people, Age, weight etc..and i find a attribute "sex" where contain "female" ad "male" for gender, how can i do for analysis?

I seen that i can trasform in 0 and 1, but for me dont have more sense. Why i can't choose 10 and 20 like number for male and female?
I Wish that this value in my analysis take a sense.

Sameone have to suggest or explain a great tecnique?

data pca dimensionality-reduction

edited Mar 26 at 6:04

asked Mar 25 at 15:10

theantomc

143

edited Mar 26 at 6:04

asked Mar 25 at 15:10

theantomc

143

edited Mar 26 at 6:04

asked Mar 25 at 15:10

theantomc

143

asked Mar 25 at 15:10

theantomc

143

asked Mar 25 at 15:10

theantomc

143

$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19

$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02

$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06

$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17

add a comment |

$begingroup$
One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here
$endgroup$
– Kiritee Gak
Mar 26 at 6:19

$begingroup$
You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak
$endgroup$
– theantomc
Mar 26 at 9:02

$begingroup$
Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.
$endgroup$
– Kiritee Gak
Mar 26 at 9:06

$begingroup$
No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?
$endgroup$
– theantomc
Mar 26 at 11:17

One hot encode and find similarity. It will be bound for categorical parts. If you trying to mix categorical/continuous to find some distance, there are already answers here

– Kiritee Gak
Mar 26 at 6:19

You speak about Cosine similarity? Can apply PCA after dummy trasformation? For me don't have more sense @KiriteeGak

– theantomc
Mar 26 at 9:02

Yes, cosine similarity will do. I have no clue why you are thinking of pca, but no it is not useful.

– Kiritee Gak
Mar 26 at 9:06

No Need ti apply PCA after for cluster? Exist a way to cosine similarity for cluster ?

– theantomc
Mar 26 at 11:17

add a comment |

1 Answer
1

active

oldest

votes

If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.

for example:

Data - [M, F, M, M]

Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]

If three types of classes are there, then it would be a 3 arrayed input for each datapoint.

If you feel that having the numbers 10 and 20 for Male and Female is meaningful to you, then you can go ahead and use it. There's nothing wrong in that. But when you want to finally train on the Data, say LSTM, it prefers taking in the categorical data.

But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.

Vote up, if you find this helpful ;)

answered Mar 26 at 6:41

William Scott

1063

New contributor

$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01

$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01

$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49

add a comment |

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47951%2fconvert-categorical-data-in-numeric-preserve-euclidean-distance%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.

for example:

Data - [M, F, M, M]

Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]

If three types of classes are there, then it would be a 3 arrayed input for each datapoint.

But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.

Vote up, if you find this helpful ;)

answered Mar 26 at 6:41

William Scott

1063

New contributor

$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01

$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01

$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49

add a comment |

If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.

for example:

Data - [M, F, M, M]

Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]

If three types of classes are there, then it would be a 3 arrayed input for each datapoint.

But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.

Vote up, if you find this helpful ;)

answered Mar 26 at 6:41

William Scott

1063

New contributor

$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01

$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01

$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49

add a comment |

If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.

for example:

Data - [M, F, M, M]

Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]

If three types of classes are there, then it would be a 3 arrayed input for each datapoint.

But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.

Vote up, if you find this helpful ;)

answered Mar 26 at 6:41

William Scott

1063

New contributor

If i understand your question correct, you are misusing the word categorical. Categorical is always a 0 or 1 in their respective indices.

for example:

Data - [M, F, M, M]

Categorical Data: [[1, 0], [0, 1], [1, 0], [1, 0]]

If three types of classes are there, then it would be a 3 arrayed input for each datapoint.

But if you are talking about the input attributes, then you need not worry about the 0-1 problem, Just use as they are.

Vote up, if you find this helpful ;)

answered Mar 26 at 6:41

William Scott

1063

New contributor

answered Mar 26 at 6:41

William Scott

1063

New contributor

answered Mar 26 at 6:41

William Scott

1063

answered Mar 26 at 6:41

William Scott

1063

New contributor

William Scott is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01

$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01

$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49

add a comment |

$begingroup$
But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?
$endgroup$
– theantomc
Mar 26 at 9:01

$begingroup$
Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.
$endgroup$
– William Scott
Mar 26 at 22:01

$begingroup$
i wish avoid that people will be cluster as the same , for value that i had put in my dataset
$endgroup$
– theantomc
Mar 27 at 7:49

But if i have 2 attribute, ad example M and F and another features that represent subscribe or not in a website ad example (that can i have just value like yes and no) I have M= 1, F=0 and Yes=1 , No=0 ...So male guys that are subscribe will be cluster togheter...this make sense?

– theantomc
Mar 26 at 9:01

Yes. then in that case, the value defined will just make sure that the intra cluster distance is more. But even if the values are binary, the clusters will be same.

– William Scott
Mar 26 at 22:01

i wish avoid that people will be cluster as the same , for value that i had put in my dataset

– theantomc
Mar 27 at 7:49

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

iMN zroeqtOGWf,UO80uc,wGuc 6XDq,i9QWywOMzl1CYMCo p,x1MLK zr,COj09Parylar r

搜尋此網誌

Trjtdtk

1 Answer
1

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer
1

1 Answer
1

1 Answer
1