Aggregating target-encoded array-like categorical features?Ground-truth and feature extraction for predictive modellingDo categorical features always need to be encoded?Suitable aggregations (mean, median or something else) to make features?Preparing, Scaling and Selecting from a combination of numerical and categorical featuresCatboost Categorical Features Handling Options (CTR settings)?Removing Categorial Features in Linear RegressionOne hot encoding vs Word embeddingOne Hot Encoding vs Word Embeding - When to choose one or another?How to handle continuous values and a binary target?Target Encoding: missing value imputation before or after encoding

What was the state of the German rail system in 1944?

Is it cheaper to drop cargo than to land it?

Which industry am I working in? Software development or financial services?

Filling cracks with epoxy after Tung oil

SQL Server Always On File Share Witness (Quorum vote) on different subnet to other nodes

Manager is threatning to grade me poorly if I don't complete the project

I caught several of my students plagiarizing. Could it be my fault as a teacher?

A mathematically illogical argument in the derivation of Hamilton's equation in Goldstein

Why Isn’t SQL More Refactorable?

What is a "listed natural gas appliance"?

How encryption in SQL login authentication works

When and why did journal article titles become descriptive, rather than creatively allusive?

Why was the battle set up *outside* Winterfell?

Did we get closer to another plane than we were supposed to, or was the pilot just protecting our delicate sensibilities?

Automatically use long arrows in display mode

Why is B♯ higher than C♭ in 31-ET?

What is Shri Venkateshwara Mangalasasana stotram recited for?

Is this homebrew life-stealing melee cantrip unbalanced?

What are the differences between credential stuffing and password spraying?

Should I replace my bicycle tires if they have not been inflated in multiple years

Junior developer struggles: how to communicate with management?

Why isn't nylon as strong as kevlar?

Where can I go to avoid planes overhead?

What word means "to make something obsolete"?

Aggregating target-encoded array-like categorical features?

Ground-truth and feature extraction for predictive modellingDo categorical features always need to be encoded?Suitable aggregations (mean, median or something else) to make features?Preparing, Scaling and Selecting from a combination of numerical and categorical featuresCatboost Categorical Features Handling Options (CTR settings)?Removing Categorial Features in Linear RegressionOne hot encoding vs Word embeddingOne Hot Encoding vs Word Embeding - When to choose one or another?How to handle continuous values and a binary target?Target Encoding: missing value imputation before or after encoding

I am trying find commonly used techniques when dealing with high cardinality multi-valued categorical variables for machine learning classification algorithms.

One-hot encoding leads to very high dimensionality. The approach I've landed on is target-encoding/mean-encoding. I understand how to use this when the categorical feature is a single choice (eg current zip code). But, when the feature can take on multiple values from a large list (eg favorite hobbies, illness symptoms, university coursework), I am not sure how to combine the values.

My intuition says that the wrong approach would be to take each unique combination as its own factor and encode that, as it would lead to overfitting. Other things that come to mind would be simple aggregations like sum/avg/product/variance.

How should target encoded values be combined?

asked Apr 9 at 18:41

user4446237

1135

add a comment |

I am trying find commonly used techniques when dealing with high cardinality multi-valued categorical variables for machine learning classification algorithms.

How should target encoded values be combined?

asked Apr 9 at 18:41

user4446237

1135

add a comment |

I am trying find commonly used techniques when dealing with high cardinality multi-valued categorical variables for machine learning classification algorithms.

How should target encoded values be combined?

asked Apr 9 at 18:41

user4446237

1135

I am trying find commonly used techniques when dealing with high cardinality multi-valued categorical variables for machine learning classification algorithms.

How should target encoded values be combined?

machine-learning feature-engineering encoding

asked Apr 9 at 18:41

user4446237

1135

asked Apr 9 at 18:41

user4446237

1135

asked Apr 9 at 18:41

user4446237

1135

asked Apr 9 at 18:41

user4446237

1135

asked Apr 9 at 18:41

user4446237

1135

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48987%2faggregating-target-encoded-array-like-categorical-features%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

KuArarMNYFGVpoEXHP,kSmL0WwGKINq,V6,LsV,gd4jcAH4bGziCk,3SsCgxOyy0FO

搜尋此網誌

Trjtdtk

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli