LightGBM - Why Exclusive Feature Bundling (EFB)?what is init_score in lightGBM?Injecting random values as one input feature for feature selection results in a odd beaviourLightGBM vs XGBoostWhat scale does LightGBM use for output?What approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)?Catboost Categorical Features Handling Options (CTR settings)?Boruta Feature Selection packageSuggestions on using model in production 1 test at a timeHow does L1 Regularization work in lightGBMWhy am I getting accuracy of Xgboost model 0.00%?

What word means "to make something obsolete"?

Can a cyclic Amine form an Amide?

Junior developer struggles: how to communicate with management?

Disabling Resource Governor in SQL Server

What does air vanishing on contact sound like?

Hang 20lb projector screen on Hardieplank

How to assert on pagereference where the endpoint of pagereference is predefined

Any examples of headwear for races with animal ears?

Would "lab meat" be able to feed a much larger global population

Packet sniffer for MacOS Mojave and above

How did Captain America use this power?

Is it legal to define an unnamed struct?

Point of the the Dothraki's attack in GoT S8E3?

My ID is expired, can I fly to the Bahamas with my passport?

Save terminal output to a txt file

How can I close a gap between my fence and my neighbor's that's on his side of the property line?

Is Cola "probably the best-known" Latin word in the world? If not, which might it be?

Can I use 1000v rectifier diodes instead of 600v rectifier diodes?

Can commander tax be proliferated?

You look catfish vs You look like a catfish?

If Melisandre foresaw another character closing blue eyes, why did she follow Stannis?

Is it cheaper to drop cargo than to land it?

Power LED from 3.3V Power Pin without Resistor

Why are notes ordered like they are on a piano?

LightGBM - Why Exclusive Feature Bundling (EFB)?

what is init_score in lightGBM?Injecting random values as one input feature for feature selection results in a odd beaviourLightGBM vs XGBoostWhat scale does LightGBM use for output?What approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)?Catboost Categorical Features Handling Options (CTR settings)?Boruta Feature Selection packageSuggestions on using model in production 1 test at a timeHow does L1 Regularization work in lightGBMWhy am I getting accuracy of Xgboost model 0.00%?

I'm currently studying GBDT and started reading LightGBM's research paper.

In section 4. they explain the Exclusive Feature Bundling algorithm, which aims at reducing the number of features by regrouping mutually exclusive features into bundles, treating them as a single feature. The researchers emphasize the fact that one must be able to retrieve the original values of the features from the bundle.

Question: If we have a categorical feature that has been one-hot encoded, won't this algorithm simply reverse the one-hot encoding to a numeric encoding, thereby cancelling all the benefits of our previous encoding? (suppression of hierarchy between categories etc.)

edited Apr 10 at 13:07

ebrahimi

76021022

asked Nov 30 '18 at 14:36

T. Morvan

111

add a comment |

I'm currently studying GBDT and started reading LightGBM's research paper.

edited Apr 10 at 13:07

ebrahimi

76021022

asked Nov 30 '18 at 14:36

T. Morvan

111

add a comment |

I'm currently studying GBDT and started reading LightGBM's research paper.

edited Apr 10 at 13:07

ebrahimi

76021022

asked Nov 30 '18 at 14:36

T. Morvan

111

I'm currently studying GBDT and started reading LightGBM's research paper.

feature-selection decision-trees xgboost machine-learning-model gbm

edited Apr 10 at 13:07

ebrahimi

76021022

asked Nov 30 '18 at 14:36

T. Morvan

111

edited Apr 10 at 13:07

ebrahimi

76021022

asked Nov 30 '18 at 14:36

T. Morvan

111

edited Apr 10 at 13:07

ebrahimi

76021022

edited Apr 10 at 13:07

ebrahimi

76021022

edited Apr 10 at 13:07

ebrahimi

76021022

asked Nov 30 '18 at 14:36

T. Morvan

111

asked Nov 30 '18 at 14:36

T. Morvan

111

asked Nov 30 '18 at 14:36

T. Morvan

111

add a comment |

2 Answers
2

active

oldest

votes

I've read that paper so many times before in so many ways. What I can say on the matter is that the paper does not describe explicitly what the framework particularly does. It just gives an hint of their intuitive idea of bundling of the features in an efficient way. But specificly, it does not say that it does a 'reversion of one-hot-encoding' in particular to your question.

I tried giving categorical inputs directly and as one-hot-encoded to compare the time that it takes to compute. There was a significant difference: giving directly was all better in multiple datasets compared to giving as one-hot-encoded.

Possibilities:

1)It is possible that LightGBM Framework can find out that we give the features as one-hot-encoded from the sparsity, it is possible that the algorithm does not treat one-hot-encoded with EFB.

2)It is also possible that LightGBM uses EFB on one-hot-encoded samples but it may be harmful, or not good as EFB on direct categorical inputs. (I go for this one)

But still, I do not think that EFB will reverse one-hot-encoding since EFB is explained as a unique way of treating the categorical features. But it possibly 'bundles the unbundled features' when treating one-hot-encoded inputs.

I used the word 'probably' so much times out of implicitness of the paper. What I can advice to you is that to send an e-mail to one of the authors of the paper, I do not think that they would refuse to explain it. Or if you are brave, go for the GitHub Repo of LightGBM, to check the codes by yourself.
I hope that I could give you an insight. If you come up with an exact answer on the matter, please let me know. Please do not hesitate to further discuss this, I'll be around. Good luck, have fun!

edited Apr 10 at 13:07

ebrahimi

76021022

answered Dec 2 '18 at 21:03

Ugur MULUK

4047

add a comment |

From what the paper describes, EFB serves to speed up by reducing number of features. I think it is not saying there is no other effects. Of course whether other 'effects' are real concerns is another question.

Also, EFB does not only deal with one-hot encoded features, but continuous features also.

I also think it would not bundle all one-hot encoded features with the possibility of getting an overflow value.

answered Apr 10 at 10:51

Raymond Kwok

111

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f41907%2flightgbm-why-exclusive-feature-bundling-efb%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Possibilities:

1)It is possible that LightGBM Framework can find out that we give the features as one-hot-encoded from the sparsity, it is possible that the algorithm does not treat one-hot-encoded with EFB.

2)It is also possible that LightGBM uses EFB on one-hot-encoded samples but it may be harmful, or not good as EFB on direct categorical inputs. (I go for this one)

edited Apr 10 at 13:07

ebrahimi

76021022

answered Dec 2 '18 at 21:03

Ugur MULUK

4047

add a comment |

Possibilities:

1)It is possible that LightGBM Framework can find out that we give the features as one-hot-encoded from the sparsity, it is possible that the algorithm does not treat one-hot-encoded with EFB.

2)It is also possible that LightGBM uses EFB on one-hot-encoded samples but it may be harmful, or not good as EFB on direct categorical inputs. (I go for this one)

edited Apr 10 at 13:07

ebrahimi

76021022

answered Dec 2 '18 at 21:03

Ugur MULUK

4047

add a comment |

Possibilities:

1)It is possible that LightGBM Framework can find out that we give the features as one-hot-encoded from the sparsity, it is possible that the algorithm does not treat one-hot-encoded with EFB.

2)It is also possible that LightGBM uses EFB on one-hot-encoded samples but it may be harmful, or not good as EFB on direct categorical inputs. (I go for this one)

edited Apr 10 at 13:07

ebrahimi

76021022

answered Dec 2 '18 at 21:03

Ugur MULUK

4047

Possibilities:

1)It is possible that LightGBM Framework can find out that we give the features as one-hot-encoded from the sparsity, it is possible that the algorithm does not treat one-hot-encoded with EFB.

2)It is also possible that LightGBM uses EFB on one-hot-encoded samples but it may be harmful, or not good as EFB on direct categorical inputs. (I go for this one)

edited Apr 10 at 13:07

ebrahimi

76021022

answered Dec 2 '18 at 21:03

Ugur MULUK

4047

edited Apr 10 at 13:07

ebrahimi

76021022

edited Apr 10 at 13:07

ebrahimi

76021022

edited Apr 10 at 13:07

ebrahimi

76021022

answered Dec 2 '18 at 21:03

Ugur MULUK

4047

answered Dec 2 '18 at 21:03

Ugur MULUK

4047

answered Dec 2 '18 at 21:03

Ugur MULUK

4047

add a comment |

Also, EFB does not only deal with one-hot encoded features, but continuous features also.

I also think it would not bundle all one-hot encoded features with the possibility of getting an overflow value.

answered Apr 10 at 10:51

Raymond Kwok

111

add a comment |

Also, EFB does not only deal with one-hot encoded features, but continuous features also.

I also think it would not bundle all one-hot encoded features with the possibility of getting an overflow value.

answered Apr 10 at 10:51

Raymond Kwok

111

add a comment |

Also, EFB does not only deal with one-hot encoded features, but continuous features also.

I also think it would not bundle all one-hot encoded features with the possibility of getting an overflow value.

answered Apr 10 at 10:51

Raymond Kwok

111

Also, EFB does not only deal with one-hot encoded features, but continuous features also.

I also think it would not bundle all one-hot encoded features with the possibility of getting an overflow value.

answered Apr 10 at 10:51

Raymond Kwok

111

answered Apr 10 at 10:51

Raymond Kwok

111

answered Apr 10 at 10:51

Raymond Kwok

111

answered Apr 10 at 10:51

Raymond Kwok

111

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

PCNLzEJUvNjB7LsOexV1A2hZWWYs0K,bF 4NopADJZNPc8 G,mY0l3 HUFmS1i s LgfDd VpuZ6BOlLGijar ExH,EPAWXfTf12tmAC

搜尋此網誌

Trjtdtk

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers
2

2 Answers
2

2 Answers
2