How do I create a data set that has a set of features for multiple options, with one option being the expected outcome? The Next CEO of Stack Overflow2019 Community Moderator ElectionStackoverflow API Structure data storageWhich machine learning algorithm should I apply for differentiate question difficulty level with users' resultMore features hurts when underfitting?Gradient Boosting Tree: “the more variable the better”?Gradient boosting vs logistic regression, for boolean featuresXGBoost: predict on only valuable featuresBest ML practice for temporal dependency between featuresWhat approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)?How do I control for some patients providing multiple samples in my training data?how does XGBoost's exact greedy split finding algorithm determine candidate split values for different feature types?

Is there an equivalent of cd - for cp or mv

Reference request: Grassmannian and Plucker coordinates in type B, C, D

Is it correct to say moon starry nights?

Is it professional to write unrelated content in an almost-empty email?

Could a dragon use its wings to swim?

Expectation in a stochastic differential equation

Is French Guiana a (hard) EU border?

Help/tips for a first time writer?

Can someone explain this formula for calculating Manhattan distance?

From jafe to El-Guest

Easy to read palindrome checker

Does destroying a Lich's phylactery destroy the soul within it?

Getting Stale Gas Out of a Gas Tank w/out Dropping the Tank

What is the process for purifying your home if you believe it may have been previously used for pagan worship?

Players Circumventing the limitations of Wish

Is it okay to majorly distort historical facts while writing a fiction story?

Is there a way to save my career from absolute disaster?

What flight has the highest ratio of timezone difference to flight time?

Graph of the history of databases

TikZ: How to fill area with a special pattern?

Is it convenient to ask the journal's editor for two additional days to complete a review?

Strange use of "whether ... than ..." in official text

Is fine stranded wire ok for main supply line?

Scary film where a woman has vaginal teeth

How do I create a data set that has a set of features for multiple options, with one option being the expected outcome?

The Next CEO of Stack Overflow

2019 Community Moderator ElectionStackoverflow API Structure data storageWhich machine learning algorithm should I apply for differentiate question difficulty level with users' resultMore features hurts when underfitting?Gradient Boosting Tree: “the more variable the better”?Gradient boosting vs logistic regression, for boolean featuresXGBoost: predict on only valuable featuresBest ML practice for temporal dependency between featuresWhat approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)?How do I control for some patients providing multiple samples in my training data?how does XGBoost's exact greedy split finding algorithm determine candidate split values for different feature types?

Most datasets I see are:

feature 1, feature 2, feature 3, outcome

Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.

How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?

e.g. I have a question with 3 possible answers:

"What organ pumps blood around the human body?"

A. Heart

B. Liver

C. Church Organ

And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.

question, option1 and features, option2 and features, option3 and features, correct option

Many thanks for your help!

asked Mar 24 at 12:23

OultimoCoder

183

add a comment |

Most datasets I see are:

feature 1, feature 2, feature 3, outcome

Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.

How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?

e.g. I have a question with 3 possible answers:

"What organ pumps blood around the human body?"

A. Heart

B. Liver

C. Church Organ

And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.

question, option1 and features, option2 and features, option3 and features, correct option

Many thanks for your help!

asked Mar 24 at 12:23

OultimoCoder

183

add a comment |

Most datasets I see are:

feature 1, feature 2, feature 3, outcome

Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.

How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?

e.g. I have a question with 3 possible answers:

"What organ pumps blood around the human body?"

A. Heart

B. Liver

C. Church Organ

And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.

question, option1 and features, option2 and features, option3 and features, correct option

Many thanks for your help!

asked Mar 24 at 12:23

OultimoCoder

183

Most datasets I see are:

feature 1, feature 2, feature 3, outcome

Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.

How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?

e.g. I have a question with 3 possible answers:

"What organ pumps blood around the human body?"

A. Heart

B. Liver

C. Church Organ

And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.

question, option1 and features, option2 and features, option3 and features, correct option

Many thanks for your help!

machine-learning xgboost

asked Mar 24 at 12:23

OultimoCoder

183

asked Mar 24 at 12:23

OultimoCoder

183

asked Mar 24 at 12:23

OultimoCoder

183

asked Mar 24 at 12:23

OultimoCoder

183

asked Mar 24 at 12:23

OultimoCoder

183

add a comment |

1 Answer
1

active

oldest

votes

The final feature vector would be a concatenation like (for multi-class prediction):

Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
(1, 2, 3)

There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.

XGBoost parameters for multi-class classification are:

'objective': 'multi:softprob',
'num_class': 3

edited Mar 24 at 14:01

answered Mar 24 at 12:44

Esmailian

2,272218

$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35

1

$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40

1

$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55

add a comment |

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47893%2fhow-do-i-create-a-data-set-that-has-a-set-of-features-for-multiple-options-with%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The final feature vector would be a concatenation like (for multi-class prediction):

Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
(1, 2, 3)

There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.

XGBoost parameters for multi-class classification are:

'objective': 'multi:softprob',
'num_class': 3

edited Mar 24 at 14:01

answered Mar 24 at 12:44

Esmailian

2,272218

$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35

1

$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40

1

$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55

add a comment |

The final feature vector would be a concatenation like (for multi-class prediction):

Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
(1, 2, 3)

There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.

XGBoost parameters for multi-class classification are:

'objective': 'multi:softprob',
'num_class': 3

edited Mar 24 at 14:01

answered Mar 24 at 12:44

Esmailian

2,272218

$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35

1

$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40

1

$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55

add a comment |

The final feature vector would be a concatenation like (for multi-class prediction):

Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
(1, 2, 3)

There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.

XGBoost parameters for multi-class classification are:

'objective': 'multi:softprob',
'num_class': 3

edited Mar 24 at 14:01

answered Mar 24 at 12:44

Esmailian

2,272218

The final feature vector would be a concatenation like (for multi-class prediction):

Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
(1, 2, 3)

There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.

XGBoost parameters for multi-class classification are:

'objective': 'multi:softprob',
'num_class': 3

edited Mar 24 at 14:01

answered Mar 24 at 12:44

Esmailian

2,272218

edited Mar 24 at 14:01

answered Mar 24 at 12:44

Esmailian

2,272218

answered Mar 24 at 12:44

Esmailian

2,272218

answered Mar 24 at 12:44

Esmailian

2,272218

$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35

1

$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40

1

$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55

add a comment |

$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35

1

$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40

1

$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55

Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?

– OultimoCoder
Mar 24 at 13:35

@OultimoCoder updated the example

– Esmailian
Mar 24 at 13:40

@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.

– OultimoCoder
Mar 24 at 13:55

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

1 Answer
1

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1