How do I create a data set that has a set of features for multiple options, with one option being the expected outcome? The Next CEO of Stack Overflow2019 Community Moderator ElectionStackoverflow API Structure data storageWhich machine learning algorithm should I apply for differentiate question difficulty level with users' resultMore features hurts when underfitting?Gradient Boosting Tree: “the more variable the better”?Gradient boosting vs logistic regression, for boolean featuresXGBoost: predict on only valuable featuresBest ML practice for temporal dependency between featuresWhat approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)?How do I control for some patients providing multiple samples in my training data?how does XGBoost's exact greedy split finding algorithm determine candidate split values for different feature types?
Is there an equivalent of cd - for cp or mv
Reference request: Grassmannian and Plucker coordinates in type B, C, D
Is it correct to say moon starry nights?
Is it professional to write unrelated content in an almost-empty email?
Could a dragon use its wings to swim?
Expectation in a stochastic differential equation
Is French Guiana a (hard) EU border?
Help/tips for a first time writer?
Can someone explain this formula for calculating Manhattan distance?
From jafe to El-Guest
Easy to read palindrome checker
Does destroying a Lich's phylactery destroy the soul within it?
Getting Stale Gas Out of a Gas Tank w/out Dropping the Tank
What is the process for purifying your home if you believe it may have been previously used for pagan worship?
Players Circumventing the limitations of Wish
Is it okay to majorly distort historical facts while writing a fiction story?
Is there a way to save my career from absolute disaster?
What flight has the highest ratio of timezone difference to flight time?
Graph of the history of databases
TikZ: How to fill area with a special pattern?
Is it convenient to ask the journal's editor for two additional days to complete a review?
Strange use of "whether ... than ..." in official text
Is fine stranded wire ok for main supply line?
Scary film where a woman has vaginal teeth
How do I create a data set that has a set of features for multiple options, with one option being the expected outcome?
The Next CEO of Stack Overflow2019 Community Moderator ElectionStackoverflow API Structure data storageWhich machine learning algorithm should I apply for differentiate question difficulty level with users' resultMore features hurts when underfitting?Gradient Boosting Tree: “the more variable the better”?Gradient boosting vs logistic regression, for boolean featuresXGBoost: predict on only valuable featuresBest ML practice for temporal dependency between featuresWhat approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)?How do I control for some patients providing multiple samples in my training data?how does XGBoost's exact greedy split finding algorithm determine candidate split values for different feature types?
$begingroup$
Most datasets I see are:
feature 1, feature 2, feature 3, outcome
Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.
How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?
e.g. I have a question with 3 possible answers:
"What organ pumps blood around the human body?"
A. Heart
B. Liver
C. Church Organ
And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.
question, option1 and features, option2 and features, option3 and features, correct option
Many thanks for your help!
machine-learning xgboost
$endgroup$
add a comment |
$begingroup$
Most datasets I see are:
feature 1, feature 2, feature 3, outcome
Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.
How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?
e.g. I have a question with 3 possible answers:
"What organ pumps blood around the human body?"
A. Heart
B. Liver
C. Church Organ
And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.
question, option1 and features, option2 and features, option3 and features, correct option
Many thanks for your help!
machine-learning xgboost
$endgroup$
add a comment |
$begingroup$
Most datasets I see are:
feature 1, feature 2, feature 3, outcome
Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.
How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?
e.g. I have a question with 3 possible answers:
"What organ pumps blood around the human body?"
A. Heart
B. Liver
C. Church Organ
And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.
question, option1 and features, option2 and features, option3 and features, correct option
Many thanks for your help!
machine-learning xgboost
$endgroup$
Most datasets I see are:
feature 1, feature 2, feature 3, outcome
Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.
How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?
e.g. I have a question with 3 possible answers:
"What organ pumps blood around the human body?"
A. Heart
B. Liver
C. Church Organ
And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.
question, option1 and features, option2 and features, option3 and features, correct option
Many thanks for your help!
machine-learning xgboost
machine-learning xgboost
asked Mar 24 at 12:23
OultimoCoderOultimoCoder
183
183
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
The final feature vector would be a concatenation like (for multi-class prediction):
Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
(1, 2, 3)
There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.
XGBoost parameters for multi-class classification are:
'objective': 'multi:softprob',
'num_class': 3
$endgroup$
$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35
1
$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40
1
$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55
add a comment |
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47893%2fhow-do-i-create-a-data-set-that-has-a-set-of-features-for-multiple-options-with%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The final feature vector would be a concatenation like (for multi-class prediction):
Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
(1, 2, 3)
There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.
XGBoost parameters for multi-class classification are:
'objective': 'multi:softprob',
'num_class': 3
$endgroup$
$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35
1
$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40
1
$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55
add a comment |
$begingroup$
The final feature vector would be a concatenation like (for multi-class prediction):
Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
(1, 2, 3)
There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.
XGBoost parameters for multi-class classification are:
'objective': 'multi:softprob',
'num_class': 3
$endgroup$
$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35
1
$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40
1
$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55
add a comment |
$begingroup$
The final feature vector would be a concatenation like (for multi-class prediction):
Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
(1, 2, 3)
There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.
XGBoost parameters for multi-class classification are:
'objective': 'multi:softprob',
'num_class': 3
$endgroup$
The final feature vector would be a concatenation like (for multi-class prediction):
Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
(1, 2, 3)
There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.
XGBoost parameters for multi-class classification are:
'objective': 'multi:softprob',
'num_class': 3
edited Mar 24 at 14:01
answered Mar 24 at 12:44
EsmailianEsmailian
2,272218
2,272218
$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35
1
$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40
1
$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55
add a comment |
$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35
1
$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40
1
$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55
$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35
$begingroup$
Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
$endgroup$
– OultimoCoder
Mar 24 at 13:35
1
1
$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40
$begingroup$
@OultimoCoder updated the example
$endgroup$
– Esmailian
Mar 24 at 13:40
1
1
$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55
$begingroup$
@Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
$endgroup$
– OultimoCoder
Mar 24 at 13:55
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47893%2fhow-do-i-create-a-data-set-that-has-a-set-of-features-for-multiple-options-with%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown