How do I create a data set that has a set of features for multiple options, with one option being the expected outcome? The Next CEO of Stack Overflow2019 Community Moderator ElectionStackoverflow API Structure data storageWhich machine learning algorithm should I apply for differentiate question difficulty level with users' resultMore features hurts when underfitting?Gradient Boosting Tree: “the more variable the better”?Gradient boosting vs logistic regression, for boolean featuresXGBoost: predict on only valuable featuresBest ML practice for temporal dependency between featuresWhat approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)?How do I control for some patients providing multiple samples in my training data?how does XGBoost's exact greedy split finding algorithm determine candidate split values for different feature types?

Is there an equivalent of cd - for cp or mv

Reference request: Grassmannian and Plucker coordinates in type B, C, D

Is it correct to say moon starry nights?

Is it professional to write unrelated content in an almost-empty email?

Could a dragon use its wings to swim?

Expectation in a stochastic differential equation

Is French Guiana a (hard) EU border?

Help/tips for a first time writer?

Can someone explain this formula for calculating Manhattan distance?

From jafe to El-Guest

Easy to read palindrome checker

Does destroying a Lich's phylactery destroy the soul within it?

Getting Stale Gas Out of a Gas Tank w/out Dropping the Tank

What is the process for purifying your home if you believe it may have been previously used for pagan worship?

Players Circumventing the limitations of Wish

Is it okay to majorly distort historical facts while writing a fiction story?

Is there a way to save my career from absolute disaster?

What flight has the highest ratio of timezone difference to flight time?

Graph of the history of databases

TikZ: How to fill area with a special pattern?

Is it convenient to ask the journal's editor for two additional days to complete a review?

Strange use of "whether ... than ..." in official text

Is fine stranded wire ok for main supply line?

Scary film where a woman has vaginal teeth



How do I create a data set that has a set of features for multiple options, with one option being the expected outcome?



The Next CEO of Stack Overflow
2019 Community Moderator ElectionStackoverflow API Structure data storageWhich machine learning algorithm should I apply for differentiate question difficulty level with users' resultMore features hurts when underfitting?Gradient Boosting Tree: “the more variable the better”?Gradient boosting vs logistic regression, for boolean featuresXGBoost: predict on only valuable featuresBest ML practice for temporal dependency between featuresWhat approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)?How do I control for some patients providing multiple samples in my training data?how does XGBoost's exact greedy split finding algorithm determine candidate split values for different feature types?










3












$begingroup$


Most datasets I see are:



feature 1, feature 2, feature 3, outcome



Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.



How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?



e.g. I have a question with 3 possible answers:



"What organ pumps blood around the human body?"

A. Heart

B. Liver

C. Church Organ



And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.



question, option1 and features, option2 and features, option3 and features, correct option



Many thanks for your help!










share|improve this question









$endgroup$
















    3












    $begingroup$


    Most datasets I see are:



    feature 1, feature 2, feature 3, outcome



    Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.



    How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?



    e.g. I have a question with 3 possible answers:



    "What organ pumps blood around the human body?"

    A. Heart

    B. Liver

    C. Church Organ



    And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.



    question, option1 and features, option2 and features, option3 and features, correct option



    Many thanks for your help!










    share|improve this question









    $endgroup$














      3












      3








      3


      1



      $begingroup$


      Most datasets I see are:



      feature 1, feature 2, feature 3, outcome



      Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.



      How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?



      e.g. I have a question with 3 possible answers:



      "What organ pumps blood around the human body?"

      A. Heart

      B. Liver

      C. Church Organ



      And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.



      question, option1 and features, option2 and features, option3 and features, correct option



      Many thanks for your help!










      share|improve this question









      $endgroup$




      Most datasets I see are:



      feature 1, feature 2, feature 3, outcome



      Where outcome is binary e.g. if they are cancer positive outcome will be 1 and 0 if they don't have cancer.



      How do I create a dataset where there are multiple outcomes and each possible outcome has a set of features for it?



      e.g. I have a question with 3 possible answers:



      "What organ pumps blood around the human body?"

      A. Heart

      B. Liver

      C. Church Organ



      And each answer has a set of features with one answer being correct. How would I display this in a csv file? I want to read it into an xgboost algorithm for training.



      question, option1 and features, option2 and features, option3 and features, correct option



      Many thanks for your help!







      machine-learning xgboost






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 24 at 12:23









      OultimoCoderOultimoCoder

      183




      183




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          The final feature vector would be a concatenation like (for multi-class prediction):




          Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
          (1, 2, 3)




          There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.



          XGBoost parameters for multi-class classification are:



          'objective': 'multi:softprob',
          'num_class': 3





          share|improve this answer











          $endgroup$












          • $begingroup$
            Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
            $endgroup$
            – OultimoCoder
            Mar 24 at 13:35







          • 1




            $begingroup$
            @OultimoCoder updated the example
            $endgroup$
            – Esmailian
            Mar 24 at 13:40






          • 1




            $begingroup$
            @Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
            $endgroup$
            – OultimoCoder
            Mar 24 at 13:55












          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47893%2fhow-do-i-create-a-data-set-that-has-a-set-of-features-for-multiple-options-with%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          The final feature vector would be a concatenation like (for multi-class prediction):




          Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
          (1, 2, 3)




          There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.



          XGBoost parameters for multi-class classification are:



          'objective': 'multi:softprob',
          'num_class': 3





          share|improve this answer











          $endgroup$












          • $begingroup$
            Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
            $endgroup$
            – OultimoCoder
            Mar 24 at 13:35







          • 1




            $begingroup$
            @OultimoCoder updated the example
            $endgroup$
            – Esmailian
            Mar 24 at 13:40






          • 1




            $begingroup$
            @Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
            $endgroup$
            – OultimoCoder
            Mar 24 at 13:55
















          1












          $begingroup$

          The final feature vector would be a concatenation like (for multi-class prediction):




          Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
          (1, 2, 3)




          There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.



          XGBoost parameters for multi-class classification are:



          'objective': 'multi:softprob',
          'num_class': 3





          share|improve this answer











          $endgroup$












          • $begingroup$
            Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
            $endgroup$
            – OultimoCoder
            Mar 24 at 13:35







          • 1




            $begingroup$
            @OultimoCoder updated the example
            $endgroup$
            – Esmailian
            Mar 24 at 13:40






          • 1




            $begingroup$
            @Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
            $endgroup$
            – OultimoCoder
            Mar 24 at 13:55














          1












          1








          1





          $begingroup$

          The final feature vector would be a concatenation like (for multi-class prediction):




          Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
          (1, 2, 3)




          There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.



          XGBoost parameters for multi-class classification are:



          'objective': 'multi:softprob',
          'num_class': 3





          share|improve this answer











          $endgroup$



          The final feature vector would be a concatenation like (for multi-class prediction):




          Question google count | option A google count | option B google count | option C google count | option C no. words | option A no. words | other features | label
          (1, 2, 3)




          There is no need to put features related to option A close to each other (or in any particular order), they just need to be on the same column for all rows regardless of the label.



          XGBoost parameters for multi-class classification are:



          'objective': 'multi:softprob',
          'num_class': 3






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 24 at 14:01

























          answered Mar 24 at 12:44









          EsmailianEsmailian

          2,272218




          2,272218











          • $begingroup$
            Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
            $endgroup$
            – OultimoCoder
            Mar 24 at 13:35







          • 1




            $begingroup$
            @OultimoCoder updated the example
            $endgroup$
            – Esmailian
            Mar 24 at 13:40






          • 1




            $begingroup$
            @Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
            $endgroup$
            – OultimoCoder
            Mar 24 at 13:55

















          • $begingroup$
            Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
            $endgroup$
            – OultimoCoder
            Mar 24 at 13:35







          • 1




            $begingroup$
            @OultimoCoder updated the example
            $endgroup$
            – Esmailian
            Mar 24 at 13:40






          • 1




            $begingroup$
            @Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
            $endgroup$
            – OultimoCoder
            Mar 24 at 13:55
















          $begingroup$
          Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
          $endgroup$
          – OultimoCoder
          Mar 24 at 13:35





          $begingroup$
          Ok, I understand what you're saying, thank you, but let's say one of my additional features is the number of pages returned in a google search when googling the question and answer: e.g. 1,000,000 for option a and question, 200,000 for option b and 0 for option c. How would I add these features to the dataset? Do I just add 3 more rows? optionaresults, optionbresults, optioncresults Because what I don't understand is will the results be attributed to the correct option in the model? If this makes sense?
          $endgroup$
          – OultimoCoder
          Mar 24 at 13:35





          1




          1




          $begingroup$
          @OultimoCoder updated the example
          $endgroup$
          – Esmailian
          Mar 24 at 13:40




          $begingroup$
          @OultimoCoder updated the example
          $endgroup$
          – Esmailian
          Mar 24 at 13:40




          1




          1




          $begingroup$
          @Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
          $endgroup$
          – OultimoCoder
          Mar 24 at 13:55





          $begingroup$
          @Emailian Ahhhh thank you, where I was going wrong was I was assuming the features for each label had to be explicit. I didn't fully understand it. Your edits helped explain it better. I'll wait a day before choosing your answer as the correct one.
          $endgroup$
          – OultimoCoder
          Mar 24 at 13:55


















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47893%2fhow-do-i-create-a-data-set-that-has-a-set-of-features-for-multiple-options-with%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High