Is filtering a dataset still a good option if the dataset is very small?2019 Community Moderator ElectionWhat is the meaning of spherical dataset?How should classification be done for a very small data set?Prediction questions related to the datasetIf an NMT dataset is artificially enlarged by splitting sequences up, should it still train for the same number of epochs?Good dataset for sentiment Analysis in Tickets for IT SupportDownsampling the dataset to create balanced dataset for neural modelsPrediction interval for very small datasetStructure the dataset for financial machine learningBest CNN architecture for binary classification of small images with a massive datasetHow to get the marker locations in the LINEMOD dataset?

Why didn't people conceal Tzaraat?

Why can't we play rap on piano?

How can I deal with my CEO asking me to hire someone with a higher salary than me, a co-founder?

How to properly check if the given string is empty in a POSIX shell script?

Does the Idaho Potato Commission associate potato skins with healthy eating?

Is it possible to create a QR code using text?

meaning of 腰を落としている

What is the fastest integer factorization to break RSA?

Am I breaking OOP practice with this architecture?

How would I stat a creature to be immune to everything but the Magic Missile spell? (just for fun)

Do Iron Man suits sport waste management systems?

How to travel to Japan while expressing milk?

How dangerous is XSS?

Detention in 1997

What exploit Are these user agents trying to use?

Can compressed videos be decoded back to their uncompresed original format?

In Bayesian inference, why are some terms dropped from the posterior predictive?

How does having to sign to support someone for elections fit with having a secret ballot?

Is this draw by repetition?

Is it "common practice in Fourier transform spectroscopy to multiply the measured interferogram by an apodizing function"? If so, why?

What exactly is ineptocracy?

Why do I get negative height?

Does the Cone of Cold spell freeze water?

Using "tail" to follow a file without displaying the most recent lines



Is filtering a dataset still a good option if the dataset is very small?



2019 Community Moderator ElectionWhat is the meaning of spherical dataset?How should classification be done for a very small data set?Prediction questions related to the datasetIf an NMT dataset is artificially enlarged by splitting sequences up, should it still train for the same number of epochs?Good dataset for sentiment Analysis in Tickets for IT SupportDownsampling the dataset to create balanced dataset for neural modelsPrediction interval for very small datasetStructure the dataset for financial machine learningBest CNN architecture for binary classification of small images with a massive datasetHow to get the marker locations in the LINEMOD dataset?










0












$begingroup$


Suppose I have a data set as follow:



var1 var2 ... varN test1 test2
x x ... x good v.good
x x ... x good bad
x x ... x meh bad
x x ... x good good
x x ... x v.bad bad
x x ... x bad bad
x x ... x meh good
x x ... x good good
x x ... x v.bad good
x x ... x good bad


test2 is a more sophisticated version of test1, I want to know what makes my test2 bad if my test1 has the value good. For that, I filtered this data to only include rows where test1 has the value good.
My dataset becomes:



var1 var2 ... varN test1 test2 Y
x x ... x good v.good 1
x x ... x good bad 0
x x ... x good bad 0
x x ... x good good 1
x x ... x good good 1
x x ... x good bad 0


I did this since it will allow me to know exactly what changes in var1, ..., varN makes the test go from good to bad when using logistic regression or some heuristic approach.



My question is: Does this still hold if we have, per say, a dataset of 100 observation and that filtration slices it in half?










share|improve this question









$endgroup$
















    0












    $begingroup$


    Suppose I have a data set as follow:



    var1 var2 ... varN test1 test2
    x x ... x good v.good
    x x ... x good bad
    x x ... x meh bad
    x x ... x good good
    x x ... x v.bad bad
    x x ... x bad bad
    x x ... x meh good
    x x ... x good good
    x x ... x v.bad good
    x x ... x good bad


    test2 is a more sophisticated version of test1, I want to know what makes my test2 bad if my test1 has the value good. For that, I filtered this data to only include rows where test1 has the value good.
    My dataset becomes:



    var1 var2 ... varN test1 test2 Y
    x x ... x good v.good 1
    x x ... x good bad 0
    x x ... x good bad 0
    x x ... x good good 1
    x x ... x good good 1
    x x ... x good bad 0


    I did this since it will allow me to know exactly what changes in var1, ..., varN makes the test go from good to bad when using logistic regression or some heuristic approach.



    My question is: Does this still hold if we have, per say, a dataset of 100 observation and that filtration slices it in half?










    share|improve this question









    $endgroup$














      0












      0








      0





      $begingroup$


      Suppose I have a data set as follow:



      var1 var2 ... varN test1 test2
      x x ... x good v.good
      x x ... x good bad
      x x ... x meh bad
      x x ... x good good
      x x ... x v.bad bad
      x x ... x bad bad
      x x ... x meh good
      x x ... x good good
      x x ... x v.bad good
      x x ... x good bad


      test2 is a more sophisticated version of test1, I want to know what makes my test2 bad if my test1 has the value good. For that, I filtered this data to only include rows where test1 has the value good.
      My dataset becomes:



      var1 var2 ... varN test1 test2 Y
      x x ... x good v.good 1
      x x ... x good bad 0
      x x ... x good bad 0
      x x ... x good good 1
      x x ... x good good 1
      x x ... x good bad 0


      I did this since it will allow me to know exactly what changes in var1, ..., varN makes the test go from good to bad when using logistic regression or some heuristic approach.



      My question is: Does this still hold if we have, per say, a dataset of 100 observation and that filtration slices it in half?










      share|improve this question









      $endgroup$




      Suppose I have a data set as follow:



      var1 var2 ... varN test1 test2
      x x ... x good v.good
      x x ... x good bad
      x x ... x meh bad
      x x ... x good good
      x x ... x v.bad bad
      x x ... x bad bad
      x x ... x meh good
      x x ... x good good
      x x ... x v.bad good
      x x ... x good bad


      test2 is a more sophisticated version of test1, I want to know what makes my test2 bad if my test1 has the value good. For that, I filtered this data to only include rows where test1 has the value good.
      My dataset becomes:



      var1 var2 ... varN test1 test2 Y
      x x ... x good v.good 1
      x x ... x good bad 0
      x x ... x good bad 0
      x x ... x good good 1
      x x ... x good good 1
      x x ... x good bad 0


      I did this since it will allow me to know exactly what changes in var1, ..., varN makes the test go from good to bad when using logistic regression or some heuristic approach.



      My question is: Does this still hold if we have, per say, a dataset of 100 observation and that filtration slices it in half?







      dataset feature-engineering feature-construction






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 26 at 17:08









      Mohamed NidabdellaMohamed Nidabdella

      11




      11




















          0






          active

          oldest

          votes












          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48048%2fis-filtering-a-dataset-still-a-good-option-if-the-dataset-is-very-small%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48048%2fis-filtering-a-dataset-still-a-good-option-if-the-dataset-is-very-small%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High