Is filtering a dataset still a good option if the dataset is very small?2019 Community Moderator ElectionWhat is the meaning of spherical dataset?How should classification be done for a very small data set?Prediction questions related to the datasetIf an NMT dataset is artificially enlarged by splitting sequences up, should it still train for the same number of epochs?Good dataset for sentiment Analysis in Tickets for IT SupportDownsampling the dataset to create balanced dataset for neural modelsPrediction interval for very small datasetStructure the dataset for financial machine learningBest CNN architecture for binary classification of small images with a massive datasetHow to get the marker locations in the LINEMOD dataset?

Why didn't people conceal Tzaraat?

Why can't we play rap on piano?

How can I deal with my CEO asking me to hire someone with a higher salary than me, a co-founder?

How to properly check if the given string is empty in a POSIX shell script?

Does the Idaho Potato Commission associate potato skins with healthy eating?

Is it possible to create a QR code using text?

meaning of 腰を落としている

What is the fastest integer factorization to break RSA?

Am I breaking OOP practice with this architecture?

How would I stat a creature to be immune to everything but the Magic Missile spell? (just for fun)

Do Iron Man suits sport waste management systems?

How to travel to Japan while expressing milk?

How dangerous is XSS?

Detention in 1997

What exploit Are these user agents trying to use?

Can compressed videos be decoded back to their uncompresed original format?

In Bayesian inference, why are some terms dropped from the posterior predictive?

How does having to sign to support someone for elections fit with having a secret ballot?

Is this draw by repetition?

Is it "common practice in Fourier transform spectroscopy to multiply the measured interferogram by an apodizing function"? If so, why?

What exactly is ineptocracy?

Why do I get negative height?

Does the Cone of Cold spell freeze water?

Using "tail" to follow a file without displaying the most recent lines



Is filtering a dataset still a good option if the dataset is very small?



2019 Community Moderator ElectionWhat is the meaning of spherical dataset?How should classification be done for a very small data set?Prediction questions related to the datasetIf an NMT dataset is artificially enlarged by splitting sequences up, should it still train for the same number of epochs?Good dataset for sentiment Analysis in Tickets for IT SupportDownsampling the dataset to create balanced dataset for neural modelsPrediction interval for very small datasetStructure the dataset for financial machine learningBest CNN architecture for binary classification of small images with a massive datasetHow to get the marker locations in the LINEMOD dataset?










0












$begingroup$


Suppose I have a data set as follow:



var1 var2 ... varN test1 test2
x x ... x good v.good
x x ... x good bad
x x ... x meh bad
x x ... x good good
x x ... x v.bad bad
x x ... x bad bad
x x ... x meh good
x x ... x good good
x x ... x v.bad good
x x ... x good bad


test2 is a more sophisticated version of test1, I want to know what makes my test2 bad if my test1 has the value good. For that, I filtered this data to only include rows where test1 has the value good.
My dataset becomes:



var1 var2 ... varN test1 test2 Y
x x ... x good v.good 1
x x ... x good bad 0
x x ... x good bad 0
x x ... x good good 1
x x ... x good good 1
x x ... x good bad 0


I did this since it will allow me to know exactly what changes in var1, ..., varN makes the test go from good to bad when using logistic regression or some heuristic approach.



My question is: Does this still hold if we have, per say, a dataset of 100 observation and that filtration slices it in half?










share|improve this question









$endgroup$
















    0












    $begingroup$


    Suppose I have a data set as follow:



    var1 var2 ... varN test1 test2
    x x ... x good v.good
    x x ... x good bad
    x x ... x meh bad
    x x ... x good good
    x x ... x v.bad bad
    x x ... x bad bad
    x x ... x meh good
    x x ... x good good
    x x ... x v.bad good
    x x ... x good bad


    test2 is a more sophisticated version of test1, I want to know what makes my test2 bad if my test1 has the value good. For that, I filtered this data to only include rows where test1 has the value good.
    My dataset becomes:



    var1 var2 ... varN test1 test2 Y
    x x ... x good v.good 1
    x x ... x good bad 0
    x x ... x good bad 0
    x x ... x good good 1
    x x ... x good good 1
    x x ... x good bad 0


    I did this since it will allow me to know exactly what changes in var1, ..., varN makes the test go from good to bad when using logistic regression or some heuristic approach.



    My question is: Does this still hold if we have, per say, a dataset of 100 observation and that filtration slices it in half?










    share|improve this question









    $endgroup$














      0












      0








      0





      $begingroup$


      Suppose I have a data set as follow:



      var1 var2 ... varN test1 test2
      x x ... x good v.good
      x x ... x good bad
      x x ... x meh bad
      x x ... x good good
      x x ... x v.bad bad
      x x ... x bad bad
      x x ... x meh good
      x x ... x good good
      x x ... x v.bad good
      x x ... x good bad


      test2 is a more sophisticated version of test1, I want to know what makes my test2 bad if my test1 has the value good. For that, I filtered this data to only include rows where test1 has the value good.
      My dataset becomes:



      var1 var2 ... varN test1 test2 Y
      x x ... x good v.good 1
      x x ... x good bad 0
      x x ... x good bad 0
      x x ... x good good 1
      x x ... x good good 1
      x x ... x good bad 0


      I did this since it will allow me to know exactly what changes in var1, ..., varN makes the test go from good to bad when using logistic regression or some heuristic approach.



      My question is: Does this still hold if we have, per say, a dataset of 100 observation and that filtration slices it in half?










      share|improve this question









      $endgroup$




      Suppose I have a data set as follow:



      var1 var2 ... varN test1 test2
      x x ... x good v.good
      x x ... x good bad
      x x ... x meh bad
      x x ... x good good
      x x ... x v.bad bad
      x x ... x bad bad
      x x ... x meh good
      x x ... x good good
      x x ... x v.bad good
      x x ... x good bad


      test2 is a more sophisticated version of test1, I want to know what makes my test2 bad if my test1 has the value good. For that, I filtered this data to only include rows where test1 has the value good.
      My dataset becomes:



      var1 var2 ... varN test1 test2 Y
      x x ... x good v.good 1
      x x ... x good bad 0
      x x ... x good bad 0
      x x ... x good good 1
      x x ... x good good 1
      x x ... x good bad 0


      I did this since it will allow me to know exactly what changes in var1, ..., varN makes the test go from good to bad when using logistic regression or some heuristic approach.



      My question is: Does this still hold if we have, per say, a dataset of 100 observation and that filtration slices it in half?







      dataset feature-engineering feature-construction






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 26 at 17:08









      Mohamed NidabdellaMohamed Nidabdella

      11




      11




















          0






          active

          oldest

          votes












          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48048%2fis-filtering-a-dataset-still-a-good-option-if-the-dataset-is-very-small%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48048%2fis-filtering-a-dataset-still-a-good-option-if-the-dataset-is-very-small%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Marja Vauras Lähteet | Aiheesta muualla | NavigointivalikkoMarja Vauras Turun yliopiston tutkimusportaalissaInfobox OKSuomalaisen Tiedeakatemian varsinaiset jäsenetKasvatustieteiden tiedekunnan dekaanit ja muu johtoMarja VaurasKoulutusvienti on kestävyys- ja ketteryyslaji (2.5.2017)laajentamallaWorldCat Identities0000 0001 0855 9405n86069603utb201588738523620927

          Which is better: GPT or RelGAN for text generation?2019 Community Moderator ElectionWhat is the difference between TextGAN and LM for text generation?GANs (generative adversarial networks) possible for text as well?Generator loss not decreasing- text to image synthesisChoosing a right algorithm for template-based text generationHow should I format input and output for text generation with LSTMsGumbel Softmax vs Vanilla Softmax for GAN trainingWhich neural network to choose for classification from text/speech?NLP text autoencoder that generates text in poetic meterWhat is the interpretation of the expectation notation in the GAN formulation?What is the difference between TextGAN and LM for text generation?How to prepare the data for text generation task

          Is this part of the description of the Archfey warlock's Misty Escape feature redundant?When is entropic ward considered “used”?How does the reaction timing work for Wrath of the Storm? Can it potentially prevent the damage from the triggering attack?Does the Dark Arts Archlich warlock patrons's Arcane Invisibility activate every time you cast a level 1+ spell?When attacking while invisible, when exactly does invisibility break?Can I cast Hellish Rebuke on my turn?Do I have to “pre-cast” a reaction spell in order for it to be triggered?What happens if a Player Misty Escapes into an Invisible CreatureCan a reaction interrupt multiattack?Does the Fiend-patron warlock's Hurl Through Hell feature dispel effects that require the target to be on the same plane as the caster?What are you allowed to do while using the Warlock's Eldritch Master feature?