Random Forests Feature Selection on Time Series Data2019 Community Moderator ElectionFeature selection using feature importances in random forests with scikit-learnFeature selection for gene expression datasetFeature Selection for K Nearest Neighbour and Decision TreesOrange 3 - Feature selection / importanceDetermining Important Atrributes with Feature SelectionHow to use isolation forest from sklearn to return the positions of anomalies?Multiple time-series predictions with Random Forests (in Python)LSTM Feature selection processFeature selection for time series predictionMultivariate Time Series Binary Classification

A poker game description that does not feel gimmicky

Could a US political party gain complete control over the government by removing checks & balances?

Is there a name of the flying bionic bird?

What is the meaning of "of trouble" in the following sentence?

Patience, young "Padovan"

Can I legally use front facing blue light in the UK?

I’m planning on buying a laser printer but concerned about the life cycle of toner in the machine

How is it possible for user's password to be changed after storage was encrypted? (on OS X, Android)

Landlord wants to switch my lease to a "Land contract" to "get back at the city"

Is it legal to have the "// (c) 2019 John Smith" header in all files when there are hundreds of contributors?

Does the average primeness of natural numbers tend to zero?

Was there ever an axiom rendered a theorem?

Creating a loop after a break using Markov Chain in Tikz

Is there a familial term for apples and pears?

Why do UK politicians seemingly ignore opinion polls on Brexit?

Is ipsum/ipsa/ipse a third person pronoun, or can it serve other functions?

What are the motivations for publishing new editions of an existing textbook, beyond new discoveries in a field?

Need help identifying/translating a plaque in Tangier, Morocco

Is domain driven design an anti-SQL pattern?

Denied boarding due to overcrowding, Sparpreis ticket. What are my rights?

Are there any other methods to apply to solving simultaneous equations?

Is std::next for vector O(n) or O(1)?

I see my dog run

What do the Banks children have against barley water?



Random Forests Feature Selection on Time Series Data



2019 Community Moderator ElectionFeature selection using feature importances in random forests with scikit-learnFeature selection for gene expression datasetFeature Selection for K Nearest Neighbour and Decision TreesOrange 3 - Feature selection / importanceDetermining Important Atrributes with Feature SelectionHow to use isolation forest from sklearn to return the positions of anomalies?Multiple time-series predictions with Random Forests (in Python)LSTM Feature selection processFeature selection for time series predictionMultivariate Time Series Binary Classification










0












$begingroup$


I have a dataset with N amount of features, each one with 500 instances in time.



Let's say that I have for example, the features x, y, v_x, v_y, a_x, a_y, j_x, j_y. In one sample I have 500 instances (rows in a table), for each feature. In another sample, I got other 500 instances, and a class.



I'd like to select a subset of the features automatically with the Random Forests algorithm. The problem is that the algorithm (I'm using ScikitLearn, RandomForestClassifier), accepts a matrix (2D array) as X input, of size [N_samples, N_features]. If I give the array as it is, that is a vector (len 500) for the feature x, another (len 500) for the feature y, etc., I get a N_samples x N_features x 500 array, which is incompatible with the requirements of RandomForestClassifier.



I tried to unroll the matrix in a vector, like having so 500 x N_features array, but in that way, in the reduction, it considers all the elements independent feature, and breaks my structure.



How can I reduce the features (by selection) (possibly using this algorithm, but open to other libraries and/or algorithms) keeping the time instances consistent?



My goal is to do classification, so forecasting resources are limitedly useful to me. Also I have the requirement that each sample has those occurrences, and I don't have them as separate samples unfortunately.










share|improve this question









$endgroup$











  • $begingroup$
    Welcome to this site! If you want to treat 500 values per feature as "all or nothing", i.e. not breaking the structure, one way is to use the average for each feature thus reducing 500 to 1.
    $endgroup$
    – Esmailian
    Mar 28 at 22:16










  • $begingroup$
    But the features bring a semantic which kind of gets lost if I just do the average. But I tried a similar thing. I ran the DTW distance for each feature against the a feature-sequence of a target sample (avg of 3-4 target samples), where target's class is the one of active classes (in binarized comparison, one vs all other classes), and still no success. On the class I'm interested I get up to 0.50 precision and 1.00 recall, if I take out the difficult class out, less if I have it
    $endgroup$
    – user1714647
    Mar 28 at 22:59










  • $begingroup$
    Can you say something more about what kind of data this is?
    $endgroup$
    – jonnor
    2 days ago










  • $begingroup$
    What is the performance when you flatten the features? Sometimes it actually works fine, with a strong enough model and enough data...
    $endgroup$
    – jonnor
    2 days ago















0












$begingroup$


I have a dataset with N amount of features, each one with 500 instances in time.



Let's say that I have for example, the features x, y, v_x, v_y, a_x, a_y, j_x, j_y. In one sample I have 500 instances (rows in a table), for each feature. In another sample, I got other 500 instances, and a class.



I'd like to select a subset of the features automatically with the Random Forests algorithm. The problem is that the algorithm (I'm using ScikitLearn, RandomForestClassifier), accepts a matrix (2D array) as X input, of size [N_samples, N_features]. If I give the array as it is, that is a vector (len 500) for the feature x, another (len 500) for the feature y, etc., I get a N_samples x N_features x 500 array, which is incompatible with the requirements of RandomForestClassifier.



I tried to unroll the matrix in a vector, like having so 500 x N_features array, but in that way, in the reduction, it considers all the elements independent feature, and breaks my structure.



How can I reduce the features (by selection) (possibly using this algorithm, but open to other libraries and/or algorithms) keeping the time instances consistent?



My goal is to do classification, so forecasting resources are limitedly useful to me. Also I have the requirement that each sample has those occurrences, and I don't have them as separate samples unfortunately.










share|improve this question









$endgroup$











  • $begingroup$
    Welcome to this site! If you want to treat 500 values per feature as "all or nothing", i.e. not breaking the structure, one way is to use the average for each feature thus reducing 500 to 1.
    $endgroup$
    – Esmailian
    Mar 28 at 22:16










  • $begingroup$
    But the features bring a semantic which kind of gets lost if I just do the average. But I tried a similar thing. I ran the DTW distance for each feature against the a feature-sequence of a target sample (avg of 3-4 target samples), where target's class is the one of active classes (in binarized comparison, one vs all other classes), and still no success. On the class I'm interested I get up to 0.50 precision and 1.00 recall, if I take out the difficult class out, less if I have it
    $endgroup$
    – user1714647
    Mar 28 at 22:59










  • $begingroup$
    Can you say something more about what kind of data this is?
    $endgroup$
    – jonnor
    2 days ago










  • $begingroup$
    What is the performance when you flatten the features? Sometimes it actually works fine, with a strong enough model and enough data...
    $endgroup$
    – jonnor
    2 days ago













0












0








0





$begingroup$


I have a dataset with N amount of features, each one with 500 instances in time.



Let's say that I have for example, the features x, y, v_x, v_y, a_x, a_y, j_x, j_y. In one sample I have 500 instances (rows in a table), for each feature. In another sample, I got other 500 instances, and a class.



I'd like to select a subset of the features automatically with the Random Forests algorithm. The problem is that the algorithm (I'm using ScikitLearn, RandomForestClassifier), accepts a matrix (2D array) as X input, of size [N_samples, N_features]. If I give the array as it is, that is a vector (len 500) for the feature x, another (len 500) for the feature y, etc., I get a N_samples x N_features x 500 array, which is incompatible with the requirements of RandomForestClassifier.



I tried to unroll the matrix in a vector, like having so 500 x N_features array, but in that way, in the reduction, it considers all the elements independent feature, and breaks my structure.



How can I reduce the features (by selection) (possibly using this algorithm, but open to other libraries and/or algorithms) keeping the time instances consistent?



My goal is to do classification, so forecasting resources are limitedly useful to me. Also I have the requirement that each sample has those occurrences, and I don't have them as separate samples unfortunately.










share|improve this question









$endgroup$




I have a dataset with N amount of features, each one with 500 instances in time.



Let's say that I have for example, the features x, y, v_x, v_y, a_x, a_y, j_x, j_y. In one sample I have 500 instances (rows in a table), for each feature. In another sample, I got other 500 instances, and a class.



I'd like to select a subset of the features automatically with the Random Forests algorithm. The problem is that the algorithm (I'm using ScikitLearn, RandomForestClassifier), accepts a matrix (2D array) as X input, of size [N_samples, N_features]. If I give the array as it is, that is a vector (len 500) for the feature x, another (len 500) for the feature y, etc., I get a N_samples x N_features x 500 array, which is incompatible with the requirements of RandomForestClassifier.



I tried to unroll the matrix in a vector, like having so 500 x N_features array, but in that way, in the reduction, it considers all the elements independent feature, and breaks my structure.



How can I reduce the features (by selection) (possibly using this algorithm, but open to other libraries and/or algorithms) keeping the time instances consistent?



My goal is to do classification, so forecasting resources are limitedly useful to me. Also I have the requirement that each sample has those occurrences, and I don't have them as separate samples unfortunately.







python scikit-learn time-series feature-selection random-forest






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 28 at 22:04









user1714647user1714647

101




101











  • $begingroup$
    Welcome to this site! If you want to treat 500 values per feature as "all or nothing", i.e. not breaking the structure, one way is to use the average for each feature thus reducing 500 to 1.
    $endgroup$
    – Esmailian
    Mar 28 at 22:16










  • $begingroup$
    But the features bring a semantic which kind of gets lost if I just do the average. But I tried a similar thing. I ran the DTW distance for each feature against the a feature-sequence of a target sample (avg of 3-4 target samples), where target's class is the one of active classes (in binarized comparison, one vs all other classes), and still no success. On the class I'm interested I get up to 0.50 precision and 1.00 recall, if I take out the difficult class out, less if I have it
    $endgroup$
    – user1714647
    Mar 28 at 22:59










  • $begingroup$
    Can you say something more about what kind of data this is?
    $endgroup$
    – jonnor
    2 days ago










  • $begingroup$
    What is the performance when you flatten the features? Sometimes it actually works fine, with a strong enough model and enough data...
    $endgroup$
    – jonnor
    2 days ago
















  • $begingroup$
    Welcome to this site! If you want to treat 500 values per feature as "all or nothing", i.e. not breaking the structure, one way is to use the average for each feature thus reducing 500 to 1.
    $endgroup$
    – Esmailian
    Mar 28 at 22:16










  • $begingroup$
    But the features bring a semantic which kind of gets lost if I just do the average. But I tried a similar thing. I ran the DTW distance for each feature against the a feature-sequence of a target sample (avg of 3-4 target samples), where target's class is the one of active classes (in binarized comparison, one vs all other classes), and still no success. On the class I'm interested I get up to 0.50 precision and 1.00 recall, if I take out the difficult class out, less if I have it
    $endgroup$
    – user1714647
    Mar 28 at 22:59










  • $begingroup$
    Can you say something more about what kind of data this is?
    $endgroup$
    – jonnor
    2 days ago










  • $begingroup$
    What is the performance when you flatten the features? Sometimes it actually works fine, with a strong enough model and enough data...
    $endgroup$
    – jonnor
    2 days ago















$begingroup$
Welcome to this site! If you want to treat 500 values per feature as "all or nothing", i.e. not breaking the structure, one way is to use the average for each feature thus reducing 500 to 1.
$endgroup$
– Esmailian
Mar 28 at 22:16




$begingroup$
Welcome to this site! If you want to treat 500 values per feature as "all or nothing", i.e. not breaking the structure, one way is to use the average for each feature thus reducing 500 to 1.
$endgroup$
– Esmailian
Mar 28 at 22:16












$begingroup$
But the features bring a semantic which kind of gets lost if I just do the average. But I tried a similar thing. I ran the DTW distance for each feature against the a feature-sequence of a target sample (avg of 3-4 target samples), where target's class is the one of active classes (in binarized comparison, one vs all other classes), and still no success. On the class I'm interested I get up to 0.50 precision and 1.00 recall, if I take out the difficult class out, less if I have it
$endgroup$
– user1714647
Mar 28 at 22:59




$begingroup$
But the features bring a semantic which kind of gets lost if I just do the average. But I tried a similar thing. I ran the DTW distance for each feature against the a feature-sequence of a target sample (avg of 3-4 target samples), where target's class is the one of active classes (in binarized comparison, one vs all other classes), and still no success. On the class I'm interested I get up to 0.50 precision and 1.00 recall, if I take out the difficult class out, less if I have it
$endgroup$
– user1714647
Mar 28 at 22:59












$begingroup$
Can you say something more about what kind of data this is?
$endgroup$
– jonnor
2 days ago




$begingroup$
Can you say something more about what kind of data this is?
$endgroup$
– jonnor
2 days ago












$begingroup$
What is the performance when you flatten the features? Sometimes it actually works fine, with a strong enough model and enough data...
$endgroup$
– jonnor
2 days ago




$begingroup$
What is the performance when you flatten the features? Sometimes it actually works fine, with a strong enough model and enough data...
$endgroup$
– jonnor
2 days ago










2 Answers
2






active

oldest

votes


















0












$begingroup$

Some EDA might be needed to create new features for each time-series item. You might want to mine for patterns and have random forest reduce the overfitting. Exactly how mining is done depends on the nature of the problem, which might indicate for things like:



  • interesting time periods,

  • events that happen at a time,

  • time lag between different series,

  • dynamical systems,

  • latent variables,

  • scedasticity

Breiman's landmark paper on random forest gives some theoretical guarantees that random forest works well when individual classifiers are good and the correlation between these individuals are low. This can also be a heuristic to prune features.






share|improve this answer









$endgroup$




















    0












    $begingroup$

    If you want to preserve and utilize the 2D structure, use something like a Convolutional Neural Network. Feature selection can be done using L1 regularization. Otherwise you will have to do feature engineering outside the classifier.



    This 2D structure with one axis being time is quite similar to spectrograms used in audio, where CNNs are frequently applied. So check out literature on Acoustic Event Recognition and Acoustic Scene Classification for more details.






    share|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "557"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48183%2frandom-forests-feature-selection-on-time-series-data%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0












      $begingroup$

      Some EDA might be needed to create new features for each time-series item. You might want to mine for patterns and have random forest reduce the overfitting. Exactly how mining is done depends on the nature of the problem, which might indicate for things like:



      • interesting time periods,

      • events that happen at a time,

      • time lag between different series,

      • dynamical systems,

      • latent variables,

      • scedasticity

      Breiman's landmark paper on random forest gives some theoretical guarantees that random forest works well when individual classifiers are good and the correlation between these individuals are low. This can also be a heuristic to prune features.






      share|improve this answer









      $endgroup$

















        0












        $begingroup$

        Some EDA might be needed to create new features for each time-series item. You might want to mine for patterns and have random forest reduce the overfitting. Exactly how mining is done depends on the nature of the problem, which might indicate for things like:



        • interesting time periods,

        • events that happen at a time,

        • time lag between different series,

        • dynamical systems,

        • latent variables,

        • scedasticity

        Breiman's landmark paper on random forest gives some theoretical guarantees that random forest works well when individual classifiers are good and the correlation between these individuals are low. This can also be a heuristic to prune features.






        share|improve this answer









        $endgroup$















          0












          0








          0





          $begingroup$

          Some EDA might be needed to create new features for each time-series item. You might want to mine for patterns and have random forest reduce the overfitting. Exactly how mining is done depends on the nature of the problem, which might indicate for things like:



          • interesting time periods,

          • events that happen at a time,

          • time lag between different series,

          • dynamical systems,

          • latent variables,

          • scedasticity

          Breiman's landmark paper on random forest gives some theoretical guarantees that random forest works well when individual classifiers are good and the correlation between these individuals are low. This can also be a heuristic to prune features.






          share|improve this answer









          $endgroup$



          Some EDA might be needed to create new features for each time-series item. You might want to mine for patterns and have random forest reduce the overfitting. Exactly how mining is done depends on the nature of the problem, which might indicate for things like:



          • interesting time periods,

          • events that happen at a time,

          • time lag between different series,

          • dynamical systems,

          • latent variables,

          • scedasticity

          Breiman's landmark paper on random forest gives some theoretical guarantees that random forest works well when individual classifiers are good and the correlation between these individuals are low. This can also be a heuristic to prune features.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 29 at 3:38









          Yee Sern TanYee Sern Tan

          11




          11





















              0












              $begingroup$

              If you want to preserve and utilize the 2D structure, use something like a Convolutional Neural Network. Feature selection can be done using L1 regularization. Otherwise you will have to do feature engineering outside the classifier.



              This 2D structure with one axis being time is quite similar to spectrograms used in audio, where CNNs are frequently applied. So check out literature on Acoustic Event Recognition and Acoustic Scene Classification for more details.






              share|improve this answer









              $endgroup$

















                0












                $begingroup$

                If you want to preserve and utilize the 2D structure, use something like a Convolutional Neural Network. Feature selection can be done using L1 regularization. Otherwise you will have to do feature engineering outside the classifier.



                This 2D structure with one axis being time is quite similar to spectrograms used in audio, where CNNs are frequently applied. So check out literature on Acoustic Event Recognition and Acoustic Scene Classification for more details.






                share|improve this answer









                $endgroup$















                  0












                  0








                  0





                  $begingroup$

                  If you want to preserve and utilize the 2D structure, use something like a Convolutional Neural Network. Feature selection can be done using L1 regularization. Otherwise you will have to do feature engineering outside the classifier.



                  This 2D structure with one axis being time is quite similar to spectrograms used in audio, where CNNs are frequently applied. So check out literature on Acoustic Event Recognition and Acoustic Scene Classification for more details.






                  share|improve this answer









                  $endgroup$



                  If you want to preserve and utilize the 2D structure, use something like a Convolutional Neural Network. Feature selection can be done using L1 regularization. Otherwise you will have to do feature engineering outside the classifier.



                  This 2D structure with one axis being time is quite similar to spectrograms used in audio, where CNNs are frequently applied. So check out literature on Acoustic Event Recognition and Acoustic Scene Classification for more details.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 2 days ago









                  jonnorjonnor

                  2376




                  2376



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48183%2frandom-forests-feature-selection-on-time-series-data%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                      Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                      Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High