Is it OK to try to find the best PCA k parameter as we do with other hyperparameters?2019 Community Moderator ElectionWhy do we choose principal components based on maximum variance explained?What is the actual output of Principal Component Analysis?Genetic Algorithm to find best parameter values of an estimaorWhat is the difference between model hyperparameters and model parameters?Preprocessing to-be-predicted data in ML with R - “learn” and “apply” featuresDimensionality reduction with PCA limitationsPCA with Catagorical Variable in RSklearn PCA with zero components examplemultivariate clustering, dimensionality reduction and data scalling for regressionShow importance of variables from a data set without a response variable? Use PCA?Scale of the data after PCA

What does "Puller Prush Person" mean?

Mortgage Pre-approval / Loan - Apply Alone or with Fiancée?

Accidentally leaked the solution to an assignment, what to do now? (I'm the prof)

NMaximize is not converging to a solution

Horror movie about a virus at the prom; beginning and end are stylized as a cartoon

Client team has low performances and low technical skills: we always fix their work and now they stop collaborate with us. How to solve?

Is it inappropriate for a student to attend their mentor's dissertation defense?

dbcc cleantable batch size explanation

Mutually beneficial digestive system symbiotes

How to regain access to running applications after accidentally zapping X.org?

Why doesn't H₄O²⁺ exist?

Could an aircraft fly or hover using only jets of compressed air?

Important Resources for Dark Age Civilizations?

Do infinite dimensional systems make sense?

Can a Cauchy sequence converge for one metric while not converging for another?

Has there ever been an airliner design involving reducing generator load by installing solar panels?

Is it possible to run Internet Explorer on OS X El Capitan?

Is it unprofessional to ask if a job posting on GlassDoor is real?

Approximately how much travel time was saved by the opening of the Suez Canal in 1869?

Can you really stack all of this on an Opportunity Attack?

Are astronomers waiting to see something in an image from a gravitational lens that they've already seen in an adjacent image?

How much of data wrangling is a data scientist's job?

Why can't I see bouncing of switch on oscilloscope screen?

Perform and show arithmetic with LuaLaTeX



Is it OK to try to find the best PCA k parameter as we do with other hyperparameters?



2019 Community Moderator ElectionWhy do we choose principal components based on maximum variance explained?What is the actual output of Principal Component Analysis?Genetic Algorithm to find best parameter values of an estimaorWhat is the difference between model hyperparameters and model parameters?Preprocessing to-be-predicted data in ML with R - “learn” and “apply” featuresDimensionality reduction with PCA limitationsPCA with Catagorical Variable in RSklearn PCA with zero components examplemultivariate clustering, dimensionality reduction and data scalling for regressionShow importance of variables from a data set without a response variable? Use PCA?Scale of the data after PCA










7












$begingroup$


Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting dataset. A common goal is keeping variance between 90% and 99%.



My question is: is it considered a good practice to try different values of the k parameter (size of the resulting dataset's dimension) and then check the results of the resulting models against some cross-validation dataset in the same way as we do to pick good values of other hyperparameters like regularization lambdas and thresholds?










share|improve this question









$endgroup$
















    7












    $begingroup$


    Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting dataset. A common goal is keeping variance between 90% and 99%.



    My question is: is it considered a good practice to try different values of the k parameter (size of the resulting dataset's dimension) and then check the results of the resulting models against some cross-validation dataset in the same way as we do to pick good values of other hyperparameters like regularization lambdas and thresholds?










    share|improve this question









    $endgroup$














      7












      7








      7


      2



      $begingroup$


      Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting dataset. A common goal is keeping variance between 90% and 99%.



      My question is: is it considered a good practice to try different values of the k parameter (size of the resulting dataset's dimension) and then check the results of the resulting models against some cross-validation dataset in the same way as we do to pick good values of other hyperparameters like regularization lambdas and thresholds?










      share|improve this question









      $endgroup$




      Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting dataset. A common goal is keeping variance between 90% and 99%.



      My question is: is it considered a good practice to try different values of the k parameter (size of the resulting dataset's dimension) and then check the results of the resulting models against some cross-validation dataset in the same way as we do to pick good values of other hyperparameters like regularization lambdas and thresholds?







      machine-learning pca hyperparameter






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 27 at 18:58









      J. DoeJ. Doe

      361




      361




















          1 Answer
          1






          active

          oldest

          votes


















          5












          $begingroup$

          Your emphasis on using a validation set rather than the training set for selecting $k$ is a good practice and should be followed. However, we can do even better!



          The parameter $k$ in $textPCA$ is more special than a general hyper-parameter. Because, the solution to $textPCA(k)$ already exists in $textPCA(K)$, for $K > k$, which is the first $k$ Eigenvectors (corresponding to $k$ largest Eigenvalues) in $textPCA(K)$. Therefore, instead of running $textPCA(1)$, $textPCA(4)$, ..., $textPCA(K)$ separately on training data, as we do for a hyper-parameter in general, we only need to run $textPCA(K)$ to have the solution for all $k in 1,..,K$.



          As a result, the process would be as follows:



          1. Run $textPCA$ for the largest acceptable $K$ on training set,

          2. Plot, or prepare ($k$, variance) on validation set,

          3. Select the $k$ that gives the minimum acceptable variance, e.g. 90% or 99%.

          And, N-fold cross validation would be as follows:



          1. Run $textPCA$ for the largest acceptable $K$ on N training folds,

          2. Plot, or prepare ($k$, average of N variances) on held-out folds,

          3. Select the $k$ that gives the minimum acceptable average variance, e.g. 90% or 99%.

          Also, here is a related post that asks "why do we choose principal components based on maximum variance explained?".






          share|improve this answer











          $endgroup$












          • $begingroup$
            Is K-PCA the correct name for this? It sounds a bit confusing and remembers me of Kernel Principal Component Analysis (KPCA), which is a non-linear version of PCA
            $endgroup$
            – Pedro Henrique Monforte
            Mar 28 at 2:36










          • $begingroup$
            @PedroHenriqueMonforte Thanks! Notation updated.
            $endgroup$
            – Esmailian
            Mar 28 at 11:05











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48103%2fis-it-ok-to-try-to-find-the-best-pca-k-parameter-as-we-do-with-other-hyperparame%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          5












          $begingroup$

          Your emphasis on using a validation set rather than the training set for selecting $k$ is a good practice and should be followed. However, we can do even better!



          The parameter $k$ in $textPCA$ is more special than a general hyper-parameter. Because, the solution to $textPCA(k)$ already exists in $textPCA(K)$, for $K > k$, which is the first $k$ Eigenvectors (corresponding to $k$ largest Eigenvalues) in $textPCA(K)$. Therefore, instead of running $textPCA(1)$, $textPCA(4)$, ..., $textPCA(K)$ separately on training data, as we do for a hyper-parameter in general, we only need to run $textPCA(K)$ to have the solution for all $k in 1,..,K$.



          As a result, the process would be as follows:



          1. Run $textPCA$ for the largest acceptable $K$ on training set,

          2. Plot, or prepare ($k$, variance) on validation set,

          3. Select the $k$ that gives the minimum acceptable variance, e.g. 90% or 99%.

          And, N-fold cross validation would be as follows:



          1. Run $textPCA$ for the largest acceptable $K$ on N training folds,

          2. Plot, or prepare ($k$, average of N variances) on held-out folds,

          3. Select the $k$ that gives the minimum acceptable average variance, e.g. 90% or 99%.

          Also, here is a related post that asks "why do we choose principal components based on maximum variance explained?".






          share|improve this answer











          $endgroup$












          • $begingroup$
            Is K-PCA the correct name for this? It sounds a bit confusing and remembers me of Kernel Principal Component Analysis (KPCA), which is a non-linear version of PCA
            $endgroup$
            – Pedro Henrique Monforte
            Mar 28 at 2:36










          • $begingroup$
            @PedroHenriqueMonforte Thanks! Notation updated.
            $endgroup$
            – Esmailian
            Mar 28 at 11:05















          5












          $begingroup$

          Your emphasis on using a validation set rather than the training set for selecting $k$ is a good practice and should be followed. However, we can do even better!



          The parameter $k$ in $textPCA$ is more special than a general hyper-parameter. Because, the solution to $textPCA(k)$ already exists in $textPCA(K)$, for $K > k$, which is the first $k$ Eigenvectors (corresponding to $k$ largest Eigenvalues) in $textPCA(K)$. Therefore, instead of running $textPCA(1)$, $textPCA(4)$, ..., $textPCA(K)$ separately on training data, as we do for a hyper-parameter in general, we only need to run $textPCA(K)$ to have the solution for all $k in 1,..,K$.



          As a result, the process would be as follows:



          1. Run $textPCA$ for the largest acceptable $K$ on training set,

          2. Plot, or prepare ($k$, variance) on validation set,

          3. Select the $k$ that gives the minimum acceptable variance, e.g. 90% or 99%.

          And, N-fold cross validation would be as follows:



          1. Run $textPCA$ for the largest acceptable $K$ on N training folds,

          2. Plot, or prepare ($k$, average of N variances) on held-out folds,

          3. Select the $k$ that gives the minimum acceptable average variance, e.g. 90% or 99%.

          Also, here is a related post that asks "why do we choose principal components based on maximum variance explained?".






          share|improve this answer











          $endgroup$












          • $begingroup$
            Is K-PCA the correct name for this? It sounds a bit confusing and remembers me of Kernel Principal Component Analysis (KPCA), which is a non-linear version of PCA
            $endgroup$
            – Pedro Henrique Monforte
            Mar 28 at 2:36










          • $begingroup$
            @PedroHenriqueMonforte Thanks! Notation updated.
            $endgroup$
            – Esmailian
            Mar 28 at 11:05













          5












          5








          5





          $begingroup$

          Your emphasis on using a validation set rather than the training set for selecting $k$ is a good practice and should be followed. However, we can do even better!



          The parameter $k$ in $textPCA$ is more special than a general hyper-parameter. Because, the solution to $textPCA(k)$ already exists in $textPCA(K)$, for $K > k$, which is the first $k$ Eigenvectors (corresponding to $k$ largest Eigenvalues) in $textPCA(K)$. Therefore, instead of running $textPCA(1)$, $textPCA(4)$, ..., $textPCA(K)$ separately on training data, as we do for a hyper-parameter in general, we only need to run $textPCA(K)$ to have the solution for all $k in 1,..,K$.



          As a result, the process would be as follows:



          1. Run $textPCA$ for the largest acceptable $K$ on training set,

          2. Plot, or prepare ($k$, variance) on validation set,

          3. Select the $k$ that gives the minimum acceptable variance, e.g. 90% or 99%.

          And, N-fold cross validation would be as follows:



          1. Run $textPCA$ for the largest acceptable $K$ on N training folds,

          2. Plot, or prepare ($k$, average of N variances) on held-out folds,

          3. Select the $k$ that gives the minimum acceptable average variance, e.g. 90% or 99%.

          Also, here is a related post that asks "why do we choose principal components based on maximum variance explained?".






          share|improve this answer











          $endgroup$



          Your emphasis on using a validation set rather than the training set for selecting $k$ is a good practice and should be followed. However, we can do even better!



          The parameter $k$ in $textPCA$ is more special than a general hyper-parameter. Because, the solution to $textPCA(k)$ already exists in $textPCA(K)$, for $K > k$, which is the first $k$ Eigenvectors (corresponding to $k$ largest Eigenvalues) in $textPCA(K)$. Therefore, instead of running $textPCA(1)$, $textPCA(4)$, ..., $textPCA(K)$ separately on training data, as we do for a hyper-parameter in general, we only need to run $textPCA(K)$ to have the solution for all $k in 1,..,K$.



          As a result, the process would be as follows:



          1. Run $textPCA$ for the largest acceptable $K$ on training set,

          2. Plot, or prepare ($k$, variance) on validation set,

          3. Select the $k$ that gives the minimum acceptable variance, e.g. 90% or 99%.

          And, N-fold cross validation would be as follows:



          1. Run $textPCA$ for the largest acceptable $K$ on N training folds,

          2. Plot, or prepare ($k$, average of N variances) on held-out folds,

          3. Select the $k$ that gives the minimum acceptable average variance, e.g. 90% or 99%.

          Also, here is a related post that asks "why do we choose principal components based on maximum variance explained?".







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 30 at 22:06

























          answered Mar 27 at 20:01









          EsmailianEsmailian

          2,621318




          2,621318











          • $begingroup$
            Is K-PCA the correct name for this? It sounds a bit confusing and remembers me of Kernel Principal Component Analysis (KPCA), which is a non-linear version of PCA
            $endgroup$
            – Pedro Henrique Monforte
            Mar 28 at 2:36










          • $begingroup$
            @PedroHenriqueMonforte Thanks! Notation updated.
            $endgroup$
            – Esmailian
            Mar 28 at 11:05
















          • $begingroup$
            Is K-PCA the correct name for this? It sounds a bit confusing and remembers me of Kernel Principal Component Analysis (KPCA), which is a non-linear version of PCA
            $endgroup$
            – Pedro Henrique Monforte
            Mar 28 at 2:36










          • $begingroup$
            @PedroHenriqueMonforte Thanks! Notation updated.
            $endgroup$
            – Esmailian
            Mar 28 at 11:05















          $begingroup$
          Is K-PCA the correct name for this? It sounds a bit confusing and remembers me of Kernel Principal Component Analysis (KPCA), which is a non-linear version of PCA
          $endgroup$
          – Pedro Henrique Monforte
          Mar 28 at 2:36




          $begingroup$
          Is K-PCA the correct name for this? It sounds a bit confusing and remembers me of Kernel Principal Component Analysis (KPCA), which is a non-linear version of PCA
          $endgroup$
          – Pedro Henrique Monforte
          Mar 28 at 2:36












          $begingroup$
          @PedroHenriqueMonforte Thanks! Notation updated.
          $endgroup$
          – Esmailian
          Mar 28 at 11:05




          $begingroup$
          @PedroHenriqueMonforte Thanks! Notation updated.
          $endgroup$
          – Esmailian
          Mar 28 at 11:05

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48103%2fis-it-ok-to-try-to-find-the-best-pca-k-parameter-as-we-do-with-other-hyperparame%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

          Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?