How to correctly apply the same data transformation , used on the training dataset , on real data in a webservice?2019 Community Moderator ElectionHow to generate training data for OCRHow is a single element of the training set called?How clustering is used in data management?How to use the same minmaxscaler used on the training data with new data?which deep learning text classifier is good for health dataHow to apply machine learning model to new datasetHow to apply StandardScaler and OneHotEncoder simultaneously in Spark Machine learning?What's the advantage of multi-gpu training in real?Manual feature engineering based on the outputHow to transform session-based data into training data?

How could indestructible materials be used in power generation?

Why does ы have a soft sign in it?

Why is Collection not simply treated as Collection<?>

Python: return float 1.0 as int 1 but float 1.5 as float 1.5

Expand and Contract

What is the difference between 仮定 and 想定?

Is it inappropriate for a student to attend their mentor's dissertation defense?

A category-like structure without composition?

What method can I use to design a dungeon difficult enough that the PCs can't make it through without killing them?

How do I write bicross product symbols in latex?

Alternative to sending password over mail?

A reference to a well-known characterization of scattered compact spaces

Is "remove commented out code" correct English?

Why doesn't using multiple commands with a || or && conditional work?

What is the intuition behind short exact sequences of groups; in particular, what is the intuition behind group extensions?

Is there a way of "bevelling" a single vertex?

Twin primes whose sum is a cube

Why doesn't H₄O²⁺ exist?

Anagram holiday

What does it mean to describe someone as a butt steak?

Why "Having chlorophyll without photosynthesis is actually very dangerous" and "like living with a bomb"?

What is the most common color to indicate the input-field is disabled?

What do you call someone who asks many questions?

How to say in German "enjoying home comforts"



How to correctly apply the same data transformation , used on the training dataset , on real data in a webservice?



2019 Community Moderator ElectionHow to generate training data for OCRHow is a single element of the training set called?How clustering is used in data management?How to use the same minmaxscaler used on the training data with new data?which deep learning text classifier is good for health dataHow to apply machine learning model to new datasetHow to apply StandardScaler and OneHotEncoder simultaneously in Spark Machine learning?What's the advantage of multi-gpu training in real?Manual feature engineering based on the outputHow to transform session-based data into training data?










2












$begingroup$


Let's say I used minmaxscaler while creating my model.
Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?










share|improve this question











$endgroup$
















    2












    $begingroup$


    Let's say I used minmaxscaler while creating my model.
    Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?










    share|improve this question











    $endgroup$














      2












      2








      2


      1



      $begingroup$


      Let's say I used minmaxscaler while creating my model.
      Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?










      share|improve this question











      $endgroup$




      Let's say I used minmaxscaler while creating my model.
      Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?







      machine-learning data






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 27 at 3:23









      Ethan

      671425




      671425










      asked Mar 26 at 13:52









      BlenzusBlenzus

      14610




      14610




















          2 Answers
          2






          active

          oldest

          votes


















          2












          $begingroup$

          Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.



          from sklearn.pipeline import Pipeline
          from sklearn.externals import joblib

          pipeline = Pipeline([
          ('normalization', MinMaxScaler()),
          ('classifier', RandomForestClassifier())
          ])

          joblib.dump(pipeline, 'transform_predict.joblib')


          You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:



           pipeline = load('transform_predict.joblib') 
          predictions = pipeline.predict(new_data)





          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thanks, this is what i was looking for
            $endgroup$
            – Blenzus
            Mar 26 at 14:43










          • $begingroup$
            Does this apply to dummy variables?
            $endgroup$
            – Blenzus
            Mar 26 at 14:55






          • 1




            $begingroup$
            If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
            $endgroup$
            – Dan Carter
            Mar 26 at 15:34


















          1












          $begingroup$

          You need to save minmaxscaler (along with model). In Flask app, you can :



          1. Load scaler from file

          2. Use this instance of scaler for scaling input values


          #While training



          from sklearn.externals import joblib
          scaler_filename = "saved_scaler"
          joblib.dump(scaler, scaler_filename)


          In Flask App



          scaler_filename = "saved_scaler" 
          scaler = joblib.load(scaler_filename)






          share|improve this answer









          $endgroup$












          • $begingroup$
            Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
            $endgroup$
            – Blenzus
            Mar 26 at 14:17






          • 1




            $begingroup$
            You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
            $endgroup$
            – Shamit Verma
            Mar 26 at 14:33











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48026%2fhow-to-correctly-apply-the-same-data-transformation-used-on-the-training-datas%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.



          from sklearn.pipeline import Pipeline
          from sklearn.externals import joblib

          pipeline = Pipeline([
          ('normalization', MinMaxScaler()),
          ('classifier', RandomForestClassifier())
          ])

          joblib.dump(pipeline, 'transform_predict.joblib')


          You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:



           pipeline = load('transform_predict.joblib') 
          predictions = pipeline.predict(new_data)





          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thanks, this is what i was looking for
            $endgroup$
            – Blenzus
            Mar 26 at 14:43










          • $begingroup$
            Does this apply to dummy variables?
            $endgroup$
            – Blenzus
            Mar 26 at 14:55






          • 1




            $begingroup$
            If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
            $endgroup$
            – Dan Carter
            Mar 26 at 15:34















          2












          $begingroup$

          Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.



          from sklearn.pipeline import Pipeline
          from sklearn.externals import joblib

          pipeline = Pipeline([
          ('normalization', MinMaxScaler()),
          ('classifier', RandomForestClassifier())
          ])

          joblib.dump(pipeline, 'transform_predict.joblib')


          You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:



           pipeline = load('transform_predict.joblib') 
          predictions = pipeline.predict(new_data)





          share|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Thanks, this is what i was looking for
            $endgroup$
            – Blenzus
            Mar 26 at 14:43










          • $begingroup$
            Does this apply to dummy variables?
            $endgroup$
            – Blenzus
            Mar 26 at 14:55






          • 1




            $begingroup$
            If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
            $endgroup$
            – Dan Carter
            Mar 26 at 15:34













          2












          2








          2





          $begingroup$

          Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.



          from sklearn.pipeline import Pipeline
          from sklearn.externals import joblib

          pipeline = Pipeline([
          ('normalization', MinMaxScaler()),
          ('classifier', RandomForestClassifier())
          ])

          joblib.dump(pipeline, 'transform_predict.joblib')


          You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:



           pipeline = load('transform_predict.joblib') 
          predictions = pipeline.predict(new_data)





          share|improve this answer











          $endgroup$



          Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.



          from sklearn.pipeline import Pipeline
          from sklearn.externals import joblib

          pipeline = Pipeline([
          ('normalization', MinMaxScaler()),
          ('classifier', RandomForestClassifier())
          ])

          joblib.dump(pipeline, 'transform_predict.joblib')


          You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:



           pipeline = load('transform_predict.joblib') 
          predictions = pipeline.predict(new_data)






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 26 at 14:45

























          answered Mar 26 at 14:40









          Dan CarterDan Carter

          8351218




          8351218







          • 1




            $begingroup$
            Thanks, this is what i was looking for
            $endgroup$
            – Blenzus
            Mar 26 at 14:43










          • $begingroup$
            Does this apply to dummy variables?
            $endgroup$
            – Blenzus
            Mar 26 at 14:55






          • 1




            $begingroup$
            If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
            $endgroup$
            – Dan Carter
            Mar 26 at 15:34












          • 1




            $begingroup$
            Thanks, this is what i was looking for
            $endgroup$
            – Blenzus
            Mar 26 at 14:43










          • $begingroup$
            Does this apply to dummy variables?
            $endgroup$
            – Blenzus
            Mar 26 at 14:55






          • 1




            $begingroup$
            If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
            $endgroup$
            – Dan Carter
            Mar 26 at 15:34







          1




          1




          $begingroup$
          Thanks, this is what i was looking for
          $endgroup$
          – Blenzus
          Mar 26 at 14:43




          $begingroup$
          Thanks, this is what i was looking for
          $endgroup$
          – Blenzus
          Mar 26 at 14:43












          $begingroup$
          Does this apply to dummy variables?
          $endgroup$
          – Blenzus
          Mar 26 at 14:55




          $begingroup$
          Does this apply to dummy variables?
          $endgroup$
          – Blenzus
          Mar 26 at 14:55




          1




          1




          $begingroup$
          If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
          $endgroup$
          – Dan Carter
          Mar 26 at 15:34




          $begingroup$
          If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
          $endgroup$
          – Dan Carter
          Mar 26 at 15:34











          1












          $begingroup$

          You need to save minmaxscaler (along with model). In Flask app, you can :



          1. Load scaler from file

          2. Use this instance of scaler for scaling input values


          #While training



          from sklearn.externals import joblib
          scaler_filename = "saved_scaler"
          joblib.dump(scaler, scaler_filename)


          In Flask App



          scaler_filename = "saved_scaler" 
          scaler = joblib.load(scaler_filename)






          share|improve this answer









          $endgroup$












          • $begingroup$
            Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
            $endgroup$
            – Blenzus
            Mar 26 at 14:17






          • 1




            $begingroup$
            You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
            $endgroup$
            – Shamit Verma
            Mar 26 at 14:33















          1












          $begingroup$

          You need to save minmaxscaler (along with model). In Flask app, you can :



          1. Load scaler from file

          2. Use this instance of scaler for scaling input values


          #While training



          from sklearn.externals import joblib
          scaler_filename = "saved_scaler"
          joblib.dump(scaler, scaler_filename)


          In Flask App



          scaler_filename = "saved_scaler" 
          scaler = joblib.load(scaler_filename)






          share|improve this answer









          $endgroup$












          • $begingroup$
            Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
            $endgroup$
            – Blenzus
            Mar 26 at 14:17






          • 1




            $begingroup$
            You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
            $endgroup$
            – Shamit Verma
            Mar 26 at 14:33













          1












          1








          1





          $begingroup$

          You need to save minmaxscaler (along with model). In Flask app, you can :



          1. Load scaler from file

          2. Use this instance of scaler for scaling input values


          #While training



          from sklearn.externals import joblib
          scaler_filename = "saved_scaler"
          joblib.dump(scaler, scaler_filename)


          In Flask App



          scaler_filename = "saved_scaler" 
          scaler = joblib.load(scaler_filename)






          share|improve this answer









          $endgroup$



          You need to save minmaxscaler (along with model). In Flask app, you can :



          1. Load scaler from file

          2. Use this instance of scaler for scaling input values


          #While training



          from sklearn.externals import joblib
          scaler_filename = "saved_scaler"
          joblib.dump(scaler, scaler_filename)


          In Flask App



          scaler_filename = "saved_scaler" 
          scaler = joblib.load(scaler_filename)







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 26 at 14:08









          Shamit VermaShamit Verma

          1,3191214




          1,3191214











          • $begingroup$
            Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
            $endgroup$
            – Blenzus
            Mar 26 at 14:17






          • 1




            $begingroup$
            You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
            $endgroup$
            – Shamit Verma
            Mar 26 at 14:33
















          • $begingroup$
            Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
            $endgroup$
            – Blenzus
            Mar 26 at 14:17






          • 1




            $begingroup$
            You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
            $endgroup$
            – Shamit Verma
            Mar 26 at 14:33















          $begingroup$
          Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
          $endgroup$
          – Blenzus
          Mar 26 at 14:17




          $begingroup$
          Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
          $endgroup$
          – Blenzus
          Mar 26 at 14:17




          1




          1




          $begingroup$
          You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
          $endgroup$
          – Shamit Verma
          Mar 26 at 14:33




          $begingroup$
          You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
          $endgroup$
          – Shamit Verma
          Mar 26 at 14:33

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48026%2fhow-to-correctly-apply-the-same-data-transformation-used-on-the-training-datas%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

          Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?