How to transform session-based data into training data? The Next CEO of Stack Overflow2019 Community Moderator ElectionDifferent methods for clustering skills in textWhat algorithm to use for extracting information from bank statementsJob Recommendation EngineTranslating a business problem into a machine learning solution: job-adds websiteHow can I get a forecasting model to improve its forecasts over time instead of fitting to training data?Visitor's probability to purchase on eCommerce site, based on aggregate historic dataHow to prepare future data for trainingTraining dataset decreasing in quality (Google data science blog)I have payments data and I need suggestions from experts what can I doHow to re-train a model with new data in Marketing context

How to count occurrences of text in a file?

How many extra stops do monopods offer for tele photographs?

Prepend last line of stdin to entire stdin

Is there a difference between "Fahrstuhl" and "Aufzug"

Is a distribution that is normal, but highly skewed considered Gaussian?

If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?

No sign flipping while figuring out the emf of voltaic cell?

Why is my new battery behaving weirdly?

Writing differences on a blackboard

What benefits would be gained by using human laborers instead of drones in deep sea mining?

Does increasing your ability score affect your main stat?

What connection does MS Office have to Netscape Navigator?

Why do remote US companies require working in the US?

Running a General Election and the European Elections together

Math-accent symbol over parentheses enclosing accented symbol (amsmath)

Received an invoice from my ex-employer billing me for training; how to handle?

What is the value of α and β in a triangle?

How to invert MapIndexed on a ragged structure? How to construct a tree from rules?

Won the lottery - how do I keep the money?

If the updated MCAS software needs two AOA sensors, doesn't that introduce a new single point of failure?

How did people program for Consoles with multiple CPUs?

Where do students learn to solve polynomial equations these days?

Domestic-to-international connection at Orlando (MCO)

Is it professional to write unrelated content in an almost-empty email?



How to transform session-based data into training data?



The Next CEO of Stack Overflow
2019 Community Moderator ElectionDifferent methods for clustering skills in textWhat algorithm to use for extracting information from bank statementsJob Recommendation EngineTranslating a business problem into a machine learning solution: job-adds websiteHow can I get a forecasting model to improve its forecasts over time instead of fitting to training data?Visitor's probability to purchase on eCommerce site, based on aggregate historic dataHow to prepare future data for trainingTraining dataset decreasing in quality (Google data science blog)I have payments data and I need suggestions from experts what can I doHow to re-train a model with new data in Marketing context










0












$begingroup$


I am about to build a ML model for an e-commerce shop and would like to hear some thoughts/ideas from you before I start.



In short: the purpose of the model that I am about to build is to prevent payment fraud. It will be used right before the checkout and predicts whether or not a customer will pay for his/her order in the end. Based on this prediction certain payment methods (like invoice) may or may not be offered to the customer.



The work-flow will be as follows:
When the customer goes to the checkout for the first time he/she creates a session. Within this session the model will predict a risk-score everytime the customer goes through the checkout. This can happen several times as the customer can go back and make changes to his/her basket or such. Hence, a session consists of several checkout-model-calls.



Obviously, a label can only be calculated for sessions that become an order in the end (eg "customer did pay after x weeks").



My question is now: how to turn this session based data into the best-possible training data for my model?



For example, should I use all the checkout-model-calls of a session in combination with the label of the resulting order? Or should I only take the last checkout-model-call into account as it was responsible that the order was placed in the end?



If I take the session data from all checkout-model-calls I fear that training of a model will take a lot of time and the model quality might not be significantly better than if I would have only used the last checkout-model-call.



On the other hand, if I only use the last model-call of a session I might lose valuable information and might not be able to use certain features:
For example a count of the model-calls within a session. As the model would only be trained on the last call it might get confused for the first checkout-model-calls in the live-system as it has not seen this in the training data.



Also, how to make use of the sessions that didn't end up in an order? Or are these useless?



So, I hope that I was able to describe my case well enough and I am really curious about your ideas and opinions on this.



Thanks in advance!










share|improve this question











$endgroup$
















    0












    $begingroup$


    I am about to build a ML model for an e-commerce shop and would like to hear some thoughts/ideas from you before I start.



    In short: the purpose of the model that I am about to build is to prevent payment fraud. It will be used right before the checkout and predicts whether or not a customer will pay for his/her order in the end. Based on this prediction certain payment methods (like invoice) may or may not be offered to the customer.



    The work-flow will be as follows:
    When the customer goes to the checkout for the first time he/she creates a session. Within this session the model will predict a risk-score everytime the customer goes through the checkout. This can happen several times as the customer can go back and make changes to his/her basket or such. Hence, a session consists of several checkout-model-calls.



    Obviously, a label can only be calculated for sessions that become an order in the end (eg "customer did pay after x weeks").



    My question is now: how to turn this session based data into the best-possible training data for my model?



    For example, should I use all the checkout-model-calls of a session in combination with the label of the resulting order? Or should I only take the last checkout-model-call into account as it was responsible that the order was placed in the end?



    If I take the session data from all checkout-model-calls I fear that training of a model will take a lot of time and the model quality might not be significantly better than if I would have only used the last checkout-model-call.



    On the other hand, if I only use the last model-call of a session I might lose valuable information and might not be able to use certain features:
    For example a count of the model-calls within a session. As the model would only be trained on the last call it might get confused for the first checkout-model-calls in the live-system as it has not seen this in the training data.



    Also, how to make use of the sessions that didn't end up in an order? Or are these useless?



    So, I hope that I was able to describe my case well enough and I am really curious about your ideas and opinions on this.



    Thanks in advance!










    share|improve this question











    $endgroup$














      0












      0








      0





      $begingroup$


      I am about to build a ML model for an e-commerce shop and would like to hear some thoughts/ideas from you before I start.



      In short: the purpose of the model that I am about to build is to prevent payment fraud. It will be used right before the checkout and predicts whether or not a customer will pay for his/her order in the end. Based on this prediction certain payment methods (like invoice) may or may not be offered to the customer.



      The work-flow will be as follows:
      When the customer goes to the checkout for the first time he/she creates a session. Within this session the model will predict a risk-score everytime the customer goes through the checkout. This can happen several times as the customer can go back and make changes to his/her basket or such. Hence, a session consists of several checkout-model-calls.



      Obviously, a label can only be calculated for sessions that become an order in the end (eg "customer did pay after x weeks").



      My question is now: how to turn this session based data into the best-possible training data for my model?



      For example, should I use all the checkout-model-calls of a session in combination with the label of the resulting order? Or should I only take the last checkout-model-call into account as it was responsible that the order was placed in the end?



      If I take the session data from all checkout-model-calls I fear that training of a model will take a lot of time and the model quality might not be significantly better than if I would have only used the last checkout-model-call.



      On the other hand, if I only use the last model-call of a session I might lose valuable information and might not be able to use certain features:
      For example a count of the model-calls within a session. As the model would only be trained on the last call it might get confused for the first checkout-model-calls in the live-system as it has not seen this in the training data.



      Also, how to make use of the sessions that didn't end up in an order? Or are these useless?



      So, I hope that I was able to describe my case well enough and I am really curious about your ideas and opinions on this.



      Thanks in advance!










      share|improve this question











      $endgroup$




      I am about to build a ML model for an e-commerce shop and would like to hear some thoughts/ideas from you before I start.



      In short: the purpose of the model that I am about to build is to prevent payment fraud. It will be used right before the checkout and predicts whether or not a customer will pay for his/her order in the end. Based on this prediction certain payment methods (like invoice) may or may not be offered to the customer.



      The work-flow will be as follows:
      When the customer goes to the checkout for the first time he/she creates a session. Within this session the model will predict a risk-score everytime the customer goes through the checkout. This can happen several times as the customer can go back and make changes to his/her basket or such. Hence, a session consists of several checkout-model-calls.



      Obviously, a label can only be calculated for sessions that become an order in the end (eg "customer did pay after x weeks").



      My question is now: how to turn this session based data into the best-possible training data for my model?



      For example, should I use all the checkout-model-calls of a session in combination with the label of the resulting order? Or should I only take the last checkout-model-call into account as it was responsible that the order was placed in the end?



      If I take the session data from all checkout-model-calls I fear that training of a model will take a lot of time and the model quality might not be significantly better than if I would have only used the last checkout-model-call.



      On the other hand, if I only use the last model-call of a session I might lose valuable information and might not be able to use certain features:
      For example a count of the model-calls within a session. As the model would only be trained on the last call it might get confused for the first checkout-model-calls in the live-system as it has not seen this in the training data.



      Also, how to make use of the sessions that didn't end up in an order? Or are these useless?



      So, I hope that I was able to describe my case well enough and I am really curious about your ideas and opinions on this.



      Thanks in advance!







      machine-learning data training






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 23 at 15:25







      Siruphuhn

















      asked Mar 23 at 15:15









      SiruphuhnSiruphuhn

      11




      11




















          0






          active

          oldest

          votes












          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47849%2fhow-to-transform-session-based-data-into-training-data%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47849%2fhow-to-transform-session-based-data-into-training-data%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High