How to transform session-based data into training data? The Next CEO of Stack Overflow2019 Community Moderator ElectionDifferent methods for clustering skills in textWhat algorithm to use for extracting information from bank statementsJob Recommendation EngineTranslating a business problem into a machine learning solution: job-adds websiteHow can I get a forecasting model to improve its forecasts over time instead of fitting to training data?Visitor's probability to purchase on eCommerce site, based on aggregate historic dataHow to prepare future data for trainingTraining dataset decreasing in quality (Google data science blog)I have payments data and I need suggestions from experts what can I doHow to re-train a model with new data in Marketing context
How to count occurrences of text in a file?
How many extra stops do monopods offer for tele photographs?
Prepend last line of stdin to entire stdin
Is there a difference between "Fahrstuhl" and "Aufzug"
Is a distribution that is normal, but highly skewed considered Gaussian?
If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?
No sign flipping while figuring out the emf of voltaic cell?
Why is my new battery behaving weirdly?
Writing differences on a blackboard
What benefits would be gained by using human laborers instead of drones in deep sea mining?
Does increasing your ability score affect your main stat?
What connection does MS Office have to Netscape Navigator?
Why do remote US companies require working in the US?
Running a General Election and the European Elections together
Math-accent symbol over parentheses enclosing accented symbol (amsmath)
Received an invoice from my ex-employer billing me for training; how to handle?
What is the value of α and β in a triangle?
How to invert MapIndexed on a ragged structure? How to construct a tree from rules?
Won the lottery - how do I keep the money?
If the updated MCAS software needs two AOA sensors, doesn't that introduce a new single point of failure?
How did people program for Consoles with multiple CPUs?
Where do students learn to solve polynomial equations these days?
Domestic-to-international connection at Orlando (MCO)
Is it professional to write unrelated content in an almost-empty email?
How to transform session-based data into training data?
The Next CEO of Stack Overflow2019 Community Moderator ElectionDifferent methods for clustering skills in textWhat algorithm to use for extracting information from bank statementsJob Recommendation EngineTranslating a business problem into a machine learning solution: job-adds websiteHow can I get a forecasting model to improve its forecasts over time instead of fitting to training data?Visitor's probability to purchase on eCommerce site, based on aggregate historic dataHow to prepare future data for trainingTraining dataset decreasing in quality (Google data science blog)I have payments data and I need suggestions from experts what can I doHow to re-train a model with new data in Marketing context
$begingroup$
I am about to build a ML model for an e-commerce shop and would like to hear some thoughts/ideas from you before I start.
In short: the purpose of the model that I am about to build is to prevent payment fraud. It will be used right before the checkout and predicts whether or not a customer will pay for his/her order in the end. Based on this prediction certain payment methods (like invoice) may or may not be offered to the customer.
The work-flow will be as follows:
When the customer goes to the checkout for the first time he/she creates a session. Within this session the model will predict a risk-score everytime the customer goes through the checkout. This can happen several times as the customer can go back and make changes to his/her basket or such. Hence, a session consists of several checkout-model-calls.
Obviously, a label can only be calculated for sessions that become an order in the end (eg "customer did pay after x weeks").
My question is now: how to turn this session based data into the best-possible training data for my model?
For example, should I use all the checkout-model-calls of a session in combination with the label of the resulting order? Or should I only take the last checkout-model-call into account as it was responsible that the order was placed in the end?
If I take the session data from all checkout-model-calls I fear that training of a model will take a lot of time and the model quality might not be significantly better than if I would have only used the last checkout-model-call.
On the other hand, if I only use the last model-call of a session I might lose valuable information and might not be able to use certain features:
For example a count of the model-calls within a session. As the model would only be trained on the last call it might get confused for the first checkout-model-calls in the live-system as it has not seen this in the training data.
Also, how to make use of the sessions that didn't end up in an order? Or are these useless?
So, I hope that I was able to describe my case well enough and I am really curious about your ideas and opinions on this.
Thanks in advance!
machine-learning data training
$endgroup$
add a comment |
$begingroup$
I am about to build a ML model for an e-commerce shop and would like to hear some thoughts/ideas from you before I start.
In short: the purpose of the model that I am about to build is to prevent payment fraud. It will be used right before the checkout and predicts whether or not a customer will pay for his/her order in the end. Based on this prediction certain payment methods (like invoice) may or may not be offered to the customer.
The work-flow will be as follows:
When the customer goes to the checkout for the first time he/she creates a session. Within this session the model will predict a risk-score everytime the customer goes through the checkout. This can happen several times as the customer can go back and make changes to his/her basket or such. Hence, a session consists of several checkout-model-calls.
Obviously, a label can only be calculated for sessions that become an order in the end (eg "customer did pay after x weeks").
My question is now: how to turn this session based data into the best-possible training data for my model?
For example, should I use all the checkout-model-calls of a session in combination with the label of the resulting order? Or should I only take the last checkout-model-call into account as it was responsible that the order was placed in the end?
If I take the session data from all checkout-model-calls I fear that training of a model will take a lot of time and the model quality might not be significantly better than if I would have only used the last checkout-model-call.
On the other hand, if I only use the last model-call of a session I might lose valuable information and might not be able to use certain features:
For example a count of the model-calls within a session. As the model would only be trained on the last call it might get confused for the first checkout-model-calls in the live-system as it has not seen this in the training data.
Also, how to make use of the sessions that didn't end up in an order? Or are these useless?
So, I hope that I was able to describe my case well enough and I am really curious about your ideas and opinions on this.
Thanks in advance!
machine-learning data training
$endgroup$
add a comment |
$begingroup$
I am about to build a ML model for an e-commerce shop and would like to hear some thoughts/ideas from you before I start.
In short: the purpose of the model that I am about to build is to prevent payment fraud. It will be used right before the checkout and predicts whether or not a customer will pay for his/her order in the end. Based on this prediction certain payment methods (like invoice) may or may not be offered to the customer.
The work-flow will be as follows:
When the customer goes to the checkout for the first time he/she creates a session. Within this session the model will predict a risk-score everytime the customer goes through the checkout. This can happen several times as the customer can go back and make changes to his/her basket or such. Hence, a session consists of several checkout-model-calls.
Obviously, a label can only be calculated for sessions that become an order in the end (eg "customer did pay after x weeks").
My question is now: how to turn this session based data into the best-possible training data for my model?
For example, should I use all the checkout-model-calls of a session in combination with the label of the resulting order? Or should I only take the last checkout-model-call into account as it was responsible that the order was placed in the end?
If I take the session data from all checkout-model-calls I fear that training of a model will take a lot of time and the model quality might not be significantly better than if I would have only used the last checkout-model-call.
On the other hand, if I only use the last model-call of a session I might lose valuable information and might not be able to use certain features:
For example a count of the model-calls within a session. As the model would only be trained on the last call it might get confused for the first checkout-model-calls in the live-system as it has not seen this in the training data.
Also, how to make use of the sessions that didn't end up in an order? Or are these useless?
So, I hope that I was able to describe my case well enough and I am really curious about your ideas and opinions on this.
Thanks in advance!
machine-learning data training
$endgroup$
I am about to build a ML model for an e-commerce shop and would like to hear some thoughts/ideas from you before I start.
In short: the purpose of the model that I am about to build is to prevent payment fraud. It will be used right before the checkout and predicts whether or not a customer will pay for his/her order in the end. Based on this prediction certain payment methods (like invoice) may or may not be offered to the customer.
The work-flow will be as follows:
When the customer goes to the checkout for the first time he/she creates a session. Within this session the model will predict a risk-score everytime the customer goes through the checkout. This can happen several times as the customer can go back and make changes to his/her basket or such. Hence, a session consists of several checkout-model-calls.
Obviously, a label can only be calculated for sessions that become an order in the end (eg "customer did pay after x weeks").
My question is now: how to turn this session based data into the best-possible training data for my model?
For example, should I use all the checkout-model-calls of a session in combination with the label of the resulting order? Or should I only take the last checkout-model-call into account as it was responsible that the order was placed in the end?
If I take the session data from all checkout-model-calls I fear that training of a model will take a lot of time and the model quality might not be significantly better than if I would have only used the last checkout-model-call.
On the other hand, if I only use the last model-call of a session I might lose valuable information and might not be able to use certain features:
For example a count of the model-calls within a session. As the model would only be trained on the last call it might get confused for the first checkout-model-calls in the live-system as it has not seen this in the training data.
Also, how to make use of the sessions that didn't end up in an order? Or are these useless?
So, I hope that I was able to describe my case well enough and I am really curious about your ideas and opinions on this.
Thanks in advance!
machine-learning data training
machine-learning data training
edited Mar 23 at 15:25
Siruphuhn
asked Mar 23 at 15:15
SiruphuhnSiruphuhn
11
11
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47849%2fhow-to-transform-session-based-data-into-training-data%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47849%2fhow-to-transform-session-based-data-into-training-data%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown