How important is the input data for a ML model?Predicting Soccer: guessing which matches a model will predict correctlyBest regression model to use for sales predictionPrediction model for marketing to prospective customers (using pandas)Python: How to make model predict in a generalized manner using ML AlgorithmPython: Handling imbalance Classes in python Machine LearningDecision Tree generating leaves for only one caseLogistic regression on biased datamodel to predict annual outcome based on previous years dataMachine Learning in real timeWhat Machine Learning Algorithm could I use to determine some measure in a date?
Does an advisor owe his/her student anything? Will an advisor keep a PhD student only out of pity?
Multiplicative persistence
A variation to the phrase "hanging over my shoulders"
What is going on with gets(stdin) on the site coderbyte?
Doesn't the system of the Supreme Court oppose justice?
Do we have to expect a queue for the shuttle from Watford Junction to Harry Potter Studio?
Can you use Vicious Mockery to win an argument or gain favours?
A Trivial Diagnosis
Does Doodling or Improvising on the Piano Have Any Benefits?
Make a Bowl of Alphabet Soup
Shouldn’t conservatives embrace universal basic income?
What's the name of the logical fallacy where a debater extends a statement far beyond the original statement to make it true?
Is this part of the description of the Archfey warlock's Misty Escape feature redundant?
15% tax on $7.5k earnings. Is that right?
"before" and "want" for the same systemd service?
Review your own paper in Mathematics
Why is it that I can sometimes guess the next note?
What kind of floor tile is this?
Merge org tables
Does "he squandered his car on drink" sound natural?
How to draw a matrix with arrows in limited space
Giving feedback to someone without sounding prejudiced
How do you make your own symbol when Detexify fails?
The IT department bottlenecks progress, how should I handle this?
How important is the input data for a ML model?
Predicting Soccer: guessing which matches a model will predict correctlyBest regression model to use for sales predictionPrediction model for marketing to prospective customers (using pandas)Python: How to make model predict in a generalized manner using ML AlgorithmPython: Handling imbalance Classes in python Machine LearningDecision Tree generating leaves for only one caseLogistic regression on biased datamodel to predict annual outcome based on previous years dataMachine Learning in real timeWhat Machine Learning Algorithm could I use to determine some measure in a date?
$begingroup$
Last 4-6 weeks, I have been learning and working for the first time on ML. Reading blogs, articles, documentations, etc. and practising. Have asked lot of questions here on Stack Overflow as well.
While I have got some amount of hands-on experience, but still got a very basic doubt (confusion) --
When I take my input data set with 1000 records, the model prediction accuracy is say 75%. When I keep 50000 records, the model accuracy is 65%.
1) Does that mean the model responds completely based on the i/p data being fed into?
2) If #1 is true, then in real-world where we don't have control on input data, how will the model work?
Ex. For suggesting products to a customer, the input data to the model would be the past customer buying experiences. As the quantity of input data increases, the prediction accuracy will increase or decrease?
Please let me know if I need to add further details to my question.
Thanks.
Edit - 1 - Below added frequency distribution of my input data:
Edit - 2 - Adding Confusion matrix and Classification report:
machine-learning predictive-modeling machine-learning-model
$endgroup$
|
show 1 more comment
$begingroup$
Last 4-6 weeks, I have been learning and working for the first time on ML. Reading blogs, articles, documentations, etc. and practising. Have asked lot of questions here on Stack Overflow as well.
While I have got some amount of hands-on experience, but still got a very basic doubt (confusion) --
When I take my input data set with 1000 records, the model prediction accuracy is say 75%. When I keep 50000 records, the model accuracy is 65%.
1) Does that mean the model responds completely based on the i/p data being fed into?
2) If #1 is true, then in real-world where we don't have control on input data, how will the model work?
Ex. For suggesting products to a customer, the input data to the model would be the past customer buying experiences. As the quantity of input data increases, the prediction accuracy will increase or decrease?
Please let me know if I need to add further details to my question.
Thanks.
Edit - 1 - Below added frequency distribution of my input data:
Edit - 2 - Adding Confusion matrix and Classification report:
machine-learning predictive-modeling machine-learning-model
$endgroup$
$begingroup$
It looks like your model overfits did you try to do a train/test split?
$endgroup$
– Robin Nicole
Mar 18 at 23:00
$begingroup$
Thanks Robin. Yes, I've have a 75/25 split. Just out of curiosity, may I ask what hint made you think that the model overfits? ps. Added frequency distribution of my input data in the question.
$endgroup$
– ranit.b
Mar 18 at 23:07
$begingroup$
So I guess it is your test accuracy which decreases. If your training accuracy keeps on increasing but your test accuracy decreases it meanss your model is overfitting.
$endgroup$
– Robin Nicole
Mar 18 at 23:15
$begingroup$
Would you please consider moving this as a Comment to the original question? It works much better that way than a standalone answer.
$endgroup$
– I_Play_With_Data
2 days ago
$begingroup$
Just added: Confusion Matrix and Classification Report. Do you think this is a case of class imbalance ? (where the output classes with less training data are never predicted)
$endgroup$
– ranit.b
2 days ago
|
show 1 more comment
$begingroup$
Last 4-6 weeks, I have been learning and working for the first time on ML. Reading blogs, articles, documentations, etc. and practising. Have asked lot of questions here on Stack Overflow as well.
While I have got some amount of hands-on experience, but still got a very basic doubt (confusion) --
When I take my input data set with 1000 records, the model prediction accuracy is say 75%. When I keep 50000 records, the model accuracy is 65%.
1) Does that mean the model responds completely based on the i/p data being fed into?
2) If #1 is true, then in real-world where we don't have control on input data, how will the model work?
Ex. For suggesting products to a customer, the input data to the model would be the past customer buying experiences. As the quantity of input data increases, the prediction accuracy will increase or decrease?
Please let me know if I need to add further details to my question.
Thanks.
Edit - 1 - Below added frequency distribution of my input data:
Edit - 2 - Adding Confusion matrix and Classification report:
machine-learning predictive-modeling machine-learning-model
$endgroup$
Last 4-6 weeks, I have been learning and working for the first time on ML. Reading blogs, articles, documentations, etc. and practising. Have asked lot of questions here on Stack Overflow as well.
While I have got some amount of hands-on experience, but still got a very basic doubt (confusion) --
When I take my input data set with 1000 records, the model prediction accuracy is say 75%. When I keep 50000 records, the model accuracy is 65%.
1) Does that mean the model responds completely based on the i/p data being fed into?
2) If #1 is true, then in real-world where we don't have control on input data, how will the model work?
Ex. For suggesting products to a customer, the input data to the model would be the past customer buying experiences. As the quantity of input data increases, the prediction accuracy will increase or decrease?
Please let me know if I need to add further details to my question.
Thanks.
Edit - 1 - Below added frequency distribution of my input data:
Edit - 2 - Adding Confusion matrix and Classification report:
machine-learning predictive-modeling machine-learning-model
machine-learning predictive-modeling machine-learning-model
edited 2 days ago
ranit.b
asked Mar 18 at 22:31
ranit.branit.b
547
547
$begingroup$
It looks like your model overfits did you try to do a train/test split?
$endgroup$
– Robin Nicole
Mar 18 at 23:00
$begingroup$
Thanks Robin. Yes, I've have a 75/25 split. Just out of curiosity, may I ask what hint made you think that the model overfits? ps. Added frequency distribution of my input data in the question.
$endgroup$
– ranit.b
Mar 18 at 23:07
$begingroup$
So I guess it is your test accuracy which decreases. If your training accuracy keeps on increasing but your test accuracy decreases it meanss your model is overfitting.
$endgroup$
– Robin Nicole
Mar 18 at 23:15
$begingroup$
Would you please consider moving this as a Comment to the original question? It works much better that way than a standalone answer.
$endgroup$
– I_Play_With_Data
2 days ago
$begingroup$
Just added: Confusion Matrix and Classification Report. Do you think this is a case of class imbalance ? (where the output classes with less training data are never predicted)
$endgroup$
– ranit.b
2 days ago
|
show 1 more comment
$begingroup$
It looks like your model overfits did you try to do a train/test split?
$endgroup$
– Robin Nicole
Mar 18 at 23:00
$begingroup$
Thanks Robin. Yes, I've have a 75/25 split. Just out of curiosity, may I ask what hint made you think that the model overfits? ps. Added frequency distribution of my input data in the question.
$endgroup$
– ranit.b
Mar 18 at 23:07
$begingroup$
So I guess it is your test accuracy which decreases. If your training accuracy keeps on increasing but your test accuracy decreases it meanss your model is overfitting.
$endgroup$
– Robin Nicole
Mar 18 at 23:15
$begingroup$
Would you please consider moving this as a Comment to the original question? It works much better that way than a standalone answer.
$endgroup$
– I_Play_With_Data
2 days ago
$begingroup$
Just added: Confusion Matrix and Classification Report. Do you think this is a case of class imbalance ? (where the output classes with less training data are never predicted)
$endgroup$
– ranit.b
2 days ago
$begingroup$
It looks like your model overfits did you try to do a train/test split?
$endgroup$
– Robin Nicole
Mar 18 at 23:00
$begingroup$
It looks like your model overfits did you try to do a train/test split?
$endgroup$
– Robin Nicole
Mar 18 at 23:00
$begingroup$
Thanks Robin. Yes, I've have a 75/25 split. Just out of curiosity, may I ask what hint made you think that the model overfits? ps. Added frequency distribution of my input data in the question.
$endgroup$
– ranit.b
Mar 18 at 23:07
$begingroup$
Thanks Robin. Yes, I've have a 75/25 split. Just out of curiosity, may I ask what hint made you think that the model overfits? ps. Added frequency distribution of my input data in the question.
$endgroup$
– ranit.b
Mar 18 at 23:07
$begingroup$
So I guess it is your test accuracy which decreases. If your training accuracy keeps on increasing but your test accuracy decreases it meanss your model is overfitting.
$endgroup$
– Robin Nicole
Mar 18 at 23:15
$begingroup$
So I guess it is your test accuracy which decreases. If your training accuracy keeps on increasing but your test accuracy decreases it meanss your model is overfitting.
$endgroup$
– Robin Nicole
Mar 18 at 23:15
$begingroup$
Would you please consider moving this as a Comment to the original question? It works much better that way than a standalone answer.
$endgroup$
– I_Play_With_Data
2 days ago
$begingroup$
Would you please consider moving this as a Comment to the original question? It works much better that way than a standalone answer.
$endgroup$
– I_Play_With_Data
2 days ago
$begingroup$
Just added: Confusion Matrix and Classification Report. Do you think this is a case of class imbalance ? (where the output classes with less training data are never predicted)
$endgroup$
– ranit.b
2 days ago
$begingroup$
Just added: Confusion Matrix and Classification Report. Do you think this is a case of class imbalance ? (where the output classes with less training data are never predicted)
$endgroup$
– ranit.b
2 days ago
|
show 1 more comment
2 Answers
2
active
oldest
votes
$begingroup$
To answer your first question, the accuracy of the model highly depends on the "quality" of the input data. Basically, your training data should represent the same scenario as that of the final model deployment environment.
There are two possible reasons why the scenario you mentioned is happening,
When you added more data, maybe there is no good relationship between input features and label for the new examples. It is always said that less and clean data is better than large and messy data.
If 49000 records added afterward are from the same set(i.e. have a good relationship between label and features) as that of 1000 before, there are again two possible reasons
A. If accuracy on the train dataset is small along with test dataset. e.g. training accuracy is 70% and test accuracy is 65%, then you are underfitting data. Model is very complex and dataset is small in terms of the number of examples.
B. If your training accuracy is near 100% and test accuracy is 65%, you are overfitting data. Model is complex, so you should go with some simple algorithm.
NOTE* Since you haven't mentioned about training accuracy, it is difficult to say what out of the two above is happening.
Now coming to your second question about real-world deployment. There is something called model staleness over time which is basically the problem of reducing model accuracy over time. This is the article by a product manager at Google explaining the staleness problem and how it can be solved. This will answer your second question.
Let me know if something is not clear.
New contributor
$endgroup$
$begingroup$
Many thanks, Sagar. Great piece of advice. Let me ponder my model from your viewpoint. And, will go through the article you posted bit later tonight.
$endgroup$
– ranit.b
2 days ago
add a comment |
$begingroup$
There is a false myth that more data means better classification. The model also needs to build up in its complexity otherwise, the model is just overfitting on the data.
Taking only a few random samples from the data is the best strategy to train models, rather than inputting every bit of data that we can find.
$endgroup$
1
$begingroup$
Thanks for sharing your thoughts. I guess, I've to understand the overfitting concept more properly.
$endgroup$
– ranit.b
2 days ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47565%2fhow-important-is-the-input-data-for-a-ml-model%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
To answer your first question, the accuracy of the model highly depends on the "quality" of the input data. Basically, your training data should represent the same scenario as that of the final model deployment environment.
There are two possible reasons why the scenario you mentioned is happening,
When you added more data, maybe there is no good relationship between input features and label for the new examples. It is always said that less and clean data is better than large and messy data.
If 49000 records added afterward are from the same set(i.e. have a good relationship between label and features) as that of 1000 before, there are again two possible reasons
A. If accuracy on the train dataset is small along with test dataset. e.g. training accuracy is 70% and test accuracy is 65%, then you are underfitting data. Model is very complex and dataset is small in terms of the number of examples.
B. If your training accuracy is near 100% and test accuracy is 65%, you are overfitting data. Model is complex, so you should go with some simple algorithm.
NOTE* Since you haven't mentioned about training accuracy, it is difficult to say what out of the two above is happening.
Now coming to your second question about real-world deployment. There is something called model staleness over time which is basically the problem of reducing model accuracy over time. This is the article by a product manager at Google explaining the staleness problem and how it can be solved. This will answer your second question.
Let me know if something is not clear.
New contributor
$endgroup$
$begingroup$
Many thanks, Sagar. Great piece of advice. Let me ponder my model from your viewpoint. And, will go through the article you posted bit later tonight.
$endgroup$
– ranit.b
2 days ago
add a comment |
$begingroup$
To answer your first question, the accuracy of the model highly depends on the "quality" of the input data. Basically, your training data should represent the same scenario as that of the final model deployment environment.
There are two possible reasons why the scenario you mentioned is happening,
When you added more data, maybe there is no good relationship between input features and label for the new examples. It is always said that less and clean data is better than large and messy data.
If 49000 records added afterward are from the same set(i.e. have a good relationship between label and features) as that of 1000 before, there are again two possible reasons
A. If accuracy on the train dataset is small along with test dataset. e.g. training accuracy is 70% and test accuracy is 65%, then you are underfitting data. Model is very complex and dataset is small in terms of the number of examples.
B. If your training accuracy is near 100% and test accuracy is 65%, you are overfitting data. Model is complex, so you should go with some simple algorithm.
NOTE* Since you haven't mentioned about training accuracy, it is difficult to say what out of the two above is happening.
Now coming to your second question about real-world deployment. There is something called model staleness over time which is basically the problem of reducing model accuracy over time. This is the article by a product manager at Google explaining the staleness problem and how it can be solved. This will answer your second question.
Let me know if something is not clear.
New contributor
$endgroup$
$begingroup$
Many thanks, Sagar. Great piece of advice. Let me ponder my model from your viewpoint. And, will go through the article you posted bit later tonight.
$endgroup$
– ranit.b
2 days ago
add a comment |
$begingroup$
To answer your first question, the accuracy of the model highly depends on the "quality" of the input data. Basically, your training data should represent the same scenario as that of the final model deployment environment.
There are two possible reasons why the scenario you mentioned is happening,
When you added more data, maybe there is no good relationship between input features and label for the new examples. It is always said that less and clean data is better than large and messy data.
If 49000 records added afterward are from the same set(i.e. have a good relationship between label and features) as that of 1000 before, there are again two possible reasons
A. If accuracy on the train dataset is small along with test dataset. e.g. training accuracy is 70% and test accuracy is 65%, then you are underfitting data. Model is very complex and dataset is small in terms of the number of examples.
B. If your training accuracy is near 100% and test accuracy is 65%, you are overfitting data. Model is complex, so you should go with some simple algorithm.
NOTE* Since you haven't mentioned about training accuracy, it is difficult to say what out of the two above is happening.
Now coming to your second question about real-world deployment. There is something called model staleness over time which is basically the problem of reducing model accuracy over time. This is the article by a product manager at Google explaining the staleness problem and how it can be solved. This will answer your second question.
Let me know if something is not clear.
New contributor
$endgroup$
To answer your first question, the accuracy of the model highly depends on the "quality" of the input data. Basically, your training data should represent the same scenario as that of the final model deployment environment.
There are two possible reasons why the scenario you mentioned is happening,
When you added more data, maybe there is no good relationship between input features and label for the new examples. It is always said that less and clean data is better than large and messy data.
If 49000 records added afterward are from the same set(i.e. have a good relationship between label and features) as that of 1000 before, there are again two possible reasons
A. If accuracy on the train dataset is small along with test dataset. e.g. training accuracy is 70% and test accuracy is 65%, then you are underfitting data. Model is very complex and dataset is small in terms of the number of examples.
B. If your training accuracy is near 100% and test accuracy is 65%, you are overfitting data. Model is complex, so you should go with some simple algorithm.
NOTE* Since you haven't mentioned about training accuracy, it is difficult to say what out of the two above is happening.
Now coming to your second question about real-world deployment. There is something called model staleness over time which is basically the problem of reducing model accuracy over time. This is the article by a product manager at Google explaining the staleness problem and how it can be solved. This will answer your second question.
Let me know if something is not clear.
New contributor
edited 2 days ago
New contributor
answered 2 days ago
Sagar ShelkeSagar Shelke
362
362
New contributor
New contributor
$begingroup$
Many thanks, Sagar. Great piece of advice. Let me ponder my model from your viewpoint. And, will go through the article you posted bit later tonight.
$endgroup$
– ranit.b
2 days ago
add a comment |
$begingroup$
Many thanks, Sagar. Great piece of advice. Let me ponder my model from your viewpoint. And, will go through the article you posted bit later tonight.
$endgroup$
– ranit.b
2 days ago
$begingroup$
Many thanks, Sagar. Great piece of advice. Let me ponder my model from your viewpoint. And, will go through the article you posted bit later tonight.
$endgroup$
– ranit.b
2 days ago
$begingroup$
Many thanks, Sagar. Great piece of advice. Let me ponder my model from your viewpoint. And, will go through the article you posted bit later tonight.
$endgroup$
– ranit.b
2 days ago
add a comment |
$begingroup$
There is a false myth that more data means better classification. The model also needs to build up in its complexity otherwise, the model is just overfitting on the data.
Taking only a few random samples from the data is the best strategy to train models, rather than inputting every bit of data that we can find.
$endgroup$
1
$begingroup$
Thanks for sharing your thoughts. I guess, I've to understand the overfitting concept more properly.
$endgroup$
– ranit.b
2 days ago
add a comment |
$begingroup$
There is a false myth that more data means better classification. The model also needs to build up in its complexity otherwise, the model is just overfitting on the data.
Taking only a few random samples from the data is the best strategy to train models, rather than inputting every bit of data that we can find.
$endgroup$
1
$begingroup$
Thanks for sharing your thoughts. I guess, I've to understand the overfitting concept more properly.
$endgroup$
– ranit.b
2 days ago
add a comment |
$begingroup$
There is a false myth that more data means better classification. The model also needs to build up in its complexity otherwise, the model is just overfitting on the data.
Taking only a few random samples from the data is the best strategy to train models, rather than inputting every bit of data that we can find.
$endgroup$
There is a false myth that more data means better classification. The model also needs to build up in its complexity otherwise, the model is just overfitting on the data.
Taking only a few random samples from the data is the best strategy to train models, rather than inputting every bit of data that we can find.
answered 2 days ago
abunickabhiabunickabhi
1838
1838
1
$begingroup$
Thanks for sharing your thoughts. I guess, I've to understand the overfitting concept more properly.
$endgroup$
– ranit.b
2 days ago
add a comment |
1
$begingroup$
Thanks for sharing your thoughts. I guess, I've to understand the overfitting concept more properly.
$endgroup$
– ranit.b
2 days ago
1
1
$begingroup$
Thanks for sharing your thoughts. I guess, I've to understand the overfitting concept more properly.
$endgroup$
– ranit.b
2 days ago
$begingroup$
Thanks for sharing your thoughts. I guess, I've to understand the overfitting concept more properly.
$endgroup$
– ranit.b
2 days ago
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47565%2fhow-important-is-the-input-data-for-a-ml-model%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
It looks like your model overfits did you try to do a train/test split?
$endgroup$
– Robin Nicole
Mar 18 at 23:00
$begingroup$
Thanks Robin. Yes, I've have a 75/25 split. Just out of curiosity, may I ask what hint made you think that the model overfits? ps. Added frequency distribution of my input data in the question.
$endgroup$
– ranit.b
Mar 18 at 23:07
$begingroup$
So I guess it is your test accuracy which decreases. If your training accuracy keeps on increasing but your test accuracy decreases it meanss your model is overfitting.
$endgroup$
– Robin Nicole
Mar 18 at 23:15
$begingroup$
Would you please consider moving this as a Comment to the original question? It works much better that way than a standalone answer.
$endgroup$
– I_Play_With_Data
2 days ago
$begingroup$
Just added: Confusion Matrix and Classification Report. Do you think this is a case of class imbalance ? (where the output classes with less training data are never predicted)
$endgroup$
– ranit.b
2 days ago