Given a single discrete data set, how should I divide it into training data and test data?Why Is Overfitting Bad in Machine Learning?R - Error in KNN - Test and training differWhen forecasting time series, how does one incorporate the test data back into the model after training?Right ML mode and metric to minimize FN and FP on imbalanced datasetIs the early stopping of xgboost using correctNumber of features of the model must match the input. Model n_features is `N` and input n_features is `X`.Both train and test error are decreasing in XGBoost iterationswhen can xgboost or catboost be better then Logistic regression?Train and Test Error dependence on size of dataHow do I create a data set that has a set of features for multiple options, with one option being the expected outcome?How do I train Xgboost classifier for ECG Signal data?
Selecting a secure PIN for building access
If a prion is a protein, why is it not disassembled by the digestive system?
Has a commercial or military jet bi-plane ever been manufactured?
Which industry am I working in? Software development or financial services?
Big O Simplification Algebra
How to reply this mail from potential PhD professor?
If Earth is tilted, why is Polaris always above the same spot?
Why is random forest an improvement of decision tree?
CRT Oscilloscope - part of the plot is missing
What was the state of the German rail system in 1944?
Why is C# in the D Major Scale?
How can I support myself financially as a 17 year old with a loan?
Why do we use caret (^) as the symbol for ctrl/control?
Would a 1/1 token with persist dying trigger on death effects a second time?
Short story with physics professor who "brings back the dead" (Asimov or Bradbury?)
Transferring data speed of Fast Ethernet
Can the 歳 counter be used for architecture, furniture etc to tell its age?
Would glacier 'trees' be plausible?
Endgame: Is there significance between this dialogue between Tony and his father?
In Avengers 1, why does Thanos need Loki?
A non-technological, repeating, phenomenon in the sky, holding its position in the sky for hours
I need a disease
Besides the up and down quark, what other quarks are present in daily matter around us?
Junior developer struggles: how to communicate with management?
Given a single discrete data set, how should I divide it into training data and test data?
Why Is Overfitting Bad in Machine Learning?R - Error in KNN - Test and training differWhen forecasting time series, how does one incorporate the test data back into the model after training?Right ML mode and metric to minimize FN and FP on imbalanced datasetIs the early stopping of xgboost using correctNumber of features of the model must match the input. Model n_features is `N` and input n_features is `X`.Both train and test error are decreasing in XGBoost iterationswhen can xgboost or catboost be better then Logistic regression?Train and Test Error dependence on size of dataHow do I create a data set that has a set of features for multiple options, with one option being the expected outcome?How do I train Xgboost classifier for ECG Signal data?
$begingroup$
I have a dataset in libSVM
format consisting of 6000 entries, each with 5 indices, and each index has a binary value 1 or 2. Each of the 6000 entries has a label of 1 or 0, and I am trying to use various machine learning algorithms to determine the correct label (0 or 1) given a particular set of 5 indices/values.
For example, consider the following dataset (the real one is 6000 lines):
0 101:1 102:1 103:0 104:1 105:1
0 101:0 102:1 103:0 104:1 105:1
0 101:0 102:1 103:1 104:1 105:1
1 101:1 102:1 103:1 104:1 105:1
1 101:0 102:1 103:0 104:0 105:1
1 101:1 102:1 103:1 104:0 105:0
1 101:0 102:1 103:0 104:0 105:0
For an algorithm that predicts binary classification, like xgboost
, conceptually, how do I first use my dataset to train the model, and then apply the model to the data?
I ask because xgboost
asks for two files, a data training set and a data test set. It seems to me that the algorithm should just require a single full set of data, use all of the data to train and build a model, and then apply that model to the original data set and determine if the labels are being assigned "0 or 1" accurately.
Any help in understanding this concept is much appreciated.
machine-learning xgboost training
$endgroup$
add a comment |
$begingroup$
I have a dataset in libSVM
format consisting of 6000 entries, each with 5 indices, and each index has a binary value 1 or 2. Each of the 6000 entries has a label of 1 or 0, and I am trying to use various machine learning algorithms to determine the correct label (0 or 1) given a particular set of 5 indices/values.
For example, consider the following dataset (the real one is 6000 lines):
0 101:1 102:1 103:0 104:1 105:1
0 101:0 102:1 103:0 104:1 105:1
0 101:0 102:1 103:1 104:1 105:1
1 101:1 102:1 103:1 104:1 105:1
1 101:0 102:1 103:0 104:0 105:1
1 101:1 102:1 103:1 104:0 105:0
1 101:0 102:1 103:0 104:0 105:0
For an algorithm that predicts binary classification, like xgboost
, conceptually, how do I first use my dataset to train the model, and then apply the model to the data?
I ask because xgboost
asks for two files, a data training set and a data test set. It seems to me that the algorithm should just require a single full set of data, use all of the data to train and build a model, and then apply that model to the original data set and determine if the labels are being assigned "0 or 1" accurately.
Any help in understanding this concept is much appreciated.
machine-learning xgboost training
$endgroup$
add a comment |
$begingroup$
I have a dataset in libSVM
format consisting of 6000 entries, each with 5 indices, and each index has a binary value 1 or 2. Each of the 6000 entries has a label of 1 or 0, and I am trying to use various machine learning algorithms to determine the correct label (0 or 1) given a particular set of 5 indices/values.
For example, consider the following dataset (the real one is 6000 lines):
0 101:1 102:1 103:0 104:1 105:1
0 101:0 102:1 103:0 104:1 105:1
0 101:0 102:1 103:1 104:1 105:1
1 101:1 102:1 103:1 104:1 105:1
1 101:0 102:1 103:0 104:0 105:1
1 101:1 102:1 103:1 104:0 105:0
1 101:0 102:1 103:0 104:0 105:0
For an algorithm that predicts binary classification, like xgboost
, conceptually, how do I first use my dataset to train the model, and then apply the model to the data?
I ask because xgboost
asks for two files, a data training set and a data test set. It seems to me that the algorithm should just require a single full set of data, use all of the data to train and build a model, and then apply that model to the original data set and determine if the labels are being assigned "0 or 1" accurately.
Any help in understanding this concept is much appreciated.
machine-learning xgboost training
$endgroup$
I have a dataset in libSVM
format consisting of 6000 entries, each with 5 indices, and each index has a binary value 1 or 2. Each of the 6000 entries has a label of 1 or 0, and I am trying to use various machine learning algorithms to determine the correct label (0 or 1) given a particular set of 5 indices/values.
For example, consider the following dataset (the real one is 6000 lines):
0 101:1 102:1 103:0 104:1 105:1
0 101:0 102:1 103:0 104:1 105:1
0 101:0 102:1 103:1 104:1 105:1
1 101:1 102:1 103:1 104:1 105:1
1 101:0 102:1 103:0 104:0 105:1
1 101:1 102:1 103:1 104:0 105:0
1 101:0 102:1 103:0 104:0 105:0
For an algorithm that predicts binary classification, like xgboost
, conceptually, how do I first use my dataset to train the model, and then apply the model to the data?
I ask because xgboost
asks for two files, a data training set and a data test set. It seems to me that the algorithm should just require a single full set of data, use all of the data to train and build a model, and then apply that model to the original data set and determine if the labels are being assigned "0 or 1" accurately.
Any help in understanding this concept is much appreciated.
machine-learning xgboost training
machine-learning xgboost training
asked Apr 10 at 2:39
jake9115jake9115
101
101
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
In machine learning, it is important to test out the model that you have built on your training data. This is to prevent overfitting. This is why you must split your data into testing and training. There are many different ways to split testing and training. You can randomly split the data set so that 80% of the samples are training and 20% are testing. Something else you may want to consider is using stratified sampling so that the positive labels occur in both testing and training. This is especially important if you only have a few positively labeled samples, as you could easily end up with a test set without any positive samples. In python there is an argument ‘stratify’ that you can use so that the split has balanced classes.
$endgroup$
$begingroup$
You can also use crossvalidation: Split your data into $k$ parts, train with $k-1$ and test with 1, repeat until all parts have been used for testing. This is called $k$-fold cross validation and will tell you how trustable is your model (by the variance in results) and how precise (by the mean result). Also you can do leave-on-out crossvalidation which is the same as a $k$-fold cv where $k$ equals to the size of dataset.
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 3:25
$begingroup$
@PedroHenriqueMonforte I completely agree, cross validation is a better route than just using one test set. However -- since this individual seems to be new to ML and confused about the general concept, I decided to keep my response limited.
$endgroup$
– fractalnature
Apr 11 at 16:15
$begingroup$
I understood your approach, this is why i just added a comment for others looking for it to see and not edit your answer. Also I've upvoted your answer
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 16:17
add a comment |
$begingroup$
Assuming you are using Python, an easy way to do this is to use utilities available in scikit-learn:
from sklearn.datasets import load_svmlight_file, dump_svmlight_file
from sklearn.model_selection import train_test_split
# load features and labels
X, y = load_svmlight_file('path/to/libsvm/data')
# split into train/test sets (change test_size if you like)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# write the train & test datasets to disk
dump_svmlight_file(X_train, y_train, 'train.svm')
dump_svmlight_file(X_train, y_train, 'test.svm')
In reference to your comment
It seems to me that the algorithm should just require a single full set of data, use all of the data to train and build a model, and then apply that model to the original data set and determine if the labels are being assigned "0 or 1" accurately.
I would recommend reading about overfitting. In short, overfitting happens if your model is very good at classifying the data that you used to train the model, but performs poorly on unseen data. If you fit a model to a dataset, and then test the model on the same dataset, you will likely get very optimistic estimates for performance that may lead you to believe that your model is much better than it actually is.
After finding a set of hyper-parameters that work well and testing to ensure that your model isn't overfitting, you can train the model on the full dataset using the hyper-parameters that worked.
Some good references on overfitting:
- Why Is Overfitting Bad in Machine Learning?
- Overfitting in Machine Learning: What It Is and How to Prevent It
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49006%2fgiven-a-single-discrete-data-set-how-should-i-divide-it-into-training-data-and%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
In machine learning, it is important to test out the model that you have built on your training data. This is to prevent overfitting. This is why you must split your data into testing and training. There are many different ways to split testing and training. You can randomly split the data set so that 80% of the samples are training and 20% are testing. Something else you may want to consider is using stratified sampling so that the positive labels occur in both testing and training. This is especially important if you only have a few positively labeled samples, as you could easily end up with a test set without any positive samples. In python there is an argument ‘stratify’ that you can use so that the split has balanced classes.
$endgroup$
$begingroup$
You can also use crossvalidation: Split your data into $k$ parts, train with $k-1$ and test with 1, repeat until all parts have been used for testing. This is called $k$-fold cross validation and will tell you how trustable is your model (by the variance in results) and how precise (by the mean result). Also you can do leave-on-out crossvalidation which is the same as a $k$-fold cv where $k$ equals to the size of dataset.
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 3:25
$begingroup$
@PedroHenriqueMonforte I completely agree, cross validation is a better route than just using one test set. However -- since this individual seems to be new to ML and confused about the general concept, I decided to keep my response limited.
$endgroup$
– fractalnature
Apr 11 at 16:15
$begingroup$
I understood your approach, this is why i just added a comment for others looking for it to see and not edit your answer. Also I've upvoted your answer
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 16:17
add a comment |
$begingroup$
In machine learning, it is important to test out the model that you have built on your training data. This is to prevent overfitting. This is why you must split your data into testing and training. There are many different ways to split testing and training. You can randomly split the data set so that 80% of the samples are training and 20% are testing. Something else you may want to consider is using stratified sampling so that the positive labels occur in both testing and training. This is especially important if you only have a few positively labeled samples, as you could easily end up with a test set without any positive samples. In python there is an argument ‘stratify’ that you can use so that the split has balanced classes.
$endgroup$
$begingroup$
You can also use crossvalidation: Split your data into $k$ parts, train with $k-1$ and test with 1, repeat until all parts have been used for testing. This is called $k$-fold cross validation and will tell you how trustable is your model (by the variance in results) and how precise (by the mean result). Also you can do leave-on-out crossvalidation which is the same as a $k$-fold cv where $k$ equals to the size of dataset.
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 3:25
$begingroup$
@PedroHenriqueMonforte I completely agree, cross validation is a better route than just using one test set. However -- since this individual seems to be new to ML and confused about the general concept, I decided to keep my response limited.
$endgroup$
– fractalnature
Apr 11 at 16:15
$begingroup$
I understood your approach, this is why i just added a comment for others looking for it to see and not edit your answer. Also I've upvoted your answer
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 16:17
add a comment |
$begingroup$
In machine learning, it is important to test out the model that you have built on your training data. This is to prevent overfitting. This is why you must split your data into testing and training. There are many different ways to split testing and training. You can randomly split the data set so that 80% of the samples are training and 20% are testing. Something else you may want to consider is using stratified sampling so that the positive labels occur in both testing and training. This is especially important if you only have a few positively labeled samples, as you could easily end up with a test set without any positive samples. In python there is an argument ‘stratify’ that you can use so that the split has balanced classes.
$endgroup$
In machine learning, it is important to test out the model that you have built on your training data. This is to prevent overfitting. This is why you must split your data into testing and training. There are many different ways to split testing and training. You can randomly split the data set so that 80% of the samples are training and 20% are testing. Something else you may want to consider is using stratified sampling so that the positive labels occur in both testing and training. This is especially important if you only have a few positively labeled samples, as you could easily end up with a test set without any positive samples. In python there is an argument ‘stratify’ that you can use so that the split has balanced classes.
answered Apr 11 at 3:08
fractalnaturefractalnature
1015
1015
$begingroup$
You can also use crossvalidation: Split your data into $k$ parts, train with $k-1$ and test with 1, repeat until all parts have been used for testing. This is called $k$-fold cross validation and will tell you how trustable is your model (by the variance in results) and how precise (by the mean result). Also you can do leave-on-out crossvalidation which is the same as a $k$-fold cv where $k$ equals to the size of dataset.
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 3:25
$begingroup$
@PedroHenriqueMonforte I completely agree, cross validation is a better route than just using one test set. However -- since this individual seems to be new to ML and confused about the general concept, I decided to keep my response limited.
$endgroup$
– fractalnature
Apr 11 at 16:15
$begingroup$
I understood your approach, this is why i just added a comment for others looking for it to see and not edit your answer. Also I've upvoted your answer
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 16:17
add a comment |
$begingroup$
You can also use crossvalidation: Split your data into $k$ parts, train with $k-1$ and test with 1, repeat until all parts have been used for testing. This is called $k$-fold cross validation and will tell you how trustable is your model (by the variance in results) and how precise (by the mean result). Also you can do leave-on-out crossvalidation which is the same as a $k$-fold cv where $k$ equals to the size of dataset.
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 3:25
$begingroup$
@PedroHenriqueMonforte I completely agree, cross validation is a better route than just using one test set. However -- since this individual seems to be new to ML and confused about the general concept, I decided to keep my response limited.
$endgroup$
– fractalnature
Apr 11 at 16:15
$begingroup$
I understood your approach, this is why i just added a comment for others looking for it to see and not edit your answer. Also I've upvoted your answer
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 16:17
$begingroup$
You can also use crossvalidation: Split your data into $k$ parts, train with $k-1$ and test with 1, repeat until all parts have been used for testing. This is called $k$-fold cross validation and will tell you how trustable is your model (by the variance in results) and how precise (by the mean result). Also you can do leave-on-out crossvalidation which is the same as a $k$-fold cv where $k$ equals to the size of dataset.
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 3:25
$begingroup$
You can also use crossvalidation: Split your data into $k$ parts, train with $k-1$ and test with 1, repeat until all parts have been used for testing. This is called $k$-fold cross validation and will tell you how trustable is your model (by the variance in results) and how precise (by the mean result). Also you can do leave-on-out crossvalidation which is the same as a $k$-fold cv where $k$ equals to the size of dataset.
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 3:25
$begingroup$
@PedroHenriqueMonforte I completely agree, cross validation is a better route than just using one test set. However -- since this individual seems to be new to ML and confused about the general concept, I decided to keep my response limited.
$endgroup$
– fractalnature
Apr 11 at 16:15
$begingroup$
@PedroHenriqueMonforte I completely agree, cross validation is a better route than just using one test set. However -- since this individual seems to be new to ML and confused about the general concept, I decided to keep my response limited.
$endgroup$
– fractalnature
Apr 11 at 16:15
$begingroup$
I understood your approach, this is why i just added a comment for others looking for it to see and not edit your answer. Also I've upvoted your answer
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 16:17
$begingroup$
I understood your approach, this is why i just added a comment for others looking for it to see and not edit your answer. Also I've upvoted your answer
$endgroup$
– Pedro Henrique Monforte
Apr 11 at 16:17
add a comment |
$begingroup$
Assuming you are using Python, an easy way to do this is to use utilities available in scikit-learn:
from sklearn.datasets import load_svmlight_file, dump_svmlight_file
from sklearn.model_selection import train_test_split
# load features and labels
X, y = load_svmlight_file('path/to/libsvm/data')
# split into train/test sets (change test_size if you like)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# write the train & test datasets to disk
dump_svmlight_file(X_train, y_train, 'train.svm')
dump_svmlight_file(X_train, y_train, 'test.svm')
In reference to your comment
It seems to me that the algorithm should just require a single full set of data, use all of the data to train and build a model, and then apply that model to the original data set and determine if the labels are being assigned "0 or 1" accurately.
I would recommend reading about overfitting. In short, overfitting happens if your model is very good at classifying the data that you used to train the model, but performs poorly on unseen data. If you fit a model to a dataset, and then test the model on the same dataset, you will likely get very optimistic estimates for performance that may lead you to believe that your model is much better than it actually is.
After finding a set of hyper-parameters that work well and testing to ensure that your model isn't overfitting, you can train the model on the full dataset using the hyper-parameters that worked.
Some good references on overfitting:
- Why Is Overfitting Bad in Machine Learning?
- Overfitting in Machine Learning: What It Is and How to Prevent It
$endgroup$
add a comment |
$begingroup$
Assuming you are using Python, an easy way to do this is to use utilities available in scikit-learn:
from sklearn.datasets import load_svmlight_file, dump_svmlight_file
from sklearn.model_selection import train_test_split
# load features and labels
X, y = load_svmlight_file('path/to/libsvm/data')
# split into train/test sets (change test_size if you like)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# write the train & test datasets to disk
dump_svmlight_file(X_train, y_train, 'train.svm')
dump_svmlight_file(X_train, y_train, 'test.svm')
In reference to your comment
It seems to me that the algorithm should just require a single full set of data, use all of the data to train and build a model, and then apply that model to the original data set and determine if the labels are being assigned "0 or 1" accurately.
I would recommend reading about overfitting. In short, overfitting happens if your model is very good at classifying the data that you used to train the model, but performs poorly on unseen data. If you fit a model to a dataset, and then test the model on the same dataset, you will likely get very optimistic estimates for performance that may lead you to believe that your model is much better than it actually is.
After finding a set of hyper-parameters that work well and testing to ensure that your model isn't overfitting, you can train the model on the full dataset using the hyper-parameters that worked.
Some good references on overfitting:
- Why Is Overfitting Bad in Machine Learning?
- Overfitting in Machine Learning: What It Is and How to Prevent It
$endgroup$
add a comment |
$begingroup$
Assuming you are using Python, an easy way to do this is to use utilities available in scikit-learn:
from sklearn.datasets import load_svmlight_file, dump_svmlight_file
from sklearn.model_selection import train_test_split
# load features and labels
X, y = load_svmlight_file('path/to/libsvm/data')
# split into train/test sets (change test_size if you like)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# write the train & test datasets to disk
dump_svmlight_file(X_train, y_train, 'train.svm')
dump_svmlight_file(X_train, y_train, 'test.svm')
In reference to your comment
It seems to me that the algorithm should just require a single full set of data, use all of the data to train and build a model, and then apply that model to the original data set and determine if the labels are being assigned "0 or 1" accurately.
I would recommend reading about overfitting. In short, overfitting happens if your model is very good at classifying the data that you used to train the model, but performs poorly on unseen data. If you fit a model to a dataset, and then test the model on the same dataset, you will likely get very optimistic estimates for performance that may lead you to believe that your model is much better than it actually is.
After finding a set of hyper-parameters that work well and testing to ensure that your model isn't overfitting, you can train the model on the full dataset using the hyper-parameters that worked.
Some good references on overfitting:
- Why Is Overfitting Bad in Machine Learning?
- Overfitting in Machine Learning: What It Is and How to Prevent It
$endgroup$
Assuming you are using Python, an easy way to do this is to use utilities available in scikit-learn:
from sklearn.datasets import load_svmlight_file, dump_svmlight_file
from sklearn.model_selection import train_test_split
# load features and labels
X, y = load_svmlight_file('path/to/libsvm/data')
# split into train/test sets (change test_size if you like)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# write the train & test datasets to disk
dump_svmlight_file(X_train, y_train, 'train.svm')
dump_svmlight_file(X_train, y_train, 'test.svm')
In reference to your comment
It seems to me that the algorithm should just require a single full set of data, use all of the data to train and build a model, and then apply that model to the original data set and determine if the labels are being assigned "0 or 1" accurately.
I would recommend reading about overfitting. In short, overfitting happens if your model is very good at classifying the data that you used to train the model, but performs poorly on unseen data. If you fit a model to a dataset, and then test the model on the same dataset, you will likely get very optimistic estimates for performance that may lead you to believe that your model is much better than it actually is.
After finding a set of hyper-parameters that work well and testing to ensure that your model isn't overfitting, you can train the model on the full dataset using the hyper-parameters that worked.
Some good references on overfitting:
- Why Is Overfitting Bad in Machine Learning?
- Overfitting in Machine Learning: What It Is and How to Prevent It
answered Apr 10 at 3:09
timleatharttimleathart
2,4291029
2,4291029
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49006%2fgiven-a-single-discrete-data-set-how-should-i-divide-it-into-training-data-and%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown