When to question output of model2019 Community Moderator ElectionFind effective feature on machine learning classification task with scikit-learnClassifying Email in RUsage of Precision Recall on an unbalanced datasetHow to quantify the performance of the classifier (multi-class SVM) using the test data?Precision and Recall if not binaryPoor performance of SVM after training for rare eventsPoor performance for unbalanced datasetHow to calculate Accuracy, Precision, Recall and F1 score based on predict_proba matrix?How to get accuracy, F1, precision and recall, for a keras model?Improve precision of binary classification - SVM in Matlab
Describing a person. What needs to be mentioned?
Abbreviate author names as "Lastname AB" (without space or period) in bibliography
How do we know the LHC results are robust?
Anatomically Correct Strange Women In Ponds Distributing Swords
Why are there no referendums in the US?
Avoiding estate tax by giving multiple gifts
Was Spock the First Vulcan in Starfleet?
Can "Reverse Gravity" affect spells?
What is the intuitive meaning of having a linear relationship between the logs of two variables?
Was a professor correct to chastise me for writing "Prof. X" rather than "Professor X"?
What is the best translation for "slot" in the context of multiplayer video games?
Trouble understanding the speech of overseas colleagues
Why Were Madagascar and New Zealand Discovered So Late?
Is there a good way to store credentials outside of a password manager?
Go Pregnant or Go Home
What is the difference between "behavior" and "behaviour"?
How does buying out courses with grant money work?
How can I quit an app using Terminal?
How do I go from 300 unfinished/half written blog posts, to published posts?
Crossing the line between justified force and brutality
Tiptoe or tiphoof? Adjusting words to better fit fantasy races
Method to test if a number is a perfect power?
Purchasing a ticket for someone else in another country?
Inappropriate reference requests from Journal reviewers
When to question output of model
2019 Community Moderator ElectionFind effective feature on machine learning classification task with scikit-learnClassifying Email in RUsage of Precision Recall on an unbalanced datasetHow to quantify the performance of the classifier (multi-class SVM) using the test data?Precision and Recall if not binaryPoor performance of SVM after training for rare eventsPoor performance for unbalanced datasetHow to calculate Accuracy, Precision, Recall and F1 score based on predict_proba matrix?How to get accuracy, F1, precision and recall, for a keras model?Improve precision of binary classification - SVM in Matlab
$begingroup$
I'm unsure of how to ask a question without making it seem like a code review question. At what point does one question whether they've actually implemented the algorithm and-or model correctly? Getting spot-on results is great and all, but seems highly suspect. Also, what checks can be done to ensure that the algorithm and-or model is being implemented correctly? The reason I'm asking is because I'm getting perfect classification and subsequently accuracy, precision, etc. w/ the implementation of SVM.
I am including the code, but feel free to ignore.
# Make a copy of the df
iris_df_copy = iris_df.copy()
# Create a new column, labeled 'T/F', whose value will be based on the value in the 'Class' column. If the value in the
# 'Class' column is 'Iris-setosa', then set the value of the 'T/F' column to 1. If the value in the 'Class' column is
# not 'Iris-setosa', then set the value of the 'T/F' column to 0.
iris_df_copy.loc[iris_df_copy.Class == 'Iris-setosa', 'T/F'] = 1
iris_df_copy.loc[iris_df_copy.Class != 'Iris-setosa', 'T/F'] = 0
X_svm = np.array(iris_df_copy[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])
y_svm = np.ravel(iris_df_copy[['T/F']])
# Split the samples into two subsets, use one for training and the other for testing
X_train_svm, X_test_svm, y_train_svm, y_test_svm = train_test_split(X_svm, y_svm, test_size=0.25, random_state=4)
# Instantiate the learning model - Linear SVM
linear_svm = svm.SVC(kernel='linear')
# Fit the model - Linear SVM
linear_svm.fit(X_train_svm, y_train_svm)
# Predict the response - Linear SVM
linear_svm_pred = linear_svm.predict(X_test_svm)
# Confusion matrix and quantitative metrics - Linear SVM
print("The confusion matrix is: " + np.str(confusion_matrix(y_test_svm, linear_svm_pred)))
print("The accuracy score is: " + np.str(accuracy_score(y_test_svm, linear_svm_pred)))
print("The precision is: " + np.str(precision_score(y_test_svm, linear_svm_pred, average="macro")))
print("The recall is: " + np.str(recall_score(y_test_svm, linear_svm_pred, average="macro")))
machine-learning scikit-learn svm
New contributor
$endgroup$
add a comment |
$begingroup$
I'm unsure of how to ask a question without making it seem like a code review question. At what point does one question whether they've actually implemented the algorithm and-or model correctly? Getting spot-on results is great and all, but seems highly suspect. Also, what checks can be done to ensure that the algorithm and-or model is being implemented correctly? The reason I'm asking is because I'm getting perfect classification and subsequently accuracy, precision, etc. w/ the implementation of SVM.
I am including the code, but feel free to ignore.
# Make a copy of the df
iris_df_copy = iris_df.copy()
# Create a new column, labeled 'T/F', whose value will be based on the value in the 'Class' column. If the value in the
# 'Class' column is 'Iris-setosa', then set the value of the 'T/F' column to 1. If the value in the 'Class' column is
# not 'Iris-setosa', then set the value of the 'T/F' column to 0.
iris_df_copy.loc[iris_df_copy.Class == 'Iris-setosa', 'T/F'] = 1
iris_df_copy.loc[iris_df_copy.Class != 'Iris-setosa', 'T/F'] = 0
X_svm = np.array(iris_df_copy[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])
y_svm = np.ravel(iris_df_copy[['T/F']])
# Split the samples into two subsets, use one for training and the other for testing
X_train_svm, X_test_svm, y_train_svm, y_test_svm = train_test_split(X_svm, y_svm, test_size=0.25, random_state=4)
# Instantiate the learning model - Linear SVM
linear_svm = svm.SVC(kernel='linear')
# Fit the model - Linear SVM
linear_svm.fit(X_train_svm, y_train_svm)
# Predict the response - Linear SVM
linear_svm_pred = linear_svm.predict(X_test_svm)
# Confusion matrix and quantitative metrics - Linear SVM
print("The confusion matrix is: " + np.str(confusion_matrix(y_test_svm, linear_svm_pred)))
print("The accuracy score is: " + np.str(accuracy_score(y_test_svm, linear_svm_pred)))
print("The precision is: " + np.str(precision_score(y_test_svm, linear_svm_pred, average="macro")))
print("The recall is: " + np.str(recall_score(y_test_svm, linear_svm_pred, average="macro")))
machine-learning scikit-learn svm
New contributor
$endgroup$
add a comment |
$begingroup$
I'm unsure of how to ask a question without making it seem like a code review question. At what point does one question whether they've actually implemented the algorithm and-or model correctly? Getting spot-on results is great and all, but seems highly suspect. Also, what checks can be done to ensure that the algorithm and-or model is being implemented correctly? The reason I'm asking is because I'm getting perfect classification and subsequently accuracy, precision, etc. w/ the implementation of SVM.
I am including the code, but feel free to ignore.
# Make a copy of the df
iris_df_copy = iris_df.copy()
# Create a new column, labeled 'T/F', whose value will be based on the value in the 'Class' column. If the value in the
# 'Class' column is 'Iris-setosa', then set the value of the 'T/F' column to 1. If the value in the 'Class' column is
# not 'Iris-setosa', then set the value of the 'T/F' column to 0.
iris_df_copy.loc[iris_df_copy.Class == 'Iris-setosa', 'T/F'] = 1
iris_df_copy.loc[iris_df_copy.Class != 'Iris-setosa', 'T/F'] = 0
X_svm = np.array(iris_df_copy[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])
y_svm = np.ravel(iris_df_copy[['T/F']])
# Split the samples into two subsets, use one for training and the other for testing
X_train_svm, X_test_svm, y_train_svm, y_test_svm = train_test_split(X_svm, y_svm, test_size=0.25, random_state=4)
# Instantiate the learning model - Linear SVM
linear_svm = svm.SVC(kernel='linear')
# Fit the model - Linear SVM
linear_svm.fit(X_train_svm, y_train_svm)
# Predict the response - Linear SVM
linear_svm_pred = linear_svm.predict(X_test_svm)
# Confusion matrix and quantitative metrics - Linear SVM
print("The confusion matrix is: " + np.str(confusion_matrix(y_test_svm, linear_svm_pred)))
print("The accuracy score is: " + np.str(accuracy_score(y_test_svm, linear_svm_pred)))
print("The precision is: " + np.str(precision_score(y_test_svm, linear_svm_pred, average="macro")))
print("The recall is: " + np.str(recall_score(y_test_svm, linear_svm_pred, average="macro")))
machine-learning scikit-learn svm
New contributor
$endgroup$
I'm unsure of how to ask a question without making it seem like a code review question. At what point does one question whether they've actually implemented the algorithm and-or model correctly? Getting spot-on results is great and all, but seems highly suspect. Also, what checks can be done to ensure that the algorithm and-or model is being implemented correctly? The reason I'm asking is because I'm getting perfect classification and subsequently accuracy, precision, etc. w/ the implementation of SVM.
I am including the code, but feel free to ignore.
# Make a copy of the df
iris_df_copy = iris_df.copy()
# Create a new column, labeled 'T/F', whose value will be based on the value in the 'Class' column. If the value in the
# 'Class' column is 'Iris-setosa', then set the value of the 'T/F' column to 1. If the value in the 'Class' column is
# not 'Iris-setosa', then set the value of the 'T/F' column to 0.
iris_df_copy.loc[iris_df_copy.Class == 'Iris-setosa', 'T/F'] = 1
iris_df_copy.loc[iris_df_copy.Class != 'Iris-setosa', 'T/F'] = 0
X_svm = np.array(iris_df_copy[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])
y_svm = np.ravel(iris_df_copy[['T/F']])
# Split the samples into two subsets, use one for training and the other for testing
X_train_svm, X_test_svm, y_train_svm, y_test_svm = train_test_split(X_svm, y_svm, test_size=0.25, random_state=4)
# Instantiate the learning model - Linear SVM
linear_svm = svm.SVC(kernel='linear')
# Fit the model - Linear SVM
linear_svm.fit(X_train_svm, y_train_svm)
# Predict the response - Linear SVM
linear_svm_pred = linear_svm.predict(X_test_svm)
# Confusion matrix and quantitative metrics - Linear SVM
print("The confusion matrix is: " + np.str(confusion_matrix(y_test_svm, linear_svm_pred)))
print("The accuracy score is: " + np.str(accuracy_score(y_test_svm, linear_svm_pred)))
print("The precision is: " + np.str(precision_score(y_test_svm, linear_svm_pred, average="macro")))
print("The recall is: " + np.str(recall_score(y_test_svm, linear_svm_pred, average="macro")))
machine-learning scikit-learn svm
machine-learning scikit-learn svm
New contributor
New contributor
New contributor
asked Mar 22 at 22:39
user3727648user3727648
31
31
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You need to know what the outcome should be of a given test on a dataset before you try to test a new method on them. Ask yourself, 'What do I expect from this?'
Linear SVM finds a plane to cut through the data to best represent the difference between two sets.
If you have a look at what you are separating (Iris_setosa from Iris_virginica and iris_versicolor), you'll find that the clumps themselves are perfectly separated. You can draw a line easily on each graph you care to use, and that is what I have done in the picture below. If the clumps are perfectly separated, then the SVM will return a perfectly separated result.
By Nicoguaro - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=46257808
Test the SVM on separating virginica and versicolor to see how it does in a more difficult context. Or alternatively, just generate a dataset of your own from randomly placed gaussian points.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
user3727648 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47814%2fwhen-to-question-output-of-model%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You need to know what the outcome should be of a given test on a dataset before you try to test a new method on them. Ask yourself, 'What do I expect from this?'
Linear SVM finds a plane to cut through the data to best represent the difference between two sets.
If you have a look at what you are separating (Iris_setosa from Iris_virginica and iris_versicolor), you'll find that the clumps themselves are perfectly separated. You can draw a line easily on each graph you care to use, and that is what I have done in the picture below. If the clumps are perfectly separated, then the SVM will return a perfectly separated result.
By Nicoguaro - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=46257808
Test the SVM on separating virginica and versicolor to see how it does in a more difficult context. Or alternatively, just generate a dataset of your own from randomly placed gaussian points.
$endgroup$
add a comment |
$begingroup$
You need to know what the outcome should be of a given test on a dataset before you try to test a new method on them. Ask yourself, 'What do I expect from this?'
Linear SVM finds a plane to cut through the data to best represent the difference between two sets.
If you have a look at what you are separating (Iris_setosa from Iris_virginica and iris_versicolor), you'll find that the clumps themselves are perfectly separated. You can draw a line easily on each graph you care to use, and that is what I have done in the picture below. If the clumps are perfectly separated, then the SVM will return a perfectly separated result.
By Nicoguaro - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=46257808
Test the SVM on separating virginica and versicolor to see how it does in a more difficult context. Or alternatively, just generate a dataset of your own from randomly placed gaussian points.
$endgroup$
add a comment |
$begingroup$
You need to know what the outcome should be of a given test on a dataset before you try to test a new method on them. Ask yourself, 'What do I expect from this?'
Linear SVM finds a plane to cut through the data to best represent the difference between two sets.
If you have a look at what you are separating (Iris_setosa from Iris_virginica and iris_versicolor), you'll find that the clumps themselves are perfectly separated. You can draw a line easily on each graph you care to use, and that is what I have done in the picture below. If the clumps are perfectly separated, then the SVM will return a perfectly separated result.
By Nicoguaro - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=46257808
Test the SVM on separating virginica and versicolor to see how it does in a more difficult context. Or alternatively, just generate a dataset of your own from randomly placed gaussian points.
$endgroup$
You need to know what the outcome should be of a given test on a dataset before you try to test a new method on them. Ask yourself, 'What do I expect from this?'
Linear SVM finds a plane to cut through the data to best represent the difference between two sets.
If you have a look at what you are separating (Iris_setosa from Iris_virginica and iris_versicolor), you'll find that the clumps themselves are perfectly separated. You can draw a line easily on each graph you care to use, and that is what I have done in the picture below. If the clumps are perfectly separated, then the SVM will return a perfectly separated result.
By Nicoguaro - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=46257808
Test the SVM on separating virginica and versicolor to see how it does in a more difficult context. Or alternatively, just generate a dataset of your own from randomly placed gaussian points.
answered Mar 23 at 0:15
IngolifsIngolifs
2187
2187
add a comment |
add a comment |
user3727648 is a new contributor. Be nice, and check out our Code of Conduct.
user3727648 is a new contributor. Be nice, and check out our Code of Conduct.
user3727648 is a new contributor. Be nice, and check out our Code of Conduct.
user3727648 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47814%2fwhen-to-question-output-of-model%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown