When to question output of model2019 Community Moderator ElectionFind effective feature on machine learning classification task with scikit-learnClassifying Email in RUsage of Precision Recall on an unbalanced datasetHow to quantify the performance of the classifier (multi-class SVM) using the test data?Precision and Recall if not binaryPoor performance of SVM after training for rare eventsPoor performance for unbalanced datasetHow to calculate Accuracy, Precision, Recall and F1 score based on predict_proba matrix?How to get accuracy, F1, precision and recall, for a keras model?Improve precision of binary classification - SVM in Matlab

Describing a person. What needs to be mentioned?

Abbreviate author names as "Lastname AB" (without space or period) in bibliography

How do we know the LHC results are robust?

Anatomically Correct Strange Women In Ponds Distributing Swords

Why are there no referendums in the US?

Avoiding estate tax by giving multiple gifts

Was Spock the First Vulcan in Starfleet?

Can "Reverse Gravity" affect spells?

What is the intuitive meaning of having a linear relationship between the logs of two variables?

Was a professor correct to chastise me for writing "Prof. X" rather than "Professor X"?

What is the best translation for "slot" in the context of multiplayer video games?

Trouble understanding the speech of overseas colleagues

Why Were Madagascar and New Zealand Discovered So Late?

Is there a good way to store credentials outside of a password manager?

Go Pregnant or Go Home

What is the difference between "behavior" and "behaviour"?

How does buying out courses with grant money work?

How can I quit an app using Terminal?

How do I go from 300 unfinished/half written blog posts, to published posts?

Crossing the line between justified force and brutality

Tiptoe or tiphoof? Adjusting words to better fit fantasy races

Method to test if a number is a perfect power?

Purchasing a ticket for someone else in another country?

Inappropriate reference requests from Journal reviewers



When to question output of model



2019 Community Moderator ElectionFind effective feature on machine learning classification task with scikit-learnClassifying Email in RUsage of Precision Recall on an unbalanced datasetHow to quantify the performance of the classifier (multi-class SVM) using the test data?Precision and Recall if not binaryPoor performance of SVM after training for rare eventsPoor performance for unbalanced datasetHow to calculate Accuracy, Precision, Recall and F1 score based on predict_proba matrix?How to get accuracy, F1, precision and recall, for a keras model?Improve precision of binary classification - SVM in Matlab










0












$begingroup$


I'm unsure of how to ask a question without making it seem like a code review question. At what point does one question whether they've actually implemented the algorithm and-or model correctly? Getting spot-on results is great and all, but seems highly suspect. Also, what checks can be done to ensure that the algorithm and-or model is being implemented correctly? The reason I'm asking is because I'm getting perfect classification and subsequently accuracy, precision, etc. w/ the implementation of SVM.



I am including the code, but feel free to ignore.



# Make a copy of the df
iris_df_copy = iris_df.copy()

# Create a new column, labeled 'T/F', whose value will be based on the value in the 'Class' column. If the value in the
# 'Class' column is 'Iris-setosa', then set the value of the 'T/F' column to 1. If the value in the 'Class' column is
# not 'Iris-setosa', then set the value of the 'T/F' column to 0.
iris_df_copy.loc[iris_df_copy.Class == 'Iris-setosa', 'T/F'] = 1
iris_df_copy.loc[iris_df_copy.Class != 'Iris-setosa', 'T/F'] = 0

X_svm = np.array(iris_df_copy[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])
y_svm = np.ravel(iris_df_copy[['T/F']])

# Split the samples into two subsets, use one for training and the other for testing
X_train_svm, X_test_svm, y_train_svm, y_test_svm = train_test_split(X_svm, y_svm, test_size=0.25, random_state=4)

# Instantiate the learning model - Linear SVM
linear_svm = svm.SVC(kernel='linear')

# Fit the model - Linear SVM
linear_svm.fit(X_train_svm, y_train_svm)

# Predict the response - Linear SVM
linear_svm_pred = linear_svm.predict(X_test_svm)

# Confusion matrix and quantitative metrics - Linear SVM
print("The confusion matrix is: " + np.str(confusion_matrix(y_test_svm, linear_svm_pred)))
print("The accuracy score is: " + np.str(accuracy_score(y_test_svm, linear_svm_pred)))
print("The precision is: " + np.str(precision_score(y_test_svm, linear_svm_pred, average="macro")))
print("The recall is: " + np.str(recall_score(y_test_svm, linear_svm_pred, average="macro")))










share|improve this question







New contributor




user3727648 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    0












    $begingroup$


    I'm unsure of how to ask a question without making it seem like a code review question. At what point does one question whether they've actually implemented the algorithm and-or model correctly? Getting spot-on results is great and all, but seems highly suspect. Also, what checks can be done to ensure that the algorithm and-or model is being implemented correctly? The reason I'm asking is because I'm getting perfect classification and subsequently accuracy, precision, etc. w/ the implementation of SVM.



    I am including the code, but feel free to ignore.



    # Make a copy of the df
    iris_df_copy = iris_df.copy()

    # Create a new column, labeled 'T/F', whose value will be based on the value in the 'Class' column. If the value in the
    # 'Class' column is 'Iris-setosa', then set the value of the 'T/F' column to 1. If the value in the 'Class' column is
    # not 'Iris-setosa', then set the value of the 'T/F' column to 0.
    iris_df_copy.loc[iris_df_copy.Class == 'Iris-setosa', 'T/F'] = 1
    iris_df_copy.loc[iris_df_copy.Class != 'Iris-setosa', 'T/F'] = 0

    X_svm = np.array(iris_df_copy[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])
    y_svm = np.ravel(iris_df_copy[['T/F']])

    # Split the samples into two subsets, use one for training and the other for testing
    X_train_svm, X_test_svm, y_train_svm, y_test_svm = train_test_split(X_svm, y_svm, test_size=0.25, random_state=4)

    # Instantiate the learning model - Linear SVM
    linear_svm = svm.SVC(kernel='linear')

    # Fit the model - Linear SVM
    linear_svm.fit(X_train_svm, y_train_svm)

    # Predict the response - Linear SVM
    linear_svm_pred = linear_svm.predict(X_test_svm)

    # Confusion matrix and quantitative metrics - Linear SVM
    print("The confusion matrix is: " + np.str(confusion_matrix(y_test_svm, linear_svm_pred)))
    print("The accuracy score is: " + np.str(accuracy_score(y_test_svm, linear_svm_pred)))
    print("The precision is: " + np.str(precision_score(y_test_svm, linear_svm_pred, average="macro")))
    print("The recall is: " + np.str(recall_score(y_test_svm, linear_svm_pred, average="macro")))










    share|improve this question







    New contributor




    user3727648 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      0












      0








      0





      $begingroup$


      I'm unsure of how to ask a question without making it seem like a code review question. At what point does one question whether they've actually implemented the algorithm and-or model correctly? Getting spot-on results is great and all, but seems highly suspect. Also, what checks can be done to ensure that the algorithm and-or model is being implemented correctly? The reason I'm asking is because I'm getting perfect classification and subsequently accuracy, precision, etc. w/ the implementation of SVM.



      I am including the code, but feel free to ignore.



      # Make a copy of the df
      iris_df_copy = iris_df.copy()

      # Create a new column, labeled 'T/F', whose value will be based on the value in the 'Class' column. If the value in the
      # 'Class' column is 'Iris-setosa', then set the value of the 'T/F' column to 1. If the value in the 'Class' column is
      # not 'Iris-setosa', then set the value of the 'T/F' column to 0.
      iris_df_copy.loc[iris_df_copy.Class == 'Iris-setosa', 'T/F'] = 1
      iris_df_copy.loc[iris_df_copy.Class != 'Iris-setosa', 'T/F'] = 0

      X_svm = np.array(iris_df_copy[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])
      y_svm = np.ravel(iris_df_copy[['T/F']])

      # Split the samples into two subsets, use one for training and the other for testing
      X_train_svm, X_test_svm, y_train_svm, y_test_svm = train_test_split(X_svm, y_svm, test_size=0.25, random_state=4)

      # Instantiate the learning model - Linear SVM
      linear_svm = svm.SVC(kernel='linear')

      # Fit the model - Linear SVM
      linear_svm.fit(X_train_svm, y_train_svm)

      # Predict the response - Linear SVM
      linear_svm_pred = linear_svm.predict(X_test_svm)

      # Confusion matrix and quantitative metrics - Linear SVM
      print("The confusion matrix is: " + np.str(confusion_matrix(y_test_svm, linear_svm_pred)))
      print("The accuracy score is: " + np.str(accuracy_score(y_test_svm, linear_svm_pred)))
      print("The precision is: " + np.str(precision_score(y_test_svm, linear_svm_pred, average="macro")))
      print("The recall is: " + np.str(recall_score(y_test_svm, linear_svm_pred, average="macro")))










      share|improve this question







      New contributor




      user3727648 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I'm unsure of how to ask a question without making it seem like a code review question. At what point does one question whether they've actually implemented the algorithm and-or model correctly? Getting spot-on results is great and all, but seems highly suspect. Also, what checks can be done to ensure that the algorithm and-or model is being implemented correctly? The reason I'm asking is because I'm getting perfect classification and subsequently accuracy, precision, etc. w/ the implementation of SVM.



      I am including the code, but feel free to ignore.



      # Make a copy of the df
      iris_df_copy = iris_df.copy()

      # Create a new column, labeled 'T/F', whose value will be based on the value in the 'Class' column. If the value in the
      # 'Class' column is 'Iris-setosa', then set the value of the 'T/F' column to 1. If the value in the 'Class' column is
      # not 'Iris-setosa', then set the value of the 'T/F' column to 0.
      iris_df_copy.loc[iris_df_copy.Class == 'Iris-setosa', 'T/F'] = 1
      iris_df_copy.loc[iris_df_copy.Class != 'Iris-setosa', 'T/F'] = 0

      X_svm = np.array(iris_df_copy[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])
      y_svm = np.ravel(iris_df_copy[['T/F']])

      # Split the samples into two subsets, use one for training and the other for testing
      X_train_svm, X_test_svm, y_train_svm, y_test_svm = train_test_split(X_svm, y_svm, test_size=0.25, random_state=4)

      # Instantiate the learning model - Linear SVM
      linear_svm = svm.SVC(kernel='linear')

      # Fit the model - Linear SVM
      linear_svm.fit(X_train_svm, y_train_svm)

      # Predict the response - Linear SVM
      linear_svm_pred = linear_svm.predict(X_test_svm)

      # Confusion matrix and quantitative metrics - Linear SVM
      print("The confusion matrix is: " + np.str(confusion_matrix(y_test_svm, linear_svm_pred)))
      print("The accuracy score is: " + np.str(accuracy_score(y_test_svm, linear_svm_pred)))
      print("The precision is: " + np.str(precision_score(y_test_svm, linear_svm_pred, average="macro")))
      print("The recall is: " + np.str(recall_score(y_test_svm, linear_svm_pred, average="macro")))







      machine-learning scikit-learn svm






      share|improve this question







      New contributor




      user3727648 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      user3727648 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      user3727648 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Mar 22 at 22:39









      user3727648user3727648

      31




      31




      New contributor




      user3727648 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      user3727648 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      user3727648 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          You need to know what the outcome should be of a given test on a dataset before you try to test a new method on them. Ask yourself, 'What do I expect from this?'



          Linear SVM finds a plane to cut through the data to best represent the difference between two sets.



          If you have a look at what you are separating (Iris_setosa from Iris_virginica and iris_versicolor), you'll find that the clumps themselves are perfectly separated. You can draw a line easily on each graph you care to use, and that is what I have done in the picture below. If the clumps are perfectly separated, then the SVM will return a perfectly separated result.
          enter image description here
          By Nicoguaro - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=46257808



          Test the SVM on separating virginica and versicolor to see how it does in a more difficult context. Or alternatively, just generate a dataset of your own from randomly placed gaussian points.






          share|improve this answer









          $endgroup$












            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            user3727648 is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47814%2fwhen-to-question-output-of-model%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            You need to know what the outcome should be of a given test on a dataset before you try to test a new method on them. Ask yourself, 'What do I expect from this?'



            Linear SVM finds a plane to cut through the data to best represent the difference between two sets.



            If you have a look at what you are separating (Iris_setosa from Iris_virginica and iris_versicolor), you'll find that the clumps themselves are perfectly separated. You can draw a line easily on each graph you care to use, and that is what I have done in the picture below. If the clumps are perfectly separated, then the SVM will return a perfectly separated result.
            enter image description here
            By Nicoguaro - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=46257808



            Test the SVM on separating virginica and versicolor to see how it does in a more difficult context. Or alternatively, just generate a dataset of your own from randomly placed gaussian points.






            share|improve this answer









            $endgroup$

















              0












              $begingroup$

              You need to know what the outcome should be of a given test on a dataset before you try to test a new method on them. Ask yourself, 'What do I expect from this?'



              Linear SVM finds a plane to cut through the data to best represent the difference between two sets.



              If you have a look at what you are separating (Iris_setosa from Iris_virginica and iris_versicolor), you'll find that the clumps themselves are perfectly separated. You can draw a line easily on each graph you care to use, and that is what I have done in the picture below. If the clumps are perfectly separated, then the SVM will return a perfectly separated result.
              enter image description here
              By Nicoguaro - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=46257808



              Test the SVM on separating virginica and versicolor to see how it does in a more difficult context. Or alternatively, just generate a dataset of your own from randomly placed gaussian points.






              share|improve this answer









              $endgroup$















                0












                0








                0





                $begingroup$

                You need to know what the outcome should be of a given test on a dataset before you try to test a new method on them. Ask yourself, 'What do I expect from this?'



                Linear SVM finds a plane to cut through the data to best represent the difference between two sets.



                If you have a look at what you are separating (Iris_setosa from Iris_virginica and iris_versicolor), you'll find that the clumps themselves are perfectly separated. You can draw a line easily on each graph you care to use, and that is what I have done in the picture below. If the clumps are perfectly separated, then the SVM will return a perfectly separated result.
                enter image description here
                By Nicoguaro - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=46257808



                Test the SVM on separating virginica and versicolor to see how it does in a more difficult context. Or alternatively, just generate a dataset of your own from randomly placed gaussian points.






                share|improve this answer









                $endgroup$



                You need to know what the outcome should be of a given test on a dataset before you try to test a new method on them. Ask yourself, 'What do I expect from this?'



                Linear SVM finds a plane to cut through the data to best represent the difference between two sets.



                If you have a look at what you are separating (Iris_setosa from Iris_virginica and iris_versicolor), you'll find that the clumps themselves are perfectly separated. You can draw a line easily on each graph you care to use, and that is what I have done in the picture below. If the clumps are perfectly separated, then the SVM will return a perfectly separated result.
                enter image description here
                By Nicoguaro - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=46257808



                Test the SVM on separating virginica and versicolor to see how it does in a more difficult context. Or alternatively, just generate a dataset of your own from randomly placed gaussian points.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 23 at 0:15









                IngolifsIngolifs

                2187




                2187




















                    user3727648 is a new contributor. Be nice, and check out our Code of Conduct.









                    draft saved

                    draft discarded


















                    user3727648 is a new contributor. Be nice, and check out our Code of Conduct.












                    user3727648 is a new contributor. Be nice, and check out our Code of Conduct.











                    user3727648 is a new contributor. Be nice, and check out our Code of Conduct.














                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47814%2fwhen-to-question-output-of-model%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                    Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High