How to measure the similarity between two images? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsSimilarity measure for ordered binary vectorsusers' percentile similarity measuresimilarity measuresimilarity measure with two featuresIs there a way to measure correlation between two similar datasets?How to compare performance of Cosine Similarity and Manhatten Distance?Cosine similarity between query and document confusionHow to create clusters based on sentence similarity?Similarity measure before and after dimensionality reduction or clusteringHow to measaure the similarity between two series?

How to make an animal which can only breed for a certain number of generations?

What helicopter has the most rotor blades?

Why aren't road bike wheels tiny?

What came first? Venom as the movie or as the song?

Help Recreating a Table

Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?

Im stuck and having trouble with ¬P ∨ Q Prove: P → Q

Should man-made satellites feature an intelligent inverted "cow catcher"?

How do I deal with an erroneously large refund?

What kind of equipment or other technology is necessary to photograph sprites (atmospheric phenomenon)

Does Prince Arnaud cause someone holding the Princess to lose?

Why did Israel vote against lifting the American embargo on Cuba?

Who's this lady in the war room?

Is there a verb for listening stealthily?

What is the difference between 准时 and 按时?

What's the connection between Mr. Nancy and fried chicken?

Can I take recommendation from someone I met at a conference?

Is Vivien of the Wilds + Wilderness Reclamation a competitive combo?

Raising a bilingual kid. When should we introduce the majority language?

How can I introduce the names of fantasy creatures to the reader?

How to leave only the following strings?

Is my guitar’s action too high?

How to keep bees out of canned beverages?

"Destructive force" carried by a B-52?



How to measure the similarity between two images?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsSimilarity measure for ordered binary vectorsusers' percentile similarity measuresimilarity measuresimilarity measure with two featuresIs there a way to measure correlation between two similar datasets?How to compare performance of Cosine Similarity and Manhatten Distance?Cosine similarity between query and document confusionHow to create clusters based on sentence similarity?Similarity measure before and after dimensionality reduction or clusteringHow to measaure the similarity between two series?










2












$begingroup$


I have two group images for cat and dog. And each group contain 2000 images for cat and dog respectively.



My goal is try to cluster the images by using k-means.



Assume image1 is x, and image2 is y.Here we need to measure the similarity between any two images. what is the common way to measure between two images?










share|improve this question









$endgroup$











  • $begingroup$
    You can use Siamese Networks -> “Face Recognition from Scratch using Siamese Networks and TensorFlow” by Shubham Panchal link.medium.com/HT66TqDdCV . They output a similarity score which is binary.
    $endgroup$
    – Shubham Panchal
    Apr 5 at 2:15











  • $begingroup$
    Thanks, that is real cool method. any other traditional machine learning method? like no use neural network.
    $endgroup$
    – jason
    Apr 5 at 2:31















2












$begingroup$


I have two group images for cat and dog. And each group contain 2000 images for cat and dog respectively.



My goal is try to cluster the images by using k-means.



Assume image1 is x, and image2 is y.Here we need to measure the similarity between any two images. what is the common way to measure between two images?










share|improve this question









$endgroup$











  • $begingroup$
    You can use Siamese Networks -> “Face Recognition from Scratch using Siamese Networks and TensorFlow” by Shubham Panchal link.medium.com/HT66TqDdCV . They output a similarity score which is binary.
    $endgroup$
    – Shubham Panchal
    Apr 5 at 2:15











  • $begingroup$
    Thanks, that is real cool method. any other traditional machine learning method? like no use neural network.
    $endgroup$
    – jason
    Apr 5 at 2:31













2












2








2





$begingroup$


I have two group images for cat and dog. And each group contain 2000 images for cat and dog respectively.



My goal is try to cluster the images by using k-means.



Assume image1 is x, and image2 is y.Here we need to measure the similarity between any two images. what is the common way to measure between two images?










share|improve this question









$endgroup$




I have two group images for cat and dog. And each group contain 2000 images for cat and dog respectively.



My goal is try to cluster the images by using k-means.



Assume image1 is x, and image2 is y.Here we need to measure the similarity between any two images. what is the common way to measure between two images?







machine-learning k-means similarity image






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Apr 5 at 0:36









jasonjason

1444




1444











  • $begingroup$
    You can use Siamese Networks -> “Face Recognition from Scratch using Siamese Networks and TensorFlow” by Shubham Panchal link.medium.com/HT66TqDdCV . They output a similarity score which is binary.
    $endgroup$
    – Shubham Panchal
    Apr 5 at 2:15











  • $begingroup$
    Thanks, that is real cool method. any other traditional machine learning method? like no use neural network.
    $endgroup$
    – jason
    Apr 5 at 2:31
















  • $begingroup$
    You can use Siamese Networks -> “Face Recognition from Scratch using Siamese Networks and TensorFlow” by Shubham Panchal link.medium.com/HT66TqDdCV . They output a similarity score which is binary.
    $endgroup$
    – Shubham Panchal
    Apr 5 at 2:15











  • $begingroup$
    Thanks, that is real cool method. any other traditional machine learning method? like no use neural network.
    $endgroup$
    – jason
    Apr 5 at 2:31















$begingroup$
You can use Siamese Networks -> “Face Recognition from Scratch using Siamese Networks and TensorFlow” by Shubham Panchal link.medium.com/HT66TqDdCV . They output a similarity score which is binary.
$endgroup$
– Shubham Panchal
Apr 5 at 2:15





$begingroup$
You can use Siamese Networks -> “Face Recognition from Scratch using Siamese Networks and TensorFlow” by Shubham Panchal link.medium.com/HT66TqDdCV . They output a similarity score which is binary.
$endgroup$
– Shubham Panchal
Apr 5 at 2:15













$begingroup$
Thanks, that is real cool method. any other traditional machine learning method? like no use neural network.
$endgroup$
– jason
Apr 5 at 2:31




$begingroup$
Thanks, that is real cool method. any other traditional machine learning method? like no use neural network.
$endgroup$
– jason
Apr 5 at 2:31










1 Answer
1






active

oldest

votes


















4












$begingroup$

Check this handout!



Well, there a few so... lets go:



Given two images $J[x,y]$ and $I[x,y]$ with $(x,y) in N^N times M$...



A - Used in template matching:



Template Matching is linear and is not invariant to rotation (actually not even robust to it) but it is pretty simple and robust to noise such as the ones in photography taken with low illumination.



You can easily implement these using OpenCV Template Matching. Bellow there are mathematical equations defining some of the similarity measures (adapted for comparing 2 equal sized images) used by cv2.matchTemplate:



1 - Sum Square Difference



$$ S_sq = sum_(n,m) in N^M times N bigr(J[n,m] - I[n,m]bigl)^2$$



This can be normalized as
$$ fracS_sqsqrtsum J[n,m]^2 times sum I[n,m]^2 $$



2 - Cross-Correlation



$$ C_crr = sum_(n,m) in N^M times N bigr(J[n,m] times I[n,m]bigl)^2$$



This can be normalized as
$$ fracC_crrsqrtsum J[n,m]^2 times sum I[n,m]^2 $$



B - Image descriptors/feature detectors:



Many descriptors were developed for images, their main use is to register images/objects and search for them in other scenes. But, still they offer a lot of information about the image and were used in pupil detection (A joint cascaded framework for simultaneous eye detection and eye state estimation) and even seem it used for lip reading (can't direct you to it since I am not sure it was already published)



They detect points that can be considered as features in images (relevant points) the local texture of these points or even their geometrical position to each other can be used as features.



You can learn more about it in Stanford's Image Processing Classes (check handouts for classes 12,13 and 14, if you want to keep research on Computer vision I recomend you check the whole course and maybe Rich Radke classes on Digital Image Processing and Computer Vision for Visual Effects, there is a lot of information there that can be useful for this hard working computer vision style you're trying to take)



1 - SIFT and SURF:



These are Scale Invariant methods, SURF is a speed-up and open version of SIFT, SIFT is proprietary.



2 - BRIEF, BRISK and FAST:



These are binary descriptors and are really fast (mainly on processors with a pop_count instruction) and can be used in a similar way to SIFT and SURF. Also, I've used BRIEF features as substitutes on template matching for Facial Landmark Detection with high gain on speed and no loss on accuracy for both the IPD and the KIPD classifiers, although I didn't publish any of it yet (and this is just an incremental observation on the future articles so I don't think there is harm in sharing).



3 - Histogram of Oriented Gradients (HoG):



This is rotation invariant and is used for face detection...



C - Convolutional Neural Networks:



I know you don't want to used NN's but I think it is fair to point they are REALLY POWERFULL, training a CNN with Triplet Loss can be really nice for learning a representative feature space for clustering (and classification).



Check Wesley's GitHub for a example of it's power in facial recognition using Triplet Loss to get features and then SVM to classify.



Also, if your problem with Deep Learning is computational cost, you can easily find pre-trained layers with cats and dogs around.



D - Check on previous work:



This cats and dogs fight has been going on for a long time... you can check solutions on Kaggle Competitions (Forum and Kernels), there were 2 on cats and dogs This One and That One



E - Famous Measures:




  • SSIM Structural similarity Index

  • L2 Norm (Or Euclidean Distance)

  • Mahalanobis Distance

F - Check on other kind of features



  • Cats and dogs can be a easy to identify by their ears and nose... size too but I had cats as big as dogs... so not really that safe to use size.


  • But you can try segmenting the images into animals and background and then try to do region property analisys...


  • Also, check on this image similarity metrics toolkit page it is in C but...


  • Check this paper on image similarity


  • Take a look on this Stack Overflow question and this Research Gate one


  • If you have the time, this book here: Feature Extraction & Image Processing for Computer Vision from Mark S. Nixon have much information on this kind of procedure


  • You can try Fisher Discriminant Analysis and PCA to create a mapping and the evaluate with Mahalanobis Distance or L2 Norm






share|improve this answer









$endgroup$













    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48642%2fhow-to-measure-the-similarity-between-two-images%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4












    $begingroup$

    Check this handout!



    Well, there a few so... lets go:



    Given two images $J[x,y]$ and $I[x,y]$ with $(x,y) in N^N times M$...



    A - Used in template matching:



    Template Matching is linear and is not invariant to rotation (actually not even robust to it) but it is pretty simple and robust to noise such as the ones in photography taken with low illumination.



    You can easily implement these using OpenCV Template Matching. Bellow there are mathematical equations defining some of the similarity measures (adapted for comparing 2 equal sized images) used by cv2.matchTemplate:



    1 - Sum Square Difference



    $$ S_sq = sum_(n,m) in N^M times N bigr(J[n,m] - I[n,m]bigl)^2$$



    This can be normalized as
    $$ fracS_sqsqrtsum J[n,m]^2 times sum I[n,m]^2 $$



    2 - Cross-Correlation



    $$ C_crr = sum_(n,m) in N^M times N bigr(J[n,m] times I[n,m]bigl)^2$$



    This can be normalized as
    $$ fracC_crrsqrtsum J[n,m]^2 times sum I[n,m]^2 $$



    B - Image descriptors/feature detectors:



    Many descriptors were developed for images, their main use is to register images/objects and search for them in other scenes. But, still they offer a lot of information about the image and were used in pupil detection (A joint cascaded framework for simultaneous eye detection and eye state estimation) and even seem it used for lip reading (can't direct you to it since I am not sure it was already published)



    They detect points that can be considered as features in images (relevant points) the local texture of these points or even their geometrical position to each other can be used as features.



    You can learn more about it in Stanford's Image Processing Classes (check handouts for classes 12,13 and 14, if you want to keep research on Computer vision I recomend you check the whole course and maybe Rich Radke classes on Digital Image Processing and Computer Vision for Visual Effects, there is a lot of information there that can be useful for this hard working computer vision style you're trying to take)



    1 - SIFT and SURF:



    These are Scale Invariant methods, SURF is a speed-up and open version of SIFT, SIFT is proprietary.



    2 - BRIEF, BRISK and FAST:



    These are binary descriptors and are really fast (mainly on processors with a pop_count instruction) and can be used in a similar way to SIFT and SURF. Also, I've used BRIEF features as substitutes on template matching for Facial Landmark Detection with high gain on speed and no loss on accuracy for both the IPD and the KIPD classifiers, although I didn't publish any of it yet (and this is just an incremental observation on the future articles so I don't think there is harm in sharing).



    3 - Histogram of Oriented Gradients (HoG):



    This is rotation invariant and is used for face detection...



    C - Convolutional Neural Networks:



    I know you don't want to used NN's but I think it is fair to point they are REALLY POWERFULL, training a CNN with Triplet Loss can be really nice for learning a representative feature space for clustering (and classification).



    Check Wesley's GitHub for a example of it's power in facial recognition using Triplet Loss to get features and then SVM to classify.



    Also, if your problem with Deep Learning is computational cost, you can easily find pre-trained layers with cats and dogs around.



    D - Check on previous work:



    This cats and dogs fight has been going on for a long time... you can check solutions on Kaggle Competitions (Forum and Kernels), there were 2 on cats and dogs This One and That One



    E - Famous Measures:




    • SSIM Structural similarity Index

    • L2 Norm (Or Euclidean Distance)

    • Mahalanobis Distance

    F - Check on other kind of features



    • Cats and dogs can be a easy to identify by their ears and nose... size too but I had cats as big as dogs... so not really that safe to use size.


    • But you can try segmenting the images into animals and background and then try to do region property analisys...


    • Also, check on this image similarity metrics toolkit page it is in C but...


    • Check this paper on image similarity


    • Take a look on this Stack Overflow question and this Research Gate one


    • If you have the time, this book here: Feature Extraction & Image Processing for Computer Vision from Mark S. Nixon have much information on this kind of procedure


    • You can try Fisher Discriminant Analysis and PCA to create a mapping and the evaluate with Mahalanobis Distance or L2 Norm






    share|improve this answer









    $endgroup$

















      4












      $begingroup$

      Check this handout!



      Well, there a few so... lets go:



      Given two images $J[x,y]$ and $I[x,y]$ with $(x,y) in N^N times M$...



      A - Used in template matching:



      Template Matching is linear and is not invariant to rotation (actually not even robust to it) but it is pretty simple and robust to noise such as the ones in photography taken with low illumination.



      You can easily implement these using OpenCV Template Matching. Bellow there are mathematical equations defining some of the similarity measures (adapted for comparing 2 equal sized images) used by cv2.matchTemplate:



      1 - Sum Square Difference



      $$ S_sq = sum_(n,m) in N^M times N bigr(J[n,m] - I[n,m]bigl)^2$$



      This can be normalized as
      $$ fracS_sqsqrtsum J[n,m]^2 times sum I[n,m]^2 $$



      2 - Cross-Correlation



      $$ C_crr = sum_(n,m) in N^M times N bigr(J[n,m] times I[n,m]bigl)^2$$



      This can be normalized as
      $$ fracC_crrsqrtsum J[n,m]^2 times sum I[n,m]^2 $$



      B - Image descriptors/feature detectors:



      Many descriptors were developed for images, their main use is to register images/objects and search for them in other scenes. But, still they offer a lot of information about the image and were used in pupil detection (A joint cascaded framework for simultaneous eye detection and eye state estimation) and even seem it used for lip reading (can't direct you to it since I am not sure it was already published)



      They detect points that can be considered as features in images (relevant points) the local texture of these points or even their geometrical position to each other can be used as features.



      You can learn more about it in Stanford's Image Processing Classes (check handouts for classes 12,13 and 14, if you want to keep research on Computer vision I recomend you check the whole course and maybe Rich Radke classes on Digital Image Processing and Computer Vision for Visual Effects, there is a lot of information there that can be useful for this hard working computer vision style you're trying to take)



      1 - SIFT and SURF:



      These are Scale Invariant methods, SURF is a speed-up and open version of SIFT, SIFT is proprietary.



      2 - BRIEF, BRISK and FAST:



      These are binary descriptors and are really fast (mainly on processors with a pop_count instruction) and can be used in a similar way to SIFT and SURF. Also, I've used BRIEF features as substitutes on template matching for Facial Landmark Detection with high gain on speed and no loss on accuracy for both the IPD and the KIPD classifiers, although I didn't publish any of it yet (and this is just an incremental observation on the future articles so I don't think there is harm in sharing).



      3 - Histogram of Oriented Gradients (HoG):



      This is rotation invariant and is used for face detection...



      C - Convolutional Neural Networks:



      I know you don't want to used NN's but I think it is fair to point they are REALLY POWERFULL, training a CNN with Triplet Loss can be really nice for learning a representative feature space for clustering (and classification).



      Check Wesley's GitHub for a example of it's power in facial recognition using Triplet Loss to get features and then SVM to classify.



      Also, if your problem with Deep Learning is computational cost, you can easily find pre-trained layers with cats and dogs around.



      D - Check on previous work:



      This cats and dogs fight has been going on for a long time... you can check solutions on Kaggle Competitions (Forum and Kernels), there were 2 on cats and dogs This One and That One



      E - Famous Measures:




      • SSIM Structural similarity Index

      • L2 Norm (Or Euclidean Distance)

      • Mahalanobis Distance

      F - Check on other kind of features



      • Cats and dogs can be a easy to identify by their ears and nose... size too but I had cats as big as dogs... so not really that safe to use size.


      • But you can try segmenting the images into animals and background and then try to do region property analisys...


      • Also, check on this image similarity metrics toolkit page it is in C but...


      • Check this paper on image similarity


      • Take a look on this Stack Overflow question and this Research Gate one


      • If you have the time, this book here: Feature Extraction & Image Processing for Computer Vision from Mark S. Nixon have much information on this kind of procedure


      • You can try Fisher Discriminant Analysis and PCA to create a mapping and the evaluate with Mahalanobis Distance or L2 Norm






      share|improve this answer









      $endgroup$















        4












        4








        4





        $begingroup$

        Check this handout!



        Well, there a few so... lets go:



        Given two images $J[x,y]$ and $I[x,y]$ with $(x,y) in N^N times M$...



        A - Used in template matching:



        Template Matching is linear and is not invariant to rotation (actually not even robust to it) but it is pretty simple and robust to noise such as the ones in photography taken with low illumination.



        You can easily implement these using OpenCV Template Matching. Bellow there are mathematical equations defining some of the similarity measures (adapted for comparing 2 equal sized images) used by cv2.matchTemplate:



        1 - Sum Square Difference



        $$ S_sq = sum_(n,m) in N^M times N bigr(J[n,m] - I[n,m]bigl)^2$$



        This can be normalized as
        $$ fracS_sqsqrtsum J[n,m]^2 times sum I[n,m]^2 $$



        2 - Cross-Correlation



        $$ C_crr = sum_(n,m) in N^M times N bigr(J[n,m] times I[n,m]bigl)^2$$



        This can be normalized as
        $$ fracC_crrsqrtsum J[n,m]^2 times sum I[n,m]^2 $$



        B - Image descriptors/feature detectors:



        Many descriptors were developed for images, their main use is to register images/objects and search for them in other scenes. But, still they offer a lot of information about the image and were used in pupil detection (A joint cascaded framework for simultaneous eye detection and eye state estimation) and even seem it used for lip reading (can't direct you to it since I am not sure it was already published)



        They detect points that can be considered as features in images (relevant points) the local texture of these points or even their geometrical position to each other can be used as features.



        You can learn more about it in Stanford's Image Processing Classes (check handouts for classes 12,13 and 14, if you want to keep research on Computer vision I recomend you check the whole course and maybe Rich Radke classes on Digital Image Processing and Computer Vision for Visual Effects, there is a lot of information there that can be useful for this hard working computer vision style you're trying to take)



        1 - SIFT and SURF:



        These are Scale Invariant methods, SURF is a speed-up and open version of SIFT, SIFT is proprietary.



        2 - BRIEF, BRISK and FAST:



        These are binary descriptors and are really fast (mainly on processors with a pop_count instruction) and can be used in a similar way to SIFT and SURF. Also, I've used BRIEF features as substitutes on template matching for Facial Landmark Detection with high gain on speed and no loss on accuracy for both the IPD and the KIPD classifiers, although I didn't publish any of it yet (and this is just an incremental observation on the future articles so I don't think there is harm in sharing).



        3 - Histogram of Oriented Gradients (HoG):



        This is rotation invariant and is used for face detection...



        C - Convolutional Neural Networks:



        I know you don't want to used NN's but I think it is fair to point they are REALLY POWERFULL, training a CNN with Triplet Loss can be really nice for learning a representative feature space for clustering (and classification).



        Check Wesley's GitHub for a example of it's power in facial recognition using Triplet Loss to get features and then SVM to classify.



        Also, if your problem with Deep Learning is computational cost, you can easily find pre-trained layers with cats and dogs around.



        D - Check on previous work:



        This cats and dogs fight has been going on for a long time... you can check solutions on Kaggle Competitions (Forum and Kernels), there were 2 on cats and dogs This One and That One



        E - Famous Measures:




        • SSIM Structural similarity Index

        • L2 Norm (Or Euclidean Distance)

        • Mahalanobis Distance

        F - Check on other kind of features



        • Cats and dogs can be a easy to identify by their ears and nose... size too but I had cats as big as dogs... so not really that safe to use size.


        • But you can try segmenting the images into animals and background and then try to do region property analisys...


        • Also, check on this image similarity metrics toolkit page it is in C but...


        • Check this paper on image similarity


        • Take a look on this Stack Overflow question and this Research Gate one


        • If you have the time, this book here: Feature Extraction & Image Processing for Computer Vision from Mark S. Nixon have much information on this kind of procedure


        • You can try Fisher Discriminant Analysis and PCA to create a mapping and the evaluate with Mahalanobis Distance or L2 Norm






        share|improve this answer









        $endgroup$



        Check this handout!



        Well, there a few so... lets go:



        Given two images $J[x,y]$ and $I[x,y]$ with $(x,y) in N^N times M$...



        A - Used in template matching:



        Template Matching is linear and is not invariant to rotation (actually not even robust to it) but it is pretty simple and robust to noise such as the ones in photography taken with low illumination.



        You can easily implement these using OpenCV Template Matching. Bellow there are mathematical equations defining some of the similarity measures (adapted for comparing 2 equal sized images) used by cv2.matchTemplate:



        1 - Sum Square Difference



        $$ S_sq = sum_(n,m) in N^M times N bigr(J[n,m] - I[n,m]bigl)^2$$



        This can be normalized as
        $$ fracS_sqsqrtsum J[n,m]^2 times sum I[n,m]^2 $$



        2 - Cross-Correlation



        $$ C_crr = sum_(n,m) in N^M times N bigr(J[n,m] times I[n,m]bigl)^2$$



        This can be normalized as
        $$ fracC_crrsqrtsum J[n,m]^2 times sum I[n,m]^2 $$



        B - Image descriptors/feature detectors:



        Many descriptors were developed for images, their main use is to register images/objects and search for them in other scenes. But, still they offer a lot of information about the image and were used in pupil detection (A joint cascaded framework for simultaneous eye detection and eye state estimation) and even seem it used for lip reading (can't direct you to it since I am not sure it was already published)



        They detect points that can be considered as features in images (relevant points) the local texture of these points or even their geometrical position to each other can be used as features.



        You can learn more about it in Stanford's Image Processing Classes (check handouts for classes 12,13 and 14, if you want to keep research on Computer vision I recomend you check the whole course and maybe Rich Radke classes on Digital Image Processing and Computer Vision for Visual Effects, there is a lot of information there that can be useful for this hard working computer vision style you're trying to take)



        1 - SIFT and SURF:



        These are Scale Invariant methods, SURF is a speed-up and open version of SIFT, SIFT is proprietary.



        2 - BRIEF, BRISK and FAST:



        These are binary descriptors and are really fast (mainly on processors with a pop_count instruction) and can be used in a similar way to SIFT and SURF. Also, I've used BRIEF features as substitutes on template matching for Facial Landmark Detection with high gain on speed and no loss on accuracy for both the IPD and the KIPD classifiers, although I didn't publish any of it yet (and this is just an incremental observation on the future articles so I don't think there is harm in sharing).



        3 - Histogram of Oriented Gradients (HoG):



        This is rotation invariant and is used for face detection...



        C - Convolutional Neural Networks:



        I know you don't want to used NN's but I think it is fair to point they are REALLY POWERFULL, training a CNN with Triplet Loss can be really nice for learning a representative feature space for clustering (and classification).



        Check Wesley's GitHub for a example of it's power in facial recognition using Triplet Loss to get features and then SVM to classify.



        Also, if your problem with Deep Learning is computational cost, you can easily find pre-trained layers with cats and dogs around.



        D - Check on previous work:



        This cats and dogs fight has been going on for a long time... you can check solutions on Kaggle Competitions (Forum and Kernels), there were 2 on cats and dogs This One and That One



        E - Famous Measures:




        • SSIM Structural similarity Index

        • L2 Norm (Or Euclidean Distance)

        • Mahalanobis Distance

        F - Check on other kind of features



        • Cats and dogs can be a easy to identify by their ears and nose... size too but I had cats as big as dogs... so not really that safe to use size.


        • But you can try segmenting the images into animals and background and then try to do region property analisys...


        • Also, check on this image similarity metrics toolkit page it is in C but...


        • Check this paper on image similarity


        • Take a look on this Stack Overflow question and this Research Gate one


        • If you have the time, this book here: Feature Extraction & Image Processing for Computer Vision from Mark S. Nixon have much information on this kind of procedure


        • You can try Fisher Discriminant Analysis and PCA to create a mapping and the evaluate with Mahalanobis Distance or L2 Norm







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Apr 5 at 4:11









        Pedro Henrique MonfortePedro Henrique Monforte

        559218




        559218



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48642%2fhow-to-measure-the-similarity-between-two-images%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

            Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

            Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?