How to compare the performance of two unsupervised algorithms on same data-set?Clustering with Replicator Neural NetworkH2o autoencoder anomaly detection for multivariate time series datahow to compare different sets of time series dataUnsupervised Anomaly Detection in ImagesHow would I apply anomaly detection to time series data in LSTM?Anomaly detection on time seriesAnomaly detection in nominal big dataAnomaly Detection: Model Creation & ImplementationVariable Importance in unsupervised anomaly detection algorithms

Is it better practice to read straight from sheet music rather than memorize it?

What should you do when eye contact makes your subordinate uncomfortable?

A book (mid 2000s) about a kid getting an old laptop that connects him to aliens, the alien government (I think) coming after him?

Count the occurrence of each unique word in the file

Varistor? Purpose and principle

Calculating Wattage for Resistor in High Frequency Application?

By means of an example, show that P(A) + P(B) = 1 does not mean that B is the complement of A.

What is the difference between "Do you interest" and "...interested in" something?

What should you do if you miss a job interview (deliberately)?

Why does Async/Await work properly when the loop is inside the async function and not the other way around?

Closed-form expression for certain product

If an object with more mass experiences a greater gravitational force, why don't more massive objects fall faster?

Does the expansion of the universe explain why the universe doesn't collapse?

Did US corporations pay demonstrators in the German demonstrations against article 13?

Why did the HMS Bounty go back to a time when whales are already rare?

Freedom of speech and where it applies

Does an advisor owe his/her student anything? Will an advisor keep a PhD student only out of pity?

What spells are affected by the size of the caster?

Is a model fitted to data or is data fitted to a model?

Could Intel SGX be dangerous under Linux?

How should I respond when I lied about my education and the company finds out through background check?

Why did the EU agree to delay the Brexit deadline?

Customize circled numbers

Should I stop contributing to retirement accounts?



How to compare the performance of two unsupervised algorithms on same data-set?


Clustering with Replicator Neural NetworkH2o autoencoder anomaly detection for multivariate time series datahow to compare different sets of time series dataUnsupervised Anomaly Detection in ImagesHow would I apply anomaly detection to time series data in LSTM?Anomaly detection on time seriesAnomaly detection in nominal big dataAnomaly Detection: Model Creation & ImplementationVariable Importance in unsupervised anomaly detection algorithms













5












$begingroup$


I want to solve an anomaly detection problem on an unlabeled data-set. The only information about this problem is that the anomalies population is lower than 0.1%. It should be notice that the size of the feature vector for each sample is 40. Is there any clear way to compare the performance of unsupervised algorithms?










share|improve this question











$endgroup$











  • $begingroup$
    How do you measure performance of single model?
    $endgroup$
    – mikalai
    2 days ago










  • $begingroup$
    @mikalai It is exactly what I have asked
    $endgroup$
    – Alireza Zolanvari
    2 days ago















5












$begingroup$


I want to solve an anomaly detection problem on an unlabeled data-set. The only information about this problem is that the anomalies population is lower than 0.1%. It should be notice that the size of the feature vector for each sample is 40. Is there any clear way to compare the performance of unsupervised algorithms?










share|improve this question











$endgroup$











  • $begingroup$
    How do you measure performance of single model?
    $endgroup$
    – mikalai
    2 days ago










  • $begingroup$
    @mikalai It is exactly what I have asked
    $endgroup$
    – Alireza Zolanvari
    2 days ago













5












5








5





$begingroup$


I want to solve an anomaly detection problem on an unlabeled data-set. The only information about this problem is that the anomalies population is lower than 0.1%. It should be notice that the size of the feature vector for each sample is 40. Is there any clear way to compare the performance of unsupervised algorithms?










share|improve this question











$endgroup$




I want to solve an anomaly detection problem on an unlabeled data-set. The only information about this problem is that the anomalies population is lower than 0.1%. It should be notice that the size of the feature vector for each sample is 40. Is there any clear way to compare the performance of unsupervised algorithms?







unsupervised-learning anomaly-detection unbalanced-classes evaluation






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 20 at 11:07







Alireza Zolanvari

















asked Mar 20 at 9:52









Alireza ZolanvariAlireza Zolanvari

35716




35716











  • $begingroup$
    How do you measure performance of single model?
    $endgroup$
    – mikalai
    2 days ago










  • $begingroup$
    @mikalai It is exactly what I have asked
    $endgroup$
    – Alireza Zolanvari
    2 days ago
















  • $begingroup$
    How do you measure performance of single model?
    $endgroup$
    – mikalai
    2 days ago










  • $begingroup$
    @mikalai It is exactly what I have asked
    $endgroup$
    – Alireza Zolanvari
    2 days ago















$begingroup$
How do you measure performance of single model?
$endgroup$
– mikalai
2 days ago




$begingroup$
How do you measure performance of single model?
$endgroup$
– mikalai
2 days ago












$begingroup$
@mikalai It is exactly what I have asked
$endgroup$
– Alireza Zolanvari
2 days ago




$begingroup$
@mikalai It is exactly what I have asked
$endgroup$
– Alireza Zolanvari
2 days ago










1 Answer
1






active

oldest

votes


















1












$begingroup$

For unlabeled data-sets, unsupervised anomaly detectors can be compared either subjectively or objectively.




  1. Subjective comparison: based on our domain-knowledge and by using some visualizations and statistics, we can compare two detectors and select the one that outputs better anomalies subjectively.



    1. Here is a well-cited survey on unsupervised anomaly detectors that compares the algorithms on labeled data-sets (with known, domain-specific outliers) using AUC, and concludes that local detectors (such as LOF,
      COF, INFLO and LoOP) are not good candidates for global anomaly detection:
      2016 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data


  2. Objective comparison: possible in theory, impossible in practice.


Requirements for objective comparison:



  1. Anomaly definition: $x$ is an anomaly if $P(x)< t$ for some threshold $t$,


  2. Anomaly detector requirement: $D$ is an anomaly detector if for every detected $x$, $P(x)< t$,


  3. Comparing anomalies: $x_1$ is more anomalous than $x_2$ if $P(x_1)<P(x_2)$ or equivalently $r(x_1, x_2) = P(x_1) / P(x_2) < 1$,


  4. Comparing anomaly detectors: proposal $x_1$ from detector $D_1$ is better than $x_2$ from $D_2$ if $r(x_1, x_2) < 1$,


As you can see, for qualification and comparison of two detectors we need to know $P(x)$ or at least $r(x_1, x_2)$. But if we know these quantities (which act as a judge $J$) or at least a close enough estimation of them, we have a better anomaly detector $J$ and can throw $D_1$ and $D_2$ away! We plug any observation $x$ or pair of observations $x_1$ and $x_2$ into $J$ and check which one is an anomaly or which one is more anomalous, done! So it is impossible to compare two anomaly detectors objectively unless we have a better anomaly detector (judge). So we should use a subjective comparison.






share|improve this answer











$endgroup$












  • $begingroup$
    Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
    $endgroup$
    – Alireza Zolanvari
    Mar 20 at 11:04










Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47658%2fhow-to-compare-the-performance-of-two-unsupervised-algorithms-on-same-data-set%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1












$begingroup$

For unlabeled data-sets, unsupervised anomaly detectors can be compared either subjectively or objectively.




  1. Subjective comparison: based on our domain-knowledge and by using some visualizations and statistics, we can compare two detectors and select the one that outputs better anomalies subjectively.



    1. Here is a well-cited survey on unsupervised anomaly detectors that compares the algorithms on labeled data-sets (with known, domain-specific outliers) using AUC, and concludes that local detectors (such as LOF,
      COF, INFLO and LoOP) are not good candidates for global anomaly detection:
      2016 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data


  2. Objective comparison: possible in theory, impossible in practice.


Requirements for objective comparison:



  1. Anomaly definition: $x$ is an anomaly if $P(x)< t$ for some threshold $t$,


  2. Anomaly detector requirement: $D$ is an anomaly detector if for every detected $x$, $P(x)< t$,


  3. Comparing anomalies: $x_1$ is more anomalous than $x_2$ if $P(x_1)<P(x_2)$ or equivalently $r(x_1, x_2) = P(x_1) / P(x_2) < 1$,


  4. Comparing anomaly detectors: proposal $x_1$ from detector $D_1$ is better than $x_2$ from $D_2$ if $r(x_1, x_2) < 1$,


As you can see, for qualification and comparison of two detectors we need to know $P(x)$ or at least $r(x_1, x_2)$. But if we know these quantities (which act as a judge $J$) or at least a close enough estimation of them, we have a better anomaly detector $J$ and can throw $D_1$ and $D_2$ away! We plug any observation $x$ or pair of observations $x_1$ and $x_2$ into $J$ and check which one is an anomaly or which one is more anomalous, done! So it is impossible to compare two anomaly detectors objectively unless we have a better anomaly detector (judge). So we should use a subjective comparison.






share|improve this answer











$endgroup$












  • $begingroup$
    Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
    $endgroup$
    – Alireza Zolanvari
    Mar 20 at 11:04















1












$begingroup$

For unlabeled data-sets, unsupervised anomaly detectors can be compared either subjectively or objectively.




  1. Subjective comparison: based on our domain-knowledge and by using some visualizations and statistics, we can compare two detectors and select the one that outputs better anomalies subjectively.



    1. Here is a well-cited survey on unsupervised anomaly detectors that compares the algorithms on labeled data-sets (with known, domain-specific outliers) using AUC, and concludes that local detectors (such as LOF,
      COF, INFLO and LoOP) are not good candidates for global anomaly detection:
      2016 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data


  2. Objective comparison: possible in theory, impossible in practice.


Requirements for objective comparison:



  1. Anomaly definition: $x$ is an anomaly if $P(x)< t$ for some threshold $t$,


  2. Anomaly detector requirement: $D$ is an anomaly detector if for every detected $x$, $P(x)< t$,


  3. Comparing anomalies: $x_1$ is more anomalous than $x_2$ if $P(x_1)<P(x_2)$ or equivalently $r(x_1, x_2) = P(x_1) / P(x_2) < 1$,


  4. Comparing anomaly detectors: proposal $x_1$ from detector $D_1$ is better than $x_2$ from $D_2$ if $r(x_1, x_2) < 1$,


As you can see, for qualification and comparison of two detectors we need to know $P(x)$ or at least $r(x_1, x_2)$. But if we know these quantities (which act as a judge $J$) or at least a close enough estimation of them, we have a better anomaly detector $J$ and can throw $D_1$ and $D_2$ away! We plug any observation $x$ or pair of observations $x_1$ and $x_2$ into $J$ and check which one is an anomaly or which one is more anomalous, done! So it is impossible to compare two anomaly detectors objectively unless we have a better anomaly detector (judge). So we should use a subjective comparison.






share|improve this answer











$endgroup$












  • $begingroup$
    Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
    $endgroup$
    – Alireza Zolanvari
    Mar 20 at 11:04













1












1








1





$begingroup$

For unlabeled data-sets, unsupervised anomaly detectors can be compared either subjectively or objectively.




  1. Subjective comparison: based on our domain-knowledge and by using some visualizations and statistics, we can compare two detectors and select the one that outputs better anomalies subjectively.



    1. Here is a well-cited survey on unsupervised anomaly detectors that compares the algorithms on labeled data-sets (with known, domain-specific outliers) using AUC, and concludes that local detectors (such as LOF,
      COF, INFLO and LoOP) are not good candidates for global anomaly detection:
      2016 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data


  2. Objective comparison: possible in theory, impossible in practice.


Requirements for objective comparison:



  1. Anomaly definition: $x$ is an anomaly if $P(x)< t$ for some threshold $t$,


  2. Anomaly detector requirement: $D$ is an anomaly detector if for every detected $x$, $P(x)< t$,


  3. Comparing anomalies: $x_1$ is more anomalous than $x_2$ if $P(x_1)<P(x_2)$ or equivalently $r(x_1, x_2) = P(x_1) / P(x_2) < 1$,


  4. Comparing anomaly detectors: proposal $x_1$ from detector $D_1$ is better than $x_2$ from $D_2$ if $r(x_1, x_2) < 1$,


As you can see, for qualification and comparison of two detectors we need to know $P(x)$ or at least $r(x_1, x_2)$. But if we know these quantities (which act as a judge $J$) or at least a close enough estimation of them, we have a better anomaly detector $J$ and can throw $D_1$ and $D_2$ away! We plug any observation $x$ or pair of observations $x_1$ and $x_2$ into $J$ and check which one is an anomaly or which one is more anomalous, done! So it is impossible to compare two anomaly detectors objectively unless we have a better anomaly detector (judge). So we should use a subjective comparison.






share|improve this answer











$endgroup$



For unlabeled data-sets, unsupervised anomaly detectors can be compared either subjectively or objectively.




  1. Subjective comparison: based on our domain-knowledge and by using some visualizations and statistics, we can compare two detectors and select the one that outputs better anomalies subjectively.



    1. Here is a well-cited survey on unsupervised anomaly detectors that compares the algorithms on labeled data-sets (with known, domain-specific outliers) using AUC, and concludes that local detectors (such as LOF,
      COF, INFLO and LoOP) are not good candidates for global anomaly detection:
      2016 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data


  2. Objective comparison: possible in theory, impossible in practice.


Requirements for objective comparison:



  1. Anomaly definition: $x$ is an anomaly if $P(x)< t$ for some threshold $t$,


  2. Anomaly detector requirement: $D$ is an anomaly detector if for every detected $x$, $P(x)< t$,


  3. Comparing anomalies: $x_1$ is more anomalous than $x_2$ if $P(x_1)<P(x_2)$ or equivalently $r(x_1, x_2) = P(x_1) / P(x_2) < 1$,


  4. Comparing anomaly detectors: proposal $x_1$ from detector $D_1$ is better than $x_2$ from $D_2$ if $r(x_1, x_2) < 1$,


As you can see, for qualification and comparison of two detectors we need to know $P(x)$ or at least $r(x_1, x_2)$. But if we know these quantities (which act as a judge $J$) or at least a close enough estimation of them, we have a better anomaly detector $J$ and can throw $D_1$ and $D_2$ away! We plug any observation $x$ or pair of observations $x_1$ and $x_2$ into $J$ and check which one is an anomaly or which one is more anomalous, done! So it is impossible to compare two anomaly detectors objectively unless we have a better anomaly detector (judge). So we should use a subjective comparison.







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 20 at 14:21

























answered Mar 20 at 10:57









EsmailianEsmailian

1,696115




1,696115











  • $begingroup$
    Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
    $endgroup$
    – Alireza Zolanvari
    Mar 20 at 11:04
















  • $begingroup$
    Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
    $endgroup$
    – Alireza Zolanvari
    Mar 20 at 11:04















$begingroup$
Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
$endgroup$
– Alireza Zolanvari
Mar 20 at 11:04




$begingroup$
Please check the question update. Each sample has about 40 features and subjective comparison is not very practical.
$endgroup$
– Alireza Zolanvari
Mar 20 at 11:04

















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47658%2fhow-to-compare-the-performance-of-two-unsupervised-algorithms-on-same-data-set%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High