Clustering of multi-label data The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsGeneric strategy for object detectionMulti-label text classification with minimum confidence thresholdClustering objects defined by vectorHow to use binary relevance for multi-label text classification?How can I perform multi-label classification if many labels are missing?Array of categorical variables vs one-hot encodingHow to add a new label to a multi-label dataset (like Open Images)How do machine learning models (e.g. neural networks) get better over time from new data?Keras: apply masking to non-sequential dataHow to correctly perform data sampling for train/test split in multi-label dataset?

Am I ethically obligated to go into work on an off day if the reason is sudden?

What information about me do stores get via my credit card?

"... to apply for a visa" or "... and applied for a visa"?

system call string length limit

What aspect of planet Earth must be changed to prevent the industrial revolution?

University's motivation for having tenure-track positions

How do you keep chess fun when your opponent constantly beats you?

How did the audience guess the pentatonic scale in Bobby McFerrin's presentation?

How many people can fit inside Mordenkainen's Magnificent Mansion?

Are spiders unable to hurt humans, especially very small spiders?

What was the last x86 CPU that did not have the x87 floating-point unit built in?

Can a 1st-level character have an ability score above 18?

Why can't wing-mounted spoilers be used to steepen approaches?

Can the DM override racial traits?

A pet rabbit called Belle

How to delete random line from file using Unix command?

What's the point in a preamp?

He got a vote 80% that of Emmanuel Macron’s

How to pronounce 1ターン?

Arduino Pro Micro - switch off LEDs

How did passengers keep warm on sail ships?

What can I do if neighbor is blocking my solar panels intentionally?

Converting from Markdown-with-biblatex-commands to LaTeX

Did the UK government pay "millions and millions of dollars" to try to snag Julian Assange?



Clustering of multi-label data



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsGeneric strategy for object detectionMulti-label text classification with minimum confidence thresholdClustering objects defined by vectorHow to use binary relevance for multi-label text classification?How can I perform multi-label classification if many labels are missing?Array of categorical variables vs one-hot encodingHow to add a new label to a multi-label dataset (like Open Images)How do machine learning models (e.g. neural networks) get better over time from new data?Keras: apply masking to non-sequential dataHow to correctly perform data sampling for train/test split in multi-label dataset?










0












$begingroup$


The dataset consists of



1) a set of objects and



2) a set of labels, which are used to describe the objects.



For the moment, for simplicity sake, each label can be marked as either true or false (In a more complex setup, each label will have a value of 1-10).



But, not all the labels are actually applied to all the objects (in principle, all the labels can and should be applied across all the objects, but in practice, they just are not). Also, when a label isn't applied to an object, one cannot simply assume that the label's value for that particular is false. Therefore, the missing labels will be ignored in the model.



I need to cluster the objects based on their labels.



Any tips on how and what algorithms to use will be appreciated.










share|improve this question











$endgroup$







  • 1




    $begingroup$
    First you need to decide whether you want to do clustering (ignore the labels?) or classification (predict missing labels).
    $endgroup$
    – Anony-Mousse
    Mar 31 at 7:13










  • $begingroup$
    Ignore the missing labels. Wrongly predicted missing labels can mess things up.
    $endgroup$
    – Yogesch
    Mar 31 at 7:42






  • 1




    $begingroup$
    That sounds pretty much like the standard setup of recommender systems?
    $endgroup$
    – Anony-Mousse
    Mar 31 at 10:40










  • $begingroup$
    Ok, maybe... At first look, the crux to any sort of clustering in a recommendation system is to be able to define a "distance" metric between arbitrary points (objects). For each point/object, I have a set L1, L2, ... Ln where Ln can be 0 or 1, or na. So now how do I invent this "distance" metric in a consistent/coherent way? Should that be another question? Sorry, I'm yet to figure out what's a trivial question and what's a serious question in the datascience business.
    $endgroup$
    – Yogesch
    Mar 31 at 15:57






  • 1




    $begingroup$
    Consider each label to be a user!
    $endgroup$
    – Anony-Mousse
    Apr 1 at 5:42















0












$begingroup$


The dataset consists of



1) a set of objects and



2) a set of labels, which are used to describe the objects.



For the moment, for simplicity sake, each label can be marked as either true or false (In a more complex setup, each label will have a value of 1-10).



But, not all the labels are actually applied to all the objects (in principle, all the labels can and should be applied across all the objects, but in practice, they just are not). Also, when a label isn't applied to an object, one cannot simply assume that the label's value for that particular is false. Therefore, the missing labels will be ignored in the model.



I need to cluster the objects based on their labels.



Any tips on how and what algorithms to use will be appreciated.










share|improve this question











$endgroup$







  • 1




    $begingroup$
    First you need to decide whether you want to do clustering (ignore the labels?) or classification (predict missing labels).
    $endgroup$
    – Anony-Mousse
    Mar 31 at 7:13










  • $begingroup$
    Ignore the missing labels. Wrongly predicted missing labels can mess things up.
    $endgroup$
    – Yogesch
    Mar 31 at 7:42






  • 1




    $begingroup$
    That sounds pretty much like the standard setup of recommender systems?
    $endgroup$
    – Anony-Mousse
    Mar 31 at 10:40










  • $begingroup$
    Ok, maybe... At first look, the crux to any sort of clustering in a recommendation system is to be able to define a "distance" metric between arbitrary points (objects). For each point/object, I have a set L1, L2, ... Ln where Ln can be 0 or 1, or na. So now how do I invent this "distance" metric in a consistent/coherent way? Should that be another question? Sorry, I'm yet to figure out what's a trivial question and what's a serious question in the datascience business.
    $endgroup$
    – Yogesch
    Mar 31 at 15:57






  • 1




    $begingroup$
    Consider each label to be a user!
    $endgroup$
    – Anony-Mousse
    Apr 1 at 5:42













0












0








0





$begingroup$


The dataset consists of



1) a set of objects and



2) a set of labels, which are used to describe the objects.



For the moment, for simplicity sake, each label can be marked as either true or false (In a more complex setup, each label will have a value of 1-10).



But, not all the labels are actually applied to all the objects (in principle, all the labels can and should be applied across all the objects, but in practice, they just are not). Also, when a label isn't applied to an object, one cannot simply assume that the label's value for that particular is false. Therefore, the missing labels will be ignored in the model.



I need to cluster the objects based on their labels.



Any tips on how and what algorithms to use will be appreciated.










share|improve this question











$endgroup$




The dataset consists of



1) a set of objects and



2) a set of labels, which are used to describe the objects.



For the moment, for simplicity sake, each label can be marked as either true or false (In a more complex setup, each label will have a value of 1-10).



But, not all the labels are actually applied to all the objects (in principle, all the labels can and should be applied across all the objects, but in practice, they just are not). Also, when a label isn't applied to an object, one cannot simply assume that the label's value for that particular is false. Therefore, the missing labels will be ignored in the model.



I need to cluster the objects based on their labels.



Any tips on how and what algorithms to use will be appreciated.







classification clustering multilabel-classification labels






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 31 at 8:41









Damini Jain

1136




1136










asked Mar 31 at 5:54









YogeschYogesch

1013




1013







  • 1




    $begingroup$
    First you need to decide whether you want to do clustering (ignore the labels?) or classification (predict missing labels).
    $endgroup$
    – Anony-Mousse
    Mar 31 at 7:13










  • $begingroup$
    Ignore the missing labels. Wrongly predicted missing labels can mess things up.
    $endgroup$
    – Yogesch
    Mar 31 at 7:42






  • 1




    $begingroup$
    That sounds pretty much like the standard setup of recommender systems?
    $endgroup$
    – Anony-Mousse
    Mar 31 at 10:40










  • $begingroup$
    Ok, maybe... At first look, the crux to any sort of clustering in a recommendation system is to be able to define a "distance" metric between arbitrary points (objects). For each point/object, I have a set L1, L2, ... Ln where Ln can be 0 or 1, or na. So now how do I invent this "distance" metric in a consistent/coherent way? Should that be another question? Sorry, I'm yet to figure out what's a trivial question and what's a serious question in the datascience business.
    $endgroup$
    – Yogesch
    Mar 31 at 15:57






  • 1




    $begingroup$
    Consider each label to be a user!
    $endgroup$
    – Anony-Mousse
    Apr 1 at 5:42












  • 1




    $begingroup$
    First you need to decide whether you want to do clustering (ignore the labels?) or classification (predict missing labels).
    $endgroup$
    – Anony-Mousse
    Mar 31 at 7:13










  • $begingroup$
    Ignore the missing labels. Wrongly predicted missing labels can mess things up.
    $endgroup$
    – Yogesch
    Mar 31 at 7:42






  • 1




    $begingroup$
    That sounds pretty much like the standard setup of recommender systems?
    $endgroup$
    – Anony-Mousse
    Mar 31 at 10:40










  • $begingroup$
    Ok, maybe... At first look, the crux to any sort of clustering in a recommendation system is to be able to define a "distance" metric between arbitrary points (objects). For each point/object, I have a set L1, L2, ... Ln where Ln can be 0 or 1, or na. So now how do I invent this "distance" metric in a consistent/coherent way? Should that be another question? Sorry, I'm yet to figure out what's a trivial question and what's a serious question in the datascience business.
    $endgroup$
    – Yogesch
    Mar 31 at 15:57






  • 1




    $begingroup$
    Consider each label to be a user!
    $endgroup$
    – Anony-Mousse
    Apr 1 at 5:42







1




1




$begingroup$
First you need to decide whether you want to do clustering (ignore the labels?) or classification (predict missing labels).
$endgroup$
– Anony-Mousse
Mar 31 at 7:13




$begingroup$
First you need to decide whether you want to do clustering (ignore the labels?) or classification (predict missing labels).
$endgroup$
– Anony-Mousse
Mar 31 at 7:13












$begingroup$
Ignore the missing labels. Wrongly predicted missing labels can mess things up.
$endgroup$
– Yogesch
Mar 31 at 7:42




$begingroup$
Ignore the missing labels. Wrongly predicted missing labels can mess things up.
$endgroup$
– Yogesch
Mar 31 at 7:42




1




1




$begingroup$
That sounds pretty much like the standard setup of recommender systems?
$endgroup$
– Anony-Mousse
Mar 31 at 10:40




$begingroup$
That sounds pretty much like the standard setup of recommender systems?
$endgroup$
– Anony-Mousse
Mar 31 at 10:40












$begingroup$
Ok, maybe... At first look, the crux to any sort of clustering in a recommendation system is to be able to define a "distance" metric between arbitrary points (objects). For each point/object, I have a set L1, L2, ... Ln where Ln can be 0 or 1, or na. So now how do I invent this "distance" metric in a consistent/coherent way? Should that be another question? Sorry, I'm yet to figure out what's a trivial question and what's a serious question in the datascience business.
$endgroup$
– Yogesch
Mar 31 at 15:57




$begingroup$
Ok, maybe... At first look, the crux to any sort of clustering in a recommendation system is to be able to define a "distance" metric between arbitrary points (objects). For each point/object, I have a set L1, L2, ... Ln where Ln can be 0 or 1, or na. So now how do I invent this "distance" metric in a consistent/coherent way? Should that be another question? Sorry, I'm yet to figure out what's a trivial question and what's a serious question in the datascience business.
$endgroup$
– Yogesch
Mar 31 at 15:57




1




1




$begingroup$
Consider each label to be a user!
$endgroup$
– Anony-Mousse
Apr 1 at 5:42




$begingroup$
Consider each label to be a user!
$endgroup$
– Anony-Mousse
Apr 1 at 5:42










0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48286%2fclustering-of-multi-label-data%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48286%2fclustering-of-multi-label-data%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?