Machine learning model to predict the best candidateDifferent methods for clustering skills in textSelf adjusting CNN networkHow can I use machine learning methods on modelling time series data?An Artificial Neuron Network (ANN) with an arbitrary number of inputs and outputsWhat classifier is the best to determine if object was detected in the correct position?what is the best approach to my prediction problemSupervised learning for variable length feature-less dataHow to represent data-set in a RNN?Advice on what Machine Learning Algorithms to study for a Job to candidate matching algorithmHow can I implement a deep/machine learning algorithm for this?

How do I deal with a coworker that keeps asking to make small superficial changes to a report, and it is seriously triggering my anxiety?

What is the difference between `command a[bc]d` and `command `ab,cd`

What do the phrase "Reeyan's seacrest" and the word "fraggle" mean in a sketch?

Does holding a wand and speaking its command word count as V/S/M spell components?

How did Captain America manage to do this?

What is the strongest case that can be made in favour of the UK regaining some control over fishing policy after Brexit?

Was there a Viking Exchange as well as a Columbian one?

Packing rectangles: Does rotation ever help?

Examples of subgroups where it's nontrivial to show closure under multiplication?

What happened to Captain America in Endgame?

how to find the equation of a circle given points of the circle

Why does academia still use scientific journals and not peer-reviewed government funded alternatives?

Does a semiconductor follow Ohm's law?

How to pronounce 'C++' in Spanish

How to stop co-workers from teasing me because I know Russian?

How to make a pipeline wait for end-of-file or stop after an error?

Examples of non trivial equivalence relations , I mean equivalence relations without the expression " same ... as" in their definition?

Is it possible to determine the symmetric encryption method used by output size?

Can someone publish a story that happened to you?

Why was Germany not as successful as other Europeans in establishing overseas colonies?

Is the 5 MB static resource size limit 5,242,880 bytes or 5,000,000 bytes?

How to have a sharp product image?

Document starts having heaps of errors in the middle, but the code doesn't have any problems in it

How come there are so many candidates for the 2020 Democratic party presidential nomination?



Machine learning model to predict the best candidate


Different methods for clustering skills in textSelf adjusting CNN networkHow can I use machine learning methods on modelling time series data?An Artificial Neuron Network (ANN) with an arbitrary number of inputs and outputsWhat classifier is the best to determine if object was detected in the correct position?what is the best approach to my prediction problemSupervised learning for variable length feature-less dataHow to represent data-set in a RNN?Advice on what Machine Learning Algorithms to study for a Job to candidate matching algorithmHow can I implement a deep/machine learning algorithm for this?













0












$begingroup$


Problem: I would like to build a machine learning model that can predict the best candidate from any given set. What could be a good architecture for such a model?



Given: I have several training examples, each of which consists of:

- a set of candidates.

- a descriptor for the set as a whole.

- a label that tells which one of those candidates is the best in that set.



Details:

- I will have around 10K such sets.

- The number of candidates in every set may be different (may vary roughly from 10 to 100)

- Every set is unordered.

- The descriptor of each set is currently a fixed length one-hot vector. I'm open to add more features to it though.

- Each candidate is represented by a fixed length feature vector. (However in future, the number of features describing each candidate may also differ for every candidate).



What I tried but didn't work:

One approach I tried was a simple MLP that takes one candidate as input and outputs whether or not the candidate is the best. But since this MLP wouldn't know which set the candidate belongs to, it fails in situations where a candidate is the best in one set but the same candidate is not the best in another set.




To get into some more specifics, in my current problem, each candidate is a 2D polygon with a fixed number of line segments. Labelling on the training examples is being done manually to pick the most good looking polygon in a given set of polylines. Each polygon is described by an array of (x,y) coordinates.



One problem I face is that I don't have a natural starting point for a polygon to begin it's array of (x,y) coordinates from. Currently I'm choosing the starting point to be the one with the minimum value of x+y and going counterclockwise from there.



Currently each 2D polygon has the same number of segments. But I would soon need to support polygons with varying number of segments.



In future, I would like to extend this ML model to 3D polyhedrons too, but I don't know how to even build a feature vector to describe for 3D polyhedron yet. I guess that's a problem for another day.










share|improve this question











$endgroup$











  • $begingroup$
    Do you have labeled comparisons between polygons across sets?
    $endgroup$
    – jonnor
    Apr 7 at 12:34










  • $begingroup$
    Nope. I do not have any comparisons across sets.
    $endgroup$
    – mak
    Apr 7 at 13:52










  • $begingroup$
    That makes it a bit hard. Key here is to be able to formulate the problem as a standard type of ML problem. You can have a look at the Ranking via Pairwise Comparisons for some inspiration, but I'm not sure if it fit entirely...
    $endgroup$
    – jonnor
    Apr 7 at 14:55










  • $begingroup$
    Thanks a lot for your suggestions! Even I had considered pairwise comparisons and I guess they might work, but the performance would go O(n^2). I also considered RNNs but they are meant for ordered sequences, not for unordered sets.
    $endgroup$
    – mak
    Apr 7 at 15:08










  • $begingroup$
    How many polygons in each set, and how many sets?
    $endgroup$
    – jonnor
    Apr 7 at 15:16















0












$begingroup$


Problem: I would like to build a machine learning model that can predict the best candidate from any given set. What could be a good architecture for such a model?



Given: I have several training examples, each of which consists of:

- a set of candidates.

- a descriptor for the set as a whole.

- a label that tells which one of those candidates is the best in that set.



Details:

- I will have around 10K such sets.

- The number of candidates in every set may be different (may vary roughly from 10 to 100)

- Every set is unordered.

- The descriptor of each set is currently a fixed length one-hot vector. I'm open to add more features to it though.

- Each candidate is represented by a fixed length feature vector. (However in future, the number of features describing each candidate may also differ for every candidate).



What I tried but didn't work:

One approach I tried was a simple MLP that takes one candidate as input and outputs whether or not the candidate is the best. But since this MLP wouldn't know which set the candidate belongs to, it fails in situations where a candidate is the best in one set but the same candidate is not the best in another set.




To get into some more specifics, in my current problem, each candidate is a 2D polygon with a fixed number of line segments. Labelling on the training examples is being done manually to pick the most good looking polygon in a given set of polylines. Each polygon is described by an array of (x,y) coordinates.



One problem I face is that I don't have a natural starting point for a polygon to begin it's array of (x,y) coordinates from. Currently I'm choosing the starting point to be the one with the minimum value of x+y and going counterclockwise from there.



Currently each 2D polygon has the same number of segments. But I would soon need to support polygons with varying number of segments.



In future, I would like to extend this ML model to 3D polyhedrons too, but I don't know how to even build a feature vector to describe for 3D polyhedron yet. I guess that's a problem for another day.










share|improve this question











$endgroup$











  • $begingroup$
    Do you have labeled comparisons between polygons across sets?
    $endgroup$
    – jonnor
    Apr 7 at 12:34










  • $begingroup$
    Nope. I do not have any comparisons across sets.
    $endgroup$
    – mak
    Apr 7 at 13:52










  • $begingroup$
    That makes it a bit hard. Key here is to be able to formulate the problem as a standard type of ML problem. You can have a look at the Ranking via Pairwise Comparisons for some inspiration, but I'm not sure if it fit entirely...
    $endgroup$
    – jonnor
    Apr 7 at 14:55










  • $begingroup$
    Thanks a lot for your suggestions! Even I had considered pairwise comparisons and I guess they might work, but the performance would go O(n^2). I also considered RNNs but they are meant for ordered sequences, not for unordered sets.
    $endgroup$
    – mak
    Apr 7 at 15:08










  • $begingroup$
    How many polygons in each set, and how many sets?
    $endgroup$
    – jonnor
    Apr 7 at 15:16













0












0








0





$begingroup$


Problem: I would like to build a machine learning model that can predict the best candidate from any given set. What could be a good architecture for such a model?



Given: I have several training examples, each of which consists of:

- a set of candidates.

- a descriptor for the set as a whole.

- a label that tells which one of those candidates is the best in that set.



Details:

- I will have around 10K such sets.

- The number of candidates in every set may be different (may vary roughly from 10 to 100)

- Every set is unordered.

- The descriptor of each set is currently a fixed length one-hot vector. I'm open to add more features to it though.

- Each candidate is represented by a fixed length feature vector. (However in future, the number of features describing each candidate may also differ for every candidate).



What I tried but didn't work:

One approach I tried was a simple MLP that takes one candidate as input and outputs whether or not the candidate is the best. But since this MLP wouldn't know which set the candidate belongs to, it fails in situations where a candidate is the best in one set but the same candidate is not the best in another set.




To get into some more specifics, in my current problem, each candidate is a 2D polygon with a fixed number of line segments. Labelling on the training examples is being done manually to pick the most good looking polygon in a given set of polylines. Each polygon is described by an array of (x,y) coordinates.



One problem I face is that I don't have a natural starting point for a polygon to begin it's array of (x,y) coordinates from. Currently I'm choosing the starting point to be the one with the minimum value of x+y and going counterclockwise from there.



Currently each 2D polygon has the same number of segments. But I would soon need to support polygons with varying number of segments.



In future, I would like to extend this ML model to 3D polyhedrons too, but I don't know how to even build a feature vector to describe for 3D polyhedron yet. I guess that's a problem for another day.










share|improve this question











$endgroup$




Problem: I would like to build a machine learning model that can predict the best candidate from any given set. What could be a good architecture for such a model?



Given: I have several training examples, each of which consists of:

- a set of candidates.

- a descriptor for the set as a whole.

- a label that tells which one of those candidates is the best in that set.



Details:

- I will have around 10K such sets.

- The number of candidates in every set may be different (may vary roughly from 10 to 100)

- Every set is unordered.

- The descriptor of each set is currently a fixed length one-hot vector. I'm open to add more features to it though.

- Each candidate is represented by a fixed length feature vector. (However in future, the number of features describing each candidate may also differ for every candidate).



What I tried but didn't work:

One approach I tried was a simple MLP that takes one candidate as input and outputs whether or not the candidate is the best. But since this MLP wouldn't know which set the candidate belongs to, it fails in situations where a candidate is the best in one set but the same candidate is not the best in another set.




To get into some more specifics, in my current problem, each candidate is a 2D polygon with a fixed number of line segments. Labelling on the training examples is being done manually to pick the most good looking polygon in a given set of polylines. Each polygon is described by an array of (x,y) coordinates.



One problem I face is that I don't have a natural starting point for a polygon to begin it's array of (x,y) coordinates from. Currently I'm choosing the starting point to be the one with the minimum value of x+y and going counterclockwise from there.



Currently each 2D polygon has the same number of segments. But I would soon need to support polygons with varying number of segments.



In future, I would like to extend this ML model to 3D polyhedrons too, but I don't know how to even build a feature vector to describe for 3D polyhedron yet. I guess that's a problem for another day.







machine-learning neural-network prediction machine-learning-model






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 9 at 8:08







mak

















asked Apr 7 at 5:50









makmak

33




33











  • $begingroup$
    Do you have labeled comparisons between polygons across sets?
    $endgroup$
    – jonnor
    Apr 7 at 12:34










  • $begingroup$
    Nope. I do not have any comparisons across sets.
    $endgroup$
    – mak
    Apr 7 at 13:52










  • $begingroup$
    That makes it a bit hard. Key here is to be able to formulate the problem as a standard type of ML problem. You can have a look at the Ranking via Pairwise Comparisons for some inspiration, but I'm not sure if it fit entirely...
    $endgroup$
    – jonnor
    Apr 7 at 14:55










  • $begingroup$
    Thanks a lot for your suggestions! Even I had considered pairwise comparisons and I guess they might work, but the performance would go O(n^2). I also considered RNNs but they are meant for ordered sequences, not for unordered sets.
    $endgroup$
    – mak
    Apr 7 at 15:08










  • $begingroup$
    How many polygons in each set, and how many sets?
    $endgroup$
    – jonnor
    Apr 7 at 15:16
















  • $begingroup$
    Do you have labeled comparisons between polygons across sets?
    $endgroup$
    – jonnor
    Apr 7 at 12:34










  • $begingroup$
    Nope. I do not have any comparisons across sets.
    $endgroup$
    – mak
    Apr 7 at 13:52










  • $begingroup$
    That makes it a bit hard. Key here is to be able to formulate the problem as a standard type of ML problem. You can have a look at the Ranking via Pairwise Comparisons for some inspiration, but I'm not sure if it fit entirely...
    $endgroup$
    – jonnor
    Apr 7 at 14:55










  • $begingroup$
    Thanks a lot for your suggestions! Even I had considered pairwise comparisons and I guess they might work, but the performance would go O(n^2). I also considered RNNs but they are meant for ordered sequences, not for unordered sets.
    $endgroup$
    – mak
    Apr 7 at 15:08










  • $begingroup$
    How many polygons in each set, and how many sets?
    $endgroup$
    – jonnor
    Apr 7 at 15:16















$begingroup$
Do you have labeled comparisons between polygons across sets?
$endgroup$
– jonnor
Apr 7 at 12:34




$begingroup$
Do you have labeled comparisons between polygons across sets?
$endgroup$
– jonnor
Apr 7 at 12:34












$begingroup$
Nope. I do not have any comparisons across sets.
$endgroup$
– mak
Apr 7 at 13:52




$begingroup$
Nope. I do not have any comparisons across sets.
$endgroup$
– mak
Apr 7 at 13:52












$begingroup$
That makes it a bit hard. Key here is to be able to formulate the problem as a standard type of ML problem. You can have a look at the Ranking via Pairwise Comparisons for some inspiration, but I'm not sure if it fit entirely...
$endgroup$
– jonnor
Apr 7 at 14:55




$begingroup$
That makes it a bit hard. Key here is to be able to formulate the problem as a standard type of ML problem. You can have a look at the Ranking via Pairwise Comparisons for some inspiration, but I'm not sure if it fit entirely...
$endgroup$
– jonnor
Apr 7 at 14:55












$begingroup$
Thanks a lot for your suggestions! Even I had considered pairwise comparisons and I guess they might work, but the performance would go O(n^2). I also considered RNNs but they are meant for ordered sequences, not for unordered sets.
$endgroup$
– mak
Apr 7 at 15:08




$begingroup$
Thanks a lot for your suggestions! Even I had considered pairwise comparisons and I guess they might work, but the performance would go O(n^2). I also considered RNNs but they are meant for ordered sequences, not for unordered sets.
$endgroup$
– mak
Apr 7 at 15:08












$begingroup$
How many polygons in each set, and how many sets?
$endgroup$
– jonnor
Apr 7 at 15:16




$begingroup$
How many polygons in each set, and how many sets?
$endgroup$
– jonnor
Apr 7 at 15:16










1 Answer
1






active

oldest

votes


















0












$begingroup$

I think it would make more sense to train a model to grade (regression) each candidate, them from candidates of a particular set you can use the best candidate from its "grade".



Also, you should try changing the information from raw cloud of points to more meaningful geometric form irfomation:



  • Number of vertices/segments

  • Segments length mean and variance

  • Skewness

  • Size and direction of major and minor axis

  • Center position

  • Moments of area (first,second,third...)

Update



To apply a regression model you will need to generate grades for the training set and that might be a challenge. For that I will propose a few heuristics to generate this:




  • Since you have about 10k samples, you could assign to every candidate the probability of been the best candidate in any set (for example, if he is the best candidate in 10 sets you can give him a grade $frac1010,000$.



    • You could try clustering the samples and assigning a grade to every cluster as the probability of a candidate in that cluster been the best candidate by $fracN_bestN_cluster$, where $N_best$ is the number of best candidates in any set in that cluster and $N_cluster$ is the number of candidates in that cluster


    • You can assign to each best candidate a grade like $1$ if he is best candidate in every set it appears and $1-e^-alpha N$ for every $N$ times it appears in a set without been the best candidate. You will have to tune the decay rate $alpha$ like any hyperparameter.







share|improve this answer











$endgroup$












  • $begingroup$
    Thanks @Pedro for your useful answer! However, in order to create a regression model, I would need to train it with grades for each candidate in the training examples. How do you suggest I generate those grades?
    $endgroup$
    – mak
    Apr 8 at 6:23










  • $begingroup$
    True, I proposed a few heuristics to that and updates the answer. Sorry for forgetting that crucial point lol
    $endgroup$
    – Pedro Henrique Monforte
    Apr 8 at 12:37










  • $begingroup$
    Could you return to us with a small report of the success of any of my tips? I am a computer vision researcher and geometry-related models are really useful to my field.
    $endgroup$
    – Pedro Henrique Monforte
    Apr 8 at 14:59










  • $begingroup$
    Thanks @Pedro, all of your ideas are very useful. In my case though, your first idea (individual probability based) and your third idea ($1-e^-alpha N$) might be a bit difficult to implement because I don't have ready information about which candidates appear in multiple sets. I can search for multiple occurrences, but that too is tricky because two candidates may be very similar but not exactly identical due to numerical noise. Your second idea (probability associated with clusters) seems like it might work for me. I'll try out and let you know. Thank you for your wonderful ideas!
    $endgroup$
    – mak
    Apr 9 at 8:01












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48784%2fmachine-learning-model-to-predict-the-best-candidate%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

I think it would make more sense to train a model to grade (regression) each candidate, them from candidates of a particular set you can use the best candidate from its "grade".



Also, you should try changing the information from raw cloud of points to more meaningful geometric form irfomation:



  • Number of vertices/segments

  • Segments length mean and variance

  • Skewness

  • Size and direction of major and minor axis

  • Center position

  • Moments of area (first,second,third...)

Update



To apply a regression model you will need to generate grades for the training set and that might be a challenge. For that I will propose a few heuristics to generate this:




  • Since you have about 10k samples, you could assign to every candidate the probability of been the best candidate in any set (for example, if he is the best candidate in 10 sets you can give him a grade $frac1010,000$.



    • You could try clustering the samples and assigning a grade to every cluster as the probability of a candidate in that cluster been the best candidate by $fracN_bestN_cluster$, where $N_best$ is the number of best candidates in any set in that cluster and $N_cluster$ is the number of candidates in that cluster


    • You can assign to each best candidate a grade like $1$ if he is best candidate in every set it appears and $1-e^-alpha N$ for every $N$ times it appears in a set without been the best candidate. You will have to tune the decay rate $alpha$ like any hyperparameter.







share|improve this answer











$endgroup$












  • $begingroup$
    Thanks @Pedro for your useful answer! However, in order to create a regression model, I would need to train it with grades for each candidate in the training examples. How do you suggest I generate those grades?
    $endgroup$
    – mak
    Apr 8 at 6:23










  • $begingroup$
    True, I proposed a few heuristics to that and updates the answer. Sorry for forgetting that crucial point lol
    $endgroup$
    – Pedro Henrique Monforte
    Apr 8 at 12:37










  • $begingroup$
    Could you return to us with a small report of the success of any of my tips? I am a computer vision researcher and geometry-related models are really useful to my field.
    $endgroup$
    – Pedro Henrique Monforte
    Apr 8 at 14:59










  • $begingroup$
    Thanks @Pedro, all of your ideas are very useful. In my case though, your first idea (individual probability based) and your third idea ($1-e^-alpha N$) might be a bit difficult to implement because I don't have ready information about which candidates appear in multiple sets. I can search for multiple occurrences, but that too is tricky because two candidates may be very similar but not exactly identical due to numerical noise. Your second idea (probability associated with clusters) seems like it might work for me. I'll try out and let you know. Thank you for your wonderful ideas!
    $endgroup$
    – mak
    Apr 9 at 8:01
















0












$begingroup$

I think it would make more sense to train a model to grade (regression) each candidate, them from candidates of a particular set you can use the best candidate from its "grade".



Also, you should try changing the information from raw cloud of points to more meaningful geometric form irfomation:



  • Number of vertices/segments

  • Segments length mean and variance

  • Skewness

  • Size and direction of major and minor axis

  • Center position

  • Moments of area (first,second,third...)

Update



To apply a regression model you will need to generate grades for the training set and that might be a challenge. For that I will propose a few heuristics to generate this:




  • Since you have about 10k samples, you could assign to every candidate the probability of been the best candidate in any set (for example, if he is the best candidate in 10 sets you can give him a grade $frac1010,000$.



    • You could try clustering the samples and assigning a grade to every cluster as the probability of a candidate in that cluster been the best candidate by $fracN_bestN_cluster$, where $N_best$ is the number of best candidates in any set in that cluster and $N_cluster$ is the number of candidates in that cluster


    • You can assign to each best candidate a grade like $1$ if he is best candidate in every set it appears and $1-e^-alpha N$ for every $N$ times it appears in a set without been the best candidate. You will have to tune the decay rate $alpha$ like any hyperparameter.







share|improve this answer











$endgroup$












  • $begingroup$
    Thanks @Pedro for your useful answer! However, in order to create a regression model, I would need to train it with grades for each candidate in the training examples. How do you suggest I generate those grades?
    $endgroup$
    – mak
    Apr 8 at 6:23










  • $begingroup$
    True, I proposed a few heuristics to that and updates the answer. Sorry for forgetting that crucial point lol
    $endgroup$
    – Pedro Henrique Monforte
    Apr 8 at 12:37










  • $begingroup$
    Could you return to us with a small report of the success of any of my tips? I am a computer vision researcher and geometry-related models are really useful to my field.
    $endgroup$
    – Pedro Henrique Monforte
    Apr 8 at 14:59










  • $begingroup$
    Thanks @Pedro, all of your ideas are very useful. In my case though, your first idea (individual probability based) and your third idea ($1-e^-alpha N$) might be a bit difficult to implement because I don't have ready information about which candidates appear in multiple sets. I can search for multiple occurrences, but that too is tricky because two candidates may be very similar but not exactly identical due to numerical noise. Your second idea (probability associated with clusters) seems like it might work for me. I'll try out and let you know. Thank you for your wonderful ideas!
    $endgroup$
    – mak
    Apr 9 at 8:01














0












0








0





$begingroup$

I think it would make more sense to train a model to grade (regression) each candidate, them from candidates of a particular set you can use the best candidate from its "grade".



Also, you should try changing the information from raw cloud of points to more meaningful geometric form irfomation:



  • Number of vertices/segments

  • Segments length mean and variance

  • Skewness

  • Size and direction of major and minor axis

  • Center position

  • Moments of area (first,second,third...)

Update



To apply a regression model you will need to generate grades for the training set and that might be a challenge. For that I will propose a few heuristics to generate this:




  • Since you have about 10k samples, you could assign to every candidate the probability of been the best candidate in any set (for example, if he is the best candidate in 10 sets you can give him a grade $frac1010,000$.



    • You could try clustering the samples and assigning a grade to every cluster as the probability of a candidate in that cluster been the best candidate by $fracN_bestN_cluster$, where $N_best$ is the number of best candidates in any set in that cluster and $N_cluster$ is the number of candidates in that cluster


    • You can assign to each best candidate a grade like $1$ if he is best candidate in every set it appears and $1-e^-alpha N$ for every $N$ times it appears in a set without been the best candidate. You will have to tune the decay rate $alpha$ like any hyperparameter.







share|improve this answer











$endgroup$



I think it would make more sense to train a model to grade (regression) each candidate, them from candidates of a particular set you can use the best candidate from its "grade".



Also, you should try changing the information from raw cloud of points to more meaningful geometric form irfomation:



  • Number of vertices/segments

  • Segments length mean and variance

  • Skewness

  • Size and direction of major and minor axis

  • Center position

  • Moments of area (first,second,third...)

Update



To apply a regression model you will need to generate grades for the training set and that might be a challenge. For that I will propose a few heuristics to generate this:




  • Since you have about 10k samples, you could assign to every candidate the probability of been the best candidate in any set (for example, if he is the best candidate in 10 sets you can give him a grade $frac1010,000$.



    • You could try clustering the samples and assigning a grade to every cluster as the probability of a candidate in that cluster been the best candidate by $fracN_bestN_cluster$, where $N_best$ is the number of best candidates in any set in that cluster and $N_cluster$ is the number of candidates in that cluster


    • You can assign to each best candidate a grade like $1$ if he is best candidate in every set it appears and $1-e^-alpha N$ for every $N$ times it appears in a set without been the best candidate. You will have to tune the decay rate $alpha$ like any hyperparameter.








share|improve this answer














share|improve this answer



share|improve this answer








edited Apr 8 at 12:36

























answered Apr 7 at 22:06









Pedro Henrique MonfortePedro Henrique Monforte

569219




569219











  • $begingroup$
    Thanks @Pedro for your useful answer! However, in order to create a regression model, I would need to train it with grades for each candidate in the training examples. How do you suggest I generate those grades?
    $endgroup$
    – mak
    Apr 8 at 6:23










  • $begingroup$
    True, I proposed a few heuristics to that and updates the answer. Sorry for forgetting that crucial point lol
    $endgroup$
    – Pedro Henrique Monforte
    Apr 8 at 12:37










  • $begingroup$
    Could you return to us with a small report of the success of any of my tips? I am a computer vision researcher and geometry-related models are really useful to my field.
    $endgroup$
    – Pedro Henrique Monforte
    Apr 8 at 14:59










  • $begingroup$
    Thanks @Pedro, all of your ideas are very useful. In my case though, your first idea (individual probability based) and your third idea ($1-e^-alpha N$) might be a bit difficult to implement because I don't have ready information about which candidates appear in multiple sets. I can search for multiple occurrences, but that too is tricky because two candidates may be very similar but not exactly identical due to numerical noise. Your second idea (probability associated with clusters) seems like it might work for me. I'll try out and let you know. Thank you for your wonderful ideas!
    $endgroup$
    – mak
    Apr 9 at 8:01

















  • $begingroup$
    Thanks @Pedro for your useful answer! However, in order to create a regression model, I would need to train it with grades for each candidate in the training examples. How do you suggest I generate those grades?
    $endgroup$
    – mak
    Apr 8 at 6:23










  • $begingroup$
    True, I proposed a few heuristics to that and updates the answer. Sorry for forgetting that crucial point lol
    $endgroup$
    – Pedro Henrique Monforte
    Apr 8 at 12:37










  • $begingroup$
    Could you return to us with a small report of the success of any of my tips? I am a computer vision researcher and geometry-related models are really useful to my field.
    $endgroup$
    – Pedro Henrique Monforte
    Apr 8 at 14:59










  • $begingroup$
    Thanks @Pedro, all of your ideas are very useful. In my case though, your first idea (individual probability based) and your third idea ($1-e^-alpha N$) might be a bit difficult to implement because I don't have ready information about which candidates appear in multiple sets. I can search for multiple occurrences, but that too is tricky because two candidates may be very similar but not exactly identical due to numerical noise. Your second idea (probability associated with clusters) seems like it might work for me. I'll try out and let you know. Thank you for your wonderful ideas!
    $endgroup$
    – mak
    Apr 9 at 8:01
















$begingroup$
Thanks @Pedro for your useful answer! However, in order to create a regression model, I would need to train it with grades for each candidate in the training examples. How do you suggest I generate those grades?
$endgroup$
– mak
Apr 8 at 6:23




$begingroup$
Thanks @Pedro for your useful answer! However, in order to create a regression model, I would need to train it with grades for each candidate in the training examples. How do you suggest I generate those grades?
$endgroup$
– mak
Apr 8 at 6:23












$begingroup$
True, I proposed a few heuristics to that and updates the answer. Sorry for forgetting that crucial point lol
$endgroup$
– Pedro Henrique Monforte
Apr 8 at 12:37




$begingroup$
True, I proposed a few heuristics to that and updates the answer. Sorry for forgetting that crucial point lol
$endgroup$
– Pedro Henrique Monforte
Apr 8 at 12:37












$begingroup$
Could you return to us with a small report of the success of any of my tips? I am a computer vision researcher and geometry-related models are really useful to my field.
$endgroup$
– Pedro Henrique Monforte
Apr 8 at 14:59




$begingroup$
Could you return to us with a small report of the success of any of my tips? I am a computer vision researcher and geometry-related models are really useful to my field.
$endgroup$
– Pedro Henrique Monforte
Apr 8 at 14:59












$begingroup$
Thanks @Pedro, all of your ideas are very useful. In my case though, your first idea (individual probability based) and your third idea ($1-e^-alpha N$) might be a bit difficult to implement because I don't have ready information about which candidates appear in multiple sets. I can search for multiple occurrences, but that too is tricky because two candidates may be very similar but not exactly identical due to numerical noise. Your second idea (probability associated with clusters) seems like it might work for me. I'll try out and let you know. Thank you for your wonderful ideas!
$endgroup$
– mak
Apr 9 at 8:01





$begingroup$
Thanks @Pedro, all of your ideas are very useful. In my case though, your first idea (individual probability based) and your third idea ($1-e^-alpha N$) might be a bit difficult to implement because I don't have ready information about which candidates appear in multiple sets. I can search for multiple occurrences, but that too is tricky because two candidates may be very similar but not exactly identical due to numerical noise. Your second idea (probability associated with clusters) seems like it might work for me. I'll try out and let you know. Thank you for your wonderful ideas!
$endgroup$
– mak
Apr 9 at 8:01


















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48784%2fmachine-learning-model-to-predict-the-best-candidate%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High