Detection model - Training with class-instance limit-awareness2019 Community Moderator ElectionSpecifying neural network output layout for object detectionWhat classifier is the best to determine if object was detected in the correct position?Data preprocessing: Should we normalise images pixel-wise?Several fundamental questions about CNNTraining of Region Proposal Network (RPN)Bounding Boxes in YOLO ModelYOLO algorithm - understanding training dataOdd Loss Curves for Object Detection TaskDetecting address labels using Tensorflow Object Detection API
aging parents with no investments
Uplifted animals have parts of their "brain" in various locations of their body. Where?
Can a planet have a different gravitational pull depending on its location in orbit around its sun?
extract characters between two commas?
How to deal with fear of taking dependencies
Is domain driven design an anti-SQL pattern?
Where to refill my bottle in India?
Does the average primeness of natural numbers tend to zero?
Are cabin dividers used to "hide" the flex of the airplane?
Creating a loop after a break using Markov Chain in Tikz
Why do UK politicians seemingly ignore opinion polls on Brexit?
How did the USSR manage to innovate in an environment characterized by government censorship and high bureaucracy?
Is it legal to have the "// (c) 2019 John Smith" header in all files when there are hundreds of contributors?
LWC and complex parameters
I’m planning on buying a laser printer but concerned about the life cycle of toner in the machine
New order #4: World
Why is the design of haulage companies so “special”?
Why do we use polarized capacitors?
Prime joint compound before latex paint?
Doomsday-clock for my fantasy planet
Is Social Media Science Fiction?
Can the Produce Flame cantrip be used to grapple, or as an unarmed strike, in the right circumstances?
Is this relativistic mass?
What do you call something that goes against the spirit of the law, but is legal when interpreting the law to the letter?
Detection model - Training with class-instance limit-awareness
2019 Community Moderator ElectionSpecifying neural network output layout for object detectionWhat classifier is the best to determine if object was detected in the correct position?Data preprocessing: Should we normalise images pixel-wise?Several fundamental questions about CNNTraining of Region Proposal Network (RPN)Bounding Boxes in YOLO ModelYOLO algorithm - understanding training dataOdd Loss Curves for Object Detection TaskDetecting address labels using Tensorflow Object Detection API
$begingroup$
Is there a way to make a detection model aware of the maximal number of possible objects of a given class within a single image?
For example in a toy case with 2 classes. If I know that in every single image, class A can have no more than 5 instances and class B no more than 1. Is there a way to incorporate it into the training process?
To make it clear, I'm not talking about an additional algorithm which runs on top of the trained model (such as non-maximum suppression which is used to select a single bounding box for an object). I specifically ask about the actual model and its training process.
machine-learning deep-learning object-detection
$endgroup$
add a comment |
$begingroup$
Is there a way to make a detection model aware of the maximal number of possible objects of a given class within a single image?
For example in a toy case with 2 classes. If I know that in every single image, class A can have no more than 5 instances and class B no more than 1. Is there a way to incorporate it into the training process?
To make it clear, I'm not talking about an additional algorithm which runs on top of the trained model (such as non-maximum suppression which is used to select a single bounding box for an object). I specifically ask about the actual model and its training process.
machine-learning deep-learning object-detection
$endgroup$
add a comment |
$begingroup$
Is there a way to make a detection model aware of the maximal number of possible objects of a given class within a single image?
For example in a toy case with 2 classes. If I know that in every single image, class A can have no more than 5 instances and class B no more than 1. Is there a way to incorporate it into the training process?
To make it clear, I'm not talking about an additional algorithm which runs on top of the trained model (such as non-maximum suppression which is used to select a single bounding box for an object). I specifically ask about the actual model and its training process.
machine-learning deep-learning object-detection
$endgroup$
Is there a way to make a detection model aware of the maximal number of possible objects of a given class within a single image?
For example in a toy case with 2 classes. If I know that in every single image, class A can have no more than 5 instances and class B no more than 1. Is there a way to incorporate it into the training process?
To make it clear, I'm not talking about an additional algorithm which runs on top of the trained model (such as non-maximum suppression which is used to select a single bounding box for an object). I specifically ask about the actual model and its training process.
machine-learning deep-learning object-detection
machine-learning deep-learning object-detection
asked Mar 29 at 8:33
Mark.FMark.F
1,0841521
1,0841521
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
I can think of the following approach:
Let's say that you have two classes, A and B. Additionally, you now that for class A there is at maximum 5 instances (so 0, 1, 2, 3, 4 or 5) and for B 1 instance (0 or 1).
For this purpose, you can have 6 outputs for class A and 2 outputs for class B.
Between those 6 outputs for class A, only one should be active; same for B - only one of two should be active.
For example, if on some image there are 3 objects of class A and 0 for B, the outputs would be: [0, 0, 0, 1, 0, 0]
for A, and [1, 0]
for B (or something very close to 0s and 1s, right?)
With these outputs, you can also combine other outputs which are needed for detection.
$endgroup$
add a comment |
$begingroup$
This is a very interesting supervision but hard to achieve!
Why we need this supervision?
The need for this supervision comes from the fact that model may wrongly detect more objects than it should, thus, must be punished (taught) for this violation, otherwise no supervision would be required since model is acting accordingly.
How to implement this supervision?
To this end, we need to fork some layers from the model to output the number of detected objects per class $c$ for input image $i$, namely $n'_c,i$, then supervise this output with the true number of objects in image $i$, namely $n_c, i$, or merely with an upper limit $N_c$ per class as you have suggested. Then, add a term like $(n_c, i - n'_c, i)^2$ or $textmax(0, n'_c, i - N_c)$ to the loss function to punish the model for detecting wrong or more number of objects than it should respectively. Then proceed to train the model.
What may go wrong?
But here is the problem, model can learn to lie about the number of detected objects through modifying the forked layers (weights)! Since it is easier for model to fabricate a valid $n'_c, i$ than to actually detect fewer objects which is more complex. Also, if we use a constant, unfabricatable unit (e.g. a constant neural net) that counts the number of detected objects, there would be no gradient to punish (teach) the model!
This is why this type of supervision is hard to achieve.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48197%2fdetection-model-training-with-class-instance-limit-awareness%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I can think of the following approach:
Let's say that you have two classes, A and B. Additionally, you now that for class A there is at maximum 5 instances (so 0, 1, 2, 3, 4 or 5) and for B 1 instance (0 or 1).
For this purpose, you can have 6 outputs for class A and 2 outputs for class B.
Between those 6 outputs for class A, only one should be active; same for B - only one of two should be active.
For example, if on some image there are 3 objects of class A and 0 for B, the outputs would be: [0, 0, 0, 1, 0, 0]
for A, and [1, 0]
for B (or something very close to 0s and 1s, right?)
With these outputs, you can also combine other outputs which are needed for detection.
$endgroup$
add a comment |
$begingroup$
I can think of the following approach:
Let's say that you have two classes, A and B. Additionally, you now that for class A there is at maximum 5 instances (so 0, 1, 2, 3, 4 or 5) and for B 1 instance (0 or 1).
For this purpose, you can have 6 outputs for class A and 2 outputs for class B.
Between those 6 outputs for class A, only one should be active; same for B - only one of two should be active.
For example, if on some image there are 3 objects of class A and 0 for B, the outputs would be: [0, 0, 0, 1, 0, 0]
for A, and [1, 0]
for B (or something very close to 0s and 1s, right?)
With these outputs, you can also combine other outputs which are needed for detection.
$endgroup$
add a comment |
$begingroup$
I can think of the following approach:
Let's say that you have two classes, A and B. Additionally, you now that for class A there is at maximum 5 instances (so 0, 1, 2, 3, 4 or 5) and for B 1 instance (0 or 1).
For this purpose, you can have 6 outputs for class A and 2 outputs for class B.
Between those 6 outputs for class A, only one should be active; same for B - only one of two should be active.
For example, if on some image there are 3 objects of class A and 0 for B, the outputs would be: [0, 0, 0, 1, 0, 0]
for A, and [1, 0]
for B (or something very close to 0s and 1s, right?)
With these outputs, you can also combine other outputs which are needed for detection.
$endgroup$
I can think of the following approach:
Let's say that you have two classes, A and B. Additionally, you now that for class A there is at maximum 5 instances (so 0, 1, 2, 3, 4 or 5) and for B 1 instance (0 or 1).
For this purpose, you can have 6 outputs for class A and 2 outputs for class B.
Between those 6 outputs for class A, only one should be active; same for B - only one of two should be active.
For example, if on some image there are 3 objects of class A and 0 for B, the outputs would be: [0, 0, 0, 1, 0, 0]
for A, and [1, 0]
for B (or something very close to 0s and 1s, right?)
With these outputs, you can also combine other outputs which are needed for detection.
answered Mar 29 at 9:58
Antonio JurićAntonio Jurić
741111
741111
add a comment |
add a comment |
$begingroup$
This is a very interesting supervision but hard to achieve!
Why we need this supervision?
The need for this supervision comes from the fact that model may wrongly detect more objects than it should, thus, must be punished (taught) for this violation, otherwise no supervision would be required since model is acting accordingly.
How to implement this supervision?
To this end, we need to fork some layers from the model to output the number of detected objects per class $c$ for input image $i$, namely $n'_c,i$, then supervise this output with the true number of objects in image $i$, namely $n_c, i$, or merely with an upper limit $N_c$ per class as you have suggested. Then, add a term like $(n_c, i - n'_c, i)^2$ or $textmax(0, n'_c, i - N_c)$ to the loss function to punish the model for detecting wrong or more number of objects than it should respectively. Then proceed to train the model.
What may go wrong?
But here is the problem, model can learn to lie about the number of detected objects through modifying the forked layers (weights)! Since it is easier for model to fabricate a valid $n'_c, i$ than to actually detect fewer objects which is more complex. Also, if we use a constant, unfabricatable unit (e.g. a constant neural net) that counts the number of detected objects, there would be no gradient to punish (teach) the model!
This is why this type of supervision is hard to achieve.
$endgroup$
add a comment |
$begingroup$
This is a very interesting supervision but hard to achieve!
Why we need this supervision?
The need for this supervision comes from the fact that model may wrongly detect more objects than it should, thus, must be punished (taught) for this violation, otherwise no supervision would be required since model is acting accordingly.
How to implement this supervision?
To this end, we need to fork some layers from the model to output the number of detected objects per class $c$ for input image $i$, namely $n'_c,i$, then supervise this output with the true number of objects in image $i$, namely $n_c, i$, or merely with an upper limit $N_c$ per class as you have suggested. Then, add a term like $(n_c, i - n'_c, i)^2$ or $textmax(0, n'_c, i - N_c)$ to the loss function to punish the model for detecting wrong or more number of objects than it should respectively. Then proceed to train the model.
What may go wrong?
But here is the problem, model can learn to lie about the number of detected objects through modifying the forked layers (weights)! Since it is easier for model to fabricate a valid $n'_c, i$ than to actually detect fewer objects which is more complex. Also, if we use a constant, unfabricatable unit (e.g. a constant neural net) that counts the number of detected objects, there would be no gradient to punish (teach) the model!
This is why this type of supervision is hard to achieve.
$endgroup$
add a comment |
$begingroup$
This is a very interesting supervision but hard to achieve!
Why we need this supervision?
The need for this supervision comes from the fact that model may wrongly detect more objects than it should, thus, must be punished (taught) for this violation, otherwise no supervision would be required since model is acting accordingly.
How to implement this supervision?
To this end, we need to fork some layers from the model to output the number of detected objects per class $c$ for input image $i$, namely $n'_c,i$, then supervise this output with the true number of objects in image $i$, namely $n_c, i$, or merely with an upper limit $N_c$ per class as you have suggested. Then, add a term like $(n_c, i - n'_c, i)^2$ or $textmax(0, n'_c, i - N_c)$ to the loss function to punish the model for detecting wrong or more number of objects than it should respectively. Then proceed to train the model.
What may go wrong?
But here is the problem, model can learn to lie about the number of detected objects through modifying the forked layers (weights)! Since it is easier for model to fabricate a valid $n'_c, i$ than to actually detect fewer objects which is more complex. Also, if we use a constant, unfabricatable unit (e.g. a constant neural net) that counts the number of detected objects, there would be no gradient to punish (teach) the model!
This is why this type of supervision is hard to achieve.
$endgroup$
This is a very interesting supervision but hard to achieve!
Why we need this supervision?
The need for this supervision comes from the fact that model may wrongly detect more objects than it should, thus, must be punished (taught) for this violation, otherwise no supervision would be required since model is acting accordingly.
How to implement this supervision?
To this end, we need to fork some layers from the model to output the number of detected objects per class $c$ for input image $i$, namely $n'_c,i$, then supervise this output with the true number of objects in image $i$, namely $n_c, i$, or merely with an upper limit $N_c$ per class as you have suggested. Then, add a term like $(n_c, i - n'_c, i)^2$ or $textmax(0, n'_c, i - N_c)$ to the loss function to punish the model for detecting wrong or more number of objects than it should respectively. Then proceed to train the model.
What may go wrong?
But here is the problem, model can learn to lie about the number of detected objects through modifying the forked layers (weights)! Since it is easier for model to fabricate a valid $n'_c, i$ than to actually detect fewer objects which is more complex. Also, if we use a constant, unfabricatable unit (e.g. a constant neural net) that counts the number of detected objects, there would be no gradient to punish (teach) the model!
This is why this type of supervision is hard to achieve.
edited Mar 29 at 20:16
answered Mar 29 at 11:07
EsmailianEsmailian
2,805318
2,805318
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48197%2fdetection-model-training-with-class-instance-limit-awareness%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown