Detection model - Training with class-instance limit-awareness2019 Community Moderator ElectionSpecifying neural network output layout for object detectionWhat classifier is the best to determine if object was detected in the correct position?Data preprocessing: Should we normalise images pixel-wise?Several fundamental questions about CNNTraining of Region Proposal Network (RPN)Bounding Boxes in YOLO ModelYOLO algorithm - understanding training dataOdd Loss Curves for Object Detection TaskDetecting address labels using Tensorflow Object Detection API

aging parents with no investments

Uplifted animals have parts of their "brain" in various locations of their body. Where?

Can a planet have a different gravitational pull depending on its location in orbit around its sun?

extract characters between two commas?

How to deal with fear of taking dependencies

Is domain driven design an anti-SQL pattern?

Where to refill my bottle in India?

Does the average primeness of natural numbers tend to zero?

Are cabin dividers used to "hide" the flex of the airplane?

Creating a loop after a break using Markov Chain in Tikz

Why do UK politicians seemingly ignore opinion polls on Brexit?

How did the USSR manage to innovate in an environment characterized by government censorship and high bureaucracy?

Is it legal to have the "// (c) 2019 John Smith" header in all files when there are hundreds of contributors?

LWC and complex parameters

I’m planning on buying a laser printer but concerned about the life cycle of toner in the machine

New order #4: World

Why is the design of haulage companies so “special”?

Why do we use polarized capacitors?

Prime joint compound before latex paint?

Doomsday-clock for my fantasy planet

Is Social Media Science Fiction?

Can the Produce Flame cantrip be used to grapple, or as an unarmed strike, in the right circumstances?

Is this relativistic mass?

What do you call something that goes against the spirit of the law, but is legal when interpreting the law to the letter?



Detection model - Training with class-instance limit-awareness



2019 Community Moderator ElectionSpecifying neural network output layout for object detectionWhat classifier is the best to determine if object was detected in the correct position?Data preprocessing: Should we normalise images pixel-wise?Several fundamental questions about CNNTraining of Region Proposal Network (RPN)Bounding Boxes in YOLO ModelYOLO algorithm - understanding training dataOdd Loss Curves for Object Detection TaskDetecting address labels using Tensorflow Object Detection API










1












$begingroup$


Is there a way to make a detection model aware of the maximal number of possible objects of a given class within a single image?



For example in a toy case with 2 classes. If I know that in every single image, class A can have no more than 5 instances and class B no more than 1. Is there a way to incorporate it into the training process?



To make it clear, I'm not talking about an additional algorithm which runs on top of the trained model (such as non-maximum suppression which is used to select a single bounding box for an object). I specifically ask about the actual model and its training process.










share|improve this question









$endgroup$
















    1












    $begingroup$


    Is there a way to make a detection model aware of the maximal number of possible objects of a given class within a single image?



    For example in a toy case with 2 classes. If I know that in every single image, class A can have no more than 5 instances and class B no more than 1. Is there a way to incorporate it into the training process?



    To make it clear, I'm not talking about an additional algorithm which runs on top of the trained model (such as non-maximum suppression which is used to select a single bounding box for an object). I specifically ask about the actual model and its training process.










    share|improve this question









    $endgroup$














      1












      1








      1


      1



      $begingroup$


      Is there a way to make a detection model aware of the maximal number of possible objects of a given class within a single image?



      For example in a toy case with 2 classes. If I know that in every single image, class A can have no more than 5 instances and class B no more than 1. Is there a way to incorporate it into the training process?



      To make it clear, I'm not talking about an additional algorithm which runs on top of the trained model (such as non-maximum suppression which is used to select a single bounding box for an object). I specifically ask about the actual model and its training process.










      share|improve this question









      $endgroup$




      Is there a way to make a detection model aware of the maximal number of possible objects of a given class within a single image?



      For example in a toy case with 2 classes. If I know that in every single image, class A can have no more than 5 instances and class B no more than 1. Is there a way to incorporate it into the training process?



      To make it clear, I'm not talking about an additional algorithm which runs on top of the trained model (such as non-maximum suppression which is used to select a single bounding box for an object). I specifically ask about the actual model and its training process.







      machine-learning deep-learning object-detection






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 29 at 8:33









      Mark.FMark.F

      1,0841521




      1,0841521




















          2 Answers
          2






          active

          oldest

          votes


















          1












          $begingroup$

          I can think of the following approach:



          Let's say that you have two classes, A and B. Additionally, you now that for class A there is at maximum 5 instances (so 0, 1, 2, 3, 4 or 5) and for B 1 instance (0 or 1).



          For this purpose, you can have 6 outputs for class A and 2 outputs for class B.
          Between those 6 outputs for class A, only one should be active; same for B - only one of two should be active.



          For example, if on some image there are 3 objects of class A and 0 for B, the outputs would be: [0, 0, 0, 1, 0, 0] for A, and [1, 0] for B (or something very close to 0s and 1s, right?)



          With these outputs, you can also combine other outputs which are needed for detection.






          share|improve this answer









          $endgroup$




















            1












            $begingroup$

            This is a very interesting supervision but hard to achieve!



            Why we need this supervision?



            The need for this supervision comes from the fact that model may wrongly detect more objects than it should, thus, must be punished (taught) for this violation, otherwise no supervision would be required since model is acting accordingly.



            How to implement this supervision?



            To this end, we need to fork some layers from the model to output the number of detected objects per class $c$ for input image $i$, namely $n'_c,i$, then supervise this output with the true number of objects in image $i$, namely $n_c, i$, or merely with an upper limit $N_c$ per class as you have suggested. Then, add a term like $(n_c, i - n'_c, i)^2$ or $textmax(0, n'_c, i - N_c)$ to the loss function to punish the model for detecting wrong or more number of objects than it should respectively. Then proceed to train the model.



            What may go wrong?



            But here is the problem, model can learn to lie about the number of detected objects through modifying the forked layers (weights)! Since it is easier for model to fabricate a valid $n'_c, i$ than to actually detect fewer objects which is more complex. Also, if we use a constant, unfabricatable unit (e.g. a constant neural net) that counts the number of detected objects, there would be no gradient to punish (teach) the model!



            This is why this type of supervision is hard to achieve.






            share|improve this answer











            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "557"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48197%2fdetection-model-training-with-class-instance-limit-awareness%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1












              $begingroup$

              I can think of the following approach:



              Let's say that you have two classes, A and B. Additionally, you now that for class A there is at maximum 5 instances (so 0, 1, 2, 3, 4 or 5) and for B 1 instance (0 or 1).



              For this purpose, you can have 6 outputs for class A and 2 outputs for class B.
              Between those 6 outputs for class A, only one should be active; same for B - only one of two should be active.



              For example, if on some image there are 3 objects of class A and 0 for B, the outputs would be: [0, 0, 0, 1, 0, 0] for A, and [1, 0] for B (or something very close to 0s and 1s, right?)



              With these outputs, you can also combine other outputs which are needed for detection.






              share|improve this answer









              $endgroup$

















                1












                $begingroup$

                I can think of the following approach:



                Let's say that you have two classes, A and B. Additionally, you now that for class A there is at maximum 5 instances (so 0, 1, 2, 3, 4 or 5) and for B 1 instance (0 or 1).



                For this purpose, you can have 6 outputs for class A and 2 outputs for class B.
                Between those 6 outputs for class A, only one should be active; same for B - only one of two should be active.



                For example, if on some image there are 3 objects of class A and 0 for B, the outputs would be: [0, 0, 0, 1, 0, 0] for A, and [1, 0] for B (or something very close to 0s and 1s, right?)



                With these outputs, you can also combine other outputs which are needed for detection.






                share|improve this answer









                $endgroup$















                  1












                  1








                  1





                  $begingroup$

                  I can think of the following approach:



                  Let's say that you have two classes, A and B. Additionally, you now that for class A there is at maximum 5 instances (so 0, 1, 2, 3, 4 or 5) and for B 1 instance (0 or 1).



                  For this purpose, you can have 6 outputs for class A and 2 outputs for class B.
                  Between those 6 outputs for class A, only one should be active; same for B - only one of two should be active.



                  For example, if on some image there are 3 objects of class A and 0 for B, the outputs would be: [0, 0, 0, 1, 0, 0] for A, and [1, 0] for B (or something very close to 0s and 1s, right?)



                  With these outputs, you can also combine other outputs which are needed for detection.






                  share|improve this answer









                  $endgroup$



                  I can think of the following approach:



                  Let's say that you have two classes, A and B. Additionally, you now that for class A there is at maximum 5 instances (so 0, 1, 2, 3, 4 or 5) and for B 1 instance (0 or 1).



                  For this purpose, you can have 6 outputs for class A and 2 outputs for class B.
                  Between those 6 outputs for class A, only one should be active; same for B - only one of two should be active.



                  For example, if on some image there are 3 objects of class A and 0 for B, the outputs would be: [0, 0, 0, 1, 0, 0] for A, and [1, 0] for B (or something very close to 0s and 1s, right?)



                  With these outputs, you can also combine other outputs which are needed for detection.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Mar 29 at 9:58









                  Antonio JurićAntonio Jurić

                  741111




                  741111





















                      1












                      $begingroup$

                      This is a very interesting supervision but hard to achieve!



                      Why we need this supervision?



                      The need for this supervision comes from the fact that model may wrongly detect more objects than it should, thus, must be punished (taught) for this violation, otherwise no supervision would be required since model is acting accordingly.



                      How to implement this supervision?



                      To this end, we need to fork some layers from the model to output the number of detected objects per class $c$ for input image $i$, namely $n'_c,i$, then supervise this output with the true number of objects in image $i$, namely $n_c, i$, or merely with an upper limit $N_c$ per class as you have suggested. Then, add a term like $(n_c, i - n'_c, i)^2$ or $textmax(0, n'_c, i - N_c)$ to the loss function to punish the model for detecting wrong or more number of objects than it should respectively. Then proceed to train the model.



                      What may go wrong?



                      But here is the problem, model can learn to lie about the number of detected objects through modifying the forked layers (weights)! Since it is easier for model to fabricate a valid $n'_c, i$ than to actually detect fewer objects which is more complex. Also, if we use a constant, unfabricatable unit (e.g. a constant neural net) that counts the number of detected objects, there would be no gradient to punish (teach) the model!



                      This is why this type of supervision is hard to achieve.






                      share|improve this answer











                      $endgroup$

















                        1












                        $begingroup$

                        This is a very interesting supervision but hard to achieve!



                        Why we need this supervision?



                        The need for this supervision comes from the fact that model may wrongly detect more objects than it should, thus, must be punished (taught) for this violation, otherwise no supervision would be required since model is acting accordingly.



                        How to implement this supervision?



                        To this end, we need to fork some layers from the model to output the number of detected objects per class $c$ for input image $i$, namely $n'_c,i$, then supervise this output with the true number of objects in image $i$, namely $n_c, i$, or merely with an upper limit $N_c$ per class as you have suggested. Then, add a term like $(n_c, i - n'_c, i)^2$ or $textmax(0, n'_c, i - N_c)$ to the loss function to punish the model for detecting wrong or more number of objects than it should respectively. Then proceed to train the model.



                        What may go wrong?



                        But here is the problem, model can learn to lie about the number of detected objects through modifying the forked layers (weights)! Since it is easier for model to fabricate a valid $n'_c, i$ than to actually detect fewer objects which is more complex. Also, if we use a constant, unfabricatable unit (e.g. a constant neural net) that counts the number of detected objects, there would be no gradient to punish (teach) the model!



                        This is why this type of supervision is hard to achieve.






                        share|improve this answer











                        $endgroup$















                          1












                          1








                          1





                          $begingroup$

                          This is a very interesting supervision but hard to achieve!



                          Why we need this supervision?



                          The need for this supervision comes from the fact that model may wrongly detect more objects than it should, thus, must be punished (taught) for this violation, otherwise no supervision would be required since model is acting accordingly.



                          How to implement this supervision?



                          To this end, we need to fork some layers from the model to output the number of detected objects per class $c$ for input image $i$, namely $n'_c,i$, then supervise this output with the true number of objects in image $i$, namely $n_c, i$, or merely with an upper limit $N_c$ per class as you have suggested. Then, add a term like $(n_c, i - n'_c, i)^2$ or $textmax(0, n'_c, i - N_c)$ to the loss function to punish the model for detecting wrong or more number of objects than it should respectively. Then proceed to train the model.



                          What may go wrong?



                          But here is the problem, model can learn to lie about the number of detected objects through modifying the forked layers (weights)! Since it is easier for model to fabricate a valid $n'_c, i$ than to actually detect fewer objects which is more complex. Also, if we use a constant, unfabricatable unit (e.g. a constant neural net) that counts the number of detected objects, there would be no gradient to punish (teach) the model!



                          This is why this type of supervision is hard to achieve.






                          share|improve this answer











                          $endgroup$



                          This is a very interesting supervision but hard to achieve!



                          Why we need this supervision?



                          The need for this supervision comes from the fact that model may wrongly detect more objects than it should, thus, must be punished (taught) for this violation, otherwise no supervision would be required since model is acting accordingly.



                          How to implement this supervision?



                          To this end, we need to fork some layers from the model to output the number of detected objects per class $c$ for input image $i$, namely $n'_c,i$, then supervise this output with the true number of objects in image $i$, namely $n_c, i$, or merely with an upper limit $N_c$ per class as you have suggested. Then, add a term like $(n_c, i - n'_c, i)^2$ or $textmax(0, n'_c, i - N_c)$ to the loss function to punish the model for detecting wrong or more number of objects than it should respectively. Then proceed to train the model.



                          What may go wrong?



                          But here is the problem, model can learn to lie about the number of detected objects through modifying the forked layers (weights)! Since it is easier for model to fabricate a valid $n'_c, i$ than to actually detect fewer objects which is more complex. Also, if we use a constant, unfabricatable unit (e.g. a constant neural net) that counts the number of detected objects, there would be no gradient to punish (teach) the model!



                          This is why this type of supervision is hard to achieve.







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Mar 29 at 20:16

























                          answered Mar 29 at 11:07









                          EsmailianEsmailian

                          2,805318




                          2,805318



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48197%2fdetection-model-training-with-class-instance-limit-awareness%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                              Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

                              Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?