People detection methodsAn abstract idea for the performance diffs between SLP and MLPML models: average of all versus average of averages?Transfer learning: Poor performance with last layer replacedValidation set performance increased, test set performance decreasedHow to increase accuracy of model from tensorflow model zoo?openCV tracking algorith & Haar CascadesDetecting address labels using Tensorflow Object Detection APINeed help with confusing dataset formats for Images and annotationsAction Recognition for multiple objects and localizationHow to calculate Average Precision for Image Segmentation?

Unknowingly ran an infinite loop in terminal

Is there formal test of non-linearity in linear regression?

Would glacier 'trees' be plausible?

On which topic did Indiana Jones write his doctoral thesis?

What word means "to make something obsolete"?

How encryption in SQL login authentication works

What is the name of this hexagon/pentagon polyhedron?

Should I mention being denied entry to UK due to a confusion in my Visa and Ticket bookings?

Identifying my late father's D&D stuff found in the attic

Moving the subject of the sentence into a dangling participle

Did we get closer to another plane than we were supposed to, or was the pilot just protecting our delicate sensibilities?

Why are prions in animal diets not destroyed by the digestive system?

What was the state of the German rail system in 1944?

Is Cola "probably the best-known" Latin word in the world? If not, which might it be?

Short story with physics professor who "brings back the dead" (Asimov or Bradbury?)

In Avengers 1, why does Thanos need Loki?

When and why did journal article titles become descriptive, rather than creatively allusive?

Missed the connecting flight, separate tickets on same airline - who is responsible?

Why isn't nylon as strong as kevlar?

Python password manager

Why was the battle set up *outside* Winterfell?

CRT Oscilloscope - part of the plot is missing

How can I support myself financially as a 17 year old with a loan?

Limit of this definite integral



People detection methods


An abstract idea for the performance diffs between SLP and MLPML models: average of all versus average of averages?Transfer learning: Poor performance with last layer replacedValidation set performance increased, test set performance decreasedHow to increase accuracy of model from tensorflow model zoo?openCV tracking algorith & Haar CascadesDetecting address labels using Tensorflow Object Detection APINeed help with confusing dataset formats for Images and annotationsAction Recognition for multiple objects and localizationHow to calculate Average Precision for Image Segmentation?













1












$begingroup$


In my problem I want to distinguish people from other shapes in images e.g I want to accurately know how many people are in specific region of image (at least for small number of people, for crowded places it is reasonable to get worse results).



In OpenCV there are three methods with pretrained datasets. Two basic HOGs + SVM algorithms (one trained on inria and the other on daimler dataset) and DPM method that I used also with inria dataset.
Models were trained on images of pedestrians of size 64x128(inria) and 48x96(daimler).



I've been doing my tests on a video file with two people in it, but those people are standing or sitting.
From my observations I can tell that:



  • DPM is the best algorithm, but it is really slow. It can detect human in sitting position.

  • HOGs are very dependent on the scale e.g. svm trained on daimler dataset works better with small resolution(180p in my case, because camera was close to the people), but for inria results are worse

  • HOGs works better when full human shape is clearly visible

  • Daimler gives a lot of false positives

The results coincide with expectations, because models where trained on standing pedestrians, but even when people were standing the accuracy is really bad.
Basically I need to train my own models, but I am concerned about such a strong dependence on the scale and doubt in the sense of using these methods.



I was thinking about using CNN's, but my target device has ARM cpu (https://www.mediatek.com/products/homeNetworking/mt7623n-a).
I don't need this detection to work in realtime and I'll be feeding the network with only a part of image (smaller than 480p).



Do you think that such a network can work with decent performance on such cpu?
Do you have any suggestions on type of the network, library (c++) that I could try?



UPDATE



I am experimenting with OpenCV's DNN module and yolov3-tiny. It processes whole frame in ~2s and doesn't use a lot of ram (on my target device). I am really satisfied with predictions. Of course, full yolo is better than tiny version, but much slower and consumes a lot of ram.



I am confused about number of models that exists and datasets they were trained on. Caffe Zoo has a lot of them, but I found that e.g GoogleNet doesn't have person class, so this is way I chose yolo for tests. But Yolo also is available for COCO or VOC and I just need to classify people not dogs etc.



I still have to experiment with input parameters in opencv to see how they affect performance and accuracy.



I am also thinking about retraining yolo for humans only maybe in this way I'll reach accuracy of full yolo and performance of the tiny one. What do you think? In general it is better to use COCO or VOC dataset?










share|improve this question











$endgroup$
















    1












    $begingroup$


    In my problem I want to distinguish people from other shapes in images e.g I want to accurately know how many people are in specific region of image (at least for small number of people, for crowded places it is reasonable to get worse results).



    In OpenCV there are three methods with pretrained datasets. Two basic HOGs + SVM algorithms (one trained on inria and the other on daimler dataset) and DPM method that I used also with inria dataset.
    Models were trained on images of pedestrians of size 64x128(inria) and 48x96(daimler).



    I've been doing my tests on a video file with two people in it, but those people are standing or sitting.
    From my observations I can tell that:



    • DPM is the best algorithm, but it is really slow. It can detect human in sitting position.

    • HOGs are very dependent on the scale e.g. svm trained on daimler dataset works better with small resolution(180p in my case, because camera was close to the people), but for inria results are worse

    • HOGs works better when full human shape is clearly visible

    • Daimler gives a lot of false positives

    The results coincide with expectations, because models where trained on standing pedestrians, but even when people were standing the accuracy is really bad.
    Basically I need to train my own models, but I am concerned about such a strong dependence on the scale and doubt in the sense of using these methods.



    I was thinking about using CNN's, but my target device has ARM cpu (https://www.mediatek.com/products/homeNetworking/mt7623n-a).
    I don't need this detection to work in realtime and I'll be feeding the network with only a part of image (smaller than 480p).



    Do you think that such a network can work with decent performance on such cpu?
    Do you have any suggestions on type of the network, library (c++) that I could try?



    UPDATE



    I am experimenting with OpenCV's DNN module and yolov3-tiny. It processes whole frame in ~2s and doesn't use a lot of ram (on my target device). I am really satisfied with predictions. Of course, full yolo is better than tiny version, but much slower and consumes a lot of ram.



    I am confused about number of models that exists and datasets they were trained on. Caffe Zoo has a lot of them, but I found that e.g GoogleNet doesn't have person class, so this is way I chose yolo for tests. But Yolo also is available for COCO or VOC and I just need to classify people not dogs etc.



    I still have to experiment with input parameters in opencv to see how they affect performance and accuracy.



    I am also thinking about retraining yolo for humans only maybe in this way I'll reach accuracy of full yolo and performance of the tiny one. What do you think? In general it is better to use COCO or VOC dataset?










    share|improve this question











    $endgroup$














      1












      1








      1





      $begingroup$


      In my problem I want to distinguish people from other shapes in images e.g I want to accurately know how many people are in specific region of image (at least for small number of people, for crowded places it is reasonable to get worse results).



      In OpenCV there are three methods with pretrained datasets. Two basic HOGs + SVM algorithms (one trained on inria and the other on daimler dataset) and DPM method that I used also with inria dataset.
      Models were trained on images of pedestrians of size 64x128(inria) and 48x96(daimler).



      I've been doing my tests on a video file with two people in it, but those people are standing or sitting.
      From my observations I can tell that:



      • DPM is the best algorithm, but it is really slow. It can detect human in sitting position.

      • HOGs are very dependent on the scale e.g. svm trained on daimler dataset works better with small resolution(180p in my case, because camera was close to the people), but for inria results are worse

      • HOGs works better when full human shape is clearly visible

      • Daimler gives a lot of false positives

      The results coincide with expectations, because models where trained on standing pedestrians, but even when people were standing the accuracy is really bad.
      Basically I need to train my own models, but I am concerned about such a strong dependence on the scale and doubt in the sense of using these methods.



      I was thinking about using CNN's, but my target device has ARM cpu (https://www.mediatek.com/products/homeNetworking/mt7623n-a).
      I don't need this detection to work in realtime and I'll be feeding the network with only a part of image (smaller than 480p).



      Do you think that such a network can work with decent performance on such cpu?
      Do you have any suggestions on type of the network, library (c++) that I could try?



      UPDATE



      I am experimenting with OpenCV's DNN module and yolov3-tiny. It processes whole frame in ~2s and doesn't use a lot of ram (on my target device). I am really satisfied with predictions. Of course, full yolo is better than tiny version, but much slower and consumes a lot of ram.



      I am confused about number of models that exists and datasets they were trained on. Caffe Zoo has a lot of them, but I found that e.g GoogleNet doesn't have person class, so this is way I chose yolo for tests. But Yolo also is available for COCO or VOC and I just need to classify people not dogs etc.



      I still have to experiment with input parameters in opencv to see how they affect performance and accuracy.



      I am also thinking about retraining yolo for humans only maybe in this way I'll reach accuracy of full yolo and performance of the tiny one. What do you think? In general it is better to use COCO or VOC dataset?










      share|improve this question











      $endgroup$




      In my problem I want to distinguish people from other shapes in images e.g I want to accurately know how many people are in specific region of image (at least for small number of people, for crowded places it is reasonable to get worse results).



      In OpenCV there are three methods with pretrained datasets. Two basic HOGs + SVM algorithms (one trained on inria and the other on daimler dataset) and DPM method that I used also with inria dataset.
      Models were trained on images of pedestrians of size 64x128(inria) and 48x96(daimler).



      I've been doing my tests on a video file with two people in it, but those people are standing or sitting.
      From my observations I can tell that:



      • DPM is the best algorithm, but it is really slow. It can detect human in sitting position.

      • HOGs are very dependent on the scale e.g. svm trained on daimler dataset works better with small resolution(180p in my case, because camera was close to the people), but for inria results are worse

      • HOGs works better when full human shape is clearly visible

      • Daimler gives a lot of false positives

      The results coincide with expectations, because models where trained on standing pedestrians, but even when people were standing the accuracy is really bad.
      Basically I need to train my own models, but I am concerned about such a strong dependence on the scale and doubt in the sense of using these methods.



      I was thinking about using CNN's, but my target device has ARM cpu (https://www.mediatek.com/products/homeNetworking/mt7623n-a).
      I don't need this detection to work in realtime and I'll be feeding the network with only a part of image (smaller than 480p).



      Do you think that such a network can work with decent performance on such cpu?
      Do you have any suggestions on type of the network, library (c++) that I could try?



      UPDATE



      I am experimenting with OpenCV's DNN module and yolov3-tiny. It processes whole frame in ~2s and doesn't use a lot of ram (on my target device). I am really satisfied with predictions. Of course, full yolo is better than tiny version, but much slower and consumes a lot of ram.



      I am confused about number of models that exists and datasets they were trained on. Caffe Zoo has a lot of them, but I found that e.g GoogleNet doesn't have person class, so this is way I chose yolo for tests. But Yolo also is available for COCO or VOC and I just need to classify people not dogs etc.



      I still have to experiment with input parameters in opencv to see how they affect performance and accuracy.



      I am also thinking about retraining yolo for humans only maybe in this way I'll reach accuracy of full yolo and performance of the tiny one. What do you think? In general it is better to use COCO or VOC dataset?







      neural-network cnn object-detection opencv






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 10 at 11:15







      tobix10

















      asked Apr 9 at 17:00









      tobix10tobix10

      62




      62




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          If you don't need it to work in real time you should not worry about your CPU that much.



          There are a few models for face detection using Res Net 10, with portability to OpenCV, those might be enough if you the people you are trying to count are facing forward.



          Else, you can use a Res Net 10, it runs up to 100 FPS on a Intel i5 7200u, which is not that big deal.



          Update



          Oh, this Face Detection library has optimization form ARM processors using Tengine which may help your case. About face/person I've seen a detection algorithm using OpenCV that detects human heads in any position, including the back of the head. If you want to do person count I think that should suffice. I will try to find it and post here.






          share|improve this answer











          $endgroup$












          • $begingroup$
            I see, but I want to detect person in general not by face only. I will need to read more about all those different neural networks. I updated the post with information about my current experiments.
            $endgroup$
            – tobix10
            Apr 10 at 11:11











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48977%2fpeople-detection-methods%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          If you don't need it to work in real time you should not worry about your CPU that much.



          There are a few models for face detection using Res Net 10, with portability to OpenCV, those might be enough if you the people you are trying to count are facing forward.



          Else, you can use a Res Net 10, it runs up to 100 FPS on a Intel i5 7200u, which is not that big deal.



          Update



          Oh, this Face Detection library has optimization form ARM processors using Tengine which may help your case. About face/person I've seen a detection algorithm using OpenCV that detects human heads in any position, including the back of the head. If you want to do person count I think that should suffice. I will try to find it and post here.






          share|improve this answer











          $endgroup$












          • $begingroup$
            I see, but I want to detect person in general not by face only. I will need to read more about all those different neural networks. I updated the post with information about my current experiments.
            $endgroup$
            – tobix10
            Apr 10 at 11:11















          1












          $begingroup$

          If you don't need it to work in real time you should not worry about your CPU that much.



          There are a few models for face detection using Res Net 10, with portability to OpenCV, those might be enough if you the people you are trying to count are facing forward.



          Else, you can use a Res Net 10, it runs up to 100 FPS on a Intel i5 7200u, which is not that big deal.



          Update



          Oh, this Face Detection library has optimization form ARM processors using Tengine which may help your case. About face/person I've seen a detection algorithm using OpenCV that detects human heads in any position, including the back of the head. If you want to do person count I think that should suffice. I will try to find it and post here.






          share|improve this answer











          $endgroup$












          • $begingroup$
            I see, but I want to detect person in general not by face only. I will need to read more about all those different neural networks. I updated the post with information about my current experiments.
            $endgroup$
            – tobix10
            Apr 10 at 11:11













          1












          1








          1





          $begingroup$

          If you don't need it to work in real time you should not worry about your CPU that much.



          There are a few models for face detection using Res Net 10, with portability to OpenCV, those might be enough if you the people you are trying to count are facing forward.



          Else, you can use a Res Net 10, it runs up to 100 FPS on a Intel i5 7200u, which is not that big deal.



          Update



          Oh, this Face Detection library has optimization form ARM processors using Tengine which may help your case. About face/person I've seen a detection algorithm using OpenCV that detects human heads in any position, including the back of the head. If you want to do person count I think that should suffice. I will try to find it and post here.






          share|improve this answer











          $endgroup$



          If you don't need it to work in real time you should not worry about your CPU that much.



          There are a few models for face detection using Res Net 10, with portability to OpenCV, those might be enough if you the people you are trying to count are facing forward.



          Else, you can use a Res Net 10, it runs up to 100 FPS on a Intel i5 7200u, which is not that big deal.



          Update



          Oh, this Face Detection library has optimization form ARM processors using Tengine which may help your case. About face/person I've seen a detection algorithm using OpenCV that detects human heads in any position, including the back of the head. If you want to do person count I think that should suffice. I will try to find it and post here.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 10 at 13:09

























          answered Apr 10 at 0:18









          Pedro Henrique MonfortePedro Henrique Monforte

          569219




          569219











          • $begingroup$
            I see, but I want to detect person in general not by face only. I will need to read more about all those different neural networks. I updated the post with information about my current experiments.
            $endgroup$
            – tobix10
            Apr 10 at 11:11
















          • $begingroup$
            I see, but I want to detect person in general not by face only. I will need to read more about all those different neural networks. I updated the post with information about my current experiments.
            $endgroup$
            – tobix10
            Apr 10 at 11:11















          $begingroup$
          I see, but I want to detect person in general not by face only. I will need to read more about all those different neural networks. I updated the post with information about my current experiments.
          $endgroup$
          – tobix10
          Apr 10 at 11:11




          $begingroup$
          I see, but I want to detect person in general not by face only. I will need to read more about all those different neural networks. I updated the post with information about my current experiments.
          $endgroup$
          – tobix10
          Apr 10 at 11:11

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48977%2fpeople-detection-methods%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High