Action Recognition for multiple objects and localization The Next CEO of Stack Overflow2019 Community Moderator ElectionTrajectory data mining and pattern recognition using ORB-SLAM and KNN-DTWInput and output feature shapes in CNN for speech recognitionConvnet training error does not decreaseVideo classification of birdsClustering/ Classifying users based on sequence of action and timeReframing action recognition as a reinforcement learning problemsamples for different objects with unique labelsActivity recognition with binary sensorsHow to count objects in ChainerCVExtracting metrics from multiple classes of clustered objects

Are the names of these months realistic?

Would a completely good Muggle be able to use a wand?

Inexact numbers as keys in Association?

Towers in the ocean; How deep can they be built?

What CSS properties can the br tag have?

Is "three point ish" an acceptable use of ish?

What steps are necessary to read a Modern SSD in Medieval Europe?

Calculate the Mean mean of two numbers

Lucky Feat: How can "more than one creature spend a luck point to influence the outcome of a roll"?

How to find image of a complex function with given constraints?

Is it convenient to ask the journal's editor for two additional days to complete a review?

How to properly draw diagonal line while using multicolumn inside tabular environment?

Strength of face-nailed connection for stair steps

Can I use the word “Senior” as part of a job title directly in German?

Why the last AS PATH item always is `I` or `?`?

New carbon wheel brake pads after use on aluminum wheel?

Where do students learn to solve polynomial equations these days?

Pulling the principal components out of a DimensionReducerFunction?

From jafe to El-Guest

"Eavesdropping" vs "Listen in on"

Getting Stale Gas Out of a Gas Tank w/out Dropping the Tank

0-rank tensor vs vector in 1D

Why don't programming languages automatically manage the synchronous/asynchronous problem?

Iterate through multiline string line by line

Action Recognition for multiple objects and localization

The Next CEO of Stack Overflow

2019 Community Moderator ElectionTrajectory data mining and pattern recognition using ORB-SLAM and KNN-DTWInput and output feature shapes in CNN for speech recognitionConvnet training error does not decreaseVideo classification of birdsClustering/ Classifying users based on sequence of action and timeReframing action recognition as a reinforcement learning problemsamples for different objects with unique labelsActivity recognition with binary sensorsHow to count objects in ChainerCVExtracting metrics from multiple classes of clustered objects

I want to ask question regarding the action detection on the video with proposed frames. I've used Temporal 3D ConvNet for the action recognition on video. Successfully trained it and can recognize action on videos.

When i do inference, i just collect 20 frames from video, feed it to model and it gives me result. The point is that events on different videos are not similar size. Some of them cover 90% of the frame, but some may 10%. Let's take as example that two objects collided and it can happen in different scale, and i want to detect this action.

How provide to model exact position for the action recognition, if it can happen on different scale with different objects? What comes in mind is to use Yolo to collect Regions of Interest and feed collected frames every time the 3D convnet. But if there are a lot of objects, the speed will be very slow. How to handle it?
Is there any end-to-end solutions for the action recognition with the object location proposal for the action recognition network?
I've already looked papers and blogs, what people suggest, couldn't find solution for the localization issues, so action recognition model got correct frames.

Any advise from you? Maybe someone may explain me approach?

Thank you

Regards, Dmitry

asked Mar 23 at 9:45

Dmitry

add a comment |

Any advise from you? Maybe someone may explain me approach?

Thank you

Regards, Dmitry

asked Mar 23 at 9:45

Dmitry

add a comment |

Any advise from you? Maybe someone may explain me approach?

Thank you

Regards, Dmitry

asked Mar 23 at 9:45

Dmitry

Any advise from you? Maybe someone may explain me approach?

Thank you

Regards, Dmitry

machine-learning classification object-detection activity-recognition

asked Mar 23 at 9:45

Dmitry

asked Mar 23 at 9:45

Dmitry

asked Mar 23 at 9:45

Dmitry

asked Mar 23 at 9:45

Dmitry

asked Mar 23 at 9:45

Dmitry

add a comment |

1 Answer
1

active

oldest

votes

So finding actions from videos happens to be a tricky task. I have no idea about temporal 3D convnet but in order to tackle a problem like this, I would couple the CNN layer on individual frames of video and then feed the frame timeline into another layer of LSTM in order to find the context of the video.

Video action

As the action being performed on the video covers anywhere from 10% to 90% of the frame, you can perform TestTimeAugmentation on the video in order to find the action with a higher confidence rate. Similar approach could be found in this video by Google.

answered Mar 23 at 10:07

thanatoz

467217

$begingroup$
Thank you for response. I've considered using cnn+lstm. But here i also lack of understanding. First thing is how to stitch together cnn and lstm? Should use end-to-end approach or train networks separately? If separately, how should i pass features from cnn to lstm?
$endgroup$
– Dmitry
Mar 23 at 11:19

$begingroup$
So this is where your mathematics and deep learning concepts will come handy. The way how you model a CNN together with RNN or LSTM depends on the framework you are using. In Keras, refer to the docs of the functional model. Hope it helps.
$endgroup$
– thanatoz
Mar 23 at 18:12

add a comment |

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47833%2faction-recognition-for-multiple-objects-and-localization%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Video action

answered Mar 23 at 10:07

thanatoz

467217

$begingroup$
Thank you for response. I've considered using cnn+lstm. But here i also lack of understanding. First thing is how to stitch together cnn and lstm? Should use end-to-end approach or train networks separately? If separately, how should i pass features from cnn to lstm?
$endgroup$
– Dmitry
Mar 23 at 11:19

$begingroup$
So this is where your mathematics and deep learning concepts will come handy. The way how you model a CNN together with RNN or LSTM depends on the framework you are using. In Keras, refer to the docs of the functional model. Hope it helps.
$endgroup$
– thanatoz
Mar 23 at 18:12

add a comment |

Video action

answered Mar 23 at 10:07

thanatoz

467217

$begingroup$
Thank you for response. I've considered using cnn+lstm. But here i also lack of understanding. First thing is how to stitch together cnn and lstm? Should use end-to-end approach or train networks separately? If separately, how should i pass features from cnn to lstm?
$endgroup$
– Dmitry
Mar 23 at 11:19

$begingroup$
So this is where your mathematics and deep learning concepts will come handy. The way how you model a CNN together with RNN or LSTM depends on the framework you are using. In Keras, refer to the docs of the functional model. Hope it helps.
$endgroup$
– thanatoz
Mar 23 at 18:12

add a comment |

Video action

answered Mar 23 at 10:07

thanatoz

467217

Video action

answered Mar 23 at 10:07

thanatoz

467217

answered Mar 23 at 10:07

thanatoz

467217

answered Mar 23 at 10:07

thanatoz

467217

answered Mar 23 at 10:07

thanatoz

467217

$begingroup$
Thank you for response. I've considered using cnn+lstm. But here i also lack of understanding. First thing is how to stitch together cnn and lstm? Should use end-to-end approach or train networks separately? If separately, how should i pass features from cnn to lstm?
$endgroup$
– Dmitry
Mar 23 at 11:19

$begingroup$
So this is where your mathematics and deep learning concepts will come handy. The way how you model a CNN together with RNN or LSTM depends on the framework you are using. In Keras, refer to the docs of the functional model. Hope it helps.
$endgroup$
– thanatoz
Mar 23 at 18:12

add a comment |

$begingroup$
Thank you for response. I've considered using cnn+lstm. But here i also lack of understanding. First thing is how to stitch together cnn and lstm? Should use end-to-end approach or train networks separately? If separately, how should i pass features from cnn to lstm?
$endgroup$
– Dmitry
Mar 23 at 11:19

$begingroup$
So this is where your mathematics and deep learning concepts will come handy. The way how you model a CNN together with RNN or LSTM depends on the framework you are using. In Keras, refer to the docs of the functional model. Hope it helps.
$endgroup$
– thanatoz
Mar 23 at 18:12

Thank you for response. I've considered using cnn+lstm. But here i also lack of understanding. First thing is how to stitch together cnn and lstm? Should use end-to-end approach or train networks separately? If separately, how should i pass features from cnn to lstm?

– Dmitry
Mar 23 at 11:19

So this is where your mathematics and deep learning concepts will come handy. The way how you model a CNN together with RNN or LSTM depends on the framework you are using. In Keras, refer to the docs of the functional model. Hope it helps.

– thanatoz
Mar 23 at 18:12

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

1 Answer
1

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1