Can video descriptions be learned from videos + human descriptions? [on hold]Attributes extraction from unstructured product descriptionsCan the size of a pooling layer be learned?Boolean classification on stringsWhich machine (or deep) learning methods could suit my text classification problem?Machine learning system which can learn from data and human rulesTraining an AI to play Starcraft 2 with superhuman level of performance?Can I analyze Video in AI?Video classification of birdsMachine Learning and Natural Language Processing : Project InitiationUsing ontology to infer labels for process model
How do you respond to a colleague from another team when they're wrongly expecting that you'll help them?
How should I respond when I lied about my education and the company finds out through background check?
By means of an example, show that P(A) + P(B) = 1 does not mean that B is the complement of A.
Why is so much work done on numerical verification of the Riemann Hypothesis?
Is it better practice to read straight from sheet music rather than memorize it?
why `nmap 192.168.1.97` returns less services than `nmap 127.0.0.1`?
Why does the Sun have different day lengths, but not the gas giants?
Can I use Seifert-van Kampen theorem infinite times
Varistor? Purpose and principle
Should I stop contributing to retirement accounts?
Why are synthetic pH indicators used over natural indicators?
Need a math help for the Cagan's model in macroeconomics
Open a doc from terminal, but not by its name
What should you do when eye contact makes your subordinate uncomfortable?
MTG Artifact and Enchantment Rulings
How can Trident be so inexpensive? Will it orbit Triton or just do a (slow) flyby?
Is there an efficient solution to the travelling salesman problem with binary edge weights?
Will the technology I first learn determine the direction of my future career?
Redundant comparison & "if" before assignment
Store Credit Card Information in Password Manager?
Longest common substring in linear time
Why did the HMS Bounty go back to a time when whales are already rare?
Loading commands from file
Where did Heinlein say "Once you get to Earth orbit, you're halfway to anywhere in the Solar System"?
Can video descriptions be learned from videos + human descriptions? [on hold]
Attributes extraction from unstructured product descriptionsCan the size of a pooling layer be learned?Boolean classification on stringsWhich machine (or deep) learning methods could suit my text classification problem?Machine learning system which can learn from data and human rulesTraining an AI to play Starcraft 2 with superhuman level of performance?Can I analyze Video in AI?Video classification of birdsMachine Learning and Natural Language Processing : Project InitiationUsing ontology to infer labels for process model
$begingroup$
So I have a dataset of short videos with simple content and humans describing them, for example ("ball is pushed"; "cube is grabbed"; "book is grabbed"; etc.).
What strategies are there in order to learn to describe a new video (like "cube is pushed").
I have no concrete classes I want to classify the videos in but need to generate a natural language sentence.
Do you know some exemplary research I could study or what techniques I would have to use?
machine-learning natural-language-process labels
New contributor
$endgroup$
put on hold as too broad by Spacedman, Ethan, Siong Thye Goh, Toros91, Mark.F Mar 20 at 8:20
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
$begingroup$
So I have a dataset of short videos with simple content and humans describing them, for example ("ball is pushed"; "cube is grabbed"; "book is grabbed"; etc.).
What strategies are there in order to learn to describe a new video (like "cube is pushed").
I have no concrete classes I want to classify the videos in but need to generate a natural language sentence.
Do you know some exemplary research I could study or what techniques I would have to use?
machine-learning natural-language-process labels
New contributor
$endgroup$
put on hold as too broad by Spacedman, Ethan, Siong Thye Goh, Toros91, Mark.F Mar 20 at 8:20
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
$begingroup$
So I have a dataset of short videos with simple content and humans describing them, for example ("ball is pushed"; "cube is grabbed"; "book is grabbed"; etc.).
What strategies are there in order to learn to describe a new video (like "cube is pushed").
I have no concrete classes I want to classify the videos in but need to generate a natural language sentence.
Do you know some exemplary research I could study or what techniques I would have to use?
machine-learning natural-language-process labels
New contributor
$endgroup$
So I have a dataset of short videos with simple content and humans describing them, for example ("ball is pushed"; "cube is grabbed"; "book is grabbed"; etc.).
What strategies are there in order to learn to describe a new video (like "cube is pushed").
I have no concrete classes I want to classify the videos in but need to generate a natural language sentence.
Do you know some exemplary research I could study or what techniques I would have to use?
machine-learning natural-language-process labels
machine-learning natural-language-process labels
New contributor
New contributor
New contributor
asked Mar 19 at 14:33
arcGuesserarcGuesser
1
1
New contributor
New contributor
put on hold as too broad by Spacedman, Ethan, Siong Thye Goh, Toros91, Mark.F Mar 20 at 8:20
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as too broad by Spacedman, Ethan, Siong Thye Goh, Toros91, Mark.F Mar 20 at 8:20
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
For this purpose, you need to train multiple seq2seq models. For example a model to learn patterns of video frames and another for merging language models + embedding to learn features.
This paper has a good overview of problem domain and 2 promising solutions : http://cs231n.stanford.edu/reports/2017/pdfs/31.pdf
$endgroup$
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
For this purpose, you need to train multiple seq2seq models. For example a model to learn patterns of video frames and another for merging language models + embedding to learn features.
This paper has a good overview of problem domain and 2 promising solutions : http://cs231n.stanford.edu/reports/2017/pdfs/31.pdf
$endgroup$
add a comment |
$begingroup$
For this purpose, you need to train multiple seq2seq models. For example a model to learn patterns of video frames and another for merging language models + embedding to learn features.
This paper has a good overview of problem domain and 2 promising solutions : http://cs231n.stanford.edu/reports/2017/pdfs/31.pdf
$endgroup$
add a comment |
$begingroup$
For this purpose, you need to train multiple seq2seq models. For example a model to learn patterns of video frames and another for merging language models + embedding to learn features.
This paper has a good overview of problem domain and 2 promising solutions : http://cs231n.stanford.edu/reports/2017/pdfs/31.pdf
$endgroup$
For this purpose, you need to train multiple seq2seq models. For example a model to learn patterns of video frames and another for merging language models + embedding to learn features.
This paper has a good overview of problem domain and 2 promising solutions : http://cs231n.stanford.edu/reports/2017/pdfs/31.pdf
answered Mar 20 at 4:39
Shamit VermaShamit Verma
91929
91929
add a comment |
add a comment |