Training of Region Proposal Network (RPN)Faster-RCNN how anchor work with slider in RPN layer?Digits Localization on Streets View House NumbersmLocation Invariance not achieved in CNN in spite of 99% test accuracyTraining Neural network classifier using string inputsUsing a pre trained CNN classifier and apply it on a different image datasetChanging the shape of the input layer in tensorflowWhat kind of neural network structure is suitable for image to image learning?Training a convoltion neural network for localizationMulti-inputs Convolutional Neural Network takes different number of imagesExtracting metrics from multiple classes of clustered objectsUsing a discriminator to distinguish ground truth and predicted boxes for FRCNN

ZSPL language, anyone heard of it?

How can I close a gap between my fence and my neighbor's that's on his side of the property line?

Why has the UK chosen to use Huawei infrastructure when Five Eyes allies haven't?

Could the black hole photo be a gravastar?

How does this change to the opportunity attack rule impact combat?

Controlled Hadamard gate in ZX-calculus

What are the differences between credential stuffing and password spraying?

Pressure inside an infinite ocean?

Frequency of specific viral sequence in .BAM or .fastq

How do I inject UserInterface into Access Control?

Why do only some White Walkers shatter into ice chips?

As matter approaches a black hole, does it speed up?

What are the advantages of luxury car brands like Acura/Lexus over their sibling non-luxury brands Honda/Toyota?

Can I use a fetch land to shuffle my deck while the opponent has Ashiok, Dream Render in play?

I'm in your subnets, golfing your code

What is the most remote airport from the center of the city it supposedly serves?

Are the Night's Watch still required?

Did we get closer to another plane than we were supposed to, or was the pilot just protecting our delicate sensibilities?

Does Tatsumaki wear panties?

What matters more when it comes to book covers? Is it ‘professional quality’ or relevancy?

How to safely wipe a USB flash drive

Will 700 more planes a day fly because of the Heathrow expansion?

Do you know any research on finding closed forms of recursively-defined sequences?

Multiple SQL versions with Docker



Training of Region Proposal Network (RPN)


Faster-RCNN how anchor work with slider in RPN layer?Digits Localization on Streets View House NumbersmLocation Invariance not achieved in CNN in spite of 99% test accuracyTraining Neural network classifier using string inputsUsing a pre trained CNN classifier and apply it on a different image datasetChanging the shape of the input layer in tensorflowWhat kind of neural network structure is suitable for image to image learning?Training a convoltion neural network for localizationMulti-inputs Convolutional Neural Network takes different number of imagesExtracting metrics from multiple classes of clustered objectsUsing a discriminator to distinguish ground truth and predicted boxes for FRCNN













2












$begingroup$


There are some interesting literature about RPNs (Region Proposal Network). The most concise and helpful documentation that I found so far is the following: https://www.quora.com/How-does-the-region-proposal-network-RPN-in-Faster-R-CNN-work?share=1.



But there is something that I still don't understand through my various lectures. RPNs are designed to propose several candidate regions. From which, a selection will be done to know which candidates fits our needs.



But, RPNs and neural network in general are deterministic. Thus, once trained, they will always produce the same output for a given input; there is no way to query new candidates given the same input image. As far as I understood, RPNs are trained to produce a fix number of proposal, for each new image. But how the training work then? If the RPN has to produce 300 candidates, what should be the labeled data that we use for the training, knowing that a training image probably won't have more than 5 golden truth bounding boxes?



And then, knowing that the bounding box sizes are not consistent among candidates, how does the CNN behind operates with the different size of the input?










share|improve this question









$endgroup$











  • $begingroup$
    For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
    $endgroup$
    – Emile D.
    Jun 20 '18 at 21:22















2












$begingroup$


There are some interesting literature about RPNs (Region Proposal Network). The most concise and helpful documentation that I found so far is the following: https://www.quora.com/How-does-the-region-proposal-network-RPN-in-Faster-R-CNN-work?share=1.



But there is something that I still don't understand through my various lectures. RPNs are designed to propose several candidate regions. From which, a selection will be done to know which candidates fits our needs.



But, RPNs and neural network in general are deterministic. Thus, once trained, they will always produce the same output for a given input; there is no way to query new candidates given the same input image. As far as I understood, RPNs are trained to produce a fix number of proposal, for each new image. But how the training work then? If the RPN has to produce 300 candidates, what should be the labeled data that we use for the training, knowing that a training image probably won't have more than 5 golden truth bounding boxes?



And then, knowing that the bounding box sizes are not consistent among candidates, how does the CNN behind operates with the different size of the input?










share|improve this question









$endgroup$











  • $begingroup$
    For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
    $endgroup$
    – Emile D.
    Jun 20 '18 at 21:22













2












2








2





$begingroup$


There are some interesting literature about RPNs (Region Proposal Network). The most concise and helpful documentation that I found so far is the following: https://www.quora.com/How-does-the-region-proposal-network-RPN-in-Faster-R-CNN-work?share=1.



But there is something that I still don't understand through my various lectures. RPNs are designed to propose several candidate regions. From which, a selection will be done to know which candidates fits our needs.



But, RPNs and neural network in general are deterministic. Thus, once trained, they will always produce the same output for a given input; there is no way to query new candidates given the same input image. As far as I understood, RPNs are trained to produce a fix number of proposal, for each new image. But how the training work then? If the RPN has to produce 300 candidates, what should be the labeled data that we use for the training, knowing that a training image probably won't have more than 5 golden truth bounding boxes?



And then, knowing that the bounding box sizes are not consistent among candidates, how does the CNN behind operates with the different size of the input?










share|improve this question









$endgroup$




There are some interesting literature about RPNs (Region Proposal Network). The most concise and helpful documentation that I found so far is the following: https://www.quora.com/How-does-the-region-proposal-network-RPN-in-Faster-R-CNN-work?share=1.



But there is something that I still don't understand through my various lectures. RPNs are designed to propose several candidate regions. From which, a selection will be done to know which candidates fits our needs.



But, RPNs and neural network in general are deterministic. Thus, once trained, they will always produce the same output for a given input; there is no way to query new candidates given the same input image. As far as I understood, RPNs are trained to produce a fix number of proposal, for each new image. But how the training work then? If the RPN has to produce 300 candidates, what should be the labeled data that we use for the training, knowing that a training image probably won't have more than 5 golden truth bounding boxes?



And then, knowing that the bounding box sizes are not consistent among candidates, how does the CNN behind operates with the different size of the input?







machine-learning neural-network deep-learning






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jun 20 '18 at 21:20









Emile D.Emile D.

115




115











  • $begingroup$
    For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
    $endgroup$
    – Emile D.
    Jun 20 '18 at 21:22
















  • $begingroup$
    For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
    $endgroup$
    – Emile D.
    Jun 20 '18 at 21:22















$begingroup$
For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
$endgroup$
– Emile D.
Jun 20 '18 at 21:22




$begingroup$
For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
$endgroup$
– Emile D.
Jun 20 '18 at 21:22










2 Answers
2






active

oldest

votes


















2












$begingroup$

The first answer in your commented link answers one point about how region proposals are selected. It is the Intersection Over Union (more formally the Jaccard Index) metric. So how much of your anchor overlaps the label. There is usually a lower limit set for this metric to then filter out all the useless proposals, and the remaining matches can be sorted, choosing the best.




I recommend reading through this excellently explained version of a proposal network - Mask-R-CNN (Masked Region-based CNN).
If you prefer looking at code, there is the full repo here, implemented in Keras/Tensorflow (there is also a PyTorch implementation linked somewhere).



There is even an explanatory Jupyter notebook, which might help make things click for you.






share|improve this answer









$endgroup$












  • $begingroup$
    Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
    $endgroup$
    – Emile D.
    Jun 21 '18 at 14:33


















0












$begingroup$

To know how RPN work for training, we can dive into the code wrote by Matterport, which is 10,000 stared and tf/keras implementation Mask R-CNN repo.



You can check the build_rpn_targets function in mrcnn/model.py



If we used the generated anchors (depends on your anchor scales, ratio, image size ...) to calculate the IOU of anchors and ground truth,



 # Compute overlaps [num_anchors, num_gt_boxes]
overlaps = utils.compute_overlaps(anchors, gt_boxes)


we can know how overlaps between anchors and ground truth. Then we choose positive anchors and negative anchors based on their IOU with ground truth. According to Mask R-CNN paper, IOU > 0.7 will be positive anchors and < 0.3 will be negative anchors, otherwise will be neutral anchors and not used when training



 # 1. Set negative anchors first. They get overwritten below if a GT box is
# matched to them.
anchor_iou_argmax = np.argmax(overlaps, axis=1)
anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
rpn_match[anchor_iou_max < 0.3] = -1
# 2. Set an anchor for each GT box (regardless of IoU value).
# If multiple anchors have the same IoU match all of them
gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
rpn_match[gt_iou_argmax] = 1
# 3. Set anchors with high overlap as positive.
rpn_match[anchor_iou_max >= 0.7] = 1


To effectively train RPN, you need to set up the RPN_TRAIN_ANCHORS_PER_IMAGE carefully to balance training if there is few objects in one image. Please note that there can be multiple anchors match one ground truth since we can give the bbox off-set for each anchor to fit the ground truth.



Hope the answer is clear for you!






share|improve this answer











$endgroup$












  • $begingroup$
    Well, it is indeed with more information. But the question I had is: inference time, we still propose 300 regions (or whatever RPN_TRAIN_ANCHORS_PER_IMAGE is). How do we select the few good ones from the proposed regions? And from what I gather, we backpropagate by checking the error with the IOU. But we could be missing a region not proposed during the forward pass. Thus, we could converge to have all the propositions at the same location?
    $endgroup$
    – Emile D.
    Mar 25 at 20:02










  • $begingroup$
    In inference time, RPN simply sends all positive ROIs (predict as foreground) to the second stage. Since most ROIs may overlap with each other, we'll apply Non Maximum Suppression to keep the highest probability ROIs. The main purpose of RPN is trying to catch all foreground object in an image. So RPN tends to make some false positive, which can be fixed on the second stage classifier.
    $endgroup$
    – jimmy15923
    Apr 1 at 4:35











  • $begingroup$
    Thank you for the clarification for the inference time, it is clearer to me. But in training time then, all the bounding boxes will eventually converge to the same proposals? Is there some criterion to avoid too much similarities between the proposals to avoid convergence to few same locations?
    $endgroup$
    – Emile D.
    Apr 9 at 22:46










  • $begingroup$
    As I mentioned, Non-Maximum Suppression will be applied to all anchor boxes during training and inferencing, so most proposals will not overlap too much with other proposals. The threshold of NMS is a hyperparameter which defines in config.py.
    $endgroup$
    – jimmy15923
    Apr 10 at 7:39











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f33442%2ftraining-of-region-proposal-network-rpn%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

The first answer in your commented link answers one point about how region proposals are selected. It is the Intersection Over Union (more formally the Jaccard Index) metric. So how much of your anchor overlaps the label. There is usually a lower limit set for this metric to then filter out all the useless proposals, and the remaining matches can be sorted, choosing the best.




I recommend reading through this excellently explained version of a proposal network - Mask-R-CNN (Masked Region-based CNN).
If you prefer looking at code, there is the full repo here, implemented in Keras/Tensorflow (there is also a PyTorch implementation linked somewhere).



There is even an explanatory Jupyter notebook, which might help make things click for you.






share|improve this answer









$endgroup$












  • $begingroup$
    Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
    $endgroup$
    – Emile D.
    Jun 21 '18 at 14:33















2












$begingroup$

The first answer in your commented link answers one point about how region proposals are selected. It is the Intersection Over Union (more formally the Jaccard Index) metric. So how much of your anchor overlaps the label. There is usually a lower limit set for this metric to then filter out all the useless proposals, and the remaining matches can be sorted, choosing the best.




I recommend reading through this excellently explained version of a proposal network - Mask-R-CNN (Masked Region-based CNN).
If you prefer looking at code, there is the full repo here, implemented in Keras/Tensorflow (there is also a PyTorch implementation linked somewhere).



There is even an explanatory Jupyter notebook, which might help make things click for you.






share|improve this answer









$endgroup$












  • $begingroup$
    Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
    $endgroup$
    – Emile D.
    Jun 21 '18 at 14:33













2












2








2





$begingroup$

The first answer in your commented link answers one point about how region proposals are selected. It is the Intersection Over Union (more formally the Jaccard Index) metric. So how much of your anchor overlaps the label. There is usually a lower limit set for this metric to then filter out all the useless proposals, and the remaining matches can be sorted, choosing the best.




I recommend reading through this excellently explained version of a proposal network - Mask-R-CNN (Masked Region-based CNN).
If you prefer looking at code, there is the full repo here, implemented in Keras/Tensorflow (there is also a PyTorch implementation linked somewhere).



There is even an explanatory Jupyter notebook, which might help make things click for you.






share|improve this answer









$endgroup$



The first answer in your commented link answers one point about how region proposals are selected. It is the Intersection Over Union (more formally the Jaccard Index) metric. So how much of your anchor overlaps the label. There is usually a lower limit set for this metric to then filter out all the useless proposals, and the remaining matches can be sorted, choosing the best.




I recommend reading through this excellently explained version of a proposal network - Mask-R-CNN (Masked Region-based CNN).
If you prefer looking at code, there is the full repo here, implemented in Keras/Tensorflow (there is also a PyTorch implementation linked somewhere).



There is even an explanatory Jupyter notebook, which might help make things click for you.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jun 20 '18 at 21:40









n1k31t4n1k31t4

6,8612422




6,8612422











  • $begingroup$
    Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
    $endgroup$
    – Emile D.
    Jun 21 '18 at 14:33
















  • $begingroup$
    Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
    $endgroup$
    – Emile D.
    Jun 21 '18 at 14:33















$begingroup$
Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
$endgroup$
– Emile D.
Jun 21 '18 at 14:33




$begingroup$
Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
$endgroup$
– Emile D.
Jun 21 '18 at 14:33











0












$begingroup$

To know how RPN work for training, we can dive into the code wrote by Matterport, which is 10,000 stared and tf/keras implementation Mask R-CNN repo.



You can check the build_rpn_targets function in mrcnn/model.py



If we used the generated anchors (depends on your anchor scales, ratio, image size ...) to calculate the IOU of anchors and ground truth,



 # Compute overlaps [num_anchors, num_gt_boxes]
overlaps = utils.compute_overlaps(anchors, gt_boxes)


we can know how overlaps between anchors and ground truth. Then we choose positive anchors and negative anchors based on their IOU with ground truth. According to Mask R-CNN paper, IOU > 0.7 will be positive anchors and < 0.3 will be negative anchors, otherwise will be neutral anchors and not used when training



 # 1. Set negative anchors first. They get overwritten below if a GT box is
# matched to them.
anchor_iou_argmax = np.argmax(overlaps, axis=1)
anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
rpn_match[anchor_iou_max < 0.3] = -1
# 2. Set an anchor for each GT box (regardless of IoU value).
# If multiple anchors have the same IoU match all of them
gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
rpn_match[gt_iou_argmax] = 1
# 3. Set anchors with high overlap as positive.
rpn_match[anchor_iou_max >= 0.7] = 1


To effectively train RPN, you need to set up the RPN_TRAIN_ANCHORS_PER_IMAGE carefully to balance training if there is few objects in one image. Please note that there can be multiple anchors match one ground truth since we can give the bbox off-set for each anchor to fit the ground truth.



Hope the answer is clear for you!






share|improve this answer











$endgroup$












  • $begingroup$
    Well, it is indeed with more information. But the question I had is: inference time, we still propose 300 regions (or whatever RPN_TRAIN_ANCHORS_PER_IMAGE is). How do we select the few good ones from the proposed regions? And from what I gather, we backpropagate by checking the error with the IOU. But we could be missing a region not proposed during the forward pass. Thus, we could converge to have all the propositions at the same location?
    $endgroup$
    – Emile D.
    Mar 25 at 20:02










  • $begingroup$
    In inference time, RPN simply sends all positive ROIs (predict as foreground) to the second stage. Since most ROIs may overlap with each other, we'll apply Non Maximum Suppression to keep the highest probability ROIs. The main purpose of RPN is trying to catch all foreground object in an image. So RPN tends to make some false positive, which can be fixed on the second stage classifier.
    $endgroup$
    – jimmy15923
    Apr 1 at 4:35











  • $begingroup$
    Thank you for the clarification for the inference time, it is clearer to me. But in training time then, all the bounding boxes will eventually converge to the same proposals? Is there some criterion to avoid too much similarities between the proposals to avoid convergence to few same locations?
    $endgroup$
    – Emile D.
    Apr 9 at 22:46










  • $begingroup$
    As I mentioned, Non-Maximum Suppression will be applied to all anchor boxes during training and inferencing, so most proposals will not overlap too much with other proposals. The threshold of NMS is a hyperparameter which defines in config.py.
    $endgroup$
    – jimmy15923
    Apr 10 at 7:39















0












$begingroup$

To know how RPN work for training, we can dive into the code wrote by Matterport, which is 10,000 stared and tf/keras implementation Mask R-CNN repo.



You can check the build_rpn_targets function in mrcnn/model.py



If we used the generated anchors (depends on your anchor scales, ratio, image size ...) to calculate the IOU of anchors and ground truth,



 # Compute overlaps [num_anchors, num_gt_boxes]
overlaps = utils.compute_overlaps(anchors, gt_boxes)


we can know how overlaps between anchors and ground truth. Then we choose positive anchors and negative anchors based on their IOU with ground truth. According to Mask R-CNN paper, IOU > 0.7 will be positive anchors and < 0.3 will be negative anchors, otherwise will be neutral anchors and not used when training



 # 1. Set negative anchors first. They get overwritten below if a GT box is
# matched to them.
anchor_iou_argmax = np.argmax(overlaps, axis=1)
anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
rpn_match[anchor_iou_max < 0.3] = -1
# 2. Set an anchor for each GT box (regardless of IoU value).
# If multiple anchors have the same IoU match all of them
gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
rpn_match[gt_iou_argmax] = 1
# 3. Set anchors with high overlap as positive.
rpn_match[anchor_iou_max >= 0.7] = 1


To effectively train RPN, you need to set up the RPN_TRAIN_ANCHORS_PER_IMAGE carefully to balance training if there is few objects in one image. Please note that there can be multiple anchors match one ground truth since we can give the bbox off-set for each anchor to fit the ground truth.



Hope the answer is clear for you!






share|improve this answer











$endgroup$












  • $begingroup$
    Well, it is indeed with more information. But the question I had is: inference time, we still propose 300 regions (or whatever RPN_TRAIN_ANCHORS_PER_IMAGE is). How do we select the few good ones from the proposed regions? And from what I gather, we backpropagate by checking the error with the IOU. But we could be missing a region not proposed during the forward pass. Thus, we could converge to have all the propositions at the same location?
    $endgroup$
    – Emile D.
    Mar 25 at 20:02










  • $begingroup$
    In inference time, RPN simply sends all positive ROIs (predict as foreground) to the second stage. Since most ROIs may overlap with each other, we'll apply Non Maximum Suppression to keep the highest probability ROIs. The main purpose of RPN is trying to catch all foreground object in an image. So RPN tends to make some false positive, which can be fixed on the second stage classifier.
    $endgroup$
    – jimmy15923
    Apr 1 at 4:35











  • $begingroup$
    Thank you for the clarification for the inference time, it is clearer to me. But in training time then, all the bounding boxes will eventually converge to the same proposals? Is there some criterion to avoid too much similarities between the proposals to avoid convergence to few same locations?
    $endgroup$
    – Emile D.
    Apr 9 at 22:46










  • $begingroup$
    As I mentioned, Non-Maximum Suppression will be applied to all anchor boxes during training and inferencing, so most proposals will not overlap too much with other proposals. The threshold of NMS is a hyperparameter which defines in config.py.
    $endgroup$
    – jimmy15923
    Apr 10 at 7:39













0












0








0





$begingroup$

To know how RPN work for training, we can dive into the code wrote by Matterport, which is 10,000 stared and tf/keras implementation Mask R-CNN repo.



You can check the build_rpn_targets function in mrcnn/model.py



If we used the generated anchors (depends on your anchor scales, ratio, image size ...) to calculate the IOU of anchors and ground truth,



 # Compute overlaps [num_anchors, num_gt_boxes]
overlaps = utils.compute_overlaps(anchors, gt_boxes)


we can know how overlaps between anchors and ground truth. Then we choose positive anchors and negative anchors based on their IOU with ground truth. According to Mask R-CNN paper, IOU > 0.7 will be positive anchors and < 0.3 will be negative anchors, otherwise will be neutral anchors and not used when training



 # 1. Set negative anchors first. They get overwritten below if a GT box is
# matched to them.
anchor_iou_argmax = np.argmax(overlaps, axis=1)
anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
rpn_match[anchor_iou_max < 0.3] = -1
# 2. Set an anchor for each GT box (regardless of IoU value).
# If multiple anchors have the same IoU match all of them
gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
rpn_match[gt_iou_argmax] = 1
# 3. Set anchors with high overlap as positive.
rpn_match[anchor_iou_max >= 0.7] = 1


To effectively train RPN, you need to set up the RPN_TRAIN_ANCHORS_PER_IMAGE carefully to balance training if there is few objects in one image. Please note that there can be multiple anchors match one ground truth since we can give the bbox off-set for each anchor to fit the ground truth.



Hope the answer is clear for you!






share|improve this answer











$endgroup$



To know how RPN work for training, we can dive into the code wrote by Matterport, which is 10,000 stared and tf/keras implementation Mask R-CNN repo.



You can check the build_rpn_targets function in mrcnn/model.py



If we used the generated anchors (depends on your anchor scales, ratio, image size ...) to calculate the IOU of anchors and ground truth,



 # Compute overlaps [num_anchors, num_gt_boxes]
overlaps = utils.compute_overlaps(anchors, gt_boxes)


we can know how overlaps between anchors and ground truth. Then we choose positive anchors and negative anchors based on their IOU with ground truth. According to Mask R-CNN paper, IOU > 0.7 will be positive anchors and < 0.3 will be negative anchors, otherwise will be neutral anchors and not used when training



 # 1. Set negative anchors first. They get overwritten below if a GT box is
# matched to them.
anchor_iou_argmax = np.argmax(overlaps, axis=1)
anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
rpn_match[anchor_iou_max < 0.3] = -1
# 2. Set an anchor for each GT box (regardless of IoU value).
# If multiple anchors have the same IoU match all of them
gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
rpn_match[gt_iou_argmax] = 1
# 3. Set anchors with high overlap as positive.
rpn_match[anchor_iou_max >= 0.7] = 1


To effectively train RPN, you need to set up the RPN_TRAIN_ANCHORS_PER_IMAGE carefully to balance training if there is few objects in one image. Please note that there can be multiple anchors match one ground truth since we can give the bbox off-set for each anchor to fit the ground truth.



Hope the answer is clear for you!







share|improve this answer














share|improve this answer



share|improve this answer








edited Apr 10 at 7:41

























answered Mar 12 at 2:04









jimmy15923jimmy15923

11




11











  • $begingroup$
    Well, it is indeed with more information. But the question I had is: inference time, we still propose 300 regions (or whatever RPN_TRAIN_ANCHORS_PER_IMAGE is). How do we select the few good ones from the proposed regions? And from what I gather, we backpropagate by checking the error with the IOU. But we could be missing a region not proposed during the forward pass. Thus, we could converge to have all the propositions at the same location?
    $endgroup$
    – Emile D.
    Mar 25 at 20:02










  • $begingroup$
    In inference time, RPN simply sends all positive ROIs (predict as foreground) to the second stage. Since most ROIs may overlap with each other, we'll apply Non Maximum Suppression to keep the highest probability ROIs. The main purpose of RPN is trying to catch all foreground object in an image. So RPN tends to make some false positive, which can be fixed on the second stage classifier.
    $endgroup$
    – jimmy15923
    Apr 1 at 4:35











  • $begingroup$
    Thank you for the clarification for the inference time, it is clearer to me. But in training time then, all the bounding boxes will eventually converge to the same proposals? Is there some criterion to avoid too much similarities between the proposals to avoid convergence to few same locations?
    $endgroup$
    – Emile D.
    Apr 9 at 22:46










  • $begingroup$
    As I mentioned, Non-Maximum Suppression will be applied to all anchor boxes during training and inferencing, so most proposals will not overlap too much with other proposals. The threshold of NMS is a hyperparameter which defines in config.py.
    $endgroup$
    – jimmy15923
    Apr 10 at 7:39
















  • $begingroup$
    Well, it is indeed with more information. But the question I had is: inference time, we still propose 300 regions (or whatever RPN_TRAIN_ANCHORS_PER_IMAGE is). How do we select the few good ones from the proposed regions? And from what I gather, we backpropagate by checking the error with the IOU. But we could be missing a region not proposed during the forward pass. Thus, we could converge to have all the propositions at the same location?
    $endgroup$
    – Emile D.
    Mar 25 at 20:02










  • $begingroup$
    In inference time, RPN simply sends all positive ROIs (predict as foreground) to the second stage. Since most ROIs may overlap with each other, we'll apply Non Maximum Suppression to keep the highest probability ROIs. The main purpose of RPN is trying to catch all foreground object in an image. So RPN tends to make some false positive, which can be fixed on the second stage classifier.
    $endgroup$
    – jimmy15923
    Apr 1 at 4:35











  • $begingroup$
    Thank you for the clarification for the inference time, it is clearer to me. But in training time then, all the bounding boxes will eventually converge to the same proposals? Is there some criterion to avoid too much similarities between the proposals to avoid convergence to few same locations?
    $endgroup$
    – Emile D.
    Apr 9 at 22:46










  • $begingroup$
    As I mentioned, Non-Maximum Suppression will be applied to all anchor boxes during training and inferencing, so most proposals will not overlap too much with other proposals. The threshold of NMS is a hyperparameter which defines in config.py.
    $endgroup$
    – jimmy15923
    Apr 10 at 7:39















$begingroup$
Well, it is indeed with more information. But the question I had is: inference time, we still propose 300 regions (or whatever RPN_TRAIN_ANCHORS_PER_IMAGE is). How do we select the few good ones from the proposed regions? And from what I gather, we backpropagate by checking the error with the IOU. But we could be missing a region not proposed during the forward pass. Thus, we could converge to have all the propositions at the same location?
$endgroup$
– Emile D.
Mar 25 at 20:02




$begingroup$
Well, it is indeed with more information. But the question I had is: inference time, we still propose 300 regions (or whatever RPN_TRAIN_ANCHORS_PER_IMAGE is). How do we select the few good ones from the proposed regions? And from what I gather, we backpropagate by checking the error with the IOU. But we could be missing a region not proposed during the forward pass. Thus, we could converge to have all the propositions at the same location?
$endgroup$
– Emile D.
Mar 25 at 20:02












$begingroup$
In inference time, RPN simply sends all positive ROIs (predict as foreground) to the second stage. Since most ROIs may overlap with each other, we'll apply Non Maximum Suppression to keep the highest probability ROIs. The main purpose of RPN is trying to catch all foreground object in an image. So RPN tends to make some false positive, which can be fixed on the second stage classifier.
$endgroup$
– jimmy15923
Apr 1 at 4:35





$begingroup$
In inference time, RPN simply sends all positive ROIs (predict as foreground) to the second stage. Since most ROIs may overlap with each other, we'll apply Non Maximum Suppression to keep the highest probability ROIs. The main purpose of RPN is trying to catch all foreground object in an image. So RPN tends to make some false positive, which can be fixed on the second stage classifier.
$endgroup$
– jimmy15923
Apr 1 at 4:35













$begingroup$
Thank you for the clarification for the inference time, it is clearer to me. But in training time then, all the bounding boxes will eventually converge to the same proposals? Is there some criterion to avoid too much similarities between the proposals to avoid convergence to few same locations?
$endgroup$
– Emile D.
Apr 9 at 22:46




$begingroup$
Thank you for the clarification for the inference time, it is clearer to me. But in training time then, all the bounding boxes will eventually converge to the same proposals? Is there some criterion to avoid too much similarities between the proposals to avoid convergence to few same locations?
$endgroup$
– Emile D.
Apr 9 at 22:46












$begingroup$
As I mentioned, Non-Maximum Suppression will be applied to all anchor boxes during training and inferencing, so most proposals will not overlap too much with other proposals. The threshold of NMS is a hyperparameter which defines in config.py.
$endgroup$
– jimmy15923
Apr 10 at 7:39




$begingroup$
As I mentioned, Non-Maximum Suppression will be applied to all anchor boxes during training and inferencing, so most proposals will not overlap too much with other proposals. The threshold of NMS is a hyperparameter which defines in config.py.
$endgroup$
– jimmy15923
Apr 10 at 7:39

















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f33442%2ftraining-of-region-proposal-network-rpn%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High