Commit 7a9934df authored by Zhichao Lu's avatar Zhichao Lu Committed by lzc5123016
Browse files

Merged commit includes the following changes:

184048729  by Zhichao Lu:

    Modify target_assigner so that it creates regression targets taking keypoints into account.

--
184027183  by Zhichao Lu:

    Resnet V1 FPN based feature extractors for SSD meta architecture in Object Detection V2 API.

--
184004730  by Zhichao Lu:

    Expose a lever to override the configured mask_type.

--
183933113  by Zhichao Lu:

    Weight shared convolutional box predictor as described in https://arxiv.org/abs/1708.02002

--
183929669  by Zhichao Lu:

    Expanding box list operations for future data augmentations.

--
183916792  by Zhichao Lu:

    Fix unrecognized assertion function in tests.

--
183906851  by Zhichao Lu:

    - Change ssd meta architecture to use regression weights to compute loss normalizer.

--
183871003  by Zhichao Lu:

    Fix config_util_test wrong dependency.

--
183782120  by Zhichao Lu:

    Add __init__ file to third_party directories.

--
183779109  by Zhichao Lu:

    Setup regular version sync.

--
183768772  by Zhichao Lu:

    Make test compatible with numpy 1.12 and higher

--
183767893  by Zhichao Lu:

    Make test compatible with numpy 1.12 and higher

--
183719318  by Zhichao Lu:

    Use the new test interface in ssd feature extractor.

--
183714671  by Zhichao Lu:

    Use the new test_case interface for all anchor generators.

--
183708155  by Zhichao Lu:

    Change variable scopes in ConvolutionalBoxPredictor such that previously trained checkpoints are still compatible after the change in BoxPredictor interface

--
183705798  by Zhichao Lu:

    Internal change.

--
183636023  by Zhichao Lu:

    Fixing argument name for np_box_list_ops.concatenate() function.

--
183490404  by Zhichao Lu:

    Make sure code that relies in SSD older code still works.

--
183426762  by Zhichao Lu:

    Internal change

183412315  by Zhichao Lu:

    Internal change

183337814  by Zhichao Lu:

    Internal change

183303933  by Zhichao Lu:

    Internal change

183257349  by Zhichao Lu:

    Internal change

183254447  by Zhichao Lu:

    Internal change

183251200  by Zhichao Lu:

    Internal change

183135002  by Zhichao Lu:

    Internal change

182851500  by Zhichao Lu:

    Internal change

182839607  by Zhichao Lu:

    Internal change

182830719  by Zhichao Lu:

    Internal change

182533923  by Zhichao Lu:

    Internal change

182391090  by Zhichao Lu:

    Internal change

182262339  by Zhichao Lu:

    Internal change

182244645  by Zhichao Lu:

    Internal change

182241613  by Zhichao Lu:

    Internal change

182133027  by Zhichao Lu:

    Internal change

182058807  by Zhichao Lu:

    Internal change

181812028  by Zhichao Lu:

    Internal change

181788857  by Zhichao Lu:

    Internal change

181656761  by Zhichao Lu:

    Internal change

181541125  by Zhichao Lu:

    Internal change

181538702  by Zhichao Lu:

    Internal change

181125385  by Zhichao Lu:

    Internal change

180957758  by Zhichao Lu:

    Internal change

180941434  by Zhichao Lu:

    Internal change

180852569  by Zhichao Lu:

    Internal change

180846001  by Zhichao Lu:

    Internal change

180832145  by Zhichao Lu:

    Internal change

180740495  by Zhichao Lu:

    Internal change

180729150  by Zhichao Lu:

    Internal change

180589008  by Zhichao Lu:

    Internal change

180585408  by Zhichao Lu:

    Internal change

180581039  by Zhichao Lu:

    Internal change

180286388  by Zhichao Lu:

    Internal change

179934081  by Zhichao Lu:

    Internal change

179841242  by Zhichao Lu:

    Internal change

179831694  by Zhichao Lu:

    Internal change

179761005  by Zhichao Lu:

    Internal change

179610632  by Zhichao Lu:

    Internal change

179605363  by Zhichao Lu:

    Internal change

179603774  by Zhichao Lu:

    Internal change

179598614  by Zhichao Lu:

    Internal change

179597809  by Zhichao Lu:

    Internal change

179494630  by Zhichao Lu:

    Internal change

179367492  by Zhichao Lu:

    Internal change

179250050  by Zhichao Lu:

    Internal change

179247385  by Zhichao Lu:

    Internal change

179207897  by Zhichao Lu:

    Internal change

179076230  by Zhichao Lu:

    Internal change

178862066  by Zhichao Lu:

    Internal change

178854216  by Zhichao Lu:

    Internal change

178853109  by Zhichao Lu:

    Internal change

178709753  by Zhichao Lu:

    Internal change

178640707  by Zhichao Lu:

    Internal change

178421534  by Zhichao Lu:

    Internal change

178287174  by Zhichao Lu:

    Internal change

178257399  by Zhichao Lu:

    Internal change

177681867  by Zhichao Lu:

    Internal change

177654820  by Zhichao Lu:

    Internal change

177654052  by Zhichao Lu:

    Internal change

177638787  by Zhichao Lu:

    Internal change

177598305  by Zhichao Lu:

    Internal change

177538488  by Zhichao Lu:

    Internal change

177474197  by Zhichao Lu:

    Internal change

177271928  by Zhichao Lu:

    Internal change

177250285  by Zhichao Lu:

    Internal change

177210762  by Zhichao Lu:

    Internal change

177197135  by Zhichao Lu:

    Internal change

177037781  by Zhichao Lu:

    Internal change

176917394  by Zhichao Lu:

    Internal change

176683171  by Zhichao Lu:

    Internal change

176450793  by Zhichao Lu:

    Internal change

176388133  by Zhichao Lu:

    Internal change

176197721  by Zhichao Lu:

    Internal change

176195315  by Zhichao Lu:

    Internal change

176128748  by Zhichao Lu:

    Internal change

175743440  by Zhichao Lu:

    Use Toggle instead of bool to make the layout optimizer name and usage consistent with other optimizers.

--
175578178  by Zhichao Lu:

    Internal change

175463518  by Zhichao Lu:

    Internal change

175316616  by Zhichao Lu:

    Internal change

175302470  by Zhichao Lu:

    Internal change

175300323  by Zhichao Lu:

    Internal change

175269680  by Zhichao Lu:

    Internal change

175260574  by Zhichao Lu:

    Internal change

175122281  by Zhichao Lu:

    Internal change

175111708  by Zhichao Lu:

    Internal change

175110183  by Zhichao Lu:

    Internal change

174877166  by Zhichao Lu:

    Internal change

174868399  by Zhichao Lu:

    Internal change

174754200  by Zhichao Lu:

    Internal change

174544534  by Zhichao Lu:

    Internal change

174536143  by Zhichao Lu:

    Internal change

174513795  by Zhichao Lu:

    Internal change

174463713  by Zhichao Lu:

    Internal change

174403525  by Zhichao Lu:

    Internal change

174385170  by Zhichao Lu:

    Internal change

174358498  by Zhichao Lu:

    Internal change

174249903  by Zhichao Lu:

    Fix nasnet image classification and object detection by moving the option to turn ON or OFF batch norm training into it's own arg_scope used only by detection

--
174216508  by Zhichao Lu:

    Internal change

174065370  by Zhichao Lu:

    Internal change

174048035  by Zhichao Lu:

    Fix the pointer for downloading the NAS Faster-RCNN model.

--
174042677  by Zhichao Lu:

    Internal change

173964116  by Zhichao Lu:

    Internal change

173790182  by Zhichao Lu:

    Internal change

173779919  by Zhichao Lu:

    Internal change

173753775  by Zhichao Lu:

    Internal change

173753160  by Zhichao Lu:

    Internal change

173737519  by Zhichao Lu:

    Internal change

173696066  by Zhichao Lu:

    Internal change

173611554  by Zhichao Lu:

    Internal change

173475124  by Zhichao Lu:

    Internal change

173412497  by Zhichao Lu:

    Internal change

173404010  by Zhichao Lu:

    Internal change

173375014  by Zhichao Lu:

    Internal change

173345107  by Zhichao Lu:

    Internal change

173298413  by Zhichao Lu:

    Internal change

173289754  by Zhichao Lu:

    Internal change

173275544  by Zhichao Lu:

    Internal change

173273275  by Zhichao Lu:

    Internal change

173271885  by Zhichao Lu:

    Internal change

173264856  by Zhichao Lu:

    Internal change

173263791  by Zhichao Lu:

    Internal change

173261215  by Zhichao Lu:

    Internal change

173175740  by Zhichao Lu:

    Internal change

173010193  by Zhichao Lu:

    Internal change

172815204  by Zhichao Lu:

    Allow for label maps in tf.Example decoding.

--
172696028  by Zhichao Lu:

    Internal change

172509113  by Zhichao Lu:

    Allow for label maps in tf.Example decoding.

--
172475999  by Zhichao Lu:

    Internal change

172166621  by Zhichao Lu:

    Internal change

172151758  by Zhichao Lu:

    Minor updates to some README files.

    As a result of these friendly issues:
    https://github.com/tensorflow/models/issues/2530
    https://github.com/tensorflow/models/issues/2534

--
172147420  by Zhichao Lu:

    Fix illegal summary name and move from slim's get_or_create_global_step deprecated use of tf.contrib.framework* to tf.train*.

--
172111377  by Zhichao Lu:

    Internal change

172004247  by Zhichao Lu:

    Internal change

171996881  by Zhichao Lu:

    Internal change

171835204  by Zhichao Lu:

    Internal change

171826090  by Zhichao Lu:

    Internal change

171784016  by Zhichao Lu:

    Internal change

171699876  by Zhichao Lu:

    Internal change

171053425  by Zhichao Lu:

    Internal change

170905734  by Zhichao Lu:

    Internal change

170889179  by Zhichao Lu:

    Internal change

170734389  by Zhichao Lu:

    Internal change

170705852  by Zhichao Lu:

    Internal change

170401574  by Zhichao Lu:

    Internal change

170352571  by Zhichao Lu:

    Internal change

170215443  by Zhichao Lu:

    Internal change

170184288  by Zhichao Lu:

    Internal change

169936898  by Zhichao Lu:

    Internal change

169763373  by Zhichao Lu:

    Fix broken GitHub links in tensorflow and tensorflow_models resulting from The Great Models Move (a.k.a. the research subfolder)

--
169744825  by Zhichao Lu:

    Internal change

169638135  by Zhichao Lu:

    Internal change

169561814  by Zhichao Lu:

    Internal change

169444091  by Zhichao Lu:

    Internal change

169292330  by Zhichao Lu:

    Internal change

169145185  by Zhichao Lu:

    Internal change

168906035  by Zhichao Lu:

    Internal change

168790411  by Zhichao Lu:

    Internal change

168708911  by Zhichao Lu:

    Internal change

168611969  by Zhichao Lu:

    Internal change

168535975  by Zhichao Lu:

    Internal change

168381815  by Zhichao Lu:

    Internal change

168244740  by Zhichao Lu:

    Internal change

168240024  by Zhichao Lu:

    Internal change

168168016  by Zhichao Lu:

    Internal change

168071571  by Zhichao Lu:

    Move display strings to below the bounding box if they would otherwise be outside the image.

--
168067771  by Zhichao Lu:

    Internal change

167970950  by Zhichao Lu:

    Internal change

167884533  by Zhichao Lu:

    Internal change

167626173  by Zhichao Lu:

    Internal change

167277422  by Zhichao Lu:

    Internal change

167249393  by Zhichao Lu:

    Internal change

167248954  by Zhichao Lu:

    Internal change

167189395  by Zhichao Lu:

    Internal change

167107797  by Zhichao Lu:

    Internal change

167061250  by Zhichao Lu:

    Internal change

166871147  by Zhichao Lu:

    Internal change

166867617  by Zhichao Lu:

    Internal change

166862112  by Zhichao Lu:

    Internal change

166715648  by Zhichao Lu:

    Internal change

166635615  by Zhichao Lu:

    Internal change

166383182  by Zhichao Lu:

    Internal change

166371326  by Zhichao Lu:

    Internal change

166254711  by Zhichao Lu:

    Internal change

166106294  by Zhichao Lu:

    Internal change

166081204  by Zhichao Lu:

    Internal change

165972262  by Zhichao Lu:

    Internal change

165816702  by Zhichao Lu:

    Internal change

165764471  by Zhichao Lu:

    Internal change

165724134  by Zhichao Lu:

    Internal change

165655829  by Zhichao Lu:

    Internal change

165587904  by Zhichao Lu:

    Internal change

165534540  by Zhichao Lu:

    Internal change

165177692  by Zhichao Lu:

    Internal change

165091822  by Zhichao Lu:

    Internal change

165019730  by Zhichao Lu:

    Internal change

165002942  by Zhichao Lu:

    Internal change

164897728  by Zhichao Lu:

    Internal change

164782618  by Zhichao Lu:

    Internal change

164710379  by Zhichao Lu:

    Internal change

164639237  by Zhichao Lu:

    Internal change

164069251  by Zhichao Lu:

    Internal change

164058169  by Zhichao Lu:

    Internal change

163913796  by Zhichao Lu:

    Internal change

163756696  by Zhichao Lu:

    Internal change

163524665  by Zhichao Lu:

    Internal change

163393399  by Zhichao Lu:

    Internal change

163385733  by Zhichao Lu:

    Internal change

162525065  by Zhichao Lu:

    Internal change

162376984  by Zhichao Lu:

    Internal change

162026661  by Zhichao Lu:

    Internal change

161956004  by Zhichao Lu:

    Internal change

161817520  by Zhichao Lu:

    Internal change

161718688  by Zhichao Lu:

    Internal change

161624398  by Zhichao Lu:

    Internal change

161575120  by Zhichao Lu:

    Internal change

161483997  by Zhichao Lu:

    Internal change

161462189  by Zhichao Lu:

    Internal change

161452968  by Zhichao Lu:

    Internal change

161443992  by Zhichao Lu:

    Internal change

161408607  by Zhichao Lu:

    Internal change

161262084  by Zhichao Lu:

    Internal change

161214023  by Zhichao Lu:

    Internal change

161025667  by Zhichao Lu:

    Internal change

160982216  by Zhichao Lu:

    Internal change

160666760  by Zhichao Lu:

    Internal change

160570489  by Zhichao Lu:

    Internal change

160553112  by Zhichao Lu:

    Internal change

160458261  by Zhichao Lu:

    Internal change

160349302  by Zhichao Lu:

    Internal change

160296092  by Zhichao Lu:

    Internal change

160287348  by Zhichao Lu:

    Internal change

160199279  by Zhichao Lu:

    Internal change

160160156  by Zhichao Lu:

    Internal change

160151954  by Zhichao Lu:

    Internal change

160005404  by Zhichao Lu:

    Internal change

159983265  by Zhichao Lu:

    Internal change

159819896  by Zhichao Lu:

    Internal change

159749419  by Zhichao Lu:

    Internal change

159596448  by Zhichao Lu:

    Internal change

159587801  by Zhichao Lu:

    Internal change

159587342  by Zhichao Lu:

    Internal change

159476256  by Zhichao Lu:

    Internal change

159463992  by Zhichao Lu:

    Internal change

159455585  by Zhichao Lu:

    Internal change

159270798  by Zhichao Lu:

    Internal change

159256633  by Zhichao Lu:

    Internal change

159141989  by Zhichao Lu:

    Internal change

159079098  by Zhichao Lu:

    Internal change

159078559  by Zhichao Lu:

    Internal change

159077055  by Zhichao Lu:

    Internal change

159072046  by Zhichao Lu:

    Internal change

159071092  by Zhichao Lu:

    Internal change

159069262  by Zhichao Lu:

    Internal change

159037430  by Zhichao Lu:

    Internal change

159035747  by Zhichao Lu:

    Internal change

159023868  by Zhichao Lu:

    Internal change

158939092  by Zhichao Lu:

    Internal change

158912561  by Zhichao Lu:

    Internal change

158903825  by Zhichao Lu:

    Internal change

158894348  by Zhichao Lu:

    Internal change

158884934  by Zhichao Lu:

    Internal change

158878010  by Zhichao Lu:

    Internal change

158874620  by Zhichao Lu:

    Internal change

158869501  by Zhichao Lu:

    Internal change

158842623  by Zhichao Lu:

    Internal change

158801298  by Zhichao Lu:

    Internal change

158775487  by Zhichao Lu:

    Internal change

158773668  by Zhichao Lu:

    Internal change

158771394  by Zhichao Lu:

    Internal change

158668928  by Zhichao Lu:

    Internal change

158596865  by Zhichao Lu:

    Internal change

158587317  by Zhichao Lu:

    Internal change

158586348  by Zhichao Lu:

    Internal change

158585707  by Zhichao Lu:

    Internal change

158577134  by Zhichao Lu:

    Internal change

158459749  by Zhichao Lu:

    Internal change

158459678  by Zhichao Lu:

    Internal change

158328972  by Zhichao Lu:

    Internal change

158324255  by Zhichao Lu:

    Internal change

158319576  by Zhichao Lu:

    Internal change

158290802  by Zhichao Lu:

    Internal change

158273041  by Zhichao Lu:

    Internal change

158240477  by Zhichao Lu:

    Internal change

158204316  by Zhichao Lu:

    Internal change

158154161  by Zhichao Lu:

    Internal change

158077203  by Zhichao Lu:

    Internal change

158041397  by Zhichao Lu:

    Internal change

158029233  by Zhichao Lu:

    Internal change

157976306  by Zhichao Lu:

    Internal change

157966896  by Zhichao Lu:

    Internal change

157945642  by Zhichao Lu:

    Internal change

157943135  by Zhichao Lu:

    Internal change

157942158  by Zhichao Lu:

    Internal change

157897866  by Zhichao Lu:

    Internal change

157866667  by Zhichao Lu:

    Internal change

157845915  by Zhichao Lu:

    Internal change

157842592  by Zhichao Lu:

    Internal change

157832761  by Zhichao Lu:

    Internal change

157824451  by Zhichao Lu:

    Internal change

157816531  by Zhichao Lu:

    Internal change

157782130  by Zhichao Lu:

    Internal change

157733752  by Zhichao Lu:

    Internal change

157654577  by Zhichao Lu:

    Internal change

157639285  by Zhichao Lu:

    Internal change

157530694  by Zhichao Lu:

    Internal change

157518469  by Zhichao Lu:

    Internal change

157514626  by Zhichao Lu:

    Internal change

157481413  by Zhichao Lu:

    Internal change

157267863  by Zhichao Lu:

    Internal change

157263616  by Zhichao Lu:

    Internal change

157234554  by Zhichao Lu:

    Internal change

157174595  by Zhichao Lu:

    Internal change

157169681  by Zhichao Lu:

    Internal change

157156425  by Zhichao Lu:

    Internal change

157024436  by Zhichao Lu:

    Internal change

157016195  by Zhichao Lu:

    Internal change

156941658  by Zhichao Lu:

    Internal change

156880859  by Zhichao Lu:

    Internal change

156790636  by Zhichao Lu:

    Internal change

156565969  by Zhichao Lu:

    Internal change

156522345  by Zhichao Lu:

    Internal change

156518570  by Zhichao Lu:

    Internal change

156509878  by Zhichao Lu:

    Internal change

156509134  by Zhichao Lu:

    Internal change

156472497  by Zhichao Lu:

    Internal change

156471429  by Zhichao Lu:

    Internal change

156470865  by Zhichao Lu:

    Internal change

156461563  by Zhichao Lu:

    Internal change

156437521  by Zhichao Lu:

    Internal change

156334994  by Zhichao Lu:

    Internal change

156319604  by Zhichao Lu:

    Internal change

156234305  by Zhichao Lu:

    Internal change

156226207  by Zhichao Lu:

    Internal change

156215347  by Zhichao Lu:

    Internal change

156127227  by Zhichao Lu:

    Internal change

156120405  by Zhichao Lu:

    Internal change

156113752  by Zhichao Lu:

    Internal change

156098936  by Zhichao Lu:

    Internal change

155924066  by Zhichao Lu:

    Internal change

155883241  by Zhichao Lu:

    Internal change

155806887  by Zhichao Lu:

    Internal change

155641849  by Zhichao Lu:

    Internal change

155593034  by Zhichao Lu:

    Internal change

155570702  by Zhichao Lu:

    Internal change

155515306  by Zhichao Lu:

    Internal change

155514787  by Zhichao Lu:

    Internal change

155445237  by Zhichao Lu:

    Internal change

155438672  by Zhichao Lu:

    Internal change

155264448  by Zhichao Lu:

    Internal change

155222148  by Zhichao Lu:

    Internal change

155106590  by Zhichao Lu:

    Internal change

155090562  by Zhichao Lu:

    Internal change

154973775  by Zhichao Lu:

    Internal change

154972880  by Zhichao Lu:

    Internal change

154871596  by Zhichao Lu:

    Internal change

154835007  by Zhichao Lu:

    Internal change

154788175  by Zhichao Lu:

    Internal change

154731169  by Zhichao Lu:

    Internal change

154721261  by Zhichao Lu:

    Internal change

154594626  by Zhichao Lu:

    Internal change

154588305  by Zhichao Lu:

    Internal change

154578994  by Zhichao Lu:

    Internal change

154571515  by Zhichao Lu:

    Internal change

154552873  by Zhichao Lu:

    Internal change

154549672  by Zhichao Lu:

    Internal change

154463631  by Zhichao Lu:

    Internal change

154437690  by Zhichao Lu:

    Internal change

154412359  by Zhichao Lu:

    Internal change

154374026  by Zhichao Lu:

    Internal change

154361648  by Zhichao Lu:

    Internal change

154310164  by Zhichao Lu:

    Internal change

154220862  by Zhichao Lu:

    Internal change

154187281  by Zhichao Lu:

    Internal change

154186651  by Zhichao Lu:

    Internal change

154119783  by Zhichao Lu:

    Internal change

154114285  by Zhichao Lu:

    Internal change

154095717  by Zhichao Lu:

    Internal change

154057972  by Zhichao Lu:

    Internal change

154055285  by Zhichao Lu:

    Internal change

153659288  by Zhichao Lu:

    Internal change

153637797  by Zhichao Lu:

    Internal change

153561771  by Zhichao Lu:

    Internal change

153540765  by Zhichao Lu:

    Internal change

153496128  by Zhichao Lu:

    Internal change

153473323  by Zhichao Lu:

    Internal change

153368812  by Zhichao Lu:

    Internal change

153367292  by Zhichao Lu:

    Internal change

153201890  by Zhichao Lu:

    Internal change

153074177  by Zhichao Lu:

    Internal change

152980017  by Zhichao Lu:

    Internal change

152978434  by Zhichao Lu:

    Internal change

152951821  by Zhichao Lu:

    Internal change

152904076  by Zhichao Lu:

    Internal change

152883703  by Zhichao Lu:

    Internal change

152869747  by Zhichao Lu:

    Internal change

152827463  by Zhichao Lu:

    Internal change

152756886  by Zhichao Lu:

    Internal change

152752840  by Zhichao Lu:

    Internal change

152736347  by Zhichao Lu:

    Internal change

152728184  by Zhichao Lu:

    Internal change

152720120  by Zhichao Lu:

    Internal change

152710964  by Zhichao Lu:

    Internal change

152706735  by Zhichao Lu:

    Internal change

152681133  by Zhichao Lu:

    Internal change

152517758  by Zhichao Lu:

    Internal change

152516381  by Zhichao Lu:

    Internal change

152511258  by Zhichao Lu:

    Internal change

152319164  by Zhichao Lu:

    Internal change

152316404  by Zhichao Lu:

    Internal change

152309261  by Zhichao Lu:

    Internal change

152308007  by Zhichao Lu:

    Internal change

152296551  by Zhichao Lu:

    Internal change

152188069  by Zhichao Lu:

    Internal change

152158644  by Zhichao Lu:

    Internal change

152153578  by Zhichao Lu:

    Internal change

152152285  by Zhichao Lu:

    Internal change

152055035  by Zhichao Lu:

    Internal change

152036778  by Zhichao Lu:

    Internal change

152020728  by Zhichao Lu:

    Internal change

152014842  by Zhichao Lu:

    Internal change

151848225  by Zhichao Lu:

    Internal change

151741308  by Zhichao Lu:

    Internal change

151740499  by Zhichao Lu:

    Internal change

151736189  by Zhichao Lu:

    Internal change

151612892  by Zhichao Lu:

    Internal change

151599502  by Zhichao Lu:

    Internal change

151538547  by Zhichao Lu:

    Internal change

151496530  by Zhichao Lu:

    Internal change

151476070  by Zhichao Lu:

    Internal change

151448662  by Zhichao Lu:

    Internal change

151411627  by Zhichao Lu:

    Internal change

151397737  by Zhichao Lu:

    Internal change

151169523  by Zhichao Lu:

    Internal change

151148956  by Zhichao Lu:

    Internal change

150944227  by Zhichao Lu:

    Internal change

150276683  by Zhichao Lu:

    Internal change

149986687  by Zhichao Lu:

    Internal change

149218749  by Zhichao Lu:

    Internal change

PiperOrigin-RevId: 184048729
parent 7ef602be
......@@ -80,7 +80,6 @@ class LocalizationLossBuilderTest(tf.test.TestCase):
losses_text_proto = """
localization_loss {
weighted_smooth_l1 {
anchorwise_output: true
}
}
classification_loss {
......@@ -245,7 +244,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
targets = tf.constant([[[0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]])
weights = tf.constant([[1.0, 1.0]])
loss = classification_loss(predictions, targets, weights=weights)
self.assertEqual(loss.shape, [1, 2])
self.assertEqual(loss.shape, [1, 2, 3])
def test_raise_error_on_empty_config(self):
losses_text_proto = """
......
......@@ -106,6 +106,7 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
min_depth = feature_extractor_config.min_depth
pad_to_multiple = feature_extractor_config.pad_to_multiple
batch_norm_trainable = feature_extractor_config.batch_norm_trainable
use_explicit_padding = feature_extractor_config.use_explicit_padding
conv_hyperparams = hyperparams_builder.build(
feature_extractor_config.conv_hyperparams, is_training)
......@@ -115,7 +116,8 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type]
return feature_extractor_class(is_training, depth_multiplier, min_depth,
pad_to_multiple, conv_hyperparams,
batch_norm_trainable, reuse_weights)
batch_norm_trainable, reuse_weights,
use_explicit_padding)
def _build_ssd_model(ssd_config, is_training):
......@@ -228,7 +230,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training):
feature_extractor = _build_faster_rcnn_feature_extractor(
frcnn_config.feature_extractor, is_training)
first_stage_only = frcnn_config.first_stage_only
number_of_stages = frcnn_config.number_of_stages
first_stage_anchor_generator = anchor_generator_builder.build(
frcnn_config.first_stage_anchor_generator)
......@@ -283,7 +285,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training):
'num_classes': num_classes,
'image_resizer_fn': image_resizer_fn,
'feature_extractor': feature_extractor,
'first_stage_only': first_stage_only,
'number_of_stages': number_of_stages,
'first_stage_anchor_generator': first_stage_anchor_generator,
'first_stage_atrous_rate': first_stage_atrous_rate,
'first_stage_box_predictor_arg_scope':
......
......@@ -196,7 +196,7 @@ class OptimizerBuilderTest(tf.test.TestCase):
optimizer = optimizer_builder.build(optimizer_proto, global_summaries)
self.assertTrue(
isinstance(optimizer, tf.contrib.opt.MovingAverageOptimizer))
# TODO(rathodv): Find a way to not depend on the private members.
# TODO: Find a way to not depend on the private members.
self.assertAlmostEqual(optimizer._ema._decay, 0.2)
def testBuildEmptyOptimizer(self):
......
......@@ -53,7 +53,7 @@ py_library(
deps = [
":box_list",
"//tensorflow",
"//tensorflow_models/object_detection/utils:shape_utils",
"//tensorflow/models/research/object_detection/utils:shape_utils",
],
)
......@@ -113,7 +113,7 @@ py_library(
":box_list",
":box_list_ops",
"//tensorflow",
"//tensorflow_models/object_detection/utils:ops",
"//tensorflow/models/research/object_detection/utils:ops",
],
)
......@@ -162,6 +162,7 @@ py_library(
":keypoint_ops",
":standard_fields",
"//tensorflow",
"//tensorflow/models/research/object_detection/utils:shape_utils",
],
)
......@@ -211,6 +212,7 @@ py_library(
":box_list_ops",
":standard_fields",
"//tensorflow",
"//tensorflow/models/research/object_detection/utils:shape_utils",
],
)
......@@ -232,15 +234,16 @@ py_library(
],
deps = [
":box_list",
":box_list_ops",
":matcher",
":region_similarity_calculator",
":standard_fields",
"//tensorflow",
"//tensorflow_models/object_detection/box_coders:faster_rcnn_box_coder",
"//tensorflow_models/object_detection/box_coders:mean_stddev_box_coder",
"//tensorflow_models/object_detection/core:box_coder",
"//tensorflow_models/object_detection/matchers:argmax_matcher",
"//tensorflow_models/object_detection/matchers:bipartite_matcher",
"//tensorflow/models/research/object_detection/box_coders:faster_rcnn_box_coder",
"//tensorflow/models/research/object_detection/box_coders:mean_stddev_box_coder",
"//tensorflow/models/research/object_detection/core:box_coder",
"//tensorflow/models/research/object_detection/matchers:argmax_matcher",
"//tensorflow/models/research/object_detection/matchers:bipartite_matcher",
"//tensorflow/models/research/object_detection/utils:shape_utils",
],
)
......@@ -254,8 +257,10 @@ py_test(
":region_similarity_calculator",
":target_assigner",
"//tensorflow",
"//tensorflow_models/object_detection/box_coders:mean_stddev_box_coder",
"//tensorflow_models/object_detection/matchers:bipartite_matcher",
"//tensorflow/models/research/object_detection/box_coders:keypoint_box_coder",
"//tensorflow/models/research/object_detection/box_coders:mean_stddev_box_coder",
"//tensorflow/models/research/object_detection/matchers:bipartite_matcher",
"//tensorflow/models/research/object_detection/utils:test_case",
],
)
......@@ -274,9 +279,9 @@ py_library(
srcs = ["box_predictor.py"],
deps = [
"//tensorflow",
"//tensorflow_models/object_detection/utils:ops",
"//tensorflow_models/object_detection/utils:shape_utils",
"//tensorflow_models/object_detection/utils:static_shape",
"//tensorflow/models/research/object_detection/utils:ops",
"//tensorflow/models/research/object_detection/utils:shape_utils",
"//tensorflow/models/research/object_detection/utils:static_shape",
],
)
......@@ -286,8 +291,9 @@ py_test(
deps = [
":box_predictor",
"//tensorflow",
"//tensorflow_models/object_detection/builders:hyperparams_builder",
"//tensorflow_models/object_detection/protos:hyperparams_py_pb2",
"//tensorflow/models/research/object_detection/builders:hyperparams_builder",
"//tensorflow/models/research/object_detection/protos:hyperparams_py_pb2",
"//tensorflow/models/research/object_detection/utils:test_case",
],
)
......@@ -298,7 +304,7 @@ py_library(
],
deps = [
"//tensorflow",
"//tensorflow_models/object_detection/core:box_list_ops",
"//tensorflow/models/research/object_detection/core:box_list_ops",
],
)
......@@ -309,7 +315,7 @@ py_test(
],
deps = [
":region_similarity_calculator",
"//tensorflow_models/object_detection/core:box_list",
"//tensorflow/models/research/object_detection/core:box_list",
],
)
......@@ -330,7 +336,7 @@ py_library(
],
deps = [
"//tensorflow",
"//tensorflow_models/object_detection/utils:ops",
"//tensorflow/models/research/object_detection/utils:ops",
],
)
......
......@@ -77,8 +77,8 @@ class AnchorGenerator(object):
def generate(self, feature_map_shape_list, **params):
"""Generates a collection of bounding boxes to be used as anchors.
TODO: remove **params from argument list and make stride and offsets (for
multiple_grid_anchor_generator) constructor arguments.
TODO: remove **params from argument list and make stride and
offsets (for multiple_grid_anchor_generator) constructor arguments.
Args:
feature_map_shape_list: list of (height, width) pairs in the format
......@@ -140,3 +140,4 @@ class AnchorGenerator(object):
* feature_map_shape[0]
* feature_map_shape[1])
return tf.assert_equal(expected_num_anchors, anchors.num_boxes())
......@@ -183,7 +183,8 @@ def prune_completely_outside_window(boxlist, window, scope=None):
scope: name scope.
Returns:
pruned_corners: a tensor with shape [M_out, 4] where M_out <= M_in
pruned_boxlist: a new BoxList with all bounding boxes partially or fully in
the window.
valid_indices: a tensor with shape [M_out] indexing the valid bounding boxes
in the input tensor.
"""
......@@ -656,7 +657,7 @@ def filter_greater_than(boxlist, thresh, scope=None):
This op keeps the collection of boxes whose corresponding scores are
greater than the input threshold.
TODO: Change function name to filter_scores_greater_than
TODO: Change function name to FilterScoresGreaterThan
Args:
boxlist: BoxList holding N boxes. Must contain a 'scores' field
......@@ -982,3 +983,79 @@ def pad_or_clip_box_list(boxlist, num_boxes, scope=None):
boxlist.get_field(field), num_boxes)
subboxlist.add_field(field, subfield)
return subboxlist
def select_random_box(boxlist,
default_box=None,
seed=None,
scope=None):
"""Selects a random bounding box from a `BoxList`.
Args:
boxlist: A BoxList.
default_box: A [1, 4] float32 tensor. If no boxes are present in `boxlist`,
this default box will be returned. If None, will use a default box of
[[-1., -1., -1., -1.]].
seed: Random seed.
scope: Name scope.
Returns:
bbox: A [1, 4] tensor with a random bounding box.
valid: A bool tensor indicating whether a valid bounding box is returned
(True) or whether the default box is returned (False).
"""
with tf.name_scope(scope, 'SelectRandomBox'):
bboxes = boxlist.get()
combined_shape = shape_utils.combined_static_and_dynamic_shape(bboxes)
number_of_boxes = combined_shape[0]
default_box = default_box or tf.constant([[-1., -1., -1., -1.]])
def select_box():
random_index = tf.random_uniform([],
maxval=number_of_boxes,
dtype=tf.int32,
seed=seed)
return tf.expand_dims(bboxes[random_index], axis=0), tf.constant(True)
return tf.cond(
tf.greater_equal(number_of_boxes, 1),
true_fn=select_box,
false_fn=lambda: (default_box, tf.constant(False)))
def get_minimal_coverage_box(boxlist,
default_box=None,
scope=None):
"""Creates a single bounding box which covers all boxes in the boxlist.
Args:
boxlist: A Boxlist.
default_box: A [1, 4] float32 tensor. If no boxes are present in `boxlist`,
this default box will be returned. If None, will use a default box of
[[0., 0., 1., 1.]].
scope: Name scope.
Returns:
A [1, 4] float32 tensor with a bounding box that tightly covers all the
boxes in the box list. If the boxlist does not contain any boxes, the
default box is returned.
"""
with tf.name_scope(scope, 'CreateCoverageBox'):
num_boxes = boxlist.num_boxes()
def coverage_box(bboxes):
y_min, x_min, y_max, x_max = tf.split(
value=bboxes, num_or_size_splits=4, axis=1)
y_min_coverage = tf.reduce_min(y_min, axis=0)
x_min_coverage = tf.reduce_min(x_min, axis=0)
y_max_coverage = tf.reduce_max(y_max, axis=0)
x_max_coverage = tf.reduce_max(x_max, axis=0)
return tf.stack(
[y_min_coverage, x_min_coverage, y_max_coverage, x_max_coverage],
axis=1)
default_box = default_box or tf.constant([[0., 0., 1., 1.]])
return tf.cond(
tf.greater_equal(num_boxes, 1),
true_fn=lambda: coverage_box(boxlist.get()),
false_fn=lambda: default_box)
......@@ -153,6 +153,25 @@ class BoxListOpsTest(tf.test.TestCase):
extra_data_out = sess.run(pruned.get_field('extra_data'))
self.assertAllEqual(extra_data_out, [[1], [2], [3], [4], [6]])
def test_prune_completely_outside_window_with_empty_boxlist(self):
window = tf.constant([0, 0, 9, 14], tf.float32)
corners = tf.zeros(shape=[0, 4], dtype=tf.float32)
boxes = box_list.BoxList(corners)
boxes.add_field('extra_data', tf.zeros(shape=[0], dtype=tf.int32))
pruned, keep_indices = box_list_ops.prune_completely_outside_window(boxes,
window)
pruned_boxes = pruned.get()
extra = pruned.get_field('extra_data')
exp_pruned_boxes = np.zeros(shape=[0, 4], dtype=np.float32)
exp_extra = np.zeros(shape=[0], dtype=np.int32)
with self.test_session() as sess:
pruned_boxes_out, keep_indices_out, extra_out = sess.run(
[pruned_boxes, keep_indices, extra])
self.assertAllClose(exp_pruned_boxes, pruned_boxes_out)
self.assertAllEqual([], keep_indices_out)
self.assertAllEqual(exp_extra, extra_out)
def test_intersection(self):
corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]])
corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0],
......@@ -593,6 +612,58 @@ class BoxListOpsTest(tf.test.TestCase):
self.assertAllEqual(expected_classes, classes_out)
self.assertAllClose(expected_scores, scores_out)
def test_select_random_box(self):
boxes = [[0., 0., 1., 1.],
[0., 1., 2., 3.],
[0., 2., 3., 4.]]
corners = tf.constant(boxes, dtype=tf.float32)
boxlist = box_list.BoxList(corners)
random_bbox, valid = box_list_ops.select_random_box(boxlist)
with self.test_session() as sess:
random_bbox_out, valid_out = sess.run([random_bbox, valid])
norm_small = any(
[np.linalg.norm(random_bbox_out - box) < 1e-6 for box in boxes])
self.assertTrue(norm_small)
self.assertTrue(valid_out)
def test_select_random_box_with_empty_boxlist(self):
corners = tf.constant([], shape=[0, 4], dtype=tf.float32)
boxlist = box_list.BoxList(corners)
random_bbox, valid = box_list_ops.select_random_box(boxlist)
with self.test_session() as sess:
random_bbox_out, valid_out = sess.run([random_bbox, valid])
expected_bbox_out = np.array([[-1., -1., -1., -1.]], dtype=np.float32)
self.assertAllEqual(expected_bbox_out, random_bbox_out)
self.assertFalse(valid_out)
def test_get_minimal_coverage_box(self):
boxes = [[0., 0., 1., 1.],
[-1., 1., 2., 3.],
[0., 2., 3., 4.]]
expected_coverage_box = [[-1., 0., 3., 4.]]
corners = tf.constant(boxes, dtype=tf.float32)
boxlist = box_list.BoxList(corners)
coverage_box = box_list_ops.get_minimal_coverage_box(boxlist)
with self.test_session() as sess:
coverage_box_out = sess.run(coverage_box)
self.assertAllClose(expected_coverage_box, coverage_box_out)
def test_get_minimal_coverage_box_with_empty_boxlist(self):
corners = tf.constant([], shape=[0, 4], dtype=tf.float32)
boxlist = box_list.BoxList(corners)
coverage_box = box_list_ops.get_minimal_coverage_box(boxlist)
with self.test_session() as sess:
coverage_box_out = sess.run(coverage_box)
self.assertAllClose([[0.0, 0.0, 1.0, 1.0]], coverage_box_out)
class ConcatenateTest(tf.test.TestCase):
......@@ -958,5 +1029,6 @@ class BoxRefinementTest(tf.test.TestCase):
self.assertAllClose(expected_scores, scores_out)
self.assertAllEqual(extra_field_out, [0, 1, 1])
if __name__ == '__main__':
tf.test.main()
......@@ -27,6 +27,7 @@ These modules are separated from the main model since the same
few box predictor architectures are shared across many models.
"""
from abc import abstractmethod
import math
import tensorflow as tf
from object_detection.utils import ops
from object_detection.utils import shape_utils
......@@ -59,8 +60,8 @@ class BoxPredictor(object):
def num_classes(self):
return self._num_classes
def predict(self, image_features, num_predictions_per_location, scope,
**params):
def predict(self, image_features, num_predictions_per_location,
scope=None, **params):
"""Computes encoded object locations and corresponding confidences.
Takes a high level image feature map as input and produce two predictions,
......@@ -70,10 +71,10 @@ class BoxPredictor(object):
and do not assume anything about their shapes.
Args:
image_features: A float tensor of shape [batch_size, height, width,
channels] containing features for a batch of images.
num_predictions_per_location: an integer representing the number of box
predictions to be made per spatial location in the feature map.
image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels_i] containing features for a batch of images.
num_predictions_per_location: A list of integers representing the number
of box predictions to be made per spatial location for each feature map.
scope: Variable and Op scope name.
**params: Additional keyword arguments for specific implementations of
BoxPredictor.
......@@ -86,8 +87,21 @@ class BoxPredictor(object):
class_predictions_with_background: A float tensor of shape
[batch_size, num_anchors, num_classes + 1] representing the class
predictions for the proposals.
Raises:
ValueError: If length of `image_features` is not equal to length of
`num_predictions_per_location`.
"""
with tf.variable_scope(scope):
if len(image_features) != len(num_predictions_per_location):
raise ValueError('image_feature and num_predictions_per_location must '
'be of same length, found: {} vs {}'.
format(len(image_features),
len(num_predictions_per_location)))
if scope is not None:
with tf.variable_scope(scope):
return self._predict(image_features, num_predictions_per_location,
**params)
else:
return self._predict(image_features, num_predictions_per_location,
**params)
......@@ -98,10 +112,10 @@ class BoxPredictor(object):
"""Implementations must override this method.
Args:
image_features: A float tensor of shape [batch_size, height, width,
channels] containing features for a batch of images.
num_predictions_per_location: an integer representing the number of box
predictions to be made per spatial location in the feature map.
image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels_i] containing features for a batch of images.
num_predictions_per_location: A list of integers representing the number
of box predictions to be made per spatial location for each feature map.
**params: Additional keyword arguments for specific implementations of
BoxPredictor.
......@@ -169,28 +183,35 @@ class RfcnBoxPredictor(BoxPredictor):
"""Computes encoded object locations and corresponding confidences.
Args:
image_features: A float tensor of shape [batch_size, height, width,
channels] containing features for a batch of images.
num_predictions_per_location: an integer representing the number of box
predictions to be made per spatial location in the feature map.
Currently, this must be set to 1, or an error will be raised.
image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels_i] containing features for a batch of images.
num_predictions_per_location: A list of integers representing the number
of box predictions to be made per spatial location for each feature map.
Currently, this must be set to [1], or an error will be raised.
proposal_boxes: A float tensor of shape [batch_size, num_proposals,
box_code_size].
Returns:
box_encodings: A float tensor of shape
[batch_size, 1, num_classes, code_size] representing the
[batch_size, num_anchors, num_classes, code_size] representing the
location of the objects.
class_predictions_with_background: A float tensor of shape
[batch_size, 1, num_classes + 1] representing the class
[batch_size, num_anchors, num_classes + 1] representing the class
predictions for the proposals.
Raises:
ValueError: if num_predictions_per_location is not 1.
ValueError: if num_predictions_per_location is not 1 or if
len(image_features) is not 1.
"""
if num_predictions_per_location != 1:
if (len(num_predictions_per_location) != 1 or
num_predictions_per_location[0] != 1):
raise ValueError('Currently RfcnBoxPredictor only supports '
'predicting a single box per class per location.')
if len(image_features) != 1:
raise ValueError('length of `image_features` must be 1. Found {}'.
format(len(image_features)))
image_feature = image_features[0]
num_predictions_per_location = num_predictions_per_location[0]
batch_size = tf.shape(proposal_boxes)[0]
num_boxes = tf.shape(proposal_boxes)[1]
def get_box_indices(proposals):
......@@ -202,7 +223,7 @@ class RfcnBoxPredictor(BoxPredictor):
tf.range(start=0, limit=proposals_shape[0]), 1)
return tf.reshape(ones_mat * multiplier, [-1])
net = image_features
net = image_feature
with slim.arg_scope(self._conv_hyperparams):
net = slim.conv2d(net, self._depth, [1, 1], scope='reduce_depth')
# Location predictions.
......@@ -280,6 +301,7 @@ class MaskRCNNBoxPredictor(BoxPredictor):
predict_instance_masks=False,
mask_height=14,
mask_width=14,
mask_prediction_num_conv_layers=2,
mask_prediction_conv_depth=256,
predict_keypoints=False):
"""Constructor.
......@@ -304,13 +326,21 @@ class MaskRCNNBoxPredictor(BoxPredictor):
boxes.
mask_height: Desired output mask height. The default value is 14.
mask_width: Desired output mask width. The default value is 14.
mask_prediction_num_conv_layers: Number of convolution layers applied to
the image_features in mask prediction branch.
mask_prediction_conv_depth: The depth for the first conv2d_transpose op
applied to the image_features in the mask prediciton branch.
applied to the image_features in the mask prediction branch. If set
to 0, the depth of the convolution layers will be automatically chosen
based on the number of object classes and the number of channels in the
image features.
predict_keypoints: Whether to predict keypoints insde detection boxes.
Raises:
ValueError: If predict_instance_masks or predict_keypoints is true.
ValueError: If predict_instance_masks is true but conv_hyperparams is not
set.
ValueError: If predict_keypoints is true since it is not implemented yet.
ValueError: If mask_prediction_num_conv_layers is smaller than two.
"""
super(MaskRCNNBoxPredictor, self).__init__(is_training, num_classes)
self._fc_hyperparams = fc_hyperparams
......@@ -321,6 +351,7 @@ class MaskRCNNBoxPredictor(BoxPredictor):
self._predict_instance_masks = predict_instance_masks
self._mask_height = mask_height
self._mask_width = mask_width
self._mask_prediction_num_conv_layers = mask_prediction_num_conv_layers
self._mask_prediction_conv_depth = mask_prediction_conv_depth
self._predict_keypoints = predict_keypoints
if self._predict_keypoints:
......@@ -329,52 +360,33 @@ class MaskRCNNBoxPredictor(BoxPredictor):
self._conv_hyperparams is None):
raise ValueError('`conv_hyperparams` must be provided when predicting '
'masks.')
if self._mask_prediction_num_conv_layers < 2:
raise ValueError(
'Mask prediction should consist of at least 2 conv layers')
@property
def num_classes(self):
return self._num_classes
def _predict(self, image_features, num_predictions_per_location):
"""Computes encoded object locations and corresponding confidences.
Flattens image_features and applies fully connected ops (with no
non-linearity) to predict box encodings and class predictions. In this
setting, anchors are not spatially arranged in any way and are assumed to
have been folded into the batch dimension. Thus we output 1 for the
anchors dimension.
@property
def predicts_instance_masks(self):
return self._predict_instance_masks
Also optionally predicts instance masks.
The mask prediction head is based on the Mask RCNN paper with the following
modifications: We replace the deconvolution layer with a bilinear resize
and a convolution.
def _predict_boxes_and_classes(self, image_features):
"""Predicts boxes and class scores.
Args:
image_features: A float tensor of shape [batch_size, height, width,
channels] containing features for a batch of images.
num_predictions_per_location: an integer representing the number of box
predictions to be made per spatial location in the feature map.
Currently, this must be set to 1, or an error will be raised.
Returns:
A dictionary containing the following tensors.
box_encodings: A float tensor of shape
[batch_size, 1, num_classes, code_size] representing the
location of the objects.
class_predictions_with_background: A float tensor of shape
[batch_size, 1, num_classes + 1] representing the class
predictions for the proposals.
If predict_masks is True the dictionary also contains:
instance_masks: A float tensor of shape
[batch_size, 1, num_classes, image_height, image_width]
If predict_keypoints is True the dictionary also contains:
keypoints: [batch_size, 1, num_keypoints, 2]
Raises:
ValueError: if num_predictions_per_location is not 1.
box_encodings: A float tensor of shape
[batch_size, 1, num_classes, code_size] representing the location of the
objects.
class_predictions_with_background: A float tensor of shape
[batch_size, 1, num_classes + 1] representing the class predictions for
the proposals.
"""
if num_predictions_per_location != 1:
raise ValueError('Currently FullyConnectedBoxPredictor only supports '
'predicting a single box per class per location.')
spatial_averaged_image_features = tf.reduce_mean(image_features, [1, 2],
keep_dims=True,
name='AvgPool')
......@@ -398,34 +410,155 @@ class MaskRCNNBoxPredictor(BoxPredictor):
box_encodings, [-1, 1, self._num_classes, self._box_code_size])
class_predictions_with_background = tf.reshape(
class_predictions_with_background, [-1, 1, self._num_classes + 1])
return box_encodings, class_predictions_with_background
def _get_mask_predictor_conv_depth(self, num_feature_channels, num_classes,
class_weight=3.0, feature_weight=2.0):
"""Computes the depth of the mask predictor convolutions.
Computes the depth of the mask predictor convolutions given feature channels
and number of classes by performing a weighted average of the two in
log space to compute the number of convolution channels. The weights that
are used for computing the weighted average do not need to sum to 1.
Args:
num_feature_channels: An integer containing the number of feature
channels.
num_classes: An integer containing the number of classes.
class_weight: Class weight used in computing the weighted average.
feature_weight: Feature weight used in computing the weighted average.
predictions_dict = {
BOX_ENCODINGS: box_encodings,
CLASS_PREDICTIONS_WITH_BACKGROUND: class_predictions_with_background
}
if self._predict_instance_masks:
with slim.arg_scope(self._conv_hyperparams):
upsampled_features = tf.image.resize_bilinear(
image_features,
[self._mask_height, self._mask_width],
align_corners=True)
Returns:
An integer containing the number of convolution channels used by mask
predictor.
"""
num_feature_channels_log = math.log(float(num_feature_channels), 2.0)
num_classes_log = math.log(float(num_classes), 2.0)
weighted_num_feature_channels_log = (
num_feature_channels_log * feature_weight)
weighted_num_classes_log = num_classes_log * class_weight
total_weight = feature_weight + class_weight
num_conv_channels_log = round(
(weighted_num_feature_channels_log + weighted_num_classes_log) /
total_weight)
return int(math.pow(2.0, num_conv_channels_log))
def _predict_masks(self, image_features):
"""Performs mask prediction.
Args:
image_features: A float tensor of shape [batch_size, height, width,
channels] containing features for a batch of images.
Returns:
instance_masks: A float tensor of shape
[batch_size, 1, num_classes, image_height, image_width].
"""
num_conv_channels = self._mask_prediction_conv_depth
if num_conv_channels == 0:
num_feature_channels = image_features.get_shape().as_list()[3]
num_conv_channels = self._get_mask_predictor_conv_depth(
num_feature_channels, self.num_classes)
with slim.arg_scope(self._conv_hyperparams):
upsampled_features = tf.image.resize_bilinear(
image_features,
[self._mask_height, self._mask_width],
align_corners=True)
for _ in range(self._mask_prediction_num_conv_layers - 1):
upsampled_features = slim.conv2d(
upsampled_features,
num_outputs=self._mask_prediction_conv_depth,
kernel_size=[2, 2])
mask_predictions = slim.conv2d(upsampled_features,
num_outputs=self.num_classes,
activation_fn=None,
kernel_size=[3, 3])
instance_masks = tf.expand_dims(tf.transpose(mask_predictions,
perm=[0, 3, 1, 2]),
axis=1,
name='MaskPredictor')
predictions_dict[MASK_PREDICTIONS] = instance_masks
num_outputs=num_conv_channels,
kernel_size=[3, 3])
mask_predictions = slim.conv2d(upsampled_features,
num_outputs=self.num_classes,
activation_fn=None,
kernel_size=[3, 3])
return tf.expand_dims(
tf.transpose(mask_predictions, perm=[0, 3, 1, 2]),
axis=1,
name='MaskPredictor')
def _predict(self, image_features, num_predictions_per_location,
predict_boxes_and_classes=True, predict_auxiliary_outputs=False):
"""Optionally computes encoded object locations, confidences, and masks.
Flattens image_features and applies fully connected ops (with no
non-linearity) to predict box encodings and class predictions. In this
setting, anchors are not spatially arranged in any way and are assumed to
have been folded into the batch dimension. Thus we output 1 for the
anchors dimension.
Also optionally predicts instance masks.
The mask prediction head is based on the Mask RCNN paper with the following
modifications: We replace the deconvolution layer with a bilinear resize
and a convolution.
Args:
image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels_i] containing features for a batch of images.
num_predictions_per_location: A list of integers representing the number
of box predictions to be made per spatial location for each feature map.
Currently, this must be set to [1], or an error will be raised.
predict_boxes_and_classes: If true, the function will perform box
refinement and classification.
predict_auxiliary_outputs: If true, the function will perform other
predictions such as mask, keypoint, boundaries, etc. if any.
Returns:
A dictionary containing the following tensors.
box_encodings: A float tensor of shape
[batch_size, 1, num_classes, code_size] representing the
location of the objects.
class_predictions_with_background: A float tensor of shape
[batch_size, 1, num_classes + 1] representing the class
predictions for the proposals.
If predict_masks is True the dictionary also contains:
instance_masks: A float tensor of shape
[batch_size, 1, num_classes, image_height, image_width]
If predict_keypoints is True the dictionary also contains:
keypoints: [batch_size, 1, num_keypoints, 2]
Raises:
ValueError: If num_predictions_per_location is not 1 or if both
predict_boxes_and_classes and predict_auxiliary_outputs are false or if
len(image_features) is not 1.
"""
if (len(num_predictions_per_location) != 1 or
num_predictions_per_location[0] != 1):
raise ValueError('Currently FullyConnectedBoxPredictor only supports '
'predicting a single box per class per location.')
if not predict_boxes_and_classes and not predict_auxiliary_outputs:
raise ValueError('Should perform at least one prediction.')
if len(image_features) != 1:
raise ValueError('length of `image_features` must be 1. Found {}'.
format(len(image_features)))
image_feature = image_features[0]
num_predictions_per_location = num_predictions_per_location[0]
predictions_dict = {}
if predict_boxes_and_classes:
(box_encodings, class_predictions_with_background
) = self._predict_boxes_and_classes(image_feature)
predictions_dict[BOX_ENCODINGS] = box_encodings
predictions_dict[
CLASS_PREDICTIONS_WITH_BACKGROUND] = class_predictions_with_background
if self._predict_instance_masks and predict_auxiliary_outputs:
predictions_dict[MASK_PREDICTIONS] = self._predict_masks(image_feature)
return predictions_dict
class _NoopVariableScope(object):
"""A dummy class that does not push any scope."""
def __enter__(self):
return None
def __exit__(self, exc_type, exc_value, traceback):
return False
class ConvolutionalBoxPredictor(BoxPredictor):
"""Convolutional Box Predictor.
......@@ -497,14 +630,15 @@ class ConvolutionalBoxPredictor(BoxPredictor):
self._apply_sigmoid_to_scores = apply_sigmoid_to_scores
self._class_prediction_bias_init = class_prediction_bias_init
def _predict(self, image_features, num_predictions_per_location):
def _predict(self, image_features, num_predictions_per_location_list):
"""Computes encoded object locations and corresponding confidences.
Args:
image_features: A float tensor of shape [batch_size, height, width,
channels] containing features for a batch of images.
num_predictions_per_location: an integer representing the number of box
predictions to be made per spatial location in the feature map.
image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels_i] containing features for a batch of images.
num_predictions_per_location_list: A list of integers representing the
number of box predictions to be made per spatial location for each
feature map.
Returns:
A dictionary containing the following tensors.
......@@ -514,53 +648,210 @@ class ConvolutionalBoxPredictor(BoxPredictor):
class_predictions_with_background: A float tensor of shape
[batch_size, num_anchors, num_classes + 1] representing the class
predictions for the proposals.
"""
# Add a slot for the background class.
num_class_slots = self.num_classes + 1
net = image_features
with slim.arg_scope(self._conv_hyperparams), \
slim.arg_scope([slim.dropout], is_training=self._is_training):
# Add additional conv layers before the class predictor.
features_depth = static_shape.get_depth(image_features.get_shape())
depth = max(min(features_depth, self._max_depth), self._min_depth)
tf.logging.info('depth of additional conv before box predictor: {}'.
format(depth))
if depth > 0 and self._num_layers_before_predictor > 0:
for i in range(self._num_layers_before_predictor):
net = slim.conv2d(
net, depth, [1, 1], scope='Conv2d_%d_1x1_%d' % (i, depth))
with slim.arg_scope([slim.conv2d], activation_fn=None,
normalizer_fn=None, normalizer_params=None):
box_encodings = slim.conv2d(
net, num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
scope='BoxEncodingPredictor')
if self._use_dropout:
net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
class_predictions_with_background = slim.conv2d(
net, num_predictions_per_location * num_class_slots,
[self._kernel_size, self._kernel_size], scope='ClassPredictor',
biases_initializer=tf.constant_initializer(
self._class_prediction_bias_init))
if self._apply_sigmoid_to_scores:
class_predictions_with_background = tf.sigmoid(
class_predictions_with_background)
combined_feature_map_shape = shape_utils.combined_static_and_dynamic_shape(
image_features)
box_encodings = tf.reshape(
box_encodings, tf.stack([combined_feature_map_shape[0],
combined_feature_map_shape[1] *
combined_feature_map_shape[2] *
num_predictions_per_location,
1, self._box_code_size]))
class_predictions_with_background = tf.reshape(
class_predictions_with_background,
tf.stack([combined_feature_map_shape[0],
combined_feature_map_shape[1] *
combined_feature_map_shape[2] *
num_predictions_per_location,
num_class_slots]))
return {BOX_ENCODINGS: box_encodings,
box_encodings_list = []
class_predictions_list = []
# TODO: Come up with a better way to generate scope names
# in box predictor once we have time to retrain all models in the zoo.
# The following lines create scope names to be backwards compatible with the
# existing checkpoints.
box_predictor_scopes = [_NoopVariableScope()]
if len(image_features) > 1:
box_predictor_scopes = [
tf.variable_scope('BoxPredictor_{}'.format(i))
for i in range(len(image_features))
]
for (image_feature,
num_predictions_per_location, box_predictor_scope) in zip(
image_features, num_predictions_per_location_list,
box_predictor_scopes):
with box_predictor_scope:
# Add a slot for the background class.
num_class_slots = self.num_classes + 1
net = image_feature
with slim.arg_scope(self._conv_hyperparams), \
slim.arg_scope([slim.dropout], is_training=self._is_training):
# Add additional conv layers before the class predictor.
features_depth = static_shape.get_depth(image_feature.get_shape())
depth = max(min(features_depth, self._max_depth), self._min_depth)
tf.logging.info('depth of additional conv before box predictor: {}'.
format(depth))
if depth > 0 and self._num_layers_before_predictor > 0:
for i in range(self._num_layers_before_predictor):
net = slim.conv2d(
net, depth, [1, 1], scope='Conv2d_%d_1x1_%d' % (i, depth))
with slim.arg_scope([slim.conv2d], activation_fn=None,
normalizer_fn=None, normalizer_params=None):
box_encodings = slim.conv2d(
net, num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
scope='BoxEncodingPredictor')
if self._use_dropout:
net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
class_predictions_with_background = slim.conv2d(
net, num_predictions_per_location * num_class_slots,
[self._kernel_size, self._kernel_size], scope='ClassPredictor',
biases_initializer=tf.constant_initializer(
self._class_prediction_bias_init))
if self._apply_sigmoid_to_scores:
class_predictions_with_background = tf.sigmoid(
class_predictions_with_background)
combined_feature_map_shape = (shape_utils.
combined_static_and_dynamic_shape(
image_feature))
box_encodings = tf.reshape(
box_encodings, tf.stack([combined_feature_map_shape[0],
combined_feature_map_shape[1] *
combined_feature_map_shape[2] *
num_predictions_per_location,
1, self._box_code_size]))
box_encodings_list.append(box_encodings)
class_predictions_with_background = tf.reshape(
class_predictions_with_background,
tf.stack([combined_feature_map_shape[0],
combined_feature_map_shape[1] *
combined_feature_map_shape[2] *
num_predictions_per_location,
num_class_slots]))
class_predictions_list.append(class_predictions_with_background)
return {BOX_ENCODINGS: tf.concat(box_encodings_list, axis=1),
CLASS_PREDICTIONS_WITH_BACKGROUND:
class_predictions_with_background}
tf.concat(class_predictions_list, axis=1)}
# TODO: Merge the implementation with ConvolutionalBoxPredictor above
# since they are very similar.
class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
"""Convolutional Box Predictor with weight sharing.
Defines the box predictor as defined in
https://arxiv.org/abs/1708.02002. This class differs from
ConvolutionalBoxPredictor in that it shares weights and biases while
predicting from different feature maps.
"""
def __init__(self,
is_training,
num_classes,
conv_hyperparams,
depth,
num_layers_before_predictor,
box_code_size,
kernel_size=3,
class_prediction_bias_init=0.0):
"""Constructor.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
conv_hyperparams: Slim arg_scope with hyperparameters for convolution ops.
depth: depth of conv layers.
num_layers_before_predictor: Number of the additional conv layers before
the predictor.
box_code_size: Size of encoding for each box.
kernel_size: Size of final convolution kernel.
class_prediction_bias_init: constant value to initialize bias of the last
conv2d layer before class prediction.
"""
super(WeightSharedConvolutionalBoxPredictor, self).__init__(is_training,
num_classes)
self._conv_hyperparams = conv_hyperparams
self._depth = depth
self._num_layers_before_predictor = num_layers_before_predictor
self._box_code_size = box_code_size
self._kernel_size = kernel_size
self._class_prediction_bias_init = class_prediction_bias_init
def _predict(self, image_features, num_predictions_per_location_list):
"""Computes encoded object locations and corresponding confidences.
Args:
image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels] containing features for a batch of images. Note that
all tensors in the list must have the same number of channels.
num_predictions_per_location_list: A list of integers representing the
number of box predictions to be made per spatial location for each
feature map. Note that all values must be the same since the weights are
shared.
Returns:
A dictionary containing the following tensors.
box_encodings: A float tensor of shape [batch_size, num_anchors, 1,
code_size] representing the location of the objects, where
num_anchors = feat_height * feat_width * num_predictions_per_location
class_predictions_with_background: A float tensor of shape
[batch_size, num_anchors, num_classes + 1] representing the class
predictions for the proposals.
Raises:
ValueError: If the image feature maps do not have the same number of
channels or if the num predictions per locations is differs between the
feature maps.
"""
if len(set(num_predictions_per_location_list)) > 1:
raise ValueError('num predictions per location must be same for all'
'feature maps, found: {}'.format(
num_predictions_per_location_list))
feature_channels = [
image_feature.shape[3].value for image_feature in image_features
]
if len(set(feature_channels)) > 1:
raise ValueError('all feature maps must have the same number of '
'channels, found: {}'.format(feature_channels))
box_encodings_list = []
class_predictions_list = []
for (image_feature, num_predictions_per_location) in zip(
image_features, num_predictions_per_location_list):
# Add a slot for the background class.
with tf.variable_scope('WeightSharedConvolutionalBoxPredictor',
reuse=tf.AUTO_REUSE):
num_class_slots = self.num_classes + 1
net = image_feature
with slim.arg_scope(self._conv_hyperparams):
for i in range(self._num_layers_before_predictor):
net = slim.conv2d(net,
self._depth,
[self._kernel_size, self._kernel_size],
stride=1,
padding='SAME',
scope='conv2d_{}'.format(i))
box_encodings = slim.conv2d(
net, num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
activation_fn=None, stride=1, padding='SAME',
scope='BoxEncodingPredictor')
class_predictions_with_background = slim.conv2d(
net, num_predictions_per_location * num_class_slots,
[self._kernel_size, self._kernel_size],
activation_fn=None, stride=1, padding='SAME',
biases_initializer=tf.constant_initializer(
self._class_prediction_bias_init),
scope='ClassPredictor')
combined_feature_map_shape = (shape_utils.
combined_static_and_dynamic_shape(
image_feature))
box_encodings = tf.reshape(
box_encodings, tf.stack([combined_feature_map_shape[0],
combined_feature_map_shape[1] *
combined_feature_map_shape[2] *
num_predictions_per_location,
1, self._box_code_size]))
box_encodings_list.append(box_encodings)
class_predictions_with_background = tf.reshape(
class_predictions_with_background,
tf.stack([combined_feature_map_shape[0],
combined_feature_map_shape[1] *
combined_feature_map_shape[2] *
num_predictions_per_location,
num_class_slots]))
class_predictions_list.append(class_predictions_with_background)
return {BOX_ENCODINGS: tf.concat(box_encodings_list, axis=1),
CLASS_PREDICTIONS_WITH_BACKGROUND:
tf.concat(class_predictions_list, axis=1)}
......@@ -14,7 +14,6 @@
# ==============================================================================
"""Tests for object_detection.core.box_predictor."""
import numpy as np
import tensorflow as tf
......@@ -22,6 +21,7 @@ from google.protobuf import text_format
from object_detection.builders import hyperparams_builder
from object_detection.core import box_predictor
from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case
class MaskRCNNBoxPredictorTest(tf.test.TestCase):
......@@ -55,7 +55,8 @@ class MaskRCNNBoxPredictorTest(tf.test.TestCase):
box_code_size=4,
)
box_predictions = mask_box_predictor.predict(
image_features, num_predictions_per_location=1, scope='BoxPredictor')
[image_features], num_predictions_per_location=[1],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
class_predictions_with_background = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
......@@ -93,12 +94,16 @@ class MaskRCNNBoxPredictorTest(tf.test.TestCase):
op_type=hyperparams_pb2.Hyperparams.CONV),
predict_instance_masks=True)
box_predictions = mask_box_predictor.predict(
image_features, num_predictions_per_location=1, scope='BoxPredictor')
[image_features],
num_predictions_per_location=[1],
scope='BoxPredictor',
predict_boxes_and_classes=True,
predict_auxiliary_outputs=True)
mask_predictions = box_predictions[box_predictor.MASK_PREDICTIONS]
self.assertListEqual([2, 1, 5, 14, 14],
mask_predictions.get_shape().as_list())
def test_do_not_return_instance_masks_and_keypoints_without_request(self):
def test_do_not_return_instance_masks_without_request(self):
image_features = tf.random_uniform([2, 7, 7, 3], dtype=tf.float32)
mask_box_predictor = box_predictor.MaskRCNNBoxPredictor(
is_training=False,
......@@ -108,7 +113,8 @@ class MaskRCNNBoxPredictorTest(tf.test.TestCase):
dropout_keep_prob=0.5,
box_code_size=4)
box_predictions = mask_box_predictor.predict(
image_features, num_predictions_per_location=1, scope='BoxPredictor')
[image_features], num_predictions_per_location=[1],
scope='BoxPredictor')
self.assertEqual(len(box_predictions), 2)
self.assertTrue(box_predictor.BOX_ENCODINGS in box_predictions)
self.assertTrue(box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND
......@@ -156,7 +162,8 @@ class RfcnBoxPredictorTest(tf.test.TestCase):
box_code_size=4
)
box_predictions = rfcn_box_predictor.predict(
image_features, num_predictions_per_location=1, scope='BoxPredictor',
[image_features], num_predictions_per_location=[1],
scope='BoxPredictor',
proposal_boxes=proposal_boxes)
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
class_predictions_with_background = box_predictions[
......@@ -173,7 +180,7 @@ class RfcnBoxPredictorTest(tf.test.TestCase):
self.assertAllEqual(class_predictions_shape, [8, 1, 3])
class ConvolutionalBoxPredictorTest(tf.test.TestCase):
class ConvolutionalBoxPredictorTest(test_case.TestCase):
def _build_arg_scope_with_conv_hyperparams(self):
conv_hyperparams = hyperparams_pb2.Hyperparams()
......@@ -192,36 +199,94 @@ class ConvolutionalBoxPredictorTest(tf.test.TestCase):
return hyperparams_builder.build(conv_hyperparams, is_training=True)
def test_get_boxes_for_five_aspect_ratios_per_location(self):
image_features = tf.random_uniform([4, 8, 8, 64], dtype=tf.float32)
conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
is_training=False,
num_classes=0,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
min_depth=0,
max_depth=32,
num_layers_before_predictor=1,
use_dropout=True,
dropout_keep_prob=0.8,
kernel_size=1,
box_code_size=4
)
box_predictions = conv_box_predictor.predict(
image_features, num_predictions_per_location=5, scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
objectness_predictions = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
(box_encodings_shape,
objectness_predictions_shape) = sess.run(
[tf.shape(box_encodings), tf.shape(objectness_predictions)])
self.assertAllEqual(box_encodings_shape, [4, 320, 1, 4])
self.assertAllEqual(objectness_predictions_shape, [4, 320, 1])
def graph_fn(image_features):
conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
is_training=False,
num_classes=0,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
min_depth=0,
max_depth=32,
num_layers_before_predictor=1,
use_dropout=True,
dropout_keep_prob=0.8,
kernel_size=1,
box_code_size=4
)
box_predictions = conv_box_predictor.predict(
[image_features], num_predictions_per_location=[5],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
objectness_predictions = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
return (box_encodings, objectness_predictions)
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
(box_encodings, objectness_predictions) = self.execute(graph_fn,
[image_features])
self.assertAllEqual(box_encodings.shape, [4, 320, 1, 4])
self.assertAllEqual(objectness_predictions.shape, [4, 320, 1])
def test_get_boxes_for_one_aspect_ratio_per_location(self):
image_features = tf.random_uniform([4, 8, 8, 64], dtype=tf.float32)
def graph_fn(image_features):
conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
is_training=False,
num_classes=0,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
min_depth=0,
max_depth=32,
num_layers_before_predictor=1,
use_dropout=True,
dropout_keep_prob=0.8,
kernel_size=1,
box_code_size=4
)
box_predictions = conv_box_predictor.predict(
[image_features], num_predictions_per_location=[1],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
objectness_predictions = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
return (box_encodings, objectness_predictions)
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
(box_encodings, objectness_predictions) = self.execute(graph_fn,
[image_features])
self.assertAllEqual(box_encodings.shape, [4, 64, 1, 4])
self.assertAllEqual(objectness_predictions.shape, [4, 64, 1])
def test_get_multi_class_predictions_for_five_aspect_ratios_per_location(
self):
num_classes_without_background = 6
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
def graph_fn(image_features):
conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
min_depth=0,
max_depth=32,
num_layers_before_predictor=1,
use_dropout=True,
dropout_keep_prob=0.8,
kernel_size=1,
box_code_size=4
)
box_predictions = conv_box_predictor.predict(
[image_features],
num_predictions_per_location=[5],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
class_predictions_with_background = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
return (box_encodings, class_predictions_with_background)
(box_encodings,
class_predictions_with_background) = self.execute(graph_fn,
[image_features])
self.assertAllEqual(box_encodings.shape, [4, 320, 1, 4])
self.assertAllEqual(class_predictions_with_background.shape,
[4, 320, num_classes_without_background+1])
def test_get_predictions_with_feature_maps_of_dynamic_shape(
self):
image_features = tf.placeholder(dtype=tf.float32, shape=[4, None, None, 64])
conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
is_training=False,
num_classes=0,
......@@ -235,71 +300,177 @@ class ConvolutionalBoxPredictorTest(tf.test.TestCase):
box_code_size=4
)
box_predictions = conv_box_predictor.predict(
image_features, num_predictions_per_location=1, scope='BoxPredictor')
[image_features], num_predictions_per_location=[5],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
objectness_predictions = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
init_op = tf.global_variables_initializer()
resolution = 32
expected_num_anchors = resolution*resolution*5
with self.test_session() as sess:
sess.run(init_op)
(box_encodings_shape,
objectness_predictions_shape) = sess.run(
[tf.shape(box_encodings), tf.shape(objectness_predictions)])
self.assertAllEqual(box_encodings_shape, [4, 64, 1, 4])
self.assertAllEqual(objectness_predictions_shape, [4, 64, 1])
[tf.shape(box_encodings), tf.shape(objectness_predictions)],
feed_dict={image_features:
np.random.rand(4, resolution, resolution, 64)})
self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4])
self.assertAllEqual(objectness_predictions_shape,
[4, expected_num_anchors, 1])
class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
def _build_arg_scope_with_conv_hyperparams(self):
conv_hyperparams = hyperparams_pb2.Hyperparams()
conv_hyperparams_text_proto = """
activation: RELU_6
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
return hyperparams_builder.build(conv_hyperparams, is_training=True)
def test_get_boxes_for_five_aspect_ratios_per_location(self):
def graph_fn(image_features):
conv_box_predictor = box_predictor.WeightSharedConvolutionalBoxPredictor(
is_training=False,
num_classes=0,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
depth=32,
num_layers_before_predictor=1,
box_code_size=4)
box_predictions = conv_box_predictor.predict(
[image_features], num_predictions_per_location=[5],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
objectness_predictions = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
return (box_encodings, objectness_predictions)
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
(box_encodings, objectness_predictions) = self.execute(
graph_fn, [image_features])
self.assertAllEqual(box_encodings.shape, [4, 320, 1, 4])
self.assertAllEqual(objectness_predictions.shape, [4, 320, 1])
def test_get_multi_class_predictions_for_five_aspect_ratios_per_location(
self):
num_classes_without_background = 6
image_features = tf.random_uniform([4, 8, 8, 64], dtype=tf.float32)
conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
min_depth=0,
max_depth=32,
num_layers_before_predictor=1,
use_dropout=True,
dropout_keep_prob=0.8,
kernel_size=1,
box_code_size=4
)
box_predictions = conv_box_predictor.predict(
image_features,
num_predictions_per_location=5,
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
class_predictions_with_background = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
def graph_fn(image_features):
conv_box_predictor = box_predictor.WeightSharedConvolutionalBoxPredictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
depth=32,
num_layers_before_predictor=1,
box_code_size=4)
box_predictions = conv_box_predictor.predict(
[image_features],
num_predictions_per_location=[5],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
class_predictions_with_background = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
return (box_encodings, class_predictions_with_background)
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
(box_encodings_shape, class_predictions_with_background_shape
) = sess.run([
tf.shape(box_encodings), tf.shape(class_predictions_with_background)])
self.assertAllEqual(box_encodings_shape, [4, 320, 1, 4])
self.assertAllEqual(class_predictions_with_background_shape,
[4, 320, num_classes_without_background+1])
def test_get_boxes_for_five_aspect_ratios_per_location_fully_convolutional(
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
(box_encodings, class_predictions_with_background) = self.execute(
graph_fn, [image_features])
self.assertAllEqual(box_encodings.shape, [4, 320, 1, 4])
self.assertAllEqual(class_predictions_with_background.shape,
[4, 320, num_classes_without_background+1])
def test_get_multi_class_predictions_from_two_feature_maps(
self):
num_classes_without_background = 6
def graph_fn(image_features1, image_features2):
conv_box_predictor = box_predictor.WeightSharedConvolutionalBoxPredictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
depth=32,
num_layers_before_predictor=1,
box_code_size=4)
box_predictions = conv_box_predictor.predict(
[image_features1, image_features2],
num_predictions_per_location=[5, 5],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
class_predictions_with_background = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
return (box_encodings, class_predictions_with_background)
image_features1 = np.random.rand(4, 8, 8, 64).astype(np.float32)
image_features2 = np.random.rand(4, 8, 8, 64).astype(np.float32)
(box_encodings, class_predictions_with_background) = self.execute(
graph_fn, [image_features1, image_features2])
self.assertAllEqual(box_encodings.shape, [4, 640, 1, 4])
self.assertAllEqual(class_predictions_with_background.shape,
[4, 640, num_classes_without_background+1])
def test_predictions_from_multiple_feature_maps_share_weights(self):
num_classes_without_background = 6
def graph_fn(image_features1, image_features2):
conv_box_predictor = box_predictor.WeightSharedConvolutionalBoxPredictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
depth=32,
num_layers_before_predictor=2,
box_code_size=4)
box_predictions = conv_box_predictor.predict(
[image_features1, image_features2],
num_predictions_per_location=[5, 5],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
class_predictions_with_background = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
return (box_encodings, class_predictions_with_background)
with self.test_session(graph=tf.Graph()):
graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
tf.random_uniform([4, 32, 32, 3], dtype=tf.float32))
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
expected_variable_set = set([
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_0/weights',
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_0/biases',
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_1/weights',
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_1/biases',
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictor/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictor/biases'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'ClassPredictor/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'ClassPredictor/biases')])
self.assertEqual(expected_variable_set, actual_variable_set)
def test_get_predictions_with_feature_maps_of_dynamic_shape(
self):
image_features = tf.placeholder(dtype=tf.float32, shape=[4, None, None, 64])
conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
conv_box_predictor = box_predictor.WeightSharedConvolutionalBoxPredictor(
is_training=False,
num_classes=0,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
min_depth=0,
max_depth=32,
depth=32,
num_layers_before_predictor=1,
use_dropout=True,
dropout_keep_prob=0.8,
kernel_size=1,
box_code_size=4
)
box_code_size=4)
box_predictions = conv_box_predictor.predict(
image_features, num_predictions_per_location=5, scope='BoxPredictor')
[image_features], num_predictions_per_location=[5],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
objectness_predictions = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
......
......@@ -50,8 +50,10 @@ class Loss(object):
"""Call the loss function.
Args:
prediction_tensor: a tensor representing predicted quantities.
target_tensor: a tensor representing regression or classification targets.
prediction_tensor: an N-d tensor of shape [batch, anchors, ...]
representing predicted quantities.
target_tensor: an N-d tensor of shape [batch, anchors, ...] representing
regression or classification targets.
ignore_nan_targets: whether to ignore nan targets in the loss computation.
E.g. can be used if the target tensor is missing groundtruth data that
shouldn't be factored into the loss.
......@@ -81,7 +83,8 @@ class Loss(object):
the Loss.
Returns:
loss: a tensor representing the value of the loss function
loss: an N-d tensor of shape [batch, anchors, ...] containing the loss per
anchor
"""
pass
......@@ -92,15 +95,6 @@ class WeightedL2LocalizationLoss(Loss):
Loss[b,a] = .5 * ||weights[b,a] * (prediction[b,a,:] - target[b,a,:])||^2
"""
def __init__(self, anchorwise_output=False):
"""Constructor.
Args:
anchorwise_output: Outputs loss per anchor. (default False)
"""
self._anchorwise_output = anchorwise_output
def _compute_loss(self, prediction_tensor, target_tensor, weights):
"""Compute loss function.
......@@ -112,15 +106,13 @@ class WeightedL2LocalizationLoss(Loss):
weights: a float tensor of shape [batch_size, num_anchors]
Returns:
loss: a (scalar) tensor representing the value of the loss function
or a float tensor of shape [batch_size, num_anchors]
loss: a float tensor of shape [batch_size, num_anchors] tensor
representing the value of the loss function.
"""
weighted_diff = (prediction_tensor - target_tensor) * tf.expand_dims(
weights, 2)
square_diff = 0.5 * tf.square(weighted_diff)
if self._anchorwise_output:
return tf.reduce_sum(square_diff, 2)
return tf.reduce_sum(square_diff)
return tf.reduce_sum(square_diff, 2)
class WeightedSmoothL1LocalizationLoss(Loss):
......@@ -132,15 +124,6 @@ class WeightedSmoothL1LocalizationLoss(Loss):
See also Equation (3) in the Fast R-CNN paper by Ross Girshick (ICCV 2015)
"""
def __init__(self, anchorwise_output=False):
"""Constructor.
Args:
anchorwise_output: Outputs loss per anchor. (default False)
"""
self._anchorwise_output = anchorwise_output
def _compute_loss(self, prediction_tensor, target_tensor, weights):
"""Compute loss function.
......@@ -152,7 +135,8 @@ class WeightedSmoothL1LocalizationLoss(Loss):
weights: a float tensor of shape [batch_size, num_anchors]
Returns:
loss: a (scalar) tensor representing the value of the loss function
loss: a float tensor of shape [batch_size, num_anchors] tensor
representing the value of the loss function.
"""
diff = prediction_tensor - target_tensor
abs_diff = tf.abs(diff)
......@@ -160,9 +144,7 @@ class WeightedSmoothL1LocalizationLoss(Loss):
anchorwise_smooth_l1norm = tf.reduce_sum(
tf.where(abs_diff_lt_1, 0.5 * tf.square(abs_diff), abs_diff - 0.5),
2) * weights
if self._anchorwise_output:
return anchorwise_smooth_l1norm
return tf.reduce_sum(anchorwise_smooth_l1norm)
return anchorwise_smooth_l1norm
class WeightedIOULocalizationLoss(Loss):
......@@ -184,27 +166,19 @@ class WeightedIOULocalizationLoss(Loss):
weights: a float tensor of shape [batch_size, num_anchors]
Returns:
loss: a (scalar) tensor representing the value of the loss function
loss: a float tensor of shape [batch_size, num_anchors] tensor
representing the value of the loss function.
"""
predicted_boxes = box_list.BoxList(tf.reshape(prediction_tensor, [-1, 4]))
target_boxes = box_list.BoxList(tf.reshape(target_tensor, [-1, 4]))
per_anchor_iou_loss = 1.0 - box_list_ops.matched_iou(predicted_boxes,
target_boxes)
return tf.reduce_sum(tf.reshape(weights, [-1]) * per_anchor_iou_loss)
return tf.reshape(weights, [-1]) * per_anchor_iou_loss
class WeightedSigmoidClassificationLoss(Loss):
"""Sigmoid cross entropy classification loss function."""
def __init__(self, anchorwise_output=False):
"""Constructor.
Args:
anchorwise_output: Outputs loss per anchor. (default False)
"""
self._anchorwise_output = anchorwise_output
def _compute_loss(self,
prediction_tensor,
target_tensor,
......@@ -222,8 +196,8 @@ class WeightedSigmoidClassificationLoss(Loss):
If provided, computes loss only for the specified class indices.
Returns:
loss: a (scalar) tensor representing the value of the loss function
or a float tensor of shape [batch_size, num_anchors]
loss: a float tensor of shape [batch_size, num_anchors, num_classes]
representing the value of the loss function.
"""
weights = tf.expand_dims(weights, 2)
if class_indices is not None:
......@@ -233,9 +207,7 @@ class WeightedSigmoidClassificationLoss(Loss):
[1, 1, -1])
per_entry_cross_ent = (tf.nn.sigmoid_cross_entropy_with_logits(
labels=target_tensor, logits=prediction_tensor))
if self._anchorwise_output:
return tf.reduce_sum(per_entry_cross_ent * weights, 2)
return tf.reduce_sum(per_entry_cross_ent * weights)
return per_entry_cross_ent * weights
class SigmoidFocalClassificationLoss(Loss):
......@@ -245,15 +217,13 @@ class SigmoidFocalClassificationLoss(Loss):
examples. See https://arxiv.org/pdf/1708.02002.pdf for the loss definition.
"""
def __init__(self, anchorwise_output=False, gamma=2.0, alpha=0.25):
def __init__(self, gamma=2.0, alpha=0.25):
"""Constructor.
Args:
anchorwise_output: Outputs loss per anchor. (default False)
gamma: exponent of the modulating factor (1 - p_t) ^ gamma.
alpha: optional alpha weighting factor to balance positives vs negatives.
"""
self._anchorwise_output = anchorwise_output
self._alpha = alpha
self._gamma = gamma
......@@ -274,8 +244,8 @@ class SigmoidFocalClassificationLoss(Loss):
If provided, computes loss only for the specified class indices.
Returns:
loss: a (scalar) tensor representing the value of the loss function
or a float tensor of shape [batch_size, num_anchors]
loss: a float tensor of shape [batch_size, num_anchors, num_classes]
representing the value of the loss function.
"""
weights = tf.expand_dims(weights, 2)
if class_indices is not None:
......@@ -297,25 +267,21 @@ class SigmoidFocalClassificationLoss(Loss):
(1 - target_tensor) * (1 - self._alpha))
focal_cross_entropy_loss = (modulating_factor * alpha_weight_factor *
per_entry_cross_ent)
if self._anchorwise_output:
return tf.reduce_sum(focal_cross_entropy_loss * weights, 2)
return tf.reduce_sum(focal_cross_entropy_loss * weights)
return focal_cross_entropy_loss * weights
class WeightedSoftmaxClassificationLoss(Loss):
"""Softmax loss function."""
def __init__(self, anchorwise_output=False, logit_scale=1.0):
def __init__(self, logit_scale=1.0):
"""Constructor.
Args:
anchorwise_output: Whether to output loss per anchor (default False)
logit_scale: When this value is high, the prediction is "diffused" and
when this value is low, the prediction is made peakier.
(default 1.0)
"""
self._anchorwise_output = anchorwise_output
self._logit_scale = logit_scale
def _compute_loss(self, prediction_tensor, target_tensor, weights):
......@@ -329,7 +295,8 @@ class WeightedSoftmaxClassificationLoss(Loss):
weights: a float tensor of shape [batch_size, num_anchors]
Returns:
loss: a (scalar) tensor representing the value of the loss function
loss: a float tensor of shape [batch_size, num_anchors]
representing the value of the loss function.
"""
num_classes = prediction_tensor.get_shape().as_list()[-1]
prediction_tensor = tf.divide(
......@@ -337,9 +304,7 @@ class WeightedSoftmaxClassificationLoss(Loss):
per_row_cross_ent = (tf.nn.softmax_cross_entropy_with_logits(
labels=tf.reshape(target_tensor, [-1, num_classes]),
logits=tf.reshape(prediction_tensor, [-1, num_classes])))
if self._anchorwise_output:
return tf.reshape(per_row_cross_ent, tf.shape(weights)) * weights
return tf.reduce_sum(per_row_cross_ent * tf.reshape(weights, [-1]))
return tf.reshape(per_row_cross_ent, tf.shape(weights)) * weights
class BootstrappedSigmoidClassificationLoss(Loss):
......@@ -359,14 +324,13 @@ class BootstrappedSigmoidClassificationLoss(Loss):
Reed et al. (ICLR 2015).
"""
def __init__(self, alpha, bootstrap_type='soft', anchorwise_output=False):
def __init__(self, alpha, bootstrap_type='soft'):
"""Constructor.
Args:
alpha: a float32 scalar tensor between 0 and 1 representing interpolation
weight
bootstrap_type: set to either 'hard' or 'soft' (default)
anchorwise_output: Outputs loss per anchor. (default False)
Raises:
ValueError: if bootstrap_type is not either 'hard' or 'soft'
......@@ -376,7 +340,6 @@ class BootstrappedSigmoidClassificationLoss(Loss):
'\'hard\' or \'soft.\'')
self._alpha = alpha
self._bootstrap_type = bootstrap_type
self._anchorwise_output = anchorwise_output
def _compute_loss(self, prediction_tensor, target_tensor, weights):
"""Compute loss function.
......@@ -389,8 +352,8 @@ class BootstrappedSigmoidClassificationLoss(Loss):
weights: a float tensor of shape [batch_size, num_anchors]
Returns:
loss: a (scalar) tensor representing the value of the loss function
or a float tensor of shape [batch_size, num_anchors]
loss: a float tensor of shape [batch_size, num_anchors, num_classes]
representing the value of the loss function.
"""
if self._bootstrap_type == 'soft':
bootstrap_target_tensor = self._alpha * target_tensor + (
......@@ -401,9 +364,7 @@ class BootstrappedSigmoidClassificationLoss(Loss):
tf.sigmoid(prediction_tensor) > 0.5, tf.float32)
per_entry_cross_ent = (tf.nn.sigmoid_cross_entropy_with_logits(
labels=bootstrap_target_tensor, logits=prediction_tensor))
if self._anchorwise_output:
return tf.reduce_sum(per_entry_cross_ent * tf.expand_dims(weights, 2), 2)
return tf.reduce_sum(per_entry_cross_ent * tf.expand_dims(weights, 2))
return per_entry_cross_ent * tf.expand_dims(weights, 2)
class HardExampleMiner(object):
......
......@@ -26,7 +26,7 @@ from object_detection.core import matcher
class WeightedL2LocalizationLossTest(tf.test.TestCase):
def testReturnsCorrectLoss(self):
def testReturnsCorrectWeightedLoss(self):
batch_size = 3
num_anchors = 10
code_size = 4
......@@ -36,7 +36,8 @@ class WeightedL2LocalizationLossTest(tf.test.TestCase):
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]], tf.float32)
loss_op = losses.WeightedL2LocalizationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss_op(prediction_tensor, target_tensor,
weights=weights))
expected_loss = (3 * 5 * 4) / 2.0
with self.test_session() as sess:
......@@ -50,7 +51,7 @@ class WeightedL2LocalizationLossTest(tf.test.TestCase):
prediction_tensor = tf.ones([batch_size, num_anchors, code_size])
target_tensor = tf.zeros([batch_size, num_anchors, code_size])
weights = tf.ones([batch_size, num_anchors])
loss_op = losses.WeightedL2LocalizationLoss(anchorwise_output=True)
loss_op = losses.WeightedL2LocalizationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
expected_loss = np.ones((batch_size, num_anchors)) * 2
......@@ -58,22 +59,6 @@ class WeightedL2LocalizationLossTest(tf.test.TestCase):
loss_output = sess.run(loss)
self.assertAllClose(loss_output, expected_loss)
def testReturnsCorrectLossSum(self):
batch_size = 3
num_anchors = 16
code_size = 4
prediction_tensor = tf.ones([batch_size, num_anchors, code_size])
target_tensor = tf.zeros([batch_size, num_anchors, code_size])
weights = tf.ones([batch_size, num_anchors])
loss_op = losses.WeightedL2LocalizationLoss(anchorwise_output=False)
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
expected_loss = tf.nn.l2_loss(prediction_tensor - target_tensor)
with self.test_session() as sess:
loss_output = sess.run(loss)
expected_loss_output = sess.run(expected_loss)
self.assertAllClose(loss_output, expected_loss_output)
def testReturnsCorrectNanLoss(self):
batch_size = 3
num_anchors = 10
......@@ -87,6 +72,7 @@ class WeightedL2LocalizationLossTest(tf.test.TestCase):
loss_op = losses.WeightedL2LocalizationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights,
ignore_nan_targets=True)
loss = tf.reduce_sum(loss)
expected_loss = (3 * 5 * 4) / 2.0
with self.test_session() as sess:
......@@ -111,6 +97,7 @@ class WeightedSmoothL1LocalizationLossTest(tf.test.TestCase):
[0, 3, 0]], tf.float32)
loss_op = losses.WeightedSmoothL1LocalizationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss)
exp_loss = 7.695
with self.test_session() as sess:
......@@ -130,6 +117,7 @@ class WeightedIOULocalizationLossTest(tf.test.TestCase):
weights = [[1.0, .5, 2.0]]
loss_op = losses.WeightedIOULocalizationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss)
exp_loss = 2.0
with self.test_session() as sess:
loss_output = sess.run(loss)
......@@ -159,6 +147,7 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
[1, 1, 1, 0]], tf.float32)
loss_op = losses.WeightedSigmoidClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss)
exp_loss = -2 * math.log(.5)
with self.test_session() as sess:
......@@ -184,8 +173,9 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
loss_op = losses.WeightedSigmoidClassificationLoss(True)
loss_op = losses.WeightedSigmoidClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss, axis=2)
exp_loss = np.matrix([[0, 0, -math.log(.5), 0],
[-math.log(.5), 0, 0, 0]])
......@@ -214,9 +204,10 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
[1, 1, 1, 0]], tf.float32)
# Ignores the last class.
class_indices = tf.constant([0, 1, 2], tf.int32)
loss_op = losses.WeightedSigmoidClassificationLoss(True)
loss_op = losses.WeightedSigmoidClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights,
class_indices=class_indices)
loss = tf.reduce_sum(loss, axis=2)
exp_loss = np.matrix([[0, 0, -math.log(.5), 0],
[-math.log(.5), 0, 0, 0]])
......@@ -245,14 +236,13 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[0],
[0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1, 1, 1]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(
anchorwise_output=True, gamma=2.0, alpha=None)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss(
anchorwise_output=True)
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
weights=weights)
sigmoid_loss = sigmoid_loss_op(prediction_tensor, target_tensor,
weights=weights)
focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
weights=weights), axis=2)
sigmoid_loss = tf.reduce_sum(sigmoid_loss_op(prediction_tensor,
target_tensor,
weights=weights), axis=2)
with self.test_session() as sess:
sigmoid_loss, focal_loss = sess.run([sigmoid_loss, focal_loss])
......@@ -272,14 +262,13 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[0],
[0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(
anchorwise_output=True, gamma=2.0, alpha=None)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss(
anchorwise_output=True)
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
weights=weights)
sigmoid_loss = sigmoid_loss_op(prediction_tensor, target_tensor,
weights=weights)
focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
weights=weights), axis=2)
sigmoid_loss = tf.reduce_sum(sigmoid_loss_op(prediction_tensor,
target_tensor,
weights=weights), axis=2)
with self.test_session() as sess:
sigmoid_loss, focal_loss = sess.run([sigmoid_loss, focal_loss])
......@@ -299,14 +288,13 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[0],
[0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(
anchorwise_output=False, gamma=2.0, alpha=None)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss(
anchorwise_output=False)
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
weights=weights)
sigmoid_loss = sigmoid_loss_op(prediction_tensor, target_tensor,
weights=weights)
focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
weights=weights))
sigmoid_loss = tf.reduce_sum(sigmoid_loss_op(prediction_tensor,
target_tensor,
weights=weights))
with self.test_session() as sess:
sigmoid_loss, focal_loss = sess.run([sigmoid_loss, focal_loss])
......@@ -326,14 +314,13 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[0],
[0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(
anchorwise_output=True, gamma=2.0, alpha=1.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss(
anchorwise_output=True)
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
weights=weights)
sigmoid_loss = sigmoid_loss_op(prediction_tensor, target_tensor,
weights=weights)
focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=1.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
weights=weights), axis=2)
sigmoid_loss = tf.reduce_sum(sigmoid_loss_op(prediction_tensor,
target_tensor,
weights=weights), axis=2)
with self.test_session() as sess:
sigmoid_loss, focal_loss = sess.run([sigmoid_loss, focal_loss])
......@@ -355,14 +342,13 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[0],
[0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(
anchorwise_output=True, gamma=2.0, alpha=0.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss(
anchorwise_output=True)
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
weights=weights)
sigmoid_loss = sigmoid_loss_op(prediction_tensor, target_tensor,
weights=weights)
focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=0.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
weights=weights), axis=2)
sigmoid_loss = tf.reduce_sum(sigmoid_loss_op(prediction_tensor,
target_tensor,
weights=weights), axis=2)
with self.test_session() as sess:
sigmoid_loss, focal_loss = sess.run([sigmoid_loss, focal_loss])
......@@ -391,10 +377,8 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(
anchorwise_output=True, alpha=0.5, gamma=0.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss(
anchorwise_output=True)
focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.5, gamma=0.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
weights=weights)
sigmoid_loss = sigmoid_loss_op(prediction_tensor, target_tensor,
......@@ -423,10 +407,8 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(
anchorwise_output=True, alpha=None, gamma=0.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss(
anchorwise_output=True)
focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=None, gamma=0.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
weights=weights)
sigmoid_loss = sigmoid_loss_op(prediction_tensor, target_tensor,
......@@ -456,11 +438,10 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 1]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(
anchorwise_output=False, alpha=1.0, gamma=0.0)
focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=1.0, gamma=0.0)
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
weights=weights)
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
weights=weights))
with self.test_session() as sess:
focal_loss = sess.run(focal_loss)
self.assertAllClose(
......@@ -489,11 +470,10 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 1]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(
anchorwise_output=False, alpha=0.75, gamma=0.0)
focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.75, gamma=0.0)
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
weights=weights)
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
weights=weights))
with self.test_session() as sess:
focal_loss = sess.run(focal_loss)
self.assertAllClose(
......@@ -528,6 +508,7 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
[1, 1, 1, 0]], tf.float32)
loss_op = losses.WeightedSoftmaxClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss)
exp_loss = - 1.5 * math.log(.5)
with self.test_session() as sess:
......@@ -553,7 +534,7 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, .5, 1],
[1, 1, 1, 0]], tf.float32)
loss_op = losses.WeightedSoftmaxClassificationLoss(True)
loss_op = losses.WeightedSoftmaxClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
exp_loss = np.matrix([[0, 0, - 0.5 * math.log(.5), 0],
......@@ -564,7 +545,7 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
def testReturnsCorrectAnchorWiseLossWithHighLogitScaleSetting(self):
"""At very high logit_scale, all predictions will be ~0.33."""
# TODO(yonib): Also test logit_scale with anchorwise=False.
# TODO: Also test logit_scale with anchorwise=False.
logit_scale = 10e16
prediction_tensor = tf.constant([[[-100, 100, -100],
[100, -100, -100],
......@@ -584,8 +565,7 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 1]], tf.float32)
loss_op = losses.WeightedSoftmaxClassificationLoss(
anchorwise_output=True, logit_scale=logit_scale)
loss_op = losses.WeightedSoftmaxClassificationLoss(logit_scale=logit_scale)
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
uniform_distribution_loss = - math.log(.33333333333)
......@@ -621,6 +601,7 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
loss_op = losses.BootstrappedSigmoidClassificationLoss(
alpha, bootstrap_type='soft')
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss)
exp_loss = -math.log(.5)
with self.test_session() as sess:
loss_output = sess.run(loss)
......@@ -649,6 +630,7 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
loss_op = losses.BootstrappedSigmoidClassificationLoss(
alpha, bootstrap_type='hard')
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss)
exp_loss = -math.log(.5)
with self.test_session() as sess:
loss_output = sess.run(loss)
......@@ -675,9 +657,9 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
[1, 1, 1, 0]], tf.float32)
alpha = tf.constant(.5, tf.float32)
loss_op = losses.BootstrappedSigmoidClassificationLoss(
alpha, bootstrap_type='hard', anchorwise_output=True)
alpha, bootstrap_type='hard')
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss, axis=2)
exp_loss = np.matrix([[0, 0, -math.log(.5), 0],
[-math.log(.5), 0, 0, 0]])
with self.test_session() as sess:
......
......@@ -168,6 +168,34 @@ class Match(object):
def _reshape_and_cast(self, t):
return tf.cast(tf.reshape(t, [-1]), tf.int32)
def gather_based_on_match(self, input_tensor, unmatched_value,
ignored_value):
"""Gathers elements from `input_tensor` based on match results.
For columns that are matched to a row, gathered_tensor[col] is set to
input_tensor[match_results[col]]. For columns that are unmatched,
gathered_tensor[col] is set to unmatched_value. Finally, for columns that
are ignored gathered_tensor[col] is set to ignored_value.
Note that the input_tensor.shape[1:] must match with unmatched_value.shape
and ignored_value.shape
Args:
input_tensor: Tensor to gather values from.
unmatched_value: Constant tensor value for unmatched columns.
ignored_value: Constant tensor value for ignored columns.
Returns:
gathered_tensor: A tensor containing values gathered from input_tensor.
The shape of the gathered tensor is [match_results.shape[0]] +
input_tensor.shape[1:].
"""
input_tensor = tf.concat([tf.stack([ignored_value, unmatched_value]),
input_tensor], axis=0)
gather_indices = tf.maximum(self.match_results + 2, 0)
gathered_tensor = tf.gather(input_tensor, gather_indices)
return gathered_tensor
class Matcher(object):
"""Abstract base class for matcher.
......@@ -195,7 +223,7 @@ class Matcher(object):
@abstractmethod
def _match(self, similarity_matrix, **params):
"""Method to be overriden by implementations.
"""Method to be overridden by implementations.
Args:
similarity_matrix: Float tensor of shape [N, M] with pairwise similarity
......
......@@ -20,7 +20,7 @@ import tensorflow as tf
from object_detection.core import matcher
class AnchorMatcherTest(tf.test.TestCase):
class MatchTest(tf.test.TestCase):
def test_get_correct_matched_columnIndices(self):
match_results = tf.constant([3, 1, -1, 0, -1, 5, -2])
......@@ -145,6 +145,32 @@ class AnchorMatcherTest(tf.test.TestCase):
self.assertAllEqual(all_indices_sorted,
np.arange(num_matches, dtype=np.int32))
def test_scalar_gather_based_on_match(self):
match_results = tf.constant([3, 1, -1, 0, -1, 5, -2])
input_tensor = tf.constant([0, 1, 2, 3, 4, 5, 6, 7], dtype=tf.float32)
expected_gathered_tensor = [3, 1, 100, 0, 100, 5, 200]
match = matcher.Match(match_results)
gathered_tensor = match.gather_based_on_match(input_tensor,
unmatched_value=100.,
ignored_value=200.)
self.assertEquals(gathered_tensor.dtype, tf.float32)
with self.test_session():
gathered_tensor_out = gathered_tensor.eval()
self.assertAllEqual(expected_gathered_tensor, gathered_tensor_out)
def test_multidimensional_gather_based_on_match(self):
match_results = tf.constant([1, -1, -2])
input_tensor = tf.constant([[0, 0.5, 0, 0.5], [0, 0, 0.5, 0.5]],
dtype=tf.float32)
expected_gathered_tensor = [[0, 0, 0.5, 0.5], [0, 0, 0, 0], [0, 0, 0, 0]]
match = matcher.Match(match_results)
gathered_tensor = match.gather_based_on_match(input_tensor,
unmatched_value=tf.zeros(4),
ignored_value=tf.zeros(4))
self.assertEquals(gathered_tensor.dtype, tf.float32)
with self.test_session():
gathered_tensor_out = gathered_tensor.eval()
self.assertAllEqual(expected_gathered_tensor, gathered_tensor_out)
if __name__ == '__main__':
tf.test.main()
......@@ -39,6 +39,17 @@ resize/reshaping necessary (see docstring for the preprocess function).
Output classes are always integers in the range [0, num_classes). Any mapping
of these integers to semantic labels is to be handled outside of this class.
Images are resized in the `preprocess` method. All of `preprocess`, `predict`,
and `postprocess` should be stateless.
The `preprocess` method runs `image_resizer_fn` that returns resized_images and
`true_image_shapes`. Since `image_resizer_fn` can pad the images with zeros,
true_image_shapes indicate the slices that contain the image without padding.
This is useful for padding images to be a fixed size for batching.
The `postprocess` method uses the true image shapes to clip predictions that lie
outside of images.
By default, DetectionModels produce bounding box detections; However, we support
a handful of auxiliary annotations associated with each bounding box, namely,
instance masks and keypoints.
......@@ -106,12 +117,12 @@ class DetectionModel(object):
This function is responsible for any scaling/shifting of input values that
is necessary prior to running the detector on an input image.
It is also responsible for any resizing that might be necessary as images
are assumed to arrive in arbitrary sizes. While this function could
conceivably be part of the predict method (below), it is often convenient
to keep these separate --- for example, we may want to preprocess on one
device, place onto a queue, and let another device (e.g., the GPU) handle
prediction.
It is also responsible for any resizing, padding that might be necessary
as images are assumed to arrive in arbitrary sizes. While this function
could conceivably be part of the predict method (below), it is often
convenient to keep these separate --- for example, we may want to preprocess
on one device, place onto a queue, and let another device (e.g., the GPU)
handle prediction.
A few important notes about the preprocess function:
+ We assume that this operation does not have any trainable variables nor
......@@ -134,11 +145,15 @@ class DetectionModel(object):
Returns:
preprocessed_inputs: a [batch, height_out, width_out, channels] float32
tensor representing a batch of images.
true_image_shapes: int32 tensor of shape [batch, 3] where each row is
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
"""
pass
@abstractmethod
def predict(self, preprocessed_inputs):
def predict(self, preprocessed_inputs, true_image_shapes):
"""Predict prediction tensors from inputs tensor.
Outputs of this function can be passed to loss or postprocess functions.
......@@ -146,6 +161,10 @@ class DetectionModel(object):
Args:
preprocessed_inputs: a [batch, height, width, channels] float32 tensor
representing a batch of images.
true_image_shapes: int32 tensor of shape [batch, 3] where each row is
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
Returns:
prediction_dict: a dictionary holding prediction tensors to be
......@@ -154,7 +173,7 @@ class DetectionModel(object):
pass
@abstractmethod
def postprocess(self, prediction_dict, **params):
def postprocess(self, prediction_dict, true_image_shapes, **params):
"""Convert predicted output tensors to final detections.
Outputs adhere to the following conventions:
......@@ -172,6 +191,10 @@ class DetectionModel(object):
Args:
prediction_dict: a dictionary holding prediction tensors.
true_image_shapes: int32 tensor of shape [batch, 3] where each row is
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
**params: Additional keyword arguments for specific implementations of
DetectionModel.
......@@ -190,7 +213,7 @@ class DetectionModel(object):
pass
@abstractmethod
def loss(self, prediction_dict):
def loss(self, prediction_dict, true_image_shapes):
"""Compute scalar loss tensors with respect to provided groundtruth.
Calling this function requires that groundtruth tensors have been
......@@ -198,6 +221,10 @@ class DetectionModel(object):
Args:
prediction_dict: a dictionary holding predicted tensors
true_image_shapes: int32 tensor of shape [batch, 3] where each row is
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
Returns:
a dictionary mapping strings (loss names) to scalar tensors representing
......
......@@ -20,6 +20,7 @@ import tensorflow as tf
from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import standard_fields as fields
from object_detection.utils import shape_utils
def multiclass_non_max_suppression(boxes,
......@@ -31,6 +32,7 @@ def multiclass_non_max_suppression(boxes,
clip_window=None,
change_coordinate_frame=False,
masks=None,
boundaries=None,
additional_fields=None,
scope=None):
"""Multi-class version of non maximum suppression.
......@@ -66,6 +68,9 @@ def multiclass_non_max_suppression(boxes,
masks: (optional) a [k, q, mask_height, mask_width] float32 tensor
containing box masks. `q` can be either number of classes or 1 depending
on whether a separate mask is predicted per class.
boundaries: (optional) a [k, q, boundary_height, boundary_width] float32
tensor containing box boundaries. `q` can be either number of classes or 1
depending on whether a separate boundary is predicted per class.
additional_fields: (optional) If not None, a dictionary that maps keys to
tensors whose first dimensions are all of size `k`. After non-maximum
suppression, all tensors corresponding to the selected boxes will be
......@@ -114,6 +119,8 @@ def multiclass_non_max_suppression(boxes,
per_class_boxes_list = tf.unstack(boxes, axis=1)
if masks is not None:
per_class_masks_list = tf.unstack(masks, axis=1)
if boundaries is not None:
per_class_boundaries_list = tf.unstack(boundaries, axis=1)
boxes_ids = (range(num_classes) if len(per_class_boxes_list) > 1
else [0] * num_classes)
for class_idx, boxes_idx in zip(range(num_classes), boxes_ids):
......@@ -128,6 +135,10 @@ def multiclass_non_max_suppression(boxes,
per_class_masks = per_class_masks_list[boxes_idx]
boxlist_and_class_scores.add_field(fields.BoxListFields.masks,
per_class_masks)
if boundaries is not None:
per_class_boundaries = per_class_boundaries_list[boxes_idx]
boxlist_and_class_scores.add_field(fields.BoxListFields.boundaries,
per_class_boundaries)
if additional_fields is not None:
for key, tensor in additional_fields.items():
boxlist_and_class_scores.add_field(key, tensor)
......@@ -194,9 +205,12 @@ def batch_multiclass_non_max_suppression(boxes,
max_size_per_class: maximum number of retained boxes per class.
max_total_size: maximum number of boxes retained over all classes. By
default returns all boxes retained after capping boxes per class.
clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max]
representing the window to clip boxes to before performing non-max
suppression.
clip_window: A float32 tensor of shape [batch_size, 4] where each entry is
of the form [y_min, x_min, y_max, x_max] representing the window to clip
boxes to before performing non-max suppression. This argument can also be
a tensor of shape [4] in which case, the same clip window is applied to
all images in the batch. If clip_widow is None, all boxes are used to
perform non-max suppression.
change_coordinate_frame: Whether to normalize coordinates after clipping
relative to clip_window (this can only be set to True if a clip_window
is provided)
......@@ -242,7 +256,9 @@ def batch_multiclass_non_max_suppression(boxes,
if q != 1 and q != num_classes:
raise ValueError('third dimension of boxes must be either 1 or equal '
'to the third dimension of scores')
if change_coordinate_frame and clip_window is None:
raise ValueError('if change_coordinate_frame is True, then a clip_window'
'must be specified.')
original_masks = masks
original_additional_fields = additional_fields
with tf.name_scope(scope, 'BatchMultiClassNonMaxSuppression'):
......@@ -266,6 +282,16 @@ def batch_multiclass_non_max_suppression(boxes,
masks_shape = tf.stack([batch_size, num_anchors, 1, 0, 0])
masks = tf.zeros(masks_shape)
if clip_window is None:
clip_window = tf.stack([
tf.reduce_min(boxes[:, :, :, 0]),
tf.reduce_min(boxes[:, :, :, 1]),
tf.reduce_max(boxes[:, :, :, 2]),
tf.reduce_max(boxes[:, :, :, 3])
])
if clip_window.shape.ndims == 1:
clip_window = tf.tile(tf.expand_dims(clip_window, 0), [batch_size, 1])
if additional_fields is None:
additional_fields = {}
......@@ -283,6 +309,9 @@ def batch_multiclass_non_max_suppression(boxes,
per_image_masks - A [num_anchors, q, mask_height, mask_width] float32
tensor containing box masks. `q` can be either number of classes
or 1 depending on whether a separate mask is predicted per class.
per_image_clip_window - A 1D float32 tensor of the form
[ymin, xmin, ymax, xmax] representing the window to clip the boxes
to.
per_image_additional_fields - (optional) A variable number of float32
tensors each with size [num_anchors, ...].
per_image_num_valid_boxes - A tensor of type `int32`. A 1-D tensor of
......@@ -311,9 +340,10 @@ def batch_multiclass_non_max_suppression(boxes,
per_image_boxes = args[0]
per_image_scores = args[1]
per_image_masks = args[2]
per_image_clip_window = args[3]
per_image_additional_fields = {
key: value
for key, value in zip(additional_fields, args[3:-1])
for key, value in zip(additional_fields, args[4:-1])
}
per_image_num_valid_boxes = args[-1]
per_image_boxes = tf.reshape(
......@@ -345,7 +375,7 @@ def batch_multiclass_non_max_suppression(boxes,
iou_thresh,
max_size_per_class,
max_total_size,
clip_window=clip_window,
clip_window=per_image_clip_window,
change_coordinate_frame=change_coordinate_frame,
masks=per_image_masks,
additional_fields=per_image_additional_fields)
......@@ -367,10 +397,10 @@ def batch_multiclass_non_max_suppression(boxes,
num_additional_fields = len(additional_fields)
num_nmsed_outputs = 4 + num_additional_fields
batch_outputs = tf.map_fn(
batch_outputs = shape_utils.static_or_dynamic_map_fn(
_single_image_nms_fn,
elems=([boxes, scores, masks] + list(additional_fields.values()) +
[num_valid_boxes]),
elems=([boxes, scores, masks, clip_window] +
list(additional_fields.values()) + [num_valid_boxes]),
dtype=(num_nmsed_outputs * [tf.float32] + [tf.int32]),
parallel_iterations=parallel_iterations)
......
......@@ -571,6 +571,125 @@ class MulticlassNonMaxSuppressionTest(tf.test.TestCase):
self.assertAllClose(nmsed_classes, exp_nms_classes)
self.assertAllClose(num_detections, [2, 3])
def test_batch_multiclass_nms_with_per_batch_clip_window(self):
boxes = tf.constant([[[[0, 0, 1, 1], [0, 0, 4, 5]],
[[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]],
[[0, -0.1, 1, 0.9], [0, -0.1, 1, 0.9]],
[[0, 10, 1, 11], [0, 10, 1, 11]]],
[[[0, 10.1, 1, 11.1], [0, 10.1, 1, 11.1]],
[[0, 100, 1, 101], [0, 100, 1, 101]],
[[0, 1000, 1, 1002], [0, 999, 2, 1004]],
[[0, 1000, 1, 1002.1], [0, 999, 2, 1002.7]]]],
tf.float32)
scores = tf.constant([[[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0]],
[[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]]])
clip_window = tf.constant([0., 0., 200., 200.])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = np.array([[[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 10.1, 1, 11.1],
[0, 100, 1, 101],
[0, 0, 0, 0],
[0, 0, 0, 0]]])
exp_nms_scores = np.array([[.95, .9, 0, 0],
[.5, .3, 0, 0]])
exp_nms_classes = np.array([[0, 0, 0, 0],
[0, 0, 0, 0]])
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
nmsed_additional_fields, num_detections
) = post_processing.batch_multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh,
max_size_per_class=max_output_size, max_total_size=max_output_size,
clip_window=clip_window)
self.assertIsNone(nmsed_masks)
self.assertIsNone(nmsed_additional_fields)
# Check static shapes
self.assertAllEqual(nmsed_boxes.shape.as_list(),
exp_nms_corners.shape)
self.assertAllEqual(nmsed_scores.shape.as_list(),
exp_nms_scores.shape)
self.assertAllEqual(nmsed_classes.shape.as_list(),
exp_nms_classes.shape)
self.assertEqual(num_detections.shape.as_list(), [2])
with self.test_session() as sess:
(nmsed_boxes, nmsed_scores, nmsed_classes,
num_detections) = sess.run([nmsed_boxes, nmsed_scores, nmsed_classes,
num_detections])
self.assertAllClose(nmsed_boxes, exp_nms_corners)
self.assertAllClose(nmsed_scores, exp_nms_scores)
self.assertAllClose(nmsed_classes, exp_nms_classes)
self.assertAllClose(num_detections, [2, 2])
def test_batch_multiclass_nms_with_per_image_clip_window(self):
boxes = tf.constant([[[[0, 0, 1, 1], [0, 0, 4, 5]],
[[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]],
[[0, -0.1, 1, 0.9], [0, -0.1, 1, 0.9]],
[[0, 10, 1, 11], [0, 10, 1, 11]]],
[[[0, 10.1, 1, 11.1], [0, 10.1, 1, 11.1]],
[[0, 100, 1, 101], [0, 100, 1, 101]],
[[0, 1000, 1, 1002], [0, 999, 2, 1004]],
[[0, 1000, 1, 1002.1], [0, 999, 2, 1002.7]]]],
tf.float32)
scores = tf.constant([[[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0]],
[[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]]])
clip_window = tf.constant([[0., 0., 5., 5.],
[0., 0., 200., 200.]])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = np.array([[[0, 0, 1, 1],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 10.1, 1, 11.1],
[0, 100, 1, 101],
[0, 0, 0, 0],
[0, 0, 0, 0]]])
exp_nms_scores = np.array([[.9, 0., 0., 0.],
[.5, .3, 0, 0]])
exp_nms_classes = np.array([[0, 0, 0, 0],
[0, 0, 0, 0]])
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
nmsed_additional_fields, num_detections
) = post_processing.batch_multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh,
max_size_per_class=max_output_size, max_total_size=max_output_size,
clip_window=clip_window)
self.assertIsNone(nmsed_masks)
self.assertIsNone(nmsed_additional_fields)
# Check static shapes
self.assertAllEqual(nmsed_boxes.shape.as_list(),
exp_nms_corners.shape)
self.assertAllEqual(nmsed_scores.shape.as_list(),
exp_nms_scores.shape)
self.assertAllEqual(nmsed_classes.shape.as_list(),
exp_nms_classes.shape)
self.assertEqual(num_detections.shape.as_list(), [2])
with self.test_session() as sess:
(nmsed_boxes, nmsed_scores, nmsed_classes,
num_detections) = sess.run([nmsed_boxes, nmsed_scores, nmsed_classes,
num_detections])
self.assertAllClose(nmsed_boxes, exp_nms_corners)
self.assertAllClose(nmsed_scores, exp_nms_scores)
self.assertAllClose(nmsed_classes, exp_nms_classes)
self.assertAllClose(num_detections, [1, 2])
def test_batch_multiclass_nms_with_masks(self):
boxes = tf.constant([[[[0, 0, 1, 1], [0, 0, 4, 5]],
[[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]],
......
......@@ -51,6 +51,7 @@ from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import keypoint_ops
from object_detection.core import standard_fields as fields
from object_detection.utils import shape_utils
def _apply_with_random_selector(x, func, num_cases):
......@@ -1647,6 +1648,7 @@ def _compute_new_static_size(image, min_dimension, max_dimension):
image_shape = image.get_shape().as_list()
orig_height = image_shape[0]
orig_width = image_shape[1]
num_channels = image_shape[2]
orig_min_dim = min(orig_height, orig_width)
# Calculates the larger of the possible sizes
large_scale_factor = min_dimension / float(orig_min_dim)
......@@ -1674,7 +1676,7 @@ def _compute_new_static_size(image, min_dimension, max_dimension):
new_size = small_size
else:
new_size = large_size
return tf.constant(new_size)
return tf.constant(new_size + [num_channels])
def _compute_new_dynamic_size(image, min_dimension, max_dimension):
......@@ -1682,6 +1684,7 @@ def _compute_new_dynamic_size(image, min_dimension, max_dimension):
image_shape = tf.shape(image)
orig_height = tf.to_float(image_shape[0])
orig_width = tf.to_float(image_shape[1])
num_channels = image_shape[2]
orig_min_dim = tf.minimum(orig_height, orig_width)
# Calculates the larger of the possible sizes
min_dimension = tf.constant(min_dimension, dtype=tf.float32)
......@@ -1711,7 +1714,7 @@ def _compute_new_dynamic_size(image, min_dimension, max_dimension):
lambda: small_size, lambda: large_size)
else:
new_size = large_size
return new_size
return tf.stack(tf.unstack(new_size) + [num_channels])
def resize_to_range(image,
......@@ -1719,7 +1722,8 @@ def resize_to_range(image,
min_dimension=None,
max_dimension=None,
method=tf.image.ResizeMethod.BILINEAR,
align_corners=False):
align_corners=False,
pad_to_max_dimension=False):
"""Resizes an image so its dimensions are within the provided value.
The output size can be described by two cases:
......@@ -1740,15 +1744,22 @@ def resize_to_range(image,
BILINEAR.
align_corners: bool. If true, exactly align all 4 corners of the input
and output. Defaults to False.
pad_to_max_dimension: Whether to resize the image and pad it with zeros
so the resulting image is of the spatial size
[max_dimension, max_dimension]. If masks are included they are padded
similarly.
Returns:
A 3D tensor of shape [new_height, new_width, channels],
where the image has been resized (with bilinear interpolation) so that
min(new_height, new_width) == min_dimension or
max(new_height, new_width) == max_dimension.
If masks is not None, also outputs masks:
A 3D tensor of shape [num_instances, new_height, new_width]
Note that the position of the resized_image_shape changes based on whether
masks are present.
resized_image: A 3D tensor of shape [new_height, new_width, channels],
where the image has been resized (with bilinear interpolation) so that
min(new_height, new_width) == min_dimension or
max(new_height, new_width) == max_dimension.
resized_masks: If masks is not None, also outputs masks. A 3D tensor of
shape [num_instances, new_height, new_width].
resized_image_shape: A 1D tensor of shape [3] containing shape of the
resized image.
Raises:
ValueError: if the image is not a 3D tensor.
......@@ -1762,16 +1773,27 @@ def resize_to_range(image,
else:
new_size = _compute_new_dynamic_size(image, min_dimension, max_dimension)
new_image = tf.image.resize_images(
image, new_size, method=method, align_corners=align_corners)
image, new_size[:-1], method=method, align_corners=align_corners)
if pad_to_max_dimension:
new_image = tf.image.pad_to_bounding_box(
new_image, 0, 0, max_dimension, max_dimension)
result = new_image
result = [new_image]
if masks is not None:
new_masks = tf.expand_dims(masks, 3)
new_masks = tf.image.resize_nearest_neighbor(
new_masks, new_size, align_corners=align_corners)
new_masks = tf.image.resize_images(
new_masks,
new_size[:-1],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
align_corners=align_corners)
new_masks = tf.squeeze(new_masks, 3)
result = [new_image, new_masks]
if pad_to_max_dimension:
new_masks = tf.image.pad_to_bounding_box(
new_masks, 0, 0, max_dimension, max_dimension)
result.append(new_masks)
result.append(new_size)
return result
......@@ -1789,10 +1811,13 @@ def resize_to_min_dimension(image, masks=None, min_dimension=600):
min_dimension: minimum image dimension.
Returns:
a tuple containing the following:
Resized image. A tensor of size [new_height, new_width, channels].
(optional) Resized masks. A tensor of
size [num_instances, new_height, new_width].
Note that the position of the resized_image_shape changes based on whether
masks are present.
resized_image: A tensor of size [new_height, new_width, channels].
resized_masks: If masks is not None, also outputs masks. A 3D tensor of
shape [num_instances, new_height, new_width]
resized_image_shape: A 1D tensor of shape [3] containing the shape of the
resized image.
Raises:
ValueError: if the image is not a 3D tensor.
......@@ -1803,6 +1828,7 @@ def resize_to_min_dimension(image, masks=None, min_dimension=600):
with tf.name_scope('ResizeGivenMinDimension', values=[image, min_dimension]):
image_height = tf.shape(image)[0]
image_width = tf.shape(image)[1]
num_channels = tf.shape(image)[2]
min_image_dimension = tf.minimum(image_height, image_width)
min_target_dimension = tf.maximum(min_image_dimension, min_dimension)
target_ratio = tf.to_float(min_target_dimension) / tf.to_float(
......@@ -1813,13 +1839,16 @@ def resize_to_min_dimension(image, masks=None, min_dimension=600):
tf.expand_dims(image, axis=0),
size=[target_height, target_width],
align_corners=True)
result = tf.squeeze(image, axis=0)
result = [tf.squeeze(image, axis=0)]
if masks is not None:
masks = tf.image.resize_nearest_neighbor(
tf.expand_dims(masks, axis=3),
size=[target_height, target_width],
align_corners=True)
result = (result, tf.squeeze(masks, axis=3))
result.append(tf.squeeze(masks, axis=3))
result.append(tf.stack([target_height, target_width, num_channels]))
return result
......@@ -1854,6 +1883,8 @@ def scale_boxes_to_pixel_coordinates(image, boxes, keypoints=None):
return tuple(result)
# TODO: Investigate if instead the function should return None if
# masks is None.
# pylint: disable=g-doc-return-or-yield
def resize_image(image,
masks=None,
......@@ -1861,7 +1892,28 @@ def resize_image(image,
new_width=1024,
method=tf.image.ResizeMethod.BILINEAR,
align_corners=False):
"""See `tf.image.resize_images` for detailed doc."""
"""Resizes images to the given height and width.
Args:
image: A 3D tensor of shape [height, width, channels]
masks: (optional) rank 3 float32 tensor with shape
[num_instances, height, width] containing instance masks.
new_height: (optional) (scalar) desired height of the image.
new_width: (optional) (scalar) desired width of the image.
method: (optional) interpolation method used in resizing. Defaults to
BILINEAR.
align_corners: bool. If true, exactly align all 4 corners of the input
and output. Defaults to False.
Returns:
Note that the position of the resized_image_shape changes based on whether
masks are present.
resized_image: A tensor of size [new_height, new_width, channels].
resized_masks: If masks is not None, also outputs masks. A 3D tensor of
shape [num_instances, new_height, new_width]
resized_image_shape: A 1D tensor of shape [3] containing the shape of the
resized image.
"""
with tf.name_scope(
'ResizeImage',
values=[image, new_height, new_width, method, align_corners]):
......@@ -1869,7 +1921,8 @@ def resize_image(image,
image, [new_height, new_width],
method=method,
align_corners=align_corners)
result = new_image
image_shape = shape_utils.combined_static_and_dynamic_shape(image)
result = [new_image]
if masks is not None:
num_instances = tf.shape(masks)[0]
new_size = tf.constant([new_height, new_width], dtype=tf.int32)
......@@ -1886,8 +1939,9 @@ def resize_image(image,
masks = tf.cond(num_instances > 0, resize_masks_branch,
reshape_masks_branch)
result = [new_image, masks]
result.append(masks)
result.append(tf.stack([new_height, new_width, image_shape[2]]))
return result
......
......@@ -1853,7 +1853,7 @@ class PreprocessorTest(tf.test.TestCase):
expected_masks_shape_list):
in_image = tf.random_uniform(in_image_shape)
in_masks = tf.random_uniform(in_masks_shape)
out_image, out_masks = preprocessor.resize_image(
out_image, out_masks, _ = preprocessor.resize_image(
in_image, in_masks, new_height=height, new_width=width)
out_image_shape = tf.shape(out_image)
out_masks_shape = tf.shape(out_masks)
......@@ -1880,7 +1880,7 @@ class PreprocessorTest(tf.test.TestCase):
expected_masks_shape_list):
in_image = tf.random_uniform(in_image_shape)
in_masks = tf.random_uniform(in_masks_shape)
out_image, out_masks = preprocessor.resize_image(
out_image, out_masks, _ = preprocessor.resize_image(
in_image, in_masks, new_height=height, new_width=width)
out_image_shape = tf.shape(out_image)
out_masks_shape = tf.shape(out_masks)
......@@ -1900,7 +1900,7 @@ class PreprocessorTest(tf.test.TestCase):
for in_shape, expected_shape in zip(in_shape_list, expected_shape_list):
in_image = tf.random_uniform(in_shape)
out_image = preprocessor.resize_to_range(
out_image, _ = preprocessor.resize_to_range(
in_image, min_dimension=min_dim, max_dimension=max_dim)
self.assertAllEqual(out_image.get_shape().as_list(), expected_shape)
......@@ -1913,7 +1913,7 @@ class PreprocessorTest(tf.test.TestCase):
for in_shape, expected_shape in zip(in_shape_list, expected_shape_list):
in_image = tf.placeholder(tf.float32, shape=(None, None, 3))
out_image = preprocessor.resize_to_range(
out_image, _ = preprocessor.resize_to_range(
in_image, min_dimension=min_dim, max_dimension=max_dim)
out_image_shape = tf.shape(out_image)
with self.test_session() as sess:
......@@ -1938,7 +1938,7 @@ class PreprocessorTest(tf.test.TestCase):
expected_masks_shape_list):
in_image = tf.random_uniform(in_image_shape)
in_masks = tf.random_uniform(in_masks_shape)
out_image, out_masks = preprocessor.resize_to_range(
out_image, out_masks, _ = preprocessor.resize_to_range(
in_image, in_masks, min_dimension=min_dim, max_dimension=max_dim)
self.assertAllEqual(out_masks.get_shape().as_list(), expected_mask_shape)
self.assertAllEqual(out_image.get_shape().as_list(), expected_image_shape)
......@@ -1960,7 +1960,7 @@ class PreprocessorTest(tf.test.TestCase):
in_image = tf.placeholder(tf.float32, shape=(None, None, 3))
in_masks = tf.placeholder(tf.float32, shape=(None, None, None))
in_masks = tf.random_uniform(in_masks_shape)
out_image, out_masks = preprocessor.resize_to_range(
out_image, out_masks, _ = preprocessor.resize_to_range(
in_image, in_masks, min_dimension=min_dim, max_dimension=max_dim)
out_image_shape = tf.shape(out_image)
out_masks_shape = tf.shape(out_masks)
......@@ -1991,7 +1991,7 @@ class PreprocessorTest(tf.test.TestCase):
expected_masks_shape_list):
in_image = tf.random_uniform(in_image_shape)
in_masks = tf.random_uniform(in_masks_shape)
out_image, out_masks = preprocessor.resize_to_range(
out_image, out_masks, _ = preprocessor.resize_to_range(
in_image, in_masks, min_dimension=min_dim, max_dimension=max_dim)
out_image_shape = tf.shape(out_image)
out_masks_shape = tf.shape(out_masks)
......@@ -2016,7 +2016,7 @@ class PreprocessorTest(tf.test.TestCase):
for in_shape, expected_shape in zip(in_shape_list, expected_shape_list):
in_image = tf.random_uniform(in_shape)
out_image = preprocessor.resize_to_range(
out_image, _ = preprocessor.resize_to_range(
in_image, min_dimension=min_dim, max_dimension=max_dim)
out_image_shape = tf.shape(out_image)
......@@ -2039,7 +2039,7 @@ class PreprocessorTest(tf.test.TestCase):
in_image = tf.placeholder(tf.float32, shape=(None, None, 3))
in_masks = tf.placeholder(tf.float32, shape=(None, None, None))
in_masks = tf.random_uniform(in_masks_shape)
out_image, out_masks = preprocessor.resize_to_min_dimension(
out_image, out_masks, _ = preprocessor.resize_to_min_dimension(
in_image, in_masks, min_dimension=min_dim)
out_image_shape = tf.shape(out_image)
out_masks_shape = tf.shape(out_masks)
......@@ -2069,7 +2069,7 @@ class PreprocessorTest(tf.test.TestCase):
expected_masks_shape_list):
in_image = tf.random_uniform(in_image_shape)
in_masks = tf.random_uniform(in_masks_shape)
out_image, out_masks = preprocessor.resize_to_min_dimension(
out_image, out_masks, _ = preprocessor.resize_to_min_dimension(
in_image, in_masks, min_dimension=min_dim)
out_image_shape = tf.shape(out_image)
out_masks_shape = tf.shape(out_masks)
......
......@@ -57,6 +57,7 @@ class InputDataFields(object):
groundtruth_keypoints: ground truth keypoints.
groundtruth_keypoint_visibilities: ground truth keypoint visibilities.
groundtruth_label_scores: groundtruth label scores.
groundtruth_weights: groundtruth weight factor for bounding boxes.
"""
image = 'image'
original_image = 'original_image'
......@@ -79,10 +80,11 @@ class InputDataFields(object):
groundtruth_keypoints = 'groundtruth_keypoints'
groundtruth_keypoint_visibilities = 'groundtruth_keypoint_visibilities'
groundtruth_label_scores = 'groundtruth_label_scores'
groundtruth_weights = 'groundtruth_weights'
class DetectionResultFields(object):
"""Naming converntions for storing the output of the detector.
"""Naming conventions for storing the output of the detector.
Attributes:
source_id: source of the original image.
......@@ -162,6 +164,7 @@ class TfExampleFields(object):
object_is_crowd: [DEPRECATED, use object_group_of instead]
is the object a single object or a crowd
object_segment_area: the area of the segment.
object_weight: a weight factor for the object's bounding box.
instance_masks: instance segmentation masks.
instance_boundaries: instance boundaries.
instance_classes: Classes for each instance segmentation mask.
......@@ -194,6 +197,7 @@ class TfExampleFields(object):
object_depiction = 'image/object/depiction'
object_is_crowd = 'image/object/is_crowd'
object_segment_area = 'image/object/segment/area'
object_weight = 'image/object/weight'
instance_masks = 'image/segmentation/object'
instance_boundaries = 'image/boundaries/object'
instance_classes = 'image/segmentation/object/class'
......
......@@ -37,19 +37,19 @@ from object_detection.box_coders import faster_rcnn_box_coder
from object_detection.box_coders import mean_stddev_box_coder
from object_detection.core import box_coder as bcoder
from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import matcher as mat
from object_detection.core import region_similarity_calculator as sim_calc
from object_detection.core import standard_fields as fields
from object_detection.matchers import argmax_matcher
from object_detection.matchers import bipartite_matcher
from object_detection.utils import shape_utils
class TargetAssigner(object):
"""Target assigner to compute classification and regression targets."""
def __init__(self, similarity_calc, matcher, box_coder,
positive_class_weight=1.0, negative_class_weight=1.0,
unmatched_cls_target=None):
negative_class_weight=1.0, unmatched_cls_target=None):
"""Construct Object Detection Target Assigner.
Args:
......@@ -58,10 +58,8 @@ class TargetAssigner(object):
anchors.
box_coder: an object_detection.core.BoxCoder used to encode matching
groundtruth boxes with respect to anchors.
positive_class_weight: classification weight to be associated to positive
anchors (default: 1.0)
negative_class_weight: classification weight to be associated to negative
anchors (default: 1.0)
anchors (default: 1.0). The weight must be in [0., 1.].
unmatched_cls_target: a float32 tensor with shape [d_1, d_2, ..., d_k]
which is consistent with the classification target for each
anchor (and can be empty for scalar targets). This shape must thus be
......@@ -82,7 +80,6 @@ class TargetAssigner(object):
self._similarity_calc = similarity_calc
self._matcher = matcher
self._box_coder = box_coder
self._positive_class_weight = positive_class_weight
self._negative_class_weight = negative_class_weight
if unmatched_cls_target is None:
self._unmatched_cls_target = tf.constant([0], tf.float32)
......@@ -94,7 +91,7 @@ class TargetAssigner(object):
return self._box_coder
def assign(self, anchors, groundtruth_boxes, groundtruth_labels=None,
**params):
groundtruth_weights=None, **params):
"""Assign classification and regression targets to each anchor.
For a given set of anchors and groundtruth detections, match anchors
......@@ -113,6 +110,9 @@ class TargetAssigner(object):
[d_1, ... d_k] can be empty (corresponding to scalar inputs). When set
to None, groundtruth_labels assumes a binary problem where all
ground_truth boxes get a positive label (of 1).
groundtruth_weights: a float tensor of shape [M] indicating the weight to
assign to all anchors match to a particular groundtruth box. The weights
must be in [0., 1.]. If None, all weights are set to 1.
**params: Additional keyword arguments for specific implementations of
the Matcher.
......@@ -140,14 +140,21 @@ class TargetAssigner(object):
groundtruth_labels = tf.ones(tf.expand_dims(groundtruth_boxes.num_boxes(),
0))
groundtruth_labels = tf.expand_dims(groundtruth_labels, -1)
unmatched_shape_assert = tf.assert_equal(
tf.shape(groundtruth_labels)[1:], tf.shape(self._unmatched_cls_target),
message='Unmatched class target shape incompatible '
'with groundtruth labels shape!')
labels_and_box_shapes_assert = tf.assert_equal(
tf.shape(groundtruth_labels)[0], groundtruth_boxes.num_boxes(),
message='Groundtruth boxes and labels have incompatible shapes!')
unmatched_shape_assert = shape_utils.assert_shape_equal(
shape_utils.combined_static_and_dynamic_shape(groundtruth_labels)[1:],
shape_utils.combined_static_and_dynamic_shape(
self._unmatched_cls_target))
labels_and_box_shapes_assert = shape_utils.assert_shape_equal(
shape_utils.combined_static_and_dynamic_shape(
groundtruth_labels)[:1],
shape_utils.combined_static_and_dynamic_shape(
groundtruth_boxes.get())[:1])
if groundtruth_weights is None:
num_gt_boxes = groundtruth_boxes.num_boxes_static()
if not num_gt_boxes:
num_gt_boxes = groundtruth_boxes.num_boxes()
groundtruth_weights = tf.ones([num_gt_boxes], dtype=tf.float32)
with tf.control_dependencies(
[unmatched_shape_assert, labels_and_box_shapes_assert]):
match_quality_matrix = self._similarity_calc.compare(groundtruth_boxes,
......@@ -158,16 +165,16 @@ class TargetAssigner(object):
match)
cls_targets = self._create_classification_targets(groundtruth_labels,
match)
reg_weights = self._create_regression_weights(match)
cls_weights = self._create_classification_weights(
match, self._positive_class_weight, self._negative_class_weight)
reg_weights = self._create_regression_weights(match, groundtruth_weights)
cls_weights = self._create_classification_weights(match,
groundtruth_weights)
num_anchors = anchors.num_boxes_static()
if num_anchors is not None:
reg_targets = self._reset_target_shape(reg_targets, num_anchors)
cls_targets = self._reset_target_shape(cls_targets, num_anchors)
reg_weights = self._reset_target_shape(reg_weights, num_anchors)
cls_weights = self._reset_target_shape(cls_weights, num_anchors)
num_anchors = anchors.num_boxes_static()
if num_anchors is not None:
reg_targets = self._reset_target_shape(reg_targets, num_anchors)
cls_targets = self._reset_target_shape(cls_targets, num_anchors)
reg_weights = self._reset_target_shape(reg_weights, num_anchors)
cls_weights = self._reset_target_shape(cls_weights, num_anchors)
return cls_targets, cls_weights, reg_targets, reg_weights, match
......@@ -198,23 +205,31 @@ class TargetAssigner(object):
Returns:
reg_targets: a float32 tensor with shape [N, box_code_dimension]
"""
matched_anchor_indices = match.matched_column_indices()
unmatched_ignored_anchor_indices = (match.
unmatched_or_ignored_column_indices())
matched_gt_indices = match.matched_row_indices()
matched_anchors = box_list_ops.gather(anchors,
matched_anchor_indices)
matched_gt_boxes = box_list_ops.gather(groundtruth_boxes,
matched_gt_indices)
matched_reg_targets = self._box_coder.encode(matched_gt_boxes,
matched_anchors)
matched_gt_boxes = match.gather_based_on_match(
groundtruth_boxes.get(),
unmatched_value=tf.zeros(4),
ignored_value=tf.zeros(4))
matched_gt_boxlist = box_list.BoxList(matched_gt_boxes)
if groundtruth_boxes.has_field(fields.BoxListFields.keypoints):
groundtruth_keypoints = groundtruth_boxes.get_field(
fields.BoxListFields.keypoints)
matched_keypoints = match.gather_based_on_match(
groundtruth_keypoints,
unmatched_value=tf.zeros(groundtruth_keypoints.get_shape()[1:]),
ignored_value=tf.zeros(groundtruth_keypoints.get_shape()[1:]))
matched_gt_boxlist.add_field(fields.BoxListFields.keypoints,
matched_keypoints)
matched_reg_targets = self._box_coder.encode(matched_gt_boxlist, anchors)
match_results_shape = shape_utils.combined_static_and_dynamic_shape(
match.match_results)
# Zero out the unmatched and ignored regression targets.
unmatched_ignored_reg_targets = tf.tile(
self._default_regression_target(),
tf.stack([tf.size(unmatched_ignored_anchor_indices), 1]))
reg_targets = tf.dynamic_stitch(
[matched_anchor_indices, unmatched_ignored_anchor_indices],
[matched_reg_targets, unmatched_ignored_reg_targets])
# TODO: summarize the number of matches on average.
self._default_regression_target(), [match_results_shape[0], 1])
matched_anchors_mask = match.matched_column_indicator()
reg_targets = tf.where(matched_anchors_mask,
matched_reg_targets,
unmatched_ignored_reg_targets)
return reg_targets
def _default_regression_target(self):
......@@ -245,27 +260,16 @@ class TargetAssigner(object):
and groundtruth boxes.
Returns:
cls_targets: a float32 tensor with shape [num_anchors, d_1, d_2 ... d_k],
where the subshape [d_1, ..., d_k] is compatible with groundtruth_labels
which has shape [num_gt_boxes, d_1, d_2, ... d_k].
a float32 tensor with shape [num_anchors, d_1, d_2 ... d_k], where the
subshape [d_1, ..., d_k] is compatible with groundtruth_labels which has
shape [num_gt_boxes, d_1, d_2, ... d_k].
"""
matched_anchor_indices = match.matched_column_indices()
unmatched_ignored_anchor_indices = (match.
unmatched_or_ignored_column_indices())
matched_gt_indices = match.matched_row_indices()
matched_cls_targets = tf.gather(groundtruth_labels, matched_gt_indices)
ones = self._unmatched_cls_target.shape.ndims * [1]
unmatched_ignored_cls_targets = tf.tile(
tf.expand_dims(self._unmatched_cls_target, 0),
tf.stack([tf.size(unmatched_ignored_anchor_indices)] + ones))
cls_targets = tf.dynamic_stitch(
[matched_anchor_indices, unmatched_ignored_anchor_indices],
[matched_cls_targets, unmatched_ignored_cls_targets])
return cls_targets
def _create_regression_weights(self, match):
return match.gather_based_on_match(
groundtruth_labels,
unmatched_value=self._unmatched_cls_target,
ignored_value=self._unmatched_cls_target)
def _create_regression_weights(self, match, groundtruth_weights):
"""Set regression weight for each anchor.
Only positive anchors are set to contribute to the regression loss, so this
......@@ -275,18 +279,18 @@ class TargetAssigner(object):
Args:
match: a matcher.Match object that provides a matching between anchors
and groundtruth boxes.
groundtruth_weights: a float tensor of shape [M] indicating the weight to
assign to all anchors match to a particular groundtruth box.
Returns:
reg_weights: a float32 tensor with shape [num_anchors] representing
regression weights
a float32 tensor with shape [num_anchors] representing regression weights.
"""
reg_weights = tf.cast(match.matched_column_indicator(), tf.float32)
return reg_weights
return match.gather_based_on_match(
groundtruth_weights, ignored_value=0., unmatched_value=0.)
def _create_classification_weights(self,
match,
positive_class_weight=1.0,
negative_class_weight=1.0):
groundtruth_weights):
"""Create classification weights for each anchor.
Positive (matched) anchors are associated with a weight of
......@@ -299,25 +303,23 @@ class TargetAssigner(object):
Args:
match: a matcher.Match object that provides a matching between anchors
and groundtruth boxes.
positive_class_weight: weight to be associated to positive anchors
negative_class_weight: weight to be associated to negative anchors
groundtruth_weights: a float tensor of shape [M] indicating the weight to
assign to all anchors match to a particular groundtruth box.
Returns:
cls_weights: a float32 tensor with shape [num_anchors] representing
classification weights.
a float32 tensor with shape [num_anchors] representing classification
weights.
"""
matched_indicator = tf.cast(match.matched_column_indicator(), tf.float32)
ignore_indicator = tf.cast(match.ignored_column_indicator(), tf.float32)
unmatched_indicator = 1.0 - matched_indicator - ignore_indicator
cls_weights = (positive_class_weight * matched_indicator
+ negative_class_weight * unmatched_indicator)
return cls_weights
return match.gather_based_on_match(
groundtruth_weights,
ignored_value=0.,
unmatched_value=self._negative_class_weight)
def get_box_coder(self):
"""Get BoxCoder of this TargetAssigner.
Returns:
BoxCoder: BoxCoder object.
BoxCoder object.
"""
return self._box_coder
......@@ -325,7 +327,6 @@ class TargetAssigner(object):
# TODO: This method pulls in all the implementation dependencies into
# core. Therefore its best to have this factory method outside of core.
def create_target_assigner(reference, stage=None,
positive_class_weight=1.0,
negative_class_weight=1.0,
unmatched_cls_target=None):
"""Factory function for creating standard target assigners.
......@@ -333,8 +334,6 @@ def create_target_assigner(reference, stage=None,
Args:
reference: string referencing the type of TargetAssigner.
stage: string denoting stage: {proposal, detection}.
positive_class_weight: classification weight to be associated to positive
anchors (default: 1.0)
negative_class_weight: classification weight to be associated to negative
anchors (default: 1.0)
unmatched_cls_target: a float32 tensor with shape [d_1, d_2, ..., d_k]
......@@ -383,7 +382,6 @@ def create_target_assigner(reference, stage=None,
raise ValueError('No valid combination of reference and stage.')
return TargetAssigner(similarity_calc, matcher, box_coder,
positive_class_weight=positive_class_weight,
negative_class_weight=negative_class_weight,
unmatched_cls_target=unmatched_cls_target)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment