Skip to content

Incorrect anchor computations #1765

Closed
Closed
@hgaiser

Description

@hgaiser

While working on #1697 I was checking the anchor generation and noticed an error in the computation. The centers of the anchors should be at half the size of the stride of the anchors, but the centers are currently at the top left corner.

An example makes this more clear. I have generated anchors for an imaginary image of size (512, 512). The top left anchor with ratio 1 and scale 1 is as follows:

       [-16., -16.,  16.,  16.],

Note that this anchor extends outside of the image by 16 pixels and that its center is at (0, 0).
The last anchors at the bottom right with ratio 1 and scale 1 is as follows:

       [488., 488., 520., 520.],

Note that this anchor extends on the bottom right by 8 pixels, because the image shape is (512, 512). This means that the anchors are not correctly centered on the image. In general, if we shift all anchors by half the stride (the stride is 8 pixels in this case), then we get the following anchors:

       [-12., -12.,  20.,  20.],
       [492., 492., 524., 524.],

In this case both anchors extend 12 pixels beyond the borders of the image.

Note that this problem becomes more severe when the stride gets larger (such as for P7 in retinanet):

Currently:

       [ 128.,  128.,  640.,  640.],

Correct (offset of 64 pixels):

       [ 192., 192.,  704.,  704. ],

I could make a PR to fix this computation, but I wanted to check first if that is desired. The reason is that it would invalidate all existing networks by offsetting the detections by roughly half the stride of the pyramid they are created for. Ideally it would mean retraining those networks, but I suppose that is perhaps too big of an effort.

EDIT: Actually, it's not exactly half the stride. The offset can be computed like this:

    offset_x = (image_shape[1] - (features_shape[1] - 1) * stride) / 2.0
    offset_y = (image_shape[0] - (features_shape[0] - 1) * stride) / 2.0

Where features_shape and stride are the values for the current pyramid level. This just happens to be half the stride in case the shape of the features is a nice factor of the image shape. If you use half the stride for an input shape of (300, 300) for instance, it will be incorrect.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions