Abstract

State-of-the-art image classication approaches
are mainly based on robust image representation, such as
the bag-of-features (BoF) model or the convolutional neural
network (CNN) architecture. In real applications, the ori-
entation (left/right) of an image or an object might vary
from sample to sample, whereas some handcrafted descrip-
tors (e.g., SIFT) and network operations (e.g., convolution)
are not reversal-invariant, leading to the unsatised stabil-
ity of image features extracted from these models. To deal
with, a popular solution is to augment the dataset by adding
a left-right reversed copy for each image. This strategy
improves the recognition accuracy to some extent, but also 

brings the price of almost doubled time and memory con-
sumptions on both the training and testing stages. In this
paper, we present an alternative solution based on design-
ing reversal-invariant representation of local patterns, so that
we can obtain the identical representation for an image and
its left-right reversed copy. For the BoF model, we design
a reversal-invariant version of SIFT descriptor named Max-
SIFT, a generalized RIDE algorithm which can be applied to
a large family of local descriptors. For the CNN architecture,
we present a simple idea of generating reversal-invariant deep
features (RI-Deep), and, inspired by which, design reversal-
invariant convolution (RI-Conv) layers to increase the CNN
capacity without increasing the model complexity. Experi-
ments reveal consistent accuracy gain on various image clas-
sication tasks, including scene understanding, ne-grained
object recognition, and large-scale visual recognition.