REPRESENTATION LEARNING OF IMAGE RECOGNITION: DIVERSITY, ASPECT RATIO, INVARIANCE, AND COMPOSITION

Doctoral Candidate Name: 
Qiuyu Chen
Program: 
Computing and Information Systems
Abstract: 

Deep neural networks (DNN) are proved to be effective and improve the performance dramatically in various kinds of computer vision tasks. The end-to-end learning manner in training DNN consistently shows the powerful modeling ability and consequently mitigates the dedicated efforts for expert feature engineering. On the other hand, it raises the issue that how to improve the black-box network with better representation (feature) learning especially when the learned representations and classifiers are tied together in the manner of supervised learning. In this work, representation learning is studied in four perspectives of different fields, i.e. diversity in ensemble learning, aspect ratio in image aesthetics assessment, invariance in identification task, and composition in color attribute recognition.

In light of analyzing the bottleneck of black-box network and designing better representation learning for target tasks, we introduce that: (a) Ensemble learning relies on the diversity of the complementary neural networks, in both feature representations and classifier representations. A diverse representation learning method, namely learning-difficulty-aware embedding, is proposed to adaptively reconcile learning attentions for different categories by training a series of networks with diversified representations sequentially; (b) Widely-adopted data augmentation method in image recognition deteriorates aspect ratios, which is an important factor in image aesthetics assessment. An aspect ratio representation learning method, namely adaptive fractional dilated convolution, is proposed to explicitly preserve the learning representation related to aspect ratios by adjusting the receptive fields adaptively and natively; (c) Identification tasks, e.g. person re-identification, aim at learning representations that are robust to interfering variances, e.g. lighting variances, view variances, pose variances. An invariance representation learning method, namely anchor loss, is proposed to train a robust feature extractor, which distills the identity-related representations while disentangling and removing interfering variances by global supervision under local mini-batch training; (d) Color recognition is entangled with compositional representation in both visual perception and language attentions. A compositional learning module with attention to key colors is proposed to learn better color representations. Besides, another compositional learning method, namely classifier as descriptor, is proposed for long-tail color recognition by incorporating the rich knowledge in classifier representations to remove the bias from bias-trained
model.

Through extensive experiments and thorough analysis, we demonstrate some novel insights about the impacts of four factors, i.e. diversity, receptive field, invariance, and composition. Several methods are proposed to learn better representations for those factors, achieving state-of-the-art results in different tasks.

Defense Date and Time: 
Monday, April 12, 2021 - 9:00am
Defense Location: 
Zoom
Committee Chair's Name: 
Dr. Jianping Fan
Committee Members: 
Dr. Aidong Lu, Dr. Jing Yang, Dr. Min Shin, Dr. Weidong Tian