Multi-Scale Fully Convolutional Neural Networks for Classification of Visual Objects
Recent studies have shown great potentials of the Convolutional Neural Networks (CNNs) to yield excellent results on visual classification tasks. While the CNNs could achieve translation-invariance by spatial convolution and pooling mechanisms, their ability to achieve scale-invariance is still limited. To overcome the challenge, we propose a multi-scale fully CNNs network architecture that constitutes three types of multi-scale fusions, namely: (1) multi-size filters fusion; (2) multi-layer features fusion; and (3) multi-resolution I/Os fusion. Our CNNs’ architecture is designed to incorporate the fusions such that scale-invariance could be achieved. Using the CIFAR-10 and CIFAR-100 datasets as the benchmark for testing, our architecture has achieved classification accuracy with 96.6% (CIFAR-10) and 80.36% (CIFAR-100), respectively. In conclusion, our multi-scale fully CNNs architecture has demonstrated the state-of-art classification performance based on published works to date.
Index terms- Convolutional Neural Networks, CNN, multi-size kernel fusion, multi-layer feature fusion, multi-resolution I/O fusion.