Optimizing neural network structures: faster speed, smaller size, less tuning

Zhe Li

doi:10.17077/etd.rgwxihal

Back

Optimizing neural network structures: faster speed, smaller size, less tuning

Dissertation

Open access

Optimizing neural network structures: faster speed, smaller size, less tuning

Zhe Li

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Autumn 2018

DOI: 10.17077/etd.rgwxihal

Files and links (1)

pdf

Li_uiowa_0096D_158761.85 MBDownload View

Free to read and download, Open Access

Abstract

Deep neural networks have achieved tremendous success in many domains (e.g., computer vision~\cite{Alexnet12,vggnet15,fastrcnn15}, speech recognition~\cite{hinton2012deep,dahl2012context}, natural language processing~\cite{dahl2012context,collobert2011natural}, games~\cite{silver2017mastering,silver2016mastering}), however, there are still many challenges in deep learning comunity such as how to speed up training large deep neural networks, how to compress large nerual networks for mobile/embed device without performance loss, how to automatically design the optimal network structures for a certain task, and how to further design the optimal networks with improved performance and certain model size with reduced computation cost. To speed up training large neural networks, we propose to use multinomial sampling for dropout, i.e., sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization. By minimizing a sampling dependent factor in the risk bound, we obtain a distribution-dependent dropout with sampling probabilities dependent on the second order statistics of the data distribution. To tackle the issue of evolving distribution of neurons in deep learning, we propose an efficient adaptive dropout (named evolutional dropout) that computes the sampling probabilities on-the-fly from a mini-batch of examples. To compress large neural network structures, we propose a simple yet powerful method for compressing the size of deep Convolutional Neural Networks (CNNs) based on parameter binarization. The striking difference from most previous work on parameter binarization/quantization lies at different treatments of $1\times 1$ convolutions and $k\times k$ convolutions ($k>1$), where we only binarize $k\times k$ convolutions into binary patterns. By doing this, we show that previous deep CNNs such as GoogLeNet and Inception-type Nets can be compressed dramatically with marginal drop in performance. Second, in light of the different functionalities of $1\times 1$ (data projection/transformation) and $k\times k$ convolutions (pattern extraction), we propose a new block structure codenamed the pattern residual block that adds transformed feature maps generated by $1\times 1$ convolutions to the pattern feature maps generated by $k\times k$ convolutions, based on which we design a small network with $\sim 1$ million parameters. Combining with our parameter binarization, we achieve better performance on ImageNet than using similar sized networks including recently released Google MobileNets. To automatically design neural networks, we study how to design a genetic programming approach for optimizing the structure of a CNN for a given task under limited computational resources yet without imposing strong restrictions on the search space. To reduce the computational costs, we propose two general strategies that are observed to be helpful: (i) aggressively selecting strongest individuals for survival and reproduction, and killing weaker individuals at a very early age; (ii) increasing mutation frequency to encourage diversity and faster evolution. The combined strategy with additional optimization techniques allows us to explore a large search space but with affordable computational costs. To further design the optimal networks with improved performance and certain model size under reduced computation cost, we propose an ecologically inspired genetic approach for neural network structure search , that includes two types of succession: primary and secondary succession as well as accelerated extinction. Specifically, we first use primary succession to rapidly evolve a community of poor initialized neural network structures into a more diverse community, followed by a secondary succession stage for fine-grained searching based on the networks from the primary succession. Accelerated extinction is applied in both stages to reduce computational cost. In addition, we also introduce the gene duplication to further utilize the novel block of layers that appeared in the discovered network structure.

Computer Sciences

Computational cost

Dropout

Ecologically-Inspired

Genetic approach

Model size

Neural network

Details

Title: Subtitle: Optimizing neural network structures: faster speed, smaller size, less tuning
Creators: Zhe Li - University of Iowa
Contributors: Tianbao Yang (Advisor)
Qihang Lin (Committee Member)
Suely Oliveira (Committee Member)
Kasturi R. Varadarajan (Committee Member)
Padmini Srinivasan (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Computer Science
Date degree season: Autumn 2018
DOI: 10.17077/etd.rgwxihal
Publisher: University of Iowa
Number of pages: xv, 94 pages
Language: English
Date submitted: 11/19/2018
Description illustrations: color illustrations
Description bibliographic: Includes bibliographical references (pages 85-94).
Public Abstract (ETD): Deep neural networks have achieved tremendous success in many domains (e.g., computer vision [48, 79, 19], speech recognition [33, 14], natural language processing [14, 11], games [78, 77]), however, there are still many challenges in deep learning comunity such as how to speed up training large deep neural networks, how to compress the large nerual networks for mobile/embed device without performance loss and how to automatically design the optimal network structures for a certain task.

To speed up training process of large neural network, we propose to use multinomial sampling for dropout, i.e., sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. Further we propose an efficient adaptive dropout (named evolutional dropout) that computes the sampling probabilities on-the-y from a mini-batch of examples to tackle the issue of evolving distribution of neurons in deep learning. To compress large neural network structures, we propose a simple yet powerful method for compressing the size of deep CNNs based on parameter binarization, and design a new block structure codenamed the pattern residual block that adds transformed feature maps generated by 1 x 1 convolutions to the pattern feature maps generated by k x k convolutions, based on which we design a small network with ~ 1 million parameters. To automatically design neural networks, we study how to design a genetic programming approach for optimizing the structure of a CNN for a given task under limited computational resources yet without imposing strong restrictions on the search space. To reduce the computational costs, we propose two general strategies that are observed to be helpful: (i) aggressively selecting strongest individuals for survival and reproduction, and killing weaker individuals at a very early age; (ii) increasing mutation frequency to encourage diversity and faster evolution. To further design the optimal networks with improved performance and certain model size with reduced computation cost, we propose an ecologically inspired genetic approach for neural network structure search, that includes two types of succession: primary and secondary succession as well as accelerated extinction.
Academic Unit: Computer Science
Record Identifier: 9983777230602771

Metrics

104 File views/ downloads

382 Record Views