eprintid: 19204 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/01/92/04 datestamp: 2024-06-04 14:11:40 lastmod: 2024-06-04 14:11:40 status_changed: 2024-06-04 14:05:09 type: article metadata_visibility: show creators_name: Esguerra, K. creators_name: Nasir, M. creators_name: Tang, T.B. creators_name: Tumian, A. creators_name: Ho, E.T.W. title: Sparsity-Aware Orthogonal Initialization of Deep Neural Networks ispublished: pub keywords: Budget control; Computation theory; Deep neural networks; Graph theory; Jacobian matrices; Multilayer neural networks; Network layers; Pattern recognition, Biological neural networks; Computational modelling; Dynamical isometry; Expander graphs; Expander neural network; Model pruning; Network topology; Neural-networks; Orthogonal neural networks; Ramanujan expander graph; Sparse matrices; Sparse neural networks, Network topology note: cited By 0 abstract: Deep neural networks have achieved impressive pattern recognition and generative abilities on complex tasks by developing larger and deeper models, which are increasingly costly to train and implement. There is in tandem interest to develop sparse versions of these powerful models by post-processing with weight pruning or dynamic sparse training. However, these processes require expensive train-prune-finetune cycles and compromise the trainability of very deep network configurations. We introduce sparsity-aware orthogonal initialization (SAO), a method to initialize sparse but maximally connected neural networks with orthogonal weights. SAO constructs a sparse network topology leveraging Ramanujan expander graphs to assure connectivity and assigns orthogonal weights to attain approximate dynamical isometry. Sparsity in SAO networks is tunable prior to model training. We compared SAO to fully-connected neural networks and demonstrated that SAO networks outperform magnitude pruning in very deep and sparse networks up to a thousand layers with fewer computations and training iterations. Convolutional neural networks are SAO networks with special constraints, while kernel pruning may be interpreted as tuning the SAO sparsity level. Within SAO framework, kernels may be pruned prior to model training based on a desired compression factor rather than post-training based on parameter-dependent heuristics. SAO is well-suited for applications with tight energy and computation budgets such as edge computing tasks, because it achieves sparse, trainable neural network models with fewer learnable parameters without requiring special layers, additional training, scaling, or regularization. The advantages of SAO networks are attributed to both its sparse but maximally connected topology and orthogonal weight initialization. © 2013 IEEE. date: 2023 official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85164768946&doi=10.1109%2fACCESS.2023.3295344&partnerID=40&md5=2adaab0b5880d3ceff6c3c253987515f id_number: 10.1109/ACCESS.2023.3295344 full_text_status: none publication: IEEE Access volume: 11 pagerange: 74165-74181 refereed: TRUE citation: Esguerra, K. and Nasir, M. and Tang, T.B. and Tumian, A. and Ho, E.T.W. (2023) Sparsity-Aware Orthogonal Initialization of Deep Neural Networks. IEEE Access, 11. pp. 74165-74181.