Deploying Deep Neural Networks (DNNs) on edge devices, such as smartphones and autonomous vehicles, remains a significant challenge due to their computationally intensive nature. Most existing pruning algorithms struggle to balance high compression rates and inference accuracy and have to be compatible with commercial hardware—unstructured pruning yields irregular sparsity that often limits its usage in practical scenarios. On the contrary, structured pruning has tended to compromise accuracy because its granularity is relatively coarse. Moreover, semi-structured pruning, with the optimistic expectation of balancing those trade-offs, has minimal applications across a wide array of DNN architectures. Such challenges call for a unified and efficient pruning framework for any model, thereby furthering better performance in constrained resource scenarios.
Current pruning strategies fall into three categories: unstructured, structured, or semi-structured. Unstructured pruning offers maximum flexibility in weight elimination but results in sparsity configurations incompatible with hardware acceleration. Structured pruning eliminates complete filters or layers, enhancing compatibility with hardware but at the cost of accuracy due to excessive granularity. Semi-structured pruning focuses on systematic patterns within weight matrices and seeks to balance efficiency and accuracy. However, the application of the method has been mostly restricted to a particular type of DNN, such as CNNs, and the remaining architectures like Vision Transformers have been left under-explored. Automated methods involving GNNs and reinforcement learning have been mainly used in structured and unstructured pruning, whereas pattern-based pruning techniques are underdeveloped. This gap necessitates a more robust and generalized pruning framework.
Researchers from Ocean University of China propose AutoSculpt, a cutting-edge solution to model pruning that employs Graph Neural Networks (GNNs) and Deep Reinforcement Learning (DRL) to optimize compression strategies. It achieves this by representing DNNs as graphs that capture their topological structure and parameter dependencies. It embeds pattern-based pruning strategies into these graph representations, effectively leveraging regular structures to enhance hardware compatibility and inference efficiency. Using a Graph Attention Network (GATv2) encoder, the proposed methodology systematically improves pruning patterns through reinforcement learning, thereby attaining an ideal balance between compression and accuracy. This strategy vastly increases the flexibility of pattern-based pruning, thereby broadening its applicability to CNNs and Vision Transformers among other architectures.
In AutoSculpt, DNNs are graph representations where nodes denote weights or layers and edges denote dependencies, including explicit mapping of residual connections for architectures like ResNet. The pruning strategy uses a DRL agent that evaluates graph embeddings to suggest optimal pruning patterns balancing objectives such as FLOPs reduction and accuracy retention. A dynamic reward function adjusts priorities between these goals. The proposed framework has been tested using CIFAR-10/100 and ImageNet-1K datasets, along with architectures such as ResNet, MobileNet, VGG, and Vision Transformers. Rich graph representations enable efficient usage of pruning patterns and demonstrate the versatility and wide-range usability of the approach.
AutoSculpt achieved remarkable results, consistently outperforming state-of-the-art methods in model compression. It was reportedly able to attain pruning rates as high as 90% on much simpler architectures like VGG-19, but also decrease FLOPs by up to 18% compared to other state-of-the-art methods. For more complex models, such as ResNet, as well as Vision Transformers, the paper was able to balance this by achieving pruning ratios of up to 55% and 45% respectively, with no worse than 3% in terms of accuracy loss.
Inferring latency was also reduced to a significant degree, while execution times improved up to 29 percent, and were suitable for resource-constrained applications. The pruned models, more often than not, matched or outperformed their original counterparts after fine-tuning, showing how robust the method is regarding retaining critical parameters during compression.
AutoSculpt transforms DNN pruning into a better solution for efficient compression, delivering superior performance across diverse architectures. It addresses longstanding trade-offs in accuracy, compression, and hardware compatibility through the application of GNNs and reinforcement learning. It’s flexible and robust; thus, it brings the milestone of deploying DNNs on edge devices closer to reality, offering avenues toward more practical and efficient AI applications in resource-constrained environments.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.
The post AutoSculpt: A Pattern-based Automated Pruning Framework Designed to Enhance Efficiency and Accuracy by Leveraging Graph Learning and Deep Reinforcement Learning appeared first on MarkTechPost.