Radial Networks: Enabling Dynamic Layer Skipping for Efficient Inference in Large Language and Vision Models
Radial Networks leverage significant dynamic layer sparsity within modern large language and vision models to enable token-level routing and selective layer invocation, reducing overall compute and serving costs while maintaining model capacity.