A highly efficient hardware implementation of boosted decision trees for regression on field programmable gate arrays (FPGAs) that can execute in under 10 nanoseconds, enabling real-time processing of missing transverse momentum at the Large Hadron Collider.
The CARAML benchmark suite provides a systematic, automated, and reproducible framework for evaluating the performance and energy consumption of transformer-based language models and computer vision models on a range of hardware accelerators, including NVIDIA, AMD, and Graphcore systems.
This work presents an energy-efficient 3T1R1C capacitive-RRAM content addressable memory (CAM) that uses RRAM as both storage and comparison element, eliminating the need for direct current paths and enabling low-power parallel search operations.
Embedded FPGA technology enables the implementation of reconfigurable logic within application-specific integrated circuits (ASICs), offering the low power and efficiency of an ASIC along with the ease of FPGA configuration, which is beneficial for machine learning applications in the data pipeline of next-generation collider experiments.
This paper proposes a probabilistic interval analysis technique to statically analyze programs that run on unreliable hardware architectures, where operations can fail with a certain probability.
MASE, a novel compiler, automatically explores mixed-precision quantization using custom Microscaling (MX) formats to enable efficient dataflow hardware acceleration for large language models (LLMs) with minimal accuracy degradation.
This work proposes a hardware-aware training framework that co-optimizes synaptic weights and delays for deploying highly performing spiking neural network models on digital neuromorphic hardware platforms.