Neural Speed, an Intel-developed framework, can accelerate inference of 4-bit large language models on consumer CPUs by up to 40x compared to existing solutions like llama.cpp.