ALTO optimizes compound AI systems by streaming intermediate outputs, addressing correctness and load balancing challenges, resulting in increased throughput and reduced latency.