ALTO is a network orchestrator that efficiently serves compound AI systems, optimizing throughput and latency by streaming intermediate outputs between stages.
ALTO optimizes compound AI systems by streaming partial outputs, improving throughput and reducing latency.