A unified framework utilizing diffusion inversion that enables multi-level editing capabilities for co-speech gesture generation without re-training the model.