Fine-grained attribute control in text-to-image models is achievable through semantic directions identified in token-level CLIP embeddings.