The core message of this paper is to leverage unlabeled data in the target domain to improve zero-shot dialogue state tracking performance by utilizing joint and self-training methods.