A deep learning model can endow social robots with the ability to estimate the addressee of a speaker's utterance by interpreting the speaker's non-verbal bodily cues.