Pre-training can mitigate poor extrapolation but not dataset biases, offering complementary benefits when combined with interventions to prevent exploiting biases.