Building Accurate and Diverse Data Sets for Retrosynthetic Planning