• Reed's avatar
    Fix convergence issues for MLPerf. (#5161) · 64710c05
    Reed authored
    * Fix convergence issues for MLPerf.
    
    Thank you to @robieta for helping me find these issues, and for providng an algorithm for the `get_hit_rate_and_ndcg_mlperf` function.
    
    This change causes every forked process to set a new seed, so that forked processes do not generate the same set of random numbers. This improves evaluation hit rates.
    
    Additionally, it adds a flag, --ml_perf, that makes further changes so that the evaluation hit rate can match the MLPerf reference implementation.
    
    I ran 4 times with --ml_perf and 4 times without. Without --ml_perf, the highest hit rates achieved by each run were 0.6278, 0.6287, 0.6289, and 0.6241. With --ml_perf, the highest hit rates were 0.6353, 0.6356, 0.6367, and 0.6353.
    
    * fix lint error
    
    * Fix failing test
    
    * Address @robieta's feedback
    
    * Address more feedback
    64710c05
data_preprocessing.py 22.7 KB