[evaluation] improvement on evaluation (#3862)
* fix a bug when the config file contains one category but the answer file doesn't contains that category
* fix Chinese prompt file
* support gpt-3.5-turbo and gpt-4 evaluation
* polish and update README
* resolve pr comments
---------
Co-authored-by:
Yuanchen Xu <yuanchen.xu00@gmail.com>
Showing
This diff is collapsed.
Please register or sign in to comment