- 08 May, 2024 1 commit
-
-
JuhaoLiang authored
* add AceGPT-MMLUArabic benchmark * update readme and fix lint issue * remove unused package * add MMLUArabic zero-shot settings * rename filename and update readme
-
- 06 May, 2024 5 commits
-
-
Fangyu Lei authored
* s3eval_branch * update s3eval
-
Xu Song authored
* [Fix] Fix AGIEval chinese sets * Create agieval_gen_617738.py * [Fix] Fix AGIEval chinese sets * Restore agieval_gen_64afd3.py * Update agieval_gen.py * Create agieval_mixed_0fa998.py * Update agieval_mixed.py
-
Yggdrasill7D6 authored
* add mgsm datasets * fix lint * fix lint * update mgsm * update mgsm * ease code spell * update * update * update --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
klein authored
* [Feature] update drop dataset from openai simple eval * update drop template presentation * update --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
Fengzhe Zhou authored
* add mmlu prompt from simple_evals, openai * return empty str on failure
-
- 30 Apr, 2024 3 commits
-
-
Yang Yong authored
-
Fengzhe Zhou authored
-
Alexander Lam authored
* fixed formatting based on pre-commit tests * fixed typo in comments; reduced the number of models in the eval config * fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset * removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
-
- 29 Apr, 2024 3 commits
-
-
Ikko Eltociear Ashimine authored
requiresments -> requirements
-
bittersweet1999 authored
-
Songyang Zhang authored
* [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks
-
- 28 Apr, 2024 5 commits
-
-
liushz authored
* Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation --------- Co-authored-by:liuhongwei <liuhongwei@pjlab.org.cn>
-
bittersweet1999 authored
-
Lyu Han authored
* adapt to lmdeploy v0.4.0 * compatible
-
Yggdrasill7D6 authored
* add flames datasets * fix lint * rm quota * add judgemodel info and fix os path * support flames dataset * support flames dataset --------- Co-authored-by:bittersweet1999 <1487910649@qq.com>
-
Mo Li authored
* update NeedleInAHaystack Test Docs * update docs
-
- 26 Apr, 2024 8 commits
-
-
dmitrysarov authored
* fix output typing, change mutable list to immutable tuple * import missed type * format --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
binary-husky authored
* fix relative path bug * format --------- Co-authored-by:
hmp <505030475@qq.com> Co-authored-by:
Leymore <zfz-960727@163.com>
-
Wang Xingjin authored
* add vllm get_ppl * add vllm get_ppl * format --------- Co-authored-by:
xingjin.wang <xingjin.wang@mihoyo.com> Co-authored-by:
Leymore <zfz-960727@163.com>
-
Haodong Duan authored
* Remove MultiModal * update index.rst * update README * remove mmbench codes * update news --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
Francis-llgg authored
* add gpqa_openai_simple_eval * 触发CI构建 * reorg --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
klein authored
* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 * update cibench: dataset and evluation * cibench summarizer bug * update cibench * move extract_code import --------- Co-authored-by:
zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn> Co-authored-by:
Leymore <zfz-960727@163.com>
-
bittersweet1999 authored
* support arenahard * support arenahard * support arenahard
-
bittersweet1999 authored
* support openai math evaluation * support openai math evaluation * support openai math evaluation * support math llm judge * support math llm judge
-
- 24 Apr, 2024 2 commits
-
-
Jingming Zhuo authored
* [Feature] Add IFEval * add humaneval prompt from simple_evals, openai
-
liushz authored
Co-authored-by:liuhongwei <liuhongwei@pjlab.org.cn>
-
- 23 Apr, 2024 2 commits
-
-
Ke Bao authored
* add lmdeploy tis python backend model * fix pr check * update
-
Fengzhe Zhou authored
-
- 22 Apr, 2024 3 commits
-
-
Fengzhe Zhou authored
* add TheoremQA with 5-shot * cherry pick from add-huggingface-above-v4.33, good TheoremQA results
-
Fengzhe Zhou authored
* add LLaMA-3 Series configs * update readme
-
bittersweet1999 authored
* fix multiround * fix
-
- 19 Apr, 2024 1 commit
-
-
Fengzhe Zhou authored
-
- 17 Apr, 2024 1 commit
-
-
Robin Chen authored
* [fix]Fixed the issue caused by the repeated loading of VLLM model during task segmentation. * [fix] avoid TypeError: VLLM.__init__() got an unexpected keyword argument 'tokenizer_only' * restore .pre-commit-config.yaml * restore opencompass/tasks/openicl_infer.py --------- Co-authored-by:
IcyFeather <mengzhuo.happy@gmail.com> Co-authored-by:
Leymore <zfz-960727@163.com>
-
- 16 Apr, 2024 2 commits
-
-
Songyang Zhang authored
* [Update] Update readme * [Update] Update readme * [Update] Update readme
-
Fengzhe Zhou authored
(cherry picked from commit 16ac6306c72fa202173289b55eaefe85e0fcb73c) Co-authored-by:liuhongwei <liuhongwei@pjlab.org.cn>
-
- 15 Apr, 2024 1 commit
-
-
Fengzhe Zhou authored
* logger.error -> logger.info in OpenAI * logger.info -> logger.debug in OpenAI
-
- 12 Apr, 2024 1 commit
-
-
liuwei130 authored
* add ChemBench * update results * molbench -> ChemBench --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
- 11 Apr, 2024 1 commit
-
-
Fengzhe Zhou authored
-
- 09 Apr, 2024 1 commit
-
-
Fengzhe Zhou authored
-