- 13 Mar, 2024 1 commit
-
-
Connor-Shen authored
* [Feat] support apps * [Feat] support apps * [Feat] support apps * update README
-
- 11 Mar, 2024 2 commits
-
-
Fengzhe Zhou authored
-
bittersweet1999 authored
* add temp for mtbench * add document for mtbench * add document for mtbench
-
- 05 Mar, 2024 2 commits
- 04 Mar, 2024 3 commits
-
-
Fengzhe Zhou authored
-
yuantao2108 authored
* add lveval benchmark * add LVEval readme file * update LVEval readme file * Update configs/eval_bluelm_32k_lveval.py * Update configs/eval_llama2_7b_lveval.py --------- Co-authored-by:
yuantao <yuantao@infini-ai.com> Co-authored-by:
Mo Li <82895469+DseidLi@users.noreply.github.com>
-
Mo Li authored
* add needlebench * simplify needlebench 32k, 128k, 200k for eval * update act prompt * fix bug in needlebench summarizer * add needlebench intro, fix summarizer * lint summarizer * fix linting error * move readme.md * update readme for needlebench * update docs of needlebench * simplify needlebench summarizers
-
- 29 Feb, 2024 1 commit
-
-
Skyfall-xzz authored
* [Feature] Support OpenFinData * add README for OpenFinData * update README
-
- 28 Feb, 2024 1 commit
-
-
bittersweet1999 authored
* add gemini * add gemini * add gemini
-
- 23 Feb, 2024 1 commit
-
-
Jingming authored
-
- 22 Feb, 2024 1 commit
-
-
bittersweet1999 authored
* fix ifeval * fix ifeval * fix ifeval * fix ifeval
-
- 06 Feb, 2024 2 commits
-
-
hailsham authored
* fix bug of gsm8k_postprocess * update postprocess --------- Co-authored-by:
Lei Fei <SENSETIME\leifei1@cn3114002087l.domain.sensetime.com> Co-authored-by:
Leymore <zfz-960727@163.com>
-
Connor-Shen authored
* [feat] support humaneval_multipl-e * format --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
- 05 Feb, 2024 2 commits
-
-
Fengzhe Zhou authored
-
Skyfall-xzz authored
* support NPHardEval * add .md file and fix minor bugs * refactor and minor fix --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
- 04 Feb, 2024 1 commit
-
-
bittersweet1999 authored
* support alpacaeval_v1 * Update opencompass/summarizers/subjective/__init__.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/summarizers/subjective/alpacaeval_v1.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * fix conflict * support alpacaeval v2 * support alpacav2 --------- Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com>
-
- 30 Jan, 2024 1 commit
-
-
bittersweet1999 authored
-
- 27 Jan, 2024 1 commit
-
-
Jingming authored
* [Feature] Add IFEval * [Fix] Changing the Score Rule.
-
- 26 Jan, 2024 1 commit
-
-
Xiaoming Shi authored
-
- 24 Jan, 2024 3 commits
-
-
bittersweet1999 authored
* fix corev2 * fix corev2
-
Fengzhe Zhou authored
Co-authored-by:liuhongwei <liuhongwei@pjlab.org.cn>
-
bittersweet1999 authored
* add mtbench * add mtbench * Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/mtbench.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * fix mtbench --------- Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com>
-
- 23 Jan, 2024 2 commits
-
-
Jingming authored
* [Feature] Add IFEval * [Doc] add introduction of IFEval
-
bittersweet1999 authored
* add compass arena * add compass_arena * add compass arena * Update opencompass/summarizers/subjective/compass_arena.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/summarizers/subjective/__init__.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/compass_arena.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * fix check position bias --------- Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com>
-
- 19 Jan, 2024 1 commit
-
-
Guo Qipeng authored
* update cdme config and evaluator * fix cdme prompt * move CDME trim post-processor as a separate evaluator --------- Co-authored-by:郭琦鹏 <guoqipeng@pjlab.org.cn>
-
- 17 Jan, 2024 2 commits
-
-
Fengzhe Zhou authored
Co-authored-by:zhangyifan1 <zhangyifan1@pjlab.org.cn>
-
Mo Li authored
* Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug * Added support for multi-needle testing in needle-in-a-haystack test * Optimize the code for plotting in the needle-in-a-haystack test. * Correct the typo in the dataset parameters. * update needleinahaystack test docs --------- Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com>
-
- 16 Jan, 2024 1 commit
-
-
bittersweet1999 authored
-
- 12 Jan, 2024 1 commit
-
-
bittersweet1999 authored
* add creationv2_zh * add creationv2_zh * add eng config for creationbench * add eng config for creationbench * add eng config for creationbench
-
- 11 Jan, 2024 1 commit
-
-
Songyang Zhang authored
-
- 09 Jan, 2024 1 commit
-
-
Xiaoming Shi authored
* update medbench * medbench update * format medbench * format * Update * update * update * update suffix --------- Co-authored-by:
施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org> Co-authored-by:
Leymore <zfz-960727@163.com>
-
- 08 Jan, 2024 3 commits
-
-
Fengzhe Zhou authored
-
liyucheng09 authored
* Contamination analysis for ARC_c, mmlu, and Hellaswag * update `eval_contamination.py` * update `contamination.py` summarizer * fix `eval_contamination.py` * add mmlu groups for contamination analysis
-
Yuchen Yan authored
Co-authored-by:yanyuchen04 <yanyuchen04@meituan.com>
-
- 05 Jan, 2024 2 commits
-
-
Connor-Shen authored
* support mbpp+ * support mbpp+ * minor fix * [Feat] minor fix --------- Co-authored-by:yingfhu <yingfhu@gmail.com>
-
bittersweet1999 authored
* add subject ir * Add ir dataset * Add ir dataset
-
- 04 Jan, 2024 1 commit
-
-
bittersweet1999 authored
* multi_round dataset * add multi_round evaluation
-
- 02 Jan, 2024 1 commit
-
-
Mo Li authored
* Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug --------- Co-authored-by:
Songyang Zhang <tonysy@users.noreply.github.com>
-
- 01 Jan, 2024 1 commit
-
-
Francis-llgg authored
* check * message * add * change prompt * change a para nameq * modify name of the file * delete an useless file
-