Commits · 608ff5810dd2fea1161acbd936cfb4d6bf4cfb28 · OpenDAS / opencompass

27 May, 2024 1 commit

support CHARM (https://github.com/opendatalab/CHARM ) reasoning tasks (#1190) · 608ff581

jxd authored May 27, 2024

* support CHARM (https://github.com/opendatalab/CHARM

) reasoning tasks

* fix lint error

* add dataset card for CHARM

* minor refactor

* add txt

---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>

608ff581

24 May, 2024 2 commits
- fix length (#1180) · 07a6dacf
  bittersweet1999 authored May 24, 2024
  
  07a6dacf
- [Fix] Fix drop_gen.py (#1191) · 5eb8f14d
  klein authored May 24, 2024
```
Fix the bug in drop_gen: wrong import
```
  5eb8f14d
21 May, 2024 2 commits

Update MathBench (#1176) · 1448be00

liushz authored May 21, 2024



* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>

1448be00

[Sync] update evaluator (#1175) · 2b3d4150
Fengzhe Zhou authored May 21, 2024

2b3d4150

17 May, 2024 1 commit
- [Sync] add OC16 entry (#1171) · 5de85406
  Fengzhe Zhou authored May 17, 2024
  
  5de85406
14 May, 2024 4 commits

[Sync] update github workflow (#1156) · 62dbf047
Fengzhe Zhou authored May 14, 2024

62dbf047
[Format] Add config lints (#892) · aa2dd2b5
Fengzhe Zhou authored May 14, 2024

aa2dd2b5

[Feat] Support dataset_suffix check for mixed configs (#973) · 3dbba119

Xu Song authored May 14, 2024



* [Feat] Support dataset_suffix check for mixed configs

* update mixed suffix

* update suffix

---------
Co-authored-by: Leymore <zfz-960727@163.com>

3dbba119

[Feature] Add huggingface apply_chat_template (#1098) · 7505b3ca

Fengzhe Zhou authored May 14, 2024

* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33

7505b3ca

13 May, 2024 2 commits
- [Fix] Fix Needlebench Summarizer (#1143) · 6c711cb2
  Mo Li authored May 13, 2024
```
* update few-shot example

* add 128k
```
  6c711cb2
- fix multiround (#1146) · 5432dfc1
  bittersweet1999 authored May 13, 2024
  
  5432dfc1
08 May, 2024 1 commit

[Feature] Add AceGPT-MMLUArabic benchmark (#1099) · d2c40e56

JuhaoLiang authored May 08, 2024

* add AceGPT-MMLUArabic benchmark

* update readme and fix lint issue

* remove unused package

* add MMLUArabic zero-shot settings

* rename filename and update readme

d2c40e56

06 May, 2024 5 commits

[Feature] Add S3Eval Dataset (#916) · 862044fb
Fangyu Lei authored May 06, 2024
```
* s3eval_branch

* update s3eval
```
862044fb

[Fix] Fix AGIEval chinese sets (#972) · d5017101

Xu Song authored May 06, 2024

* [Fix] Fix AGIEval chinese sets

* Create agieval_gen_617738.py

* [Fix] Fix AGIEval chinese sets

* Restore agieval_gen_64afd3.py

* Update agieval_gen.py

* Create agieval_mixed_0fa998.py

* Update agieval_mixed.py

d5017101

add mgsm datasets (#1081) · af10ecc2

Yggdrasill7D6 authored May 06, 2024



* add mgsm datasets

* fix lint

* fix lint

* update mgsm

* update mgsm

* ease code spell

* update

* update

* update

---------
Co-authored-by: Leymore <zfz-960727@163.com>

af10ecc2

[Feature] update drop dataset from openai simple eval (#1092) · 153c4fc9

klein authored May 06, 2024



* [Feature] update drop dataset from openai simple eval

* update drop template presentation

* update

---------
Co-authored-by: Leymore <zfz-960727@163.com>

153c4fc9

[Feature] Add mmlu prompt from simple_evals, openai (#1074) · d43392a3
Fengzhe Zhou authored May 06, 2024
```
* add mmlu prompt from simple_evals, openai

* return empty str on failure
```
d43392a3

30 Apr, 2024 1 commit

[Feature] Adding support for LLM Compression Evaluation (#1108) · 35c94d0c

Alexander Lam authored Apr 30, 2024

* fixed formatting based on pre-commit tests

* fixed typo in comments; reduced the number of models in the eval config

* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset

* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English

35c94d0c

28 Apr, 2024 2 commits

[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103) · a6f67e1a

liushz authored Apr 28, 2024



* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>

a6f67e1a

[Feature] add support for Flames datasets (#1093) · 58a57a4c

Yggdrasill7D6 authored Apr 28, 2024



* add flames datasets

* fix lint

* rm quota

* add judgemodel info and fix os path

* support flames dataset

* support flames dataset

---------
Co-authored-by: bittersweet1999 <1487910649@qq.com>

58a57a4c

26 Apr, 2024 4 commits

[Feature] Add gpqa prompt from simple_evals, openai (#1080) · f1ee11de

Francis-llgg authored Apr 26, 2024



* add gpqa_openai_simple_eval

* 触发CI构建

* reorg

---------
Co-authored-by: Leymore <zfz-960727@163.com>

f1ee11de

Update CIBench (#1089) · e4830a69

klein authored Apr 26, 2024



* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4

* update cibench: dataset and evluation

* cibench summarizer bug

* update cibench

* move extract_code import

---------
Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>

e4830a69

[Feature] support arenahard evaluation (#1096) · e404b72c
bittersweet1999 authored Apr 26, 2024
```
* support arenahard

* support arenahard

* support arenahard
```
e404b72c

[Feature] Support Math evaluation via judgemodel (#1094) · 6ba1c493

bittersweet1999 authored Apr 26, 2024

* support openai math evaluation

* support openai math evaluation

* support openai math evaluation

* support math llm judge

* support math llm judge

6ba1c493

24 Apr, 2024 1 commit
- Add humaneval prompt from simple_evals, openai (#1076) · 41196c48
  Jingming Zhuo authored Apr 24, 2024
```
* [Feature] Add IFEval

* add humaneval prompt from simple_evals, openai
```
  41196c48
22 Apr, 2024 2 commits
- [Feature] Add TheoremQA with 5-shot (#1048) · 004ed795
  Fengzhe Zhou authored Apr 22, 2024
```
* add TheoremQA with 5-shot

* cherry pick from add-huggingface-above-v4.33, good TheoremQA results
```
  004ed795
- [Fix] Fix MultiRound Subjective Evaluation(#1043) · 6f98c8d9
  bittersweet1999 authored Apr 22, 2024
```
* fix multiround

* fix
```
  6f98c8d9
19 Apr, 2024 1 commit
- [Sync] deprecate old mbpps (#1064) · 8c85edd1
  Fengzhe Zhou authored Apr 19, 2024
  
  8c85edd1
12 Apr, 2024 1 commit

[Feature] Add ChemBench (#1032) · a00e5729

liuwei130 authored Apr 12, 2024



* add ChemBench

* update results

* molbench -> ChemBench

---------
Co-authored-by: Leymore <zfz-960727@163.com>

a00e5729

09 Apr, 2024 1 commit
- [Sync] update taco (#1030) · b39f5015
  Fengzhe Zhou authored Apr 09, 2024
  
  b39f5015
07 Apr, 2024 3 commits

[Fix] Simplify needlebench summarizer (#1024) · 16f29b25
Mo Li authored Apr 07, 2024
```
* Conflicts:
	configs/summarizers/needlebench.py

* fix lint problems
```
16f29b25

[Feature] Add ATC Choice Version (#1019) · f2af4933

Mo Li authored Apr 07, 2024

* Squashed commit of the following:

commit c48ad194c3976dc63d1b60d8c8ab2d5ff9e1cbfe
Author: DseidLi <2568818204@qq.com>
Date:   Tue Apr 2 16:57:43 2024 +0800

    add atc_choice

commit 3ac6efea29619573e6fac8fa3cce464853dcead0
Merge: 2d4e5597 8e3a9c3
Author: DseidLi <2568818204@qq.com>
Date:   Tue Apr 2 16:41:38 2024 +0800

    Merge branch 'atc_choice' into atc_add_choice

commit 8e3a9c396a3e5546d3faf584183f6fd60b974d5e
Merge: 150a036 0a6a03fe
Author: DseidLi <2568818204@qq.com>
Date:   Tue Mar 26 04:47:07 2024 +0800

    Merge branch 'main' into atc_choice

    Conflicts:
    	configs/summarizers/needlebench.py
    	opencompass/datasets/needlebench/multi.py
    	opencompass/datasets/needlebench/origin.py
    	opencompass/datasets/needlebench/parallel.py

commit 150a036d6d990f26a57c974d1af83d88c31a0f9d
Merge: 8d6ac9a 940dd18
Author: DseidLi <2568818204@qq.com>
Date:   Wed Mar 20 03:49:08 2024 +0800

    Merge branch 'needlebench_fix' into atc_choice

commit 8d6ac9a1a43b1c9d0f0ea27e7d58968a203ea898
Author: DseidLi <2568818204@qq.com>
Date:   Wed Mar 20 03:41:49 2024 +0800

    optimize needlebench code

commit 940dd18a4270f24bc69edd2a780182c68918e1a9
Author: DseidLi <2568818204@qq.com>
Date:   Wed Mar 20 03:39:46 2024 +0800

    fix vllm

commit d8be6877bc41051f3edcc0421c462c834c0f1c9a
Merge: ecad78a 2527fda
Author: DseidLi <2568818204@qq.com>
Date:   Tue Mar 19 21:07:08 2024 +0800

    Merge remote-tracking branch 'origin/add_1M_dataset' into atc_choice

commit 2527fda8a546595bcaea1e5261367bc1097faec8
Author: DseidLi <2568818204@qq.com>
Date:   Tue Mar 19 16:03:40 2024 +0800

    add model configs

commit 75425acdf80d6d25ee24bb0aa60ac48539262e76
Author: DseidLi <2568818204@qq.com>
Date:   Tue Mar 19 16:02:15 2024 +0800

    add prompt postion args

commit 367ba1ba612a8cec5df1f80d5e5ae4e285baf38b
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 21:40:00 2024 +0800

    add Needlebench-1000K configs

commit ecad78af14c4bb00fe325779114b384c57ab30bf
Author: DseidLi <2568818204@qq.com>
Date:   Thu Mar 14 22:08:32 2024 +0800

    fix atc

commit 08772c0787b18872abadc9ffec3223941a5ee0c2
Merge: 9f3f8cf caf1cf8a
Author: DseidLi <2568818204@qq.com>
Date:   Thu Mar 14 22:07:28 2024 +0800

    Merge branch 'main' into atc_choice

    Conflicts:
    	configs/datasets/needlebench/readme.md
    	configs/datasets/needlebench/readme_zh-CN.md
    	configs/summarizers/needlebench.py
    	opencompass/datasets/needlebench/atc.py
    	opencompass/summarizers/needlebench.py

commit 9f3f8cfb4452722734d334114ac1d14110e57406
Author: DseidLi <2568818204@qq.com>
Date:   Thu Mar 14 21:35:53 2024 +0800

    add atc-choice test

commit 52be7c1202376b4e09821188b826f1a805328129
Author: DseidLi <2568818204@qq.com>
Date:   Wed Mar 6 02:54:15 2024 +0800

    update needlebench randomseed and add vllm qwen14b

commit fc1effce596ae2e5ece4933e8cd34aef8e64a6f9
Merge: 4e747ed caf1cf8a
Author: DseidLi <2568818204@qq.com>
Date:   Wed Mar 6 02:51:14 2024 +0800

    Merge branch 'main' into add_model_configs

commit 31834f9b23af3354ac3581ec86d693d0f05cdd1c
Merge: 7dabc82 120bf8b3
Author: DseidLi <2568818204@qq.com>
Date:   Sun Mar 3 23:29:42 2024 +0800

    Merge branch 'main' of https://github.com/open-compass/opencompass into atc_choice

commit 4e747ed1988ddbcfcc7fff334601259ade72d363
Author: DseidLi <2568818204@qq.com>
Date:   Sun Mar 3 22:15:25 2024 +0800

    add internlm2-lmdeploy model and gemma configs

commit 7dabc828123d711c8cf834d6aab4137bb55e85ed
Author: DseidLi <2568818204@qq.com>
Date:   Sat Mar 2 17:26:15 2024 +0800

    add atc choice version -ZH

commit 996f8ae43d3f946a052f736717ead139d153e2dd
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 16:58:56 2024 +0800

    update readme for needlebench

commit f7266e873cb34ccf18a7f20b2c5821af8416a14f
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 16:44:53 2024 +0800

    move readme.md

commit 1c7375681dea13996802e45b878dc4929ea8fa65
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 16:38:31 2024 +0800

    fix linting error

commit b6524f3ebfb8a3a12a5ad3e3fa7a8a0921fcb6c1
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 16:33:51 2024 +0800

    lint summarizer

commit c0d1190e39d3b6724f677346df2572df9af59f25
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 16:29:03 2024 +0800

    add needlebench intro, fix summarizer

commit 0965baf78588e29d813b61d73f0ebd868a0ce3d0
Author: DseidLi <2568818204@qq.com>
Date:   Mon Feb 26 13:31:26 2024 +0800

    fix bug in needlebench summarizer

commit 5d32b31eb85382026935f356190ad92b103afd98
Author: DseidLi <2568818204@qq.com>
Date:   Sat Feb 24 03:19:08 2024 +0800

    update act prompt

commit af82a7f085e394d83aa84043e2881dd50115942c
Merge: 32bf9fe 53fe788d
Author: DseidLi <2568818204@qq.com>
Date:   Fri Feb 23 17:50:32 2024 +0800

    Merge remote-tracking branch 'upstream/main' into needlebench

commit 32bf9fe802eaf8e8e5b33ff17b2a897058f8b66b
Author: DseidLi <2568818204@qq.com>
Date:   Fri Feb 23 17:31:32 2024 +0800

    simplify needlebench 32k, 128k, 200k for eval

commit a7cb025e05a48449de9839005fada02bd5bff15a
Author: DseidLi <2568818204@qq.com>
Date:   Fri Feb 23 14:48:58 2024 +0800

    add needlebench

* fix summarizer

* remove repeated code

* remove chinese comments

f2af4933

[Fix] Refactor Needlebench Configs for CLI Testing Support (#1020) · b50d1632

Mo Li authored Apr 07, 2024

* add needlebench datasets suffix

* fix import

* update run.py args for summarizer key and dataset suffix

* update utils/run.py

b50d1632

02 Apr, 2024 1 commit

[Feature] Add multi-model judge and fix some problems (#1016) · 2d4e5597

bittersweet1999 authored Apr 02, 2024

* support multi-model judge and moe judge

* test_moe

* test_moe

* test

* add moe judge

* support multi-judge-model

2d4e5597

28 Mar, 2024 1 commit
- [Feature] Support AlpacaEval_V2 (#1006) · 02e7eec9
  bittersweet1999 authored Mar 28, 2024
```
* support alpacaeval_v2

* support alpacaeval

* update docs

* update docs
```
  02e7eec9
25 Mar, 2024 1 commit

[Feature] update needlebench and configs (#986) · 0a6a03fe

Mo Li authored Mar 25, 2024

* add Needlebench-1000K configs

* add prompt postion args

* add model configs

* Update parallel.py

* fix lint

0a6a03fe

19 Mar, 2024 3 commits
- [Fix] Update APPS/TACO (#988) · 0221d308
  Connor-Shen authored Mar 19, 2024
```
* [Feature] update apps/taco

* [Feature] update apps/taco
```
  0221d308
- [Feature] Update APPS (#985) · 8a3c6e51
  Connor-Shen authored Mar 19, 2024
```
* update post process

* update post process
```
  8a3c6e51
- [Feat] Support TACO (#966) · d92595b6
  Connor-Shen authored Mar 19, 2024
```
* [Feat] Support TACO

* update README

* update README
```
  d92595b6