Commits · 608ff5810dd2fea1161acbd936cfb4d6bf4cfb28 · OpenDAS / opencompass

27 May, 2024 1 commit

support CHARM (https://github.com/opendatalab/CHARM ) reasoning tasks (#1190) · 608ff581

jxd authored May 27, 2024

* support CHARM (https://github.com/opendatalab/CHARM

) reasoning tasks

* fix lint error

* add dataset card for CHARM

* minor refactor

* add txt

---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>

608ff581

24 May, 2024 4 commits
- fix length (#1180) · 07a6dacf
  bittersweet1999 authored May 24, 2024
  
  07a6dacf
- add support for lmdeploy api judge (#1193) · 88c14d3d
  bittersweet1999 authored May 24, 2024
  
  88c14d3d
- [Fix] temporary files using tempfile (#1186) · 749e4cea
  yaoyingyy authored May 24, 2024
```
Co-authored-by: yaoying <yaoying@kingsoft.com>
```
  749e4cea
- [Fix] Fix drop_gen.py (#1191) · 5eb8f14d
  klein authored May 24, 2024
```
Fix the bug in drop_gen: wrong import
```
  5eb8f14d
21 May, 2024 3 commits

fix yi-chat template (#1178) · 31afe870
bittersweet1999 authored May 21, 2024

31afe870

Update MathBench (#1176) · 1448be00

liushz authored May 21, 2024



* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>

1448be00

[Sync] update evaluator (#1175) · 2b3d4150
Fengzhe Zhou authored May 21, 2024

2b3d4150

20 May, 2024 1 commit
- Update daily-run-test.yml (#1173) · 296ea599
  zhulinJulia24 authored May 20, 2024
  
  296ea599
17 May, 2024 1 commit
- [Sync] add OC16 entry (#1171) · 5de85406
  Fengzhe Zhou authored May 17, 2024
  
  5de85406
16 May, 2024 1 commit

update test workflow (#1167) · 94eb9056

zhulinJulia24 authored May 16, 2024



* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update oc_score_baseline.yaml

* Update daily-run-test.yml

* Update oc_score_assert.py

---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>

94eb9056

15 May, 2024 5 commits
- [Feat] enable HuggingFacewithChatTemplate with --accelerator via cli (#1163) · 8ea2c404
  Fengzhe Zhou authored May 15, 2024
```
* enable HuggingFacewithChatTemplate with --accelerator via cli

* rm vllm_internlm2_chat_7b
```
  8ea2c404
- Update accelerator (#1152) · e3c0448b
  liushz authored May 15, 2024
```
* Update acclerator

* update run

---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
```
  e3c0448b
- [Fix] Update stop_words in huggingface_above_v4_33 (#1160) · f10dd48f
  Fengzhe Zhou authored May 15, 2024
  
  f10dd48f
- [Fix] use ProcessPoolExecutor during mbpp eval (#1159) · 80f831b4
  Fengzhe Zhou authored May 15, 2024
  
  80f831b4
- fix arenahard summarizer (#1154) · 8a8987be
  bittersweet1999 authored May 15, 2024
```
Co-authored-by: Leymore <zfz-960727@163.com>
```
  8a8987be
14 May, 2024 4 commits

[Sync] update github workflow (#1156) · 62dbf047
Fengzhe Zhou authored May 14, 2024

62dbf047
[Format] Add config lints (#892) · aa2dd2b5
Fengzhe Zhou authored May 14, 2024

aa2dd2b5

[Feat] Support dataset_suffix check for mixed configs (#973) · 3dbba119

Xu Song authored May 14, 2024



* [Feat] Support dataset_suffix check for mixed configs

* update mixed suffix

* update suffix

---------
Co-authored-by: Leymore <zfz-960727@163.com>

3dbba119

[Feature] Add huggingface apply_chat_template (#1098) · 7505b3ca

Fengzhe Zhou authored May 14, 2024

* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33

7505b3ca

13 May, 2024 2 commits
- [Fix] Fix Needlebench Summarizer (#1143) · 6c711cb2
  Mo Li authored May 13, 2024
```
* update few-shot example

* add 128k
```
  6c711cb2
- fix multiround (#1146) · 5432dfc1
  bittersweet1999 authored May 13, 2024
  
  5432dfc1
11 May, 2024 1 commit
- [Fix] fix alpacaeval while add caching path (#1139) · 833a3514
  bittersweet1999 authored May 11, 2024
```
* fix alpacaeval

* fix alpacaeval
```
  833a3514
09 May, 2024 2 commits

[Sync] Update accelerator (#1122) · 19d7e630

Fengzhe Zhou authored May 09, 2024



(cherry picked from commit 4beb6d9ab655d8a626971841b7acfd9fae9d438f)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>

19d7e630

[Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs (#1123) · a71122ee
Alexander Lam authored May 09, 2024
```
* added qwen moe and mixtral 8x22 model configs

* updated README files news section
```
a71122ee

08 May, 2024 3 commits

[Fix] Fix NeedleBench Summarizer Typo (#1125) · cb080fa7

Mo Li authored May 08, 2024

* update needleinahaystack eval docs

* update needlebench summarizer

* fix english docs typo

cb080fa7

fix links (#1120) · 826d8307
bittersweet1999 authored May 08, 2024

826d8307

[Feature] Add AceGPT-MMLUArabic benchmark (#1099) · d2c40e56

JuhaoLiang authored May 08, 2024

* add AceGPT-MMLUArabic benchmark

* update readme and fix lint issue

* remove unused package

* add MMLUArabic zero-shot settings

* rename filename and update readme

d2c40e56

06 May, 2024 5 commits

[Feature] Add S3Eval Dataset (#916) · 862044fb
Fangyu Lei authored May 06, 2024
```
* s3eval_branch

* update s3eval
```
862044fb

[Fix] Fix AGIEval chinese sets (#972) · d5017101

Xu Song authored May 06, 2024

* [Fix] Fix AGIEval chinese sets

* Create agieval_gen_617738.py

* [Fix] Fix AGIEval chinese sets

* Restore agieval_gen_64afd3.py

* Update agieval_gen.py

* Create agieval_mixed_0fa998.py

* Update agieval_mixed.py

d5017101

add mgsm datasets (#1081) · af10ecc2

Yggdrasill7D6 authored May 06, 2024



* add mgsm datasets

* fix lint

* fix lint

* update mgsm

* update mgsm

* ease code spell

* update

* update

* update

---------
Co-authored-by: Leymore <zfz-960727@163.com>

af10ecc2

[Feature] update drop dataset from openai simple eval (#1092) · 153c4fc9

klein authored May 06, 2024



* [Feature] update drop dataset from openai simple eval

* update drop template presentation

* update

---------
Co-authored-by: Leymore <zfz-960727@163.com>

153c4fc9

[Feature] Add mmlu prompt from simple_evals, openai (#1074) · d43392a3
Fengzhe Zhou authored May 06, 2024
```
* add mmlu prompt from simple_evals, openai

* return empty str on failure
```
d43392a3

30 Apr, 2024 3 commits

fix LightllmApi workers bug (#1113) · 53fe3904
Yang Yong authored Apr 30, 2024

53fe3904
update pre-commit (#891) · baed2ed9
Fengzhe Zhou authored Apr 30, 2024

baed2ed9

[Feature] Adding support for LLM Compression Evaluation (#1108) · 35c94d0c

Alexander Lam authored Apr 30, 2024

* fixed formatting based on pre-commit tests

* fixed typo in comments; reduced the number of models in the eval config

* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset

* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English

35c94d0c

29 Apr, 2024 3 commits
- [Docs] Update README.md (#1110) · 9c79224b
  Ikko Eltociear Ashimine authored Apr 30, 2024
```
requiresments -> requirements
```
  9c79224b
- [Bug] Fix CMB dataset (#1106) · 3de48e9b
  bittersweet1999 authored Apr 30, 2024
  
  3de48e9b
- [Update] Update performance of common benchmarks (#1109) · 063f5f5f
  Songyang Zhang authored Apr 30, 2024
```
* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks
```
  063f5f5f
28 Apr, 2024 1 commit

[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103) · a6f67e1a

liushz authored Apr 28, 2024



* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>

a6f67e1a