Commits · d2c40e564855c3e2a24bfdef04f8b3c9304c5e7b · OpenDAS / opencompass

08 May, 2024 1 commit

[Feature] Add AceGPT-MMLUArabic benchmark (#1099) · d2c40e56

JuhaoLiang authored May 08, 2024

* add AceGPT-MMLUArabic benchmark

* update readme and fix lint issue

* remove unused package

* add MMLUArabic zero-shot settings

* rename filename and update readme

d2c40e56

06 May, 2024 5 commits

[Feature] Add S3Eval Dataset (#916) · 862044fb
Fangyu Lei authored May 06, 2024
```
* s3eval_branch

* update s3eval
```
862044fb

[Fix] Fix AGIEval chinese sets (#972) · d5017101

Xu Song authored May 06, 2024

* [Fix] Fix AGIEval chinese sets

* Create agieval_gen_617738.py

* [Fix] Fix AGIEval chinese sets

* Restore agieval_gen_64afd3.py

* Update agieval_gen.py

* Create agieval_mixed_0fa998.py

* Update agieval_mixed.py

d5017101

add mgsm datasets (#1081) · af10ecc2

Yggdrasill7D6 authored May 06, 2024



* add mgsm datasets

* fix lint

* fix lint

* update mgsm

* update mgsm

* ease code spell

* update

* update

* update

---------
Co-authored-by: Leymore <zfz-960727@163.com>

af10ecc2

[Feature] update drop dataset from openai simple eval (#1092) · 153c4fc9

klein authored May 06, 2024



* [Feature] update drop dataset from openai simple eval

* update drop template presentation

* update

---------
Co-authored-by: Leymore <zfz-960727@163.com>

153c4fc9

[Feature] Add mmlu prompt from simple_evals, openai (#1074) · d43392a3
Fengzhe Zhou authored May 06, 2024
```
* add mmlu prompt from simple_evals, openai

* return empty str on failure
```
d43392a3

30 Apr, 2024 3 commits

fix LightllmApi workers bug (#1113) · 53fe3904
Yang Yong authored Apr 30, 2024

53fe3904
update pre-commit (#891) · baed2ed9
Fengzhe Zhou authored Apr 30, 2024

baed2ed9

[Feature] Adding support for LLM Compression Evaluation (#1108) · 35c94d0c

Alexander Lam authored Apr 30, 2024

* fixed formatting based on pre-commit tests

* fixed typo in comments; reduced the number of models in the eval config

* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset

* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English

35c94d0c

29 Apr, 2024 3 commits
- [Docs] Update README.md (#1110) · 9c79224b
  Ikko Eltociear Ashimine authored Apr 30, 2024
```
requiresments -> requirements
```
  9c79224b
- [Bug] Fix CMB dataset (#1106) · 3de48e9b
  bittersweet1999 authored Apr 30, 2024
  
  3de48e9b
- [Update] Update performance of common benchmarks (#1109) · 063f5f5f
  Songyang Zhang authored Apr 30, 2024
```
* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks
```
  063f5f5f
28 Apr, 2024 5 commits

[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103) · a6f67e1a

liushz authored Apr 28, 2024



* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>

a6f67e1a

fix prompt template (#1104) · 0b7de67c
bittersweet1999 authored Apr 28, 2024

0b7de67c
adapt to lmdeploy v0.4.0 (#1073) · 1013dce6
Lyu Han authored Apr 28, 2024
```
* adapt to lmdeploy v0.4.0

* compatible
```
1013dce6

[Feature] add support for Flames datasets (#1093) · 58a57a4c

Yggdrasill7D6 authored Apr 28, 2024



* add flames datasets

* fix lint

* rm quota

* add judgemodel info and fix os path

* support flames dataset

* support flames dataset

---------
Co-authored-by: bittersweet1999 <1487910649@qq.com>

58a57a4c

[Doc] Update NeedleInAHaystack Docs (#1102) · 76dd814c
Mo Li authored Apr 28, 2024
```
* update NeedleInAHaystack Test Docs

* update docs
```
76dd814c

26 Apr, 2024 8 commits

fix output typing, change mutable list to immutable tuple (#989) · cce5b6fb

dmitrysarov authored Apr 26, 2024



* fix output typing, change mutable list to immutable tuple

* import missed type

* format

---------
Co-authored-by: Leymore <zfz-960727@163.com>

cce5b6fb

[Fix] python path bug (#1063) · 701ecbb2

binary-husky authored Apr 26, 2024



* fix relative path bug

* format

---------
Co-authored-by: hmp <505030475@qq.com>
Co-authored-by: Leymore <zfz-960727@163.com>

701ecbb2

add vllm get_ppl (#1003) · 048d41a1

Wang Xingjin authored Apr 26, 2024



* add vllm get_ppl

* add vllm get_ppl

* format

---------
Co-authored-by: xingjin.wang <xingjin.wang@mihoyo.com>
Co-authored-by: Leymore <zfz-960727@163.com>

048d41a1

[Deperecate] Remove multi-modal related stuff (#1072) · 3a232db4

Haodong Duan authored Apr 26, 2024



* Remove MultiModal

* update index.rst

* update README

* remove mmbench codes

* update news

---------
Co-authored-by: Leymore <zfz-960727@163.com>

3a232db4

[Feature] Add gpqa prompt from simple_evals, openai (#1080) · f1ee11de

Francis-llgg authored Apr 26, 2024



* add gpqa_openai_simple_eval

* 触发CI构建

* reorg

---------
Co-authored-by: Leymore <zfz-960727@163.com>

f1ee11de

Update CIBench (#1089) · e4830a69

klein authored Apr 26, 2024



* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4

* update cibench: dataset and evluation

* cibench summarizer bug

* update cibench

* move extract_code import

---------
Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>

e4830a69

[Feature] support arenahard evaluation (#1096) · e404b72c
bittersweet1999 authored Apr 26, 2024
```
* support arenahard

* support arenahard

* support arenahard
```
e404b72c

[Feature] Support Math evaluation via judgemodel (#1094) · 6ba1c493

bittersweet1999 authored Apr 26, 2024

* support openai math evaluation

* support openai math evaluation

* support openai math evaluation

* support math llm judge

* support math llm judge

6ba1c493

24 Apr, 2024 2 commits
- Add humaneval prompt from simple_evals, openai (#1076) · 41196c48
  Jingming Zhuo authored Apr 24, 2024
```
* [Feature] Add IFEval

* add humaneval prompt from simple_evals, openai
```
  41196c48
- Fix Llama-3 meta template (#1079) · 17735f0c
  liushz authored Apr 24, 2024
```
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
```
  17735f0c
23 Apr, 2024 2 commits
- [Feature] Add lmdeploy tis python backend model (#1014) · 81d0e4d7
  Ke Bao authored Apr 23, 2024
```
* add lmdeploy tis python backend model

* fix pr check

* update
```
  81d0e4d7
- [Fix] Fix sequential runner (#1070) · 8fe7b271
  Fengzhe Zhou authored Apr 23, 2024
  
  8fe7b271
22 Apr, 2024 3 commits
- [Feature] Add TheoremQA with 5-shot (#1048) · 004ed795
  Fengzhe Zhou authored Apr 22, 2024
```
* add TheoremQA with 5-shot

* cherry pick from add-huggingface-above-v4.33, good TheoremQA results
```
  004ed795
- [Feature] Add LLaMA-3 Series Configs (#1065) · a2567532
  Fengzhe Zhou authored Apr 22, 2024
```
* add LLaMA-3 Series configs

* update readme
```
  a2567532
- [Fix] Fix MultiRound Subjective Evaluation(#1043) · 6f98c8d9
  bittersweet1999 authored Apr 22, 2024
```
* fix multiround

* fix
```
  6f98c8d9
19 Apr, 2024 1 commit
- [Sync] deprecate old mbpps (#1064) · 8c85edd1
  Fengzhe Zhou authored Apr 19, 2024
  
  8c85edd1
17 Apr, 2024 1 commit

[Fix] Fixed repeated loading of VLLM (#1051) · c1724013

Robin Chen authored Apr 17, 2024



* [fix]Fixed the issue caused by the repeated loading of VLLM model during task segmentation.

* [fix] avoid TypeError: VLLM.__init__() got an unexpected keyword argument 'tokenizer_only'

* restore .pre-commit-config.yaml

* restore opencompass/tasks/openicl_infer.py

---------
Co-authored-by: IcyFeather <mengzhuo.happy@gmail.com>
Co-authored-by: Leymore <zfz-960727@163.com>

c1724013

16 Apr, 2024 2 commits

[Doc] Update README (#1053) · 62983614
Songyang Zhang authored Apr 16, 2024
```
* [Update] Update readme

* [Update] Update readme

* [Update] Update readme
```
62983614

[Sync] Bump version to 0.2.4 (#1052) · 881bdbf6

Fengzhe Zhou authored Apr 16, 2024



(cherry picked from commit 16ac6306c72fa202173289b55eaefe85e0fcb73c)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>

881bdbf6

15 Apr, 2024 1 commit
- [Fix] logger.error -> logger.debug in OpenAI wrapper (#1050) · 7a41951d
  Fengzhe Zhou authored Apr 15, 2024
```
* logger.error -> logger.info in OpenAI

* logger.info -> logger.debug in OpenAI
```
  7a41951d
12 Apr, 2024 1 commit

[Feature] Add ChemBench (#1032) · a00e5729

liuwei130 authored Apr 12, 2024



* add ChemBench

* update results

* molbench -> ChemBench

---------
Co-authored-by: Leymore <zfz-960727@163.com>

a00e5729

11 Apr, 2024 1 commit
- [Fix] Update setup.py install_requires (#1036) · bd7c11bb
  Fengzhe Zhou authored Apr 11, 2024
  
  bd7c11bb
09 Apr, 2024 1 commit
- [Sync] update taco (#1030) · b39f5015
  Fengzhe Zhou authored Apr 09, 2024
  
  b39f5015