Commits · 3098d788455dc785e6830f8c69eb9d1010c0cce1 · OpenDAS / opencompass

13 Mar, 2024 1 commit

Connor-Shen authored Mar 13, 2024

* [Feat] support apps

* [Feat] support apps

* [Feat] support apps

* update README

3098d788

11 Mar, 2024 2 commits
- [Sync] update 20240308 (#953) · bdd85358
  Fengzhe Zhou authored Mar 11, 2024
  
  bdd85358
- [fix] add different temp for different question in mtbench (#954) · 848e7c8a
  bittersweet1999 authored Mar 11, 2024
```
* add temp for mtbench

* add document for mtbench

* add document for mtbench
```
  848e7c8a
05 Mar, 2024 2 commits
- [Fix] FinanceIQ_datasets import error (#939) · 2e993989
  Xu Song authored Mar 05, 2024
```
* [Fix] Fix KeyError: 'FinanceIQ_datasets'

* [Fix] Fix KeyError: 'FinanceIQ_datasets'
```
  2e993989
- [Fix] fix a bug of humanevalplus config (#944) · d0550268
  Jingming authored Mar 05, 2024
  
  d0550268
04 Mar, 2024 3 commits

[Sync] Sync Internal (#941) · b03d5dc5
Fengzhe Zhou authored Mar 04, 2024

b03d5dc5

[Feature] add lveval benchmark (#914) · bbec7d87

yuantao2108 authored Mar 04, 2024



* add lveval benchmark

* add LVEval readme file

* update LVEval readme file

* Update configs/eval_bluelm_32k_lveval.py

* Update configs/eval_llama2_7b_lveval.py

---------
Co-authored-by: yuantao <yuantao@infini-ai.com>
Co-authored-by: Mo Li <82895469+DseidLi@users.noreply.github.com>

bbec7d87

[Feature] Upgrade the needle-in-a-haystack experiment to Needlebench (#913) · 8142f399

Mo Li authored Mar 04, 2024

* add needlebench

* simplify needlebench 32k, 128k, 200k for eval

* update act prompt

* fix bug in needlebench summarizer

* add needlebench intro, fix summarizer

* lint summarizer

* fix linting error

* move readme.md

* update readme for needlebench

* update docs of needlebench

* simplify needlebench summarizers

8142f399

29 Feb, 2024 1 commit
- [Feature] Support OpenFinData (#896) · 4c45a71b
  Skyfall-xzz authored Feb 29, 2024
```
* [Feature] Support OpenFinData

* add README for OpenFinData

* update README
```
  4c45a71b
28 Feb, 2024 1 commit
- [Feature] add support for gemini (#931) · 001e77fe
  bittersweet1999 authored Feb 28, 2024
```
* add gemini

* add gemini

* add gemini
```
  001e77fe
23 Feb, 2024 1 commit
- [Fix] fix ifeval (#909) · 53fe788d
  Jingming authored Feb 23, 2024
  
  53fe788d
22 Feb, 2024 1 commit
- [Fix] Fix IFEval (#906) · 45c606bc
  bittersweet1999 authored Feb 22, 2024
```
* fix ifeval

* fix ifeval

* fix ifeval

* fix ifeval
```
  45c606bc
06 Feb, 2024 2 commits

fix bug of gsm8k_postprocess (#863) · dd444685

hailsham authored Feb 06, 2024



* fix bug of gsm8k_postprocess

* update postprocess

---------
Co-authored-by: Lei Fei <SENSETIME\leifei1@cn3114002087l.domain.sensetime.com>
Co-authored-by: Leymore <zfz-960727@163.com>

dd444685

[feat] support multipl-e (#846) · 444d8d95

Connor-Shen authored Feb 06, 2024



* [feat] support humaneval_multipl-e

* format

---------
Co-authored-by: Leymore <zfz-960727@163.com>

444d8d95

05 Feb, 2024 2 commits
- [Sync] Merge branch 'dev' into zfz/update-keyset-demo (#876) · d34ba111
  Fengzhe Zhou authored Feb 05, 2024
  
  d34ba111
- Support NPHardEval (#835) · 7ad11680
  Skyfall-xzz authored Feb 05, 2024
```
* support NPHardEval

* add .md file and fix minor bugs

* refactor and minor fix

---------
Co-authored-by: Leymore <zfz-960727@163.com>
```
  7ad11680
04 Feb, 2024 1 commit

[Feature] support alpacaeval (#809) · 7806cd0f

bittersweet1999 authored Feb 04, 2024



* support alpacaeval_v1

* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/summarizers/subjective/alpacaeval_v1.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* fix conflict

* support alpacaeval v2

* support alpacav2

---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

7806cd0f

30 Jan, 2024 1 commit
- fix compass arena (#854) · 5c6dc908
  bittersweet1999 authored Jan 30, 2024
  
  5c6dc908
27 Jan, 2024 1 commit
- [Fix] Fix acc of IFEval (#849) · 28018833
  Jingming authored Jan 27, 2024
```
* [Feature] Add IFEval

* [Fix] Changing the Score Rule.
```
  28018833
26 Jan, 2024 1 commit
- [Fix] Update MedBench (#845) · 35aace77
  Xiaoming Shi authored Jan 26, 2024
  
  35aace77
24 Jan, 2024 3 commits

[Fix] fix corev2 (#838) · 77be07db
bittersweet1999 authored Jan 24, 2024
```
* fix corev2

* fix corev2
```
77be07db
[Sync] Updata dataset cfg for internMath (#837) · 0991dd33
Fengzhe Zhou authored Jan 24, 2024
```
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
```
0991dd33

[Feature] add mtbench (#829) · 2ee8e8a1

bittersweet1999 authored Jan 24, 2024



* add mtbench

* add mtbench

* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/mtbench.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* fix mtbench

---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

2ee8e8a1

23 Jan, 2024 2 commits

[Feature] Add IFEval (#813) · e059a5c2
Jingming authored Jan 23, 2024
```
* [Feature] Add IFEval

* [Doc] add introduction of IFEval
```
e059a5c2

[Feature] Add CompassArena (#828) · 2d4da8dd

bittersweet1999 authored Jan 23, 2024



* add compass arena

* add compass_arena

* add compass arena

* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* fix check position bias

---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

2d4da8dd

19 Jan, 2024 1 commit

Update cdme config and evaluator (#812) · e975a96f

Guo Qipeng authored Jan 19, 2024



* update cdme config and evaluator

* fix cdme prompt

* move CDME trim post-processor as a separate evaluator

---------
Co-authored-by: 郭琦鹏 <guoqipeng@pjlab.org.cn>

e975a96f

17 Jan, 2024 2 commits

[Sync] Add InternLM2 Keyset Evaluation Demo (#807) · b4afe3e7
Fengzhe Zhou authored Jan 17, 2024
```
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
```
b4afe3e7

Added support for multi-needle testing in needle-in-a-haystack test (#802) · acae5609

Mo Li authored Jan 17, 2024



* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

* update plot function in tools_needleinahaystack.py

* optimizing needleinahaystack dataset generation strategy

* modify minor formatting issues

* add English version support

* change NeedleInAHaystackDataset to dynamic loading

* change NeedleInAHaystackDataset to dynamic loading

* fix needleinahaystack test eval bug

* fix needleinahaystack config bug

* Added support for multi-needle testing in needle-in-a-haystack test

* Optimize the code for plotting in the needle-in-a-haystack test.

* Correct the typo in the dataset parameters.

* update needleinahaystack test docs

---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

acae5609

16 Jan, 2024 1 commit
- reorganize subject files (#801) · 814b3f73
  bittersweet1999 authored Jan 16, 2024
  
  814b3f73
12 Jan, 2024 1 commit

[Feature] Add configs for creationbench (#791) · 83d6c483

bittersweet1999 authored Jan 12, 2024

* add creationv2_zh

* add creationv2_zh

* add eng config for creationbench

* add eng config for creationbench

* add eng config for creationbench

83d6c483

11 Jan, 2024 1 commit
- Update gsm8k agent prompt (#788) · 467ad0ac
  Songyang Zhang authored Jan 11, 2024
  
  467ad0ac
09 Jan, 2024 1 commit

[Feature] Update MedBench (#779) · ad872a5d

Xiaoming Shi authored Jan 09, 2024



* update medbench

* medbench update

* format medbench

* format

* Update

* update

* update

* update suffix

---------
Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
Co-authored-by: Leymore <zfz-960727@163.com>

ad872a5d

08 Jan, 2024 3 commits
- [Sync] Sync with internal codes 2023.01.08 (#777) · 32f40a8f
  Fengzhe Zhou authored Jan 08, 2024
  
  32f40a8f
- [Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c (#699) · 0b286303
  liyucheng09 authored Jan 08, 2024
```
* Contamination analysis for ARC_c, mmlu, and Hellaswag

* update `eval_contamination.py`

* update `contamination.py` summarizer

* fix `eval_contamination.py`

* add mmlu groups for contamination analysis
```
  0b286303
- [Fix] fix typos in drop prompt (#773) · 11f3b91e
  Yuchen Yan authored Jan 08, 2024
```
Co-authored-by: yanyuchen04 <yanyuchen04@meituan.com>
```
  11f3b91e
05 Jan, 2024 2 commits

Support Mbpp_plus dataset (#770) · 30a90d8d

Connor-Shen authored Jan 05, 2024



* support mbpp+

* support mbpp+

* minor fix

* [Feat] minor fix

---------
Co-authored-by: yingfhu <yingfhu@gmail.com>

30a90d8d

[Feature] add subject ir dataset (#755) · 2163f939
bittersweet1999 authored Jan 05, 2024
```
* add subject ir

* Add ir dataset

* Add ir dataset
```
2163f939

04 Jan, 2024 1 commit
- [Feature] Add multi_round dataset evaluation (#766) · be369c3e
  bittersweet1999 authored Jan 04, 2024
```
* multi_round dataset

* add multi_round evaluation
```
  be369c3e
02 Jan, 2024 1 commit

[Update] Change NeedleInAHaystackDataset to dynamic dataset loading (#754) · 33f8df1c

Mo Li authored Jan 02, 2024



* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

* update plot function in tools_needleinahaystack.py

* optimizing needleinahaystack dataset generation strategy

* modify minor formatting issues

* add English version support

* change NeedleInAHaystackDataset to dynamic loading

* change NeedleInAHaystackDataset to dynamic loading

* fix needleinahaystack test eval bug

* fix needleinahaystack config bug

---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

33f8df1c

01 Jan, 2024 1 commit

[Feature] Add GPQA Dataset (#729) · b69fe234

Francis-llgg authored Jan 01, 2024

* check

* message

* add

* change prompt

* change a para nameq

* modify name of the file

* delete an useless file

b69fe234