- 27 May, 2024 1 commit
-
-
https://github.com/opendatalab/CHARMjxd authored
* support CHARM (https://github.com/opendatalab/CHARM ) reasoning tasks * fix lint error * add dataset card for CHARM * minor refactor * add txt --------- Co-authored-by:
wujiang <wujiang@pjlab.org.cn> Co-authored-by:
Leymore <zfz-960727@163.com>
-
- 24 May, 2024 4 commits
-
-
bittersweet1999 authored
-
bittersweet1999 authored
-
yaoyingyy authored
Co-authored-by:yaoying <yaoying@kingsoft.com>
-
klein authored
Fix the bug in drop_gen: wrong import
-
- 21 May, 2024 3 commits
-
-
bittersweet1999 authored
-
liushz authored
* Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench --------- Co-authored-by:liuhongwei <liuhongwei@pjlab.org.cn>
-
Fengzhe Zhou authored
-
- 20 May, 2024 1 commit
-
-
zhulinJulia24 authored
-
- 17 May, 2024 1 commit
-
-
Fengzhe Zhou authored
-
- 16 May, 2024 1 commit
-
-
zhulinJulia24 authored
* Update pr-run-test.yml * Update daily-run-test.yml * Update daily-run-test.yml * Update pr-run-test.yml * Update daily-run-test.yml * Update daily-run-test.yml * Update daily-run-test.yml * Update daily-run-test.yml * Update oc_score_baseline.yaml * Update daily-run-test.yml * Update oc_score_assert.py --------- Co-authored-by:zhulin1 <zhulin1@pjlab.org.cn>
-
- 15 May, 2024 5 commits
-
-
Fengzhe Zhou authored
* enable HuggingFacewithChatTemplate with --accelerator via cli * rm vllm_internlm2_chat_7b
-
liushz authored
* Update acclerator * update run --------- Co-authored-by:
liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by:
Fengzhe Zhou <zfz-960727@163.com>
-
Fengzhe Zhou authored
-
Fengzhe Zhou authored
-
bittersweet1999 authored
Co-authored-by:Leymore <zfz-960727@163.com>
-
- 14 May, 2024 4 commits
-
-
Fengzhe Zhou authored
-
Fengzhe Zhou authored
-
Xu Song authored
* [Feat] Support dataset_suffix check for mixed configs * update mixed suffix * update suffix --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
Fengzhe Zhou authored
* add TheoremQA with 5-shot * add huggingface_above_v4_33 classes * use num_worker partitioner in cli * update theoremqa * update TheoremQA * add TheoremQA * rename theoremqa -> TheoremQA * update TheoremQA output path * rewrite many model configs * update huggingface * further update * refine configs * update configs * update configs * add configs/eval_llama3_instruct.py * add summarizer multi faceted * update bbh datasets * update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py * rename class * update readme * update hf above v4.33
-
- 13 May, 2024 2 commits
-
-
Mo Li authored
* update few-shot example * add 128k
-
bittersweet1999 authored
-
- 11 May, 2024 1 commit
-
-
bittersweet1999 authored
* fix alpacaeval * fix alpacaeval
-
- 09 May, 2024 2 commits
-
-
Fengzhe Zhou authored
(cherry picked from commit 4beb6d9ab655d8a626971841b7acfd9fae9d438f) Co-authored-by:liuhongwei <liuhongwei@pjlab.org.cn>
-
Alexander Lam authored
* added qwen moe and mixtral 8x22 model configs * updated README files news section
-
- 08 May, 2024 3 commits
-
-
Mo Li authored
* update needleinahaystack eval docs * update needlebench summarizer * fix english docs typo
-
bittersweet1999 authored
-
JuhaoLiang authored
* add AceGPT-MMLUArabic benchmark * update readme and fix lint issue * remove unused package * add MMLUArabic zero-shot settings * rename filename and update readme
-
- 06 May, 2024 5 commits
-
-
Fangyu Lei authored
* s3eval_branch * update s3eval
-
Xu Song authored
* [Fix] Fix AGIEval chinese sets * Create agieval_gen_617738.py * [Fix] Fix AGIEval chinese sets * Restore agieval_gen_64afd3.py * Update agieval_gen.py * Create agieval_mixed_0fa998.py * Update agieval_mixed.py
-
Yggdrasill7D6 authored
* add mgsm datasets * fix lint * fix lint * update mgsm * update mgsm * ease code spell * update * update * update --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
klein authored
* [Feature] update drop dataset from openai simple eval * update drop template presentation * update --------- Co-authored-by:Leymore <zfz-960727@163.com>
-
Fengzhe Zhou authored
* add mmlu prompt from simple_evals, openai * return empty str on failure
-
- 30 Apr, 2024 3 commits
-
-
Yang Yong authored
-
Fengzhe Zhou authored
-
Alexander Lam authored
* fixed formatting based on pre-commit tests * fixed typo in comments; reduced the number of models in the eval config * fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset * removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
-
- 29 Apr, 2024 3 commits
-
-
Ikko Eltociear Ashimine authored
requiresments -> requirements
-
bittersweet1999 authored
-
Songyang Zhang authored
* [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks
-
- 28 Apr, 2024 1 commit
-
-
liushz authored
* Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation --------- Co-authored-by:liuhongwei <liuhongwei@pjlab.org.cn>
-