Update news (#241)

8f7bdb4b · Songyang Zhang · GitHub · fdc69f9d · 8f7bdb4b · 8f7bdb4b
Unverified Commit 8f7bdb4b authored Aug 21, 2023 by Songyang Zhang Committed by GitHub Aug 21, 2023
Show whitespace changes
Inline Side-by-side

Showing with 40 additions and 7 deletions

README.md README.md +21 -4

README_zh-CN.md README_zh-CN.md +19 -3

No files found.
--- a/README.md
+++ b/README.md
@@ -29,16 +29,16 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
 ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
- **\[2023.08.11\]** [Model comparison](https://opencompass.org.cn/model-compare/GPT-4,ChatGPT,LLaMA-2-70B,LLaMA-65B) is now online. We hope this feature offers deeper insights! 🔥🔥🔥.
+- **\[2023.08.18\]** We have supported evaluation for **multi-modality learning**, include **MMBench, SEED-Bench, COCO-Caption, Flickr-30K, OCR-VQA, ScienceQA** and so on. Leaderboard is on the road. Feel free to try multi-modality evaluation with  OpenCompass ! 🔥🔥🔥.
- **\[2023.08.11\]** We have supported [LEval](https://github.com/OpenLMLab/LEval). 🔥🔥🔥.
+- **\[2023.08.18\]** [Dataset card](https://opencompass.org.cn/dataset-detail/MMLU) is now online. Welcome new evaluation benchmark  OpenCompass ! 🔥🔥🔥.
+- **\[2023.08.11\]** [Model comparison](https://opencompass.org.cn/model-compare/GPT-4,ChatGPT,LLaMA-2-70B,LLaMA-65B) is now online. We hope this feature offers deeper insights!
+- **\[2023.08.11\]** We have supported [LEval](https://github.com/OpenLMLab/LEval).
 - **\[2023.08.10\]** OpenCompass is compatible with [LMDeploy](https://github.com/InternLM/lmdeploy). Now you can follow this [instruction](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_turbomind.html#) to evaluate the accelerated models provide by the **Turbomind**.
 - **\[2023.08.10\]** We have supported [Qwen-7B](https://github.com/QwenLM/Qwen-7B) and [XVERSE-13B](https://github.com/xverse-ai/XVERSE-13B) ! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass.
 - **\[2023.08.09\]** Several new datasets(**CMMLU, TydiQA, SQuAD2.0, DROP**) are updated on our [leaderboard](https://opencompass.org.cn/leaderboard-llm)! More datasets are welcomed to join OpenCompass.
 - **\[2023.08.07\]** We have added a [script](tools/eval_mmbench.py) for users to evaluate the inference results of [MMBench](https://opencompass.org.cn/MMBench)-dev.
 - **\[2023.08.05\]** We have supported [GPT-4](https://openai.com/gpt-4)! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass.
 - **\[2023.07.27\]** We have supported [CMMLU](https://github.com/haonan-li/CMMLU)! More datasets are welcome to join OpenCompass.
- **\[2023.07.21\]** Performances of Llama-2 are available in [OpenCompass leaderboard](https://opencompass.org.cn/leaderboard-llm)!
- **\[2023.07.13\]** We release [MMBench](https://opencompass.org.cn/MMBench), a meticulously curated dataset to comprehensively evaluate different abilities of multimodality models.
 ## ✨ Introduction
@@ -326,6 +326,23 @@ Make sure you have installed OpenCompass correctly and prepared your datasets ac
 For more tutorials, please check our [Documentation](https://opencompass.readthedocs.io/en/latest/index.html).
+## 🔜 Roadmap
+- [ ] Subjective Evaluation
+  - [ ] Release CompassAreana
+  - [ ] Subjective evaluation dataset.
+- [ ] Long-context
+  - [ ] Long-context evaluation with extensive datasets.
+  - [ ] Long-context leaderboard.
+- [ ] Coding
+  - [ ] Coding evaluation leaderdboard.
+  - [ ] Non-python language evaluation service.
+- [ ] Agent
+  - [ ] Support various agenet framework.
+  - [ ] Evaluation of tool use of the LLMs.
+- [ ] Robustness
+  - [ ] Support various attack method
 ## 👷‍♂️ Contributing
 We appreciate all contributions to improve OpenCompass. Please refer to the [contributing guideline](https://opencompass.readthedocs.io/en/latest/notes/contribution_guide.html) for the best practice.

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -29,6 +29,8 @@
 ## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
+- **\[2023.08.18\]** OpenCompass现已支持**多模态评测**，支持10+多模态评测数据集，包括 **MMBench, SEED-Bench, COCO-Caption, Flickr-30K, OCR-VQA, ScienceQA** 等. 多模态评测榜单即将上线，敬请! 🔥🔥🔥.
+- **\[2023.08.18\]** [数据集页面](https://opencompass.org.cn/dataset-detail/MMLU) 现已在OpenCompass官网上线，欢迎更多社区评测数据集加入OpenCompass ! 🔥🔥🔥.
 - **\[2023.08.11\]** 官网榜单上新增了[模型对比](https://opencompass.org.cn/model-compare/GPT-4,ChatGPT,LLaMA-2-70B,LLaMA-65B)功能，希望该功能可以协助提供更多发现！🔥🔥🔥.
 - **\[2023.08.11\]** 新增了 [LEval](https://github.com/OpenLMLab/LEval) 评测支持. 🔥🔥🔥.
 - **\[2023.08.10\]** OpenCompass 现已适配 [LMDeploy](https://github.com/InternLM/lmdeploy). 请参考 [评测指南](https://opencompass.readthedocs.io/zh_CN/latest/advanced_guides/evaluation_turbomind.html) 对 **Turbomind** 加速后的模型进行评估.
@@ -37,9 +39,6 @@
 - **\[2023.08.07\]** 新增了 [MMBench 评测脚本](tools/eval_mmbench.py) 以支持用户自行获取 [MMBench](https://opencompass.org.cn/MMBench)-dev 的测试结果.
 - **\[2023.08.05\]** [GPT-4](https://openai.com/gpt-4) 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)!
 - **\[2023.07.27\]** 新增了 [CMMLU](https://github.com/haonan-li/CMMLU)! 欢迎更多的数据集加入 OpenCompass.
- **\[2023.07.21\]** Llama-2 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)!
- **\[2023.07.19\]** 新增了 [Llama-2](https://ai.meta.com/llama/)！我们近期将会公布其评测结果。\[[文档](./docs/zh_cn/get_started.md#安装)\]。
- **\[2023.07.13\]** 发布了 [MMBench](https://opencompass.org.cn/MMBench)，该数据集经过细致整理，用于评测多模态模型全方位能力。
 ## ✨ 介绍
@@ -327,6 +326,23 @@ unzip OpenCompassData.zip
 更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。
+## 🔜 路线图
+- [ ] 主观评测
+  - [ ] 发布主观评测榜单
+  - [ ] 发布主观评测数据集
+- [ ] 长文本
+  - [ ] 支持广泛的长文本评测集
+  - [ ] 发布长文本评测榜单
+- [ ] 代码能力
+  - [ ] 发布代码能力评测榜单
+  - [ ] 提供非Python语言的评测服务
+- [ ] 智能体
+  - [ ] 支持丰富的智能体方案
+  - [ ] 提供智能体评测榜单
+- [ ] 鲁棒性
+  - [ ] 支持各类攻击方法
 ## 👷‍♂️ 贡献
 我们感谢所有的贡献者为改进和提升 OpenCompass 所作出的努力。请参考[贡献指南](https://opencompass.readthedocs.io/zh_CN/latest/notes/contribution_guide.html)来了解参与项目贡献的相关指引。