"...composable_kernel.git" did not exist on "85978e0201bb94bf6e59b325e1f5f19266845d08"
Unverified Commit 1e445417 authored by Tong Gao's avatar Tong Gao Committed by GitHub
Browse files

[Enhancement] Test linting in CI and fix existing linting errors (#69)

* [Enhancement] Test linting in CI

* fix linting
parent 9a164489
......@@ -2,4 +2,4 @@
skip = *.ipynb
count =
quiet-level = 3
ignore-words-list = nd, ans, ques
ignore-words-list = nd, ans, ques, rouge
name: lint
on: [push, pull_request]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.10
uses: actions/setup-python@v2
with:
python-version: 3.10
- name: Install pre-commit hook
run: |
pip install pre-commit
pre-commit install
- name: Linting
run: pre-commit run --all-files
......@@ -28,4 +28,4 @@
</g>
</g>
</g>
</svg>
\ No newline at end of file
</svg>
......@@ -28,4 +28,4 @@
</g>
</g>
</g>
</svg>
\ No newline at end of file
</svg>
......@@ -3,56 +3,56 @@
尽管 OpenCompass 已经包含了大多数常用数据集,用户在支持新数据集的时候需要完成以下几个步骤:
1.`opencompass/datasets` 文件夹新增数据集脚本 `mydataset.py`, 该脚本需要包含:
- 数据集及其加载方式,需要定义一个 `MyDataset` 类,实现数据集加载方法 `load`,该方法为静态方法,需要返回 `datasets.Dataset` 类型的数据。这里我们使用 huggingface dataset 作为数据集的统一接口,避免引入额外的逻辑。具体示例如下:
```python
import datasets
from .base import BaseDataset
- 数据集及其加载方式,需要定义一个 `MyDataset` 类,实现数据集加载方法 `load`,该方法为静态方法,需要返回 `datasets.Dataset` 类型的数据。这里我们使用 huggingface dataset 作为数据集的统一接口,避免引入额外的逻辑。具体示例如下:
class MyDataset(BaseDataset):
```python
import datasets
from .base import BaseDataset
@staticmethod
def load(**kwargs) -> datasets.Dataset:
pass
```
class MyDataset(BaseDataset):
- (可选)如果 OpenCompass 已有的评测器不能满足需要,需要用户定义 `MyDatasetlEvaluator` 类,实现评分方法 `score`,需要根据输入的 `predictions` 和 `references` 列表,得到需要的字典。由于一个数据集可能存在多种 metric,需要返回一个 metrics 以及对应 scores 的相关字典。具体示例如下:
@staticmethod
def load(**kwargs) -> datasets.Dataset:
pass
```
```python
from opencompass.openicl.icl_evaluator import BaseEvaluator
- (可选)如果 OpenCompass 已有的评测器不能满足需要,需要用户定义 `MyDatasetlEvaluator` 类,实现评分方法 `score`,需要根据输入的 `predictions``references` 列表,得到需要的字典。由于一个数据集可能存在多种 metric,需要返回一个 metrics 以及对应 scores 的相关字典。具体示例如下:
class MyDatasetlEvaluator(BaseEvaluator):
```python
from opencompass.openicl.icl_evaluator import BaseEvaluator
def score(self, predictions: List, references: List) -> dict:
pass
class MyDatasetlEvaluator(BaseEvaluator):
```
def score(self, predictions: List, references: List) -> dict:
pass
- (可选)如果 OpenCompass 已有的后处理方法不能满足需要,需要用户定义 `mydataset_postprocess` 方法,根据输入的字符串得到相应后处理的结果。具体示例如下:
```
```python
def mydataset_postprocess(text: str) -> str:
pass
```
- (可选)如果 OpenCompass 已有的后处理方法不能满足需要,需要用户定义 `mydataset_postprocess` 方法,根据输入的字符串得到相应后处理的结果。具体示例如下:
```python
def mydataset_postprocess(text: str) -> str:
pass
```
2. 在定义好数据集加载、评测以及数据后处理等方法之后,需要在配置文件中新增以下配置:
```python
from opencompass.datasets import MyDataset, MyDatasetlEvaluator, mydataset_postprocess
mydataset_eval_cfg = dict(
evaluator=dict(type=MyDatasetlEvaluator),
pred_postprocessor=dict(type=mydataset_postprocess))
mydataset_datasets = [
dict(
type=MyDataset,
...,
reader_cfg=...,
infer_cfg=...,
eval_cfg=mydataset_eval_cfg)
]
```
配置好数据集之后,其他需要的配置文件直接参考[快速上手](../get_started.md)教程即可。
```python
from opencompass.datasets import MyDataset, MyDatasetlEvaluator, mydataset_postprocess
mydataset_eval_cfg = dict(
evaluator=dict(type=MyDatasetlEvaluator),
pred_postprocessor=dict(type=mydataset_postprocess))
mydataset_datasets = [
dict(
type=MyDataset,
...,
reader_cfg=...,
infer_cfg=...,
eval_cfg=mydataset_eval_cfg)
]
```
配置好数据集之后,其他需要的配置文件直接参考[快速上手](../get_started.md)教程即可。
......@@ -29,14 +29,12 @@ author = 'OpenCompass Authors'
# The full version, including alpha/beta/rc tags
# version_file = '../../opencompass/version.py'
# def get_version():
# with open(version_file, 'r') as f:
# exec(compile(f.read(), version_file, 'exec'))
# return locals()['__version__']
release = "0.5.0"
release = '0.5.0'
# -- General configuration ---------------------------------------------------
......@@ -141,14 +139,16 @@ latex_elements = {
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(root_doc, 'opencompass.tex', 'OpenCompass Documentation', author, 'manual'),
(root_doc, 'opencompass.tex', 'OpenCompass Documentation', author,
'manual'),
]
# -- Options for manual page output ------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(root_doc, 'opencompass', 'OpenCompass Documentation', [author], 1)]
man_pages = [(root_doc, 'opencompass', 'OpenCompass Documentation', [author],
1)]
# -- Options for Texinfo output ----------------------------------------------
......@@ -156,8 +156,9 @@ man_pages = [(root_doc, 'opencompass', 'OpenCompass Documentation', [author], 1)
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(root_doc, 'opencompass', 'OpenCompass Documentation', author, 'OpenCompass Authors',
'AGI evaluation toolbox and benchmark.', 'Miscellaneous'),
(root_doc, 'opencompass', 'OpenCompass Documentation', author,
'OpenCompass Authors', 'AGI evaluation toolbox and benchmark.',
'Miscellaneous'),
]
# -- Options for Epub output -------------------------------------------------
......
# Prompt 概括
\ No newline at end of file
# Prompt 概括
......@@ -12,20 +12,20 @@ class CEvalDataset(BaseDataset):
@staticmethod
def load(path: str, name: str):
dev_dataset = load_dataset(
'csv',
data_files=osp.join(path, 'dev', f'{name}_dev.csv'),
split='train')
val_dataset = load_dataset(
'csv',
data_files=osp.join(path, 'val', f'{name}_val.csv'),
split='train')
dev_dataset = load_dataset('csv',
data_files=osp.join(path, 'dev',
f'{name}_dev.csv'),
split='train')
val_dataset = load_dataset('csv',
data_files=osp.join(path, 'val',
f'{name}_val.csv'),
split='train')
val_dataset = val_dataset.add_column('explanation',
[''] * len(val_dataset))
test_dataset = load_dataset(
'csv',
data_files=osp.join(path, 'test', f'{name}_test.csv'),
split='train')
test_dataset = load_dataset('csv',
data_files=osp.join(
path, 'test', f'{name}_test.csv'),
split='train')
test_dataset = test_dataset.add_column(
'answer',
[''] * len(test_dataset)).add_column('explanation',
......
......@@ -16,8 +16,9 @@ class FloresFirst100Dataset(BaseDataset):
'dev':
load_dataset(path='facebook/flores', name=name, split='dev'),
'devtest':
load_dataset(
path='facebook/flores', name=name, split='devtest[:100]')
load_dataset(path='facebook/flores',
name=name,
split='devtest[:100]')
})
......
......@@ -3,6 +3,7 @@ import json
from datasets import Dataset
from opencompass.registry import LOAD_DATASET
from .base import BaseDataset
......@@ -17,4 +18,4 @@ class SummeditsDataset_V2(BaseDataset):
line = json.loads(line)
line['label'] = 'BA'[line['label']]
dataset.append(line)
return Dataset.from_list(dataset)
\ No newline at end of file
return Dataset.from_list(dataset)
......@@ -142,7 +142,8 @@ class HuggingFace(BaseModel):
tokens = self.tokenizer.batch_encode_plus(inputs,
padding=True,
truncation=True,
max_length=self.max_seq_len - max_out_len)
max_length=self.max_seq_len -
max_out_len)
tokens = {
k: torch.tensor(np.array(tokens[k]), device=self.model.device)
for k in tokens if k in ['input_ids', 'attention_mask']
......@@ -180,10 +181,10 @@ class HuggingFace(BaseModel):
input_ids = self.tokenizer(inputs,
truncation=True,
max_length=self.max_seq_len - max_out_len)['input_ids']
max_length=self.max_seq_len -
max_out_len)['input_ids']
input_ids = torch.tensor(input_ids, device=self.model.device)
outputs = self.model.generate(input_ids,
max_new_tokens=max_out_len)
outputs = self.model.generate(input_ids, max_new_tokens=max_out_len)
if not self.extract_pred_after_decode:
outputs = outputs[:, input_ids.shape[1]:]
......
from .icl_dataset_reader import DatasetReader
from .icl_evaluator import *
from .icl_prompt_template import PromptTemplate
from .icl_retriever import *
from .icl_inferencer import *
from .icl_dataset_reader import DatasetReader # noqa
from .icl_evaluator import * # noqa
from .icl_inferencer import * # noqa
from .icl_prompt_template import PromptTemplate # noqa
from .icl_retriever import * # noqa
from .icl_aucroc_evaluator import AUCROCEvaluator
from .icl_base_evaluator import BaseEvaluator
from .icl_em_evaluator import EMEvaluator
from .icl_aucroc_evaluator import AUCROCEvaluator # noqa
from .icl_base_evaluator import BaseEvaluator # noqa
from .icl_em_evaluator import EMEvaluator # noqa
from .icl_hf_evaluator import * # noqa
from .icl_toxic_evaluator import ToxicEvaluator
from .icl_toxic_evaluator import ToxicEvaluator # noqa
from typing import List
import numpy as np
from sklearn.metrics import roc_auc_score
......@@ -10,7 +11,7 @@ from .icl_base_evaluator import BaseEvaluator
@ICL_EVALUATORS.register_module()
class AUCROCEvaluator(BaseEvaluator):
"""Calculate AUC-ROC scores and accuracy according the prediction.
For some dataset, the accuracy cannot reveal the difference between
models because of the saturation. AUC-ROC scores can further exam
model abilities to distinguish different labels. More details can refer to
......
"""Base Evaluator."""
from typing import List
class BaseEvaluator:
def __init__(self) -> None:
pass
......
from typing import List
import evaluate
from opencompass.registry import ICL_EVALUATORS
......@@ -54,9 +55,12 @@ class HuggingfaceEvaluator(BaseEvaluator):
dict: calculated scores.
"""
if len(predictions) != len(references):
return {'error': 'predictions and references have different '
return {
'error':
'predictions and references have different '
f'length. len(predictions): {len(predictions)}, '
f'len(references): {len(references)}'}
f'len(references): {len(references)}'
}
metric = evaluate.load(self.metric)
scores = metric.compute(**self._preprocess(predictions, references))
return self._postprocess(scores)
......@@ -103,7 +107,7 @@ class AccEvaluator(HuggingfaceEvaluator):
Returns:
dict: postprocessed scores.
"""
scores["accuracy"] *= 100
scores['accuracy'] *= 100
return scores
......@@ -150,7 +154,7 @@ class MccEvaluator(AccEvaluator):
Returns:
dict: postprocessed scores.
"""
scores["matthews_correlation"] *= 100
scores['matthews_correlation'] *= 100
return scores
......
from .icl_base_inferencer import BaseInferencer
from .icl_gen_inferencer import GenInferencer
from .icl_ppl_inferencer import PPLInferencer
from .icl_clp_inferencer import CLPInferencer
from .icl_base_inferencer import BaseInferencer # noqa
from .icl_clp_inferencer import CLPInferencer # noqa
from .icl_gen_inferencer import GenInferencer # noqa
from .icl_ppl_inferencer import PPLInferencer # noqa
......@@ -98,8 +98,8 @@ class CLPInferencer(BaseInferencer):
# 3. Generate in-context examples for testing inputs
for idx in range(len(ice_idx_list)):
ice.append(
retriever.generate_ice(
ice_idx_list[idx], ice_template=ice_template))
retriever.generate_ice(ice_idx_list[idx],
ice_template=ice_template))
output_handler.save_ice(ice)
# 4. Collect prompts and calculate conditional log probs
......@@ -165,11 +165,10 @@ class CLPInferencer(BaseInferencer):
choice_target_ids.append(prompt_token_num - 1)
logger.info('Calculating conditional log probability for prompts.')
for idx in trange(
0,
len(prompt_list),
self.batch_size,
disable=not self.is_main_process):
for idx in trange(0,
len(prompt_list),
self.batch_size,
disable=not self.is_main_process):
sub_prompt_list = prompt_list[idx:idx + self.batch_size]
sub_choice_target_ids = choice_target_ids[idx:idx +
self.batch_size]
......
......@@ -25,7 +25,7 @@ class PPLInferencer(BaseInferencer):
model (:obj:`BaseModel`, optional): The module to inference.
max_seq_len (:obj:`int`): Maximum number of tokenized words allowed by
the LM.
batch_size (:obj:`int`, optional): Batch size for the :obj:`DataLoader`.
batch_size (:obj:`int`, optional): Batch size for the :obj:`DataLoader`
output_json_filepath (:obj:`str`, optional): File path for output
`JSON` file.
output_json_filename (:obj:`str`, optional): File name for output
......@@ -126,7 +126,7 @@ class PPLInferencer(BaseInferencer):
label,
ice_template=ice_template,
prompt_template=prompt_template)
prompt_token_num = self.model.get_token_len_from_template(
prompt_token_num = self.model.get_token_len_from_template( # noqa
prompt, mode='ppl') # noqa
if normalizing_str is not None:
......
from .icl_base_retriever import BaseRetriever
from .icl_bm25_retriever import BM25Retriever
from .icl_dpp_retriever import DPPRetriever
from .icl_fix_k_retriever import FixKRetriever
from .icl_mdl_retriever import MDLRetriever
from .icl_random_retriever import RandomRetriever
from .icl_topk_retriever import TopkRetriever
from .icl_votek_retriever import VotekRetriever
from .icl_zero_retriever import ZeroRetriever
from .icl_base_retriever import BaseRetriever # noqa
from .icl_bm25_retriever import BM25Retriever # noqa
from .icl_dpp_retriever import DPPRetriever # noqa
from .icl_fix_k_retriever import FixKRetriever # noqa
from .icl_mdl_retriever import MDLRetriever # noqa
from .icl_random_retriever import RandomRetriever # noqa
from .icl_topk_retriever import TopkRetriever # noqa
from .icl_votek_retriever import VotekRetriever # noqa
from .icl_zero_retriever import ZeroRetriever # noqa
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment