"...composable_kernel-1.git" did not exist on "a670a5a09261e3305c46b5ba30d2bf677392f788"
Unverified Commit 311bf0da authored by Tong Gao's avatar Tong Gao Committed by GitHub
Browse files

[Fix] Fix CI (#70)

* [Fix] Fix CI

* [Fix] Fix CI

* [Fix] Fix CI

* update
parent 29006e39
...@@ -10,11 +10,11 @@ jobs: ...@@ -10,11 +10,11 @@ jobs:
lint: lint:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v2 - uses: actions/checkout@v3
- name: Set up Python 3.10 - name: Set up Python 3.10
uses: actions/setup-python@v2 uses: actions/setup-python@v4
with: with:
python-version: 3.10 python-version: '3.10'
- name: Install pre-commit hook - name: Install pre-commit hook
run: | run: |
pip install pre-commit pip install pre-commit
......
# Evalation pipeline on MMBench # Evaluation pipeline on MMBench
## Intro to each data sample in MMBench ## Intro to each data sample in MMBench
...@@ -17,8 +16,8 @@ context (optional): the context to a question, which is optional. ...@@ -17,8 +16,8 @@ context (optional): the context to a question, which is optional.
answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server) answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server)
``` ```
## Load MMBench ## Load MMBench
We provide a code snippet as an example of loading MMBench We provide a code snippet as an example of loading MMBench
```python ```python
...@@ -77,7 +76,7 @@ class MMBenchDataset(Dataset): ...@@ -77,7 +76,7 @@ class MMBenchDataset(Dataset):
'context': hint, 'context': hint,
} }
return data return data
def load_from_df(self, idx, key): def load_from_df(self, idx, key):
if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]): if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]):
return self.df.iloc[idx][key] return self.df.iloc[idx][key]
...@@ -85,10 +84,8 @@ class MMBenchDataset(Dataset): ...@@ -85,10 +84,8 @@ class MMBenchDataset(Dataset):
return None return None
``` ```
## How to construct the inference prompt ## How to construct the inference prompt
```python ```python
if data_sample['context'] is None: if data_sample['context'] is None:
prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options'] prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options']
...@@ -98,7 +95,7 @@ else: ...@@ -98,7 +95,7 @@ else:
For example: For example:
Question: Which category does this image belong to? Question: Which category does this image belong to?
A. Oil Paiting A. Oil Painting
B. Sketch B. Sketch
C. Digital art C. Digital art
D. Photo D. Photo
...@@ -107,16 +104,14 @@ D. Photo ...@@ -107,16 +104,14 @@ D. Photo
<img src="https://user-images.githubusercontent.com/56866854/252847545-ea829a95-b063-492f-8760-d27143b5c834.jpg" width="10%"/> <img src="https://user-images.githubusercontent.com/56866854/252847545-ea829a95-b063-492f-8760-d27143b5c834.jpg" width="10%"/>
</div> </div>
``` ```
prompt = ###Human: Question: Which category does this image belong to? There are several options: A. Oil Paiting, B. Sketch, C. Digital art, D. Photo ###Assistant: prompt = ###Human: Question: Which category does this image belong to? There are several options: A. Oil Painting, B. Sketch, C. Digital art, D. Photo ###Assistant:
``` ```
You can make custom modifications to the prompt
You can make custom modifications to the prompt
## How to save results: ## How to save results:
You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields: You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields:
``` ```
...@@ -125,13 +120,10 @@ A: The first choice ...@@ -125,13 +120,10 @@ A: The first choice
B: The second choice B: The second choice
C: The third choice C: The third choice
D: The fourth choice D: The fourth choice
prediction: The prediction of your model to currrent question prediction: The prediction of your model to current question
category: the leaf category category: the leaf category
l2_category: the l2-level category l2_category: the l2-level category
index: the l2-level category index: the l2-level category
``` ```
If there are any questions with fewer than four options, simply leave those fields blank.
If there are any questions with fewer than four options, simply leave those fields blank.
...@@ -208,11 +208,10 @@ class ToxicEvaluator(BaseEvaluator): ...@@ -208,11 +208,10 @@ class ToxicEvaluator(BaseEvaluator):
length = np.count_nonzero(~np.isnan(toxicity_scores)) length = np.count_nonzero(~np.isnan(toxicity_scores))
max_toxicity_score = max(toxicity_scores) max_toxicity_score = max(toxicity_scores)
return dict( return dict(expected_max_toxicity=round(max_toxicity_score, 4),
expected_max_toxicity=round(max_toxicity_score, 4), valid_frac=round(length / len(toxicity_scores), 4),
valid_frac=round(length / len(toxicity_scores), 4), toxic_frac_valid=round(num_toxic_completions / length, 4),
toxic_frac_valid=round(num_toxic_completions / length, 4), avg_toxicity_score=round(np.nanmean(toxicity_scores), 4))
avg_toxicity_score=round(np.nanmean(toxicity_scores), 4))
def score(self, predictions: List, references: List) -> dict: def score(self, predictions: List, references: List) -> dict:
"""Calculate scores. Reference is not needed. """Calculate scores. Reference is not needed.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment