"examples/dreambooth/train_dreambooth_lora_sdxl.py" did not exist on "bfdffbea32b33e043d1dcd26ad3c545a8b4a2c5e"
Unverified Commit 9ae96cdf authored by ZoneTwelve's avatar ZoneTwelve Committed by GitHub
Browse files

TMMLU+ implementation (#1394)



* implementation of TMMLU+

* implemented: TMMLU+

****TMMLU+ : large-scale Traditional chinese Massive Multitask language Understanding****

- 4 categories
    - STEM
    - Social Science
    - Humanities
    - Other

The TMMLU+ dataset, encompassing over 67 subjects and 20160 tasks, is six times larger and more balanced than its predecessor, TMMLU, and includes benchmark results from both closed-source and 20 open-weight Chinese large language models with 1.8B to 72B parameters. However, Traditional Chinese variants continue to underperform compared to major Simplified Chinese models.

```markdown
Total number of tasks in the 'test' sets: 20160
Total number of tasks in the 'validation' sets: 2247
Total number of tasks in the 'train' sets: 335
```

* Remove print from __init__.py

There was my mistake in forgetting to remove the debug print from the code.

* update: move TMMLU+ config generation program into default

* fix: we should use training set as few shots example

* update: README for TMMLU+

* update: a small changes of TMMLU+ README file

* pre-commit run thought

* Add README for TMMLU+ dataset

* run precommit

* trigger precommit again

* trigger precommit again

* isort is fussy

* isort is fussy

* format, again

* oops

* oops

---------
Co-authored-by: default avatarlintang <lintang@eleuther.ai>
Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
parent ff24e992
"dataset_name": "three_principles_of_people"
"description": "以下為三民主義的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_social_sciences"
"group_alias": "social sciences"
"include": "_default_template_yaml"
"task": "tmmluplus_three_principles_of_people"
"task_alias": "three principles of people"
"dataset_name": "trade"
"description": "以下為貿易的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_other"
"group_alias": "other"
"include": "_default_template_yaml"
"task": "tmmluplus_trade"
"task_alias": "trade"
"dataset_name": "traditional_chinese_medicine_clinical_medicine"
"description": "以下為中醫臨床醫學的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_other"
"group_alias": "other"
"include": "_default_template_yaml"
"task": "tmmluplus_traditional_chinese_medicine_clinical_medicine"
"task_alias": "traditional chinese medicine clinical medicine"
"dataset_name": "trust_practice"
"description": "以下為信託實務的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_humanities"
"group_alias": "humanities"
"include": "_default_template_yaml"
"task": "tmmluplus_trust_practice"
"task_alias": "trust practice"
"dataset_name": "ttqav2"
"description": "以下為台灣在地用語的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_social_sciences"
"group_alias": "social sciences"
"include": "_default_template_yaml"
"task": "tmmluplus_ttqav2"
"task_alias": "ttqav2"
"dataset_name": "tve_chinese_language"
"description": "以下為統測國文的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_social_sciences"
"group_alias": "social sciences"
"include": "_default_template_yaml"
"task": "tmmluplus_tve_chinese_language"
"task_alias": "tve chinese language"
"dataset_name": "tve_design"
"description": "以下為統測 設計的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_other"
"group_alias": "other"
"include": "_default_template_yaml"
"task": "tmmluplus_tve_design"
"task_alias": "tve design"
"dataset_name": "tve_mathematics"
"description": "以下為統測數學的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_STEM"
"group_alias": "STEM"
"include": "_default_template_yaml"
"task": "tmmluplus_tve_mathematics"
"task_alias": "tve mathematics"
"dataset_name": "tve_natural_sciences"
"description": "以下為統測自然科的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_STEM"
"group_alias": "STEM"
"include": "_default_template_yaml"
"task": "tmmluplus_tve_natural_sciences"
"task_alias": "tve natural sciences"
"dataset_name": "veterinary_pathology"
"description": "以下為獸醫病理學的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_other"
"group_alias": "other"
"include": "_default_template_yaml"
"task": "tmmluplus_veterinary_pathology"
"task_alias": "veterinary pathology"
"dataset_name": "veterinary_pharmacology"
"description": "以下為獸醫藥理學的單選題,請提供正確答案的選項。\n\n"
"group": "tmmluplus_other"
"group_alias": "other"
"include": "_default_template_yaml"
"task": "tmmluplus_veterinary_pharmacology"
"task_alias": "veterinary pharmacology"
import datasets
def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:
def _helper(doc):
# modifies the contents of a single
# document in our dataset.
answer_list = ["A", "B", "C", "D"]
out_doc = {
"questions": doc["question"],
"choices": [doc["A"], doc["B"], doc["C"], doc["D"]],
"goal": answer_list.index(doc["answer"]),
}
return out_doc
return dataset.map(_helper) # returns back a datasets.Dataset object
subject name category
dentistry 牙醫學 health
traditional_chinese_medicine_clinical_medicine 中醫臨床醫學 health
clinical_psychology 臨床心理學 psychology
technical 技術工相關 other
culinary_skills 餐旅 other
mechanical 機械與機電概論 other
logic_reasoning 邏輯思維 other
real_estate 房地產 other
general_principles_of_law 法學大意 law
finance_banking 金融與法規 business
anti_money_laundering 洗錢防制 law
ttqav2 台灣在地用語 culture
marketing_management 行銷管理 other
business_management 企業管理 other
organic_chemistry 有機化學 chemistry
advance_chemistry 化學 chemistry
physics 物理 physics
secondary_physics 高中物理 physics
human_behavior 人類行為與社會 psychology
national_protection 軍事 politics
jce_humanities 指考人文科目 philosophy
linear_algebra 線代 math
politic_science 政治 politics
agriculture 農業 other
official_document_management 機關文書 other
financial_analysis 財務分析 business
pharmacy 藥劑學 biology
educational_psychology 教育心理 psychology
statistics_and_machine_learning 統計與機器學習 engineering
management_accounting 管理會計 business
introduction_to_law 法律概論 law
computer_science 資訊工程 computer science
veterinary_pathology 獸醫病理學 health
accounting 會計學 business
fire_science 火災學 other
optometry 視光學 other
insurance_studies 保險學 other
pharmacology 藥理學 health
taxation 稅務 law
education_(profession_level) 教育專業 education
economics 經濟學 economics
veterinary_pharmacology 獸醫藥理學 health
nautical_science 航海 other
occupational_therapy_for_psychological_disorders 心理障礙職能治療學 psychology
trust_practice 信託實務 law
geography_of_taiwan 台灣地理 geography
physical_education 體育 education
auditing 審計學 business
administrative_law 行政法 law
basic_medical_science 基礎醫學 biology
macroeconomics 總經 economics
trade 貿易 business
chinese_language_and_literature 國文 culture
tve_design 統測_設計 other
junior_science_exam 國中會考基測自然科 biology
junior_math_exam 國中會考基測數學科 math
junior_chinese_exam 國中會考基測國文 culture
junior_social_studies 國中會考基測社會科 other
tve_mathematics 統測數學 math
tve_chinese_language 統測國文 culture
tve_natural_sciences 統測自然科 biology
junior_chemistry 國中理化 chemistry
music 音樂科 other
education 教育常識 education
three_principles_of_people 三民主義 culture
taiwanese_hokkien 閩南語 culture
engineering_math 工程數學 math
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment