Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
74344829
Commit
74344829
authored
Jun 19, 2024
by
Hojin Lee
Browse files
update tasks README
parent
d26ae30c
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
0 deletions
+1
-0
lm_eval/tasks/README.md
lm_eval/tasks/README.md
+1
-0
No files found.
lm_eval/tasks/README.md
View file @
74344829
...
@@ -45,6 +45,7 @@
...
@@ -45,6 +45,7 @@
|
[
hellaswag
](
hellaswag/README.md
)
| Tasks to predict the ending of stories or scenarios, testing comprehension and creativity. | English |
|
[
hellaswag
](
hellaswag/README.md
)
| Tasks to predict the ending of stories or scenarios, testing comprehension and creativity. | English |
|
[
hendrycks_ethics
](
hendrycks_ethics/README.md
)
| Tasks designed to evaluate the ethical reasoning capabilities of models. | English |
|
[
hendrycks_ethics
](
hendrycks_ethics/README.md
)
| Tasks designed to evaluate the ethical reasoning capabilities of models. | English |
|
[
hendrycks_math
](
hendrycks_math/README.md
)
| Mathematical problem-solving tasks to test numerical reasoning and problem-solving. | English |
|
[
hendrycks_math
](
hendrycks_math/README.md
)
| Mathematical problem-solving tasks to test numerical reasoning and problem-solving. | English |
|
[
humaneval
](
humaneval/README.md
)
| Code generation task that measure functional correctness for synthesizing programs from docstrings. | Python |
|
[
ifeval
](
ifeval/README.md
)
| Interactive fiction evaluation tasks for narrative understanding and reasoning. | English |
|
[
ifeval
](
ifeval/README.md
)
| Interactive fiction evaluation tasks for narrative understanding and reasoning. | English |
|
[
kmmlu
](
kmmlu/README.md
)
| Knowledge-based multi-subject multiple choice questions for academic evaluation. | Korean |
|
[
kmmlu
](
kmmlu/README.md
)
| Knowledge-based multi-subject multiple choice questions for academic evaluation. | Korean |
|
[
kobest
](
kobest/README.md
)
| A collection of tasks designed to evaluate understanding in Korean language. | Korean |
|
[
kobest
](
kobest/README.md
)
| A collection of tasks designed to evaluate understanding in Korean language. | Korean |
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment