Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
fd43d570
"examples/offline_inference_chat.py" did not exist on "57d61ec2d9e8d3689f10cc20ce1191b2de021eb7"
Commit
fd43d570
authored
May 29, 2023
by
cardy20
Browse files
ngram for polyglot-ko
parent
f740d5a3
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
4 deletions
+1
-4
scripts/clean_training_data/generate_13_grams.py
scripts/clean_training_data/generate_13_grams.py
+1
-4
No files found.
scripts/clean_training_data/generate_13_grams.py
View file @
fd43d570
...
@@ -49,10 +49,7 @@ def get_pile(directory):
...
@@ -49,10 +49,7 @@ def get_pile(directory):
reader
=
Reader
()
reader
=
Reader
()
# for file in glob.glob(os.path.join(directory, f"*.jsonl.zst*")):
# for file in glob.glob(os.path.join(directory, f"*.jsonl.zst*")):
for
dir
in
os
.
listdir
(
directory
):
for
dir
in
os
.
listdir
(
directory
):
print
(
os
.
path
.
join
(
directory
+
dir
,
f
".jsonl"
))
for
file
in
glob
.
glob
(
os
.
path
.
join
(
directory
+
dir
,
"*.jsonl"
)):
for
file
in
glob
.
glob
(
os
.
path
.
join
(
directory
+
dir
)):
# for document in open(file).read():
for
document
in
reader
.
read
(
file
):
for
document
in
reader
.
read
(
file
):
yield
document
yield
document
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment