Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50_tensorflow
Commits
97070abf
Commit
97070abf
authored
Nov 23, 2021
by
Aleksey Vlasenko
Browse files
Fixed preprocessing link and added clarifications about vocabulary size.
parent
c280c4ee
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
6 additions
and
3 deletions
+6
-3
official/recommendation/ranking/README.md
official/recommendation/ranking/README.md
+3
-2
official/recommendation/ranking/preprocessing/README.md
official/recommendation/ranking/preprocessing/README.md
+3
-1
No files found.
official/recommendation/ranking/README.md
View file @
97070abf
...
...
@@ -68,7 +68,7 @@ Note that the dataset is large (~1TB).
### Preprocess the data
Follow the instructions in
[
Data Preprocessing
](
data
/preprocessing
)
to
Follow the instructions in
[
Data Preprocessing
](
.
/preprocessing
)
to
preprocess the Criteo Terabyte dataset.
Data preprocessing steps are summarized below.
...
...
@@ -87,7 +87,8 @@ Categorical features:
function such as modulus will suffice, i.e. feature_value % MAX_INDEX.
The vocabulary sizes resulting from pre-processing are passed in to the model
trainer using the model.vocab_sizes config.
trainer using the model.vocab_sizes config. Note that provided values in sample below
are only valid for Criteo Terabyte dataset.
The full dataset is composed of 24 directories. Partition the data into training
and eval sets, for example days 1-23 for training and day 24 for evaluation.
...
...
official/recommendation/ranking/preprocessing/README.md
View file @
97070abf
...
...
@@ -69,7 +69,9 @@ python3 criteo_preprocess.py \
--vocab_gen_mode
--runner
DataflowRunner
--max_vocab_size
5000000
\
--project
${
PROJECT
}
--region
${
REGION
}
```
Vocabulary for each feature is going to be generated to
${STORAGE_BUCKET}/criteo_vocab/tftransform_tmp/feature_??_vocab files.
Vocabulary size can be found as wc -l
<feature_vocab_file>
.
Preprocess training and test data:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment