Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
b105f2c6
Commit
b105f2c6
authored
Aug 21, 2020
by
Morgan Funtowicz
Browse files
Update ONNX doc to match the removal of --optimize argument.
Signed-off-by:
Morgan Funtowicz
<
funtowiczmo@gmail.com
>
parent
e5f45227
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
9 additions
and
9 deletions
+9
-9
docs/source/serialization.rst
docs/source/serialization.rst
+9
-9
No files found.
docs/source/serialization.rst
View file @
b105f2c6
...
...
@@ -52,15 +52,17 @@ Below are some of the operators which can be enabled to speed up inference throu
*
Skip
connection
LayerNormalization
fusing
*
FastGeLU
approximation
Some
of
the
optimizations
performed
by
ONNX
runtime
can
be
hardware
specific
and
thus
lead
to
different
performances
if
used
on
another
machine
with
a
different
hardware
configuration
than
the
one
used
for
exporting
the
model
.
For
this
reason
,
when
using
``
convert_graph_to_onnx
.
py
``
optimizations
are
not
enabled
,
ensuring
the
model
can
be
easily
exported
to
various
hardware
.
Optimizations
can
then
be
enabled
when
loading
the
model
through
ONNX
runtime
for
inference
.
Fortunately
,
you
can
let
ONNXRuntime
find
all
the
possible
optimized
operators
for
you
.
Simply
add
``--
optimize
``
when
exporting
your
model
through
``
convert_graph_to_onnx
.
py
``.
Example
:
..
code
-
block
::
bash
python
convert_graph_to_onnx
.
py
--
framework
<
pt
,
tf
>
--
model
bert
-
base
-
cased
--
optimize
bert
-
base
-
cased
.
onnx
..
note
::
When
quantization
is
enabled
(
see
below
),
``
convert_graph_to_onnx
.
py
``
script
will
enable
optimizations
on
the
model
because
quantization
would
modify
the
underlying
graph
making
it
impossible
for
ONNX
runtime
to
do
the
optimizations
afterwards
.
..
note
::
For
more
information
about
the
optimizations
enabled
by
ONNXRuntime
,
please
have
a
look
at
the
(`
ONNXRuntime
Github
<
https
://
github
.
com
/
microsoft
/
onnxruntime
/
tree
/
master
/
onnxruntime
/
python
/
tools
/
transformers
>`
_
)
...
...
@@ -112,8 +114,6 @@ Example of quantized BERT model export:
above
command
will
contain
the
original
ONNX
model
storing
`
float32
`
weights
.
The
second
one
,
with
``-
quantized
``
suffix
,
will
hold
the
quantized
parameters
.
..
note
::
The
quantization
export
gives
the
best
performances
when
used
in
combination
with
``--
optimize
``.
TorchScript
=======================================
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment