Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50_tensorflow
Commits
0d8720c2
Commit
0d8720c2
authored
Sep 08, 2016
by
Chris Shallue
Browse files
Remove inline math from README.md (not supported by GitHub). Replace
with italics and subscript markers.
parent
4f9d1024
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
16 additions
and
12 deletions
+16
-12
im2txt/README.md
im2txt/README.md
+16
-12
No files found.
im2txt/README.md
View file @
0d8720c2
...
@@ -67,13 +67,17 @@ The following diagram illustrates the model architecture.
...
@@ -67,13 +67,17 @@ The following diagram illustrates the model architecture.


</center>
</center>
In this diagram, $$
\{
s_0, s_1, ..., s_{N-1}
\}
$$ are the words of the caption
In this diagram,
\{
*s*
<sub>
0
</sub>
,
*s*
<sub>
1
</sub>
, ...,
*s*
<sub>
*N*
-1
</sub>
\}
and $$
\{
w_e s_0, w_e s_1, ..., w_e s_{N-1}
\}
$$ are their corresponding word
are the words of the caption and
\{
*w*
<sub>
*e*
</sub>
*s*
<sub>
0
</sub>
,
embedding vectors. The outputs $$
\{
p_1, p_2, ..., p_N
\}
$$ of the LSTM are
*w*
<sub>
*e*
</sub>
*s*
<sub>
1
</sub>
, ...,
*w*
<sub>
*e*
</sub>
*s*
<sub>
*N*
-1
</sub>
\}
probability distributions generated by the model for the next word in the
are their corresponding word embedding vectors. The outputs
\{
*p*
<sub>
1
</sub>
,
sentence. The terms $$
\{
\l
og p_1(s_1),
\l
og p_2(s_2), ...,
\l
og p_N(s_N)
\}
$$
*p*
<sub>
2
</sub>
, ...,
*p*
<sub>
*N*
</sub>
\}
of the LSTM are probability
are the log-likelihoods of the correct word at each step; the negated sum of
distributions generated by the model for the next word in the sentence. The
these terms is the minimization objective of the model.
terms
\{
log
*p*
<sub>
1
</sub>
(
*s*
<sub>
1
</sub>
),
log
*p*
<sub>
2
</sub>
(
*s*
<sub>
2
</sub>
), ...,
log
*p*
<sub>
*N*
</sub>
(
*s*
<sub>
*N*
</sub>
)
\}
are the log-likelihoods of the
correct word at each step; the negated sum of these terms is the minimization
objective of the model.
During the first phase of training the parameters of the
*Inception v3*
model
During the first phase of training the parameters of the
*Inception v3*
model
are kept fixed: it is simply a static image encoder function. A single trainable
are kept fixed: it is simply a static image encoder function. A single trainable
...
@@ -85,11 +89,11 @@ training, all parameters - including the parameters of *Inception v3* - are
...
@@ -85,11 +89,11 @@ training, all parameters - including the parameters of *Inception v3* - are
trained to jointly fine-tune the image encoder and the LSTM.
trained to jointly fine-tune the image encoder and the LSTM.
Given a trained model and an image we use
*beam search*
to generate captions for
Given a trained model and an image we use
*beam search*
to generate captions for
that image. Captions are generated word-by-word, where at each step
$$t$$
we use
that image. Captions are generated word-by-word, where at each step
*t*
we use
the set of sentences already generated with length
$$t-1$$
to generate a new set
the set of sentences already generated with length
*t*
- 1
to generate a new set
of sentences with length
$$t$$
. We keep only the top
$$k$$
candidates at each
of sentences with length
*t*
. We keep only the top
*k*
candidates at each
step,
step,
where the hyperparameter
$$k$$
is called the
*beam size*
. We have found
where the hyperparameter
*k*
is called the
*beam size*
. We have found
the best
the best
performance with
$$k=3$$
.
performance with
*k*
= 3
.
## Getting Started
## Getting Started
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment