Remove inline math from README.md (not supported by GitHub). Replace

with italics and subscript markers.

Remove inline math from README.md (not supported by GitHub). Replace
with italics and subscript markers.
0d8720c2 · Chris Shallue · 4f9d1024 · 0d8720c2
Commit 0d8720c2 authored Sep 08, 2016 by Chris Shallue
Hide whitespace changes
Inline Side-by-side

Showing with 16 additions and 12 deletions

im2txt/README.md im2txt/README.md +16 -12

No files found.
--- a/im2txt/README.md
+++ b/im2txt/README.md
@@ -67,13 +67,17 @@ The following diagram illustrates the model architecture.
 ![Show and Tell Architecture](g3doc/show_and_tell_architecture.png)
 </center>

-In this diagram, $$\{ s_0, s_1, ..., s_{N-1} \}$$ are the words of the caption
-and $$\{ w_e s_0, w_e s_1, ..., w_e s_{N-1} \}$$ are their corresponding word
-embedding vectors. The outputs $$\{ p_1, p_2, ..., p_N \}$$ of the LSTM are
-probability distributions generated by the model for the next word in the
-sentence. The terms $$\{ \log p_1(s_1), \log p_2(s_2), ..., \log p_N(s_N) \}$$
-are the log-likelihoods of the correct word at each step; the negated sum of
-these terms is the minimization objective of the model.
+In this diagram, \{*s*<sub>0</sub>, *s*<sub>1</sub>, ..., *s*<sub>*N*-1</sub>\}
+are the words of the caption and \{*w*<sub>*e*</sub>*s*<sub>0</sub>,
+*w*<sub>*e*</sub>*s*<sub>1</sub>, ..., *w*<sub>*e*</sub>*s*<sub>*N*-1</sub>\}
+are their corresponding word embedding vectors. The outputs \{*p*<sub>1</sub>,
+*p*<sub>2</sub>, ..., *p*<sub>*N*</sub>\} of the LSTM are probability
+distributions generated by the model for the next word in the sentence. The
+terms \{log *p*<sub>1</sub>(*s*<sub>1</sub>),
+log *p*<sub>2</sub>(*s*<sub>2</sub>), ...,
+log *p*<sub>*N*</sub>(*s*<sub>*N*</sub>)\} are the log-likelihoods of the
+correct word at each step; the negated sum of these terms is the minimization
+objective of the model.

 During the first phase of training the parameters of the *Inception v3* model
 are kept fixed: it is simply a static image encoder function. A single trainable
@@ -85,11 +89,11 @@ training, all parameters - including the parameters of *Inception v3* - are
 trained to jointly fine-tune the image encoder and the LSTM.

 Given a trained model and an image we use *beam search* to generate captions for
-that image. Captions are generated word-by-word, where at each step $$t$$ we use
-the set of sentences already generated with length $$t-1$$ to generate a new set
-of sentences with length $$t$$. We keep only the top $$k$$ candidates at each
-step, where the hyperparameter $$k$$ is called the *beam size*. We have found
-the best performance with $$k=3$$.
+that image. Captions are generated word-by-word, where at each step *t* we use
+the set of sentences already generated with length *t* - 1 to generate a new set
+of sentences with length *t*. We keep only the top *k* candidates at each step,
+where the hyperparameter *k* is called the *beam size*. We have found the best
+performance with *k* = 3.

 ## Getting Started