This is an instruction of training and evaluating a *wait-k* simultanoes LSTM model on MUST-C English-Gernam Dataset.
[STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework](https://https://www.aclweb.org/anthology/P19-1289/)
## **Requirements**
Install fairseq (make sure to use the correct branch):
The simultaneous translation models from sharedtask participents are evaluated under a server-client protocol. The participents are requisted to plug in their own model API in the protocol, and submit a docker file.
## Server-Client Protocol
An server-client protocol that will be used in evaluation. For example, when a *wait-k* model (k=3) translate the English sentence "Alice and Bob are good friends" to Genman sentence "Alice und Bob sind gute Freunde." , the evaluation process is shown as following figure.
While every time client needs to read a new state (word or speech utterence), a "GET" request is supposed to sent over to server. Whenever a new token is generated, a "SEND" request with the word predicted (untokenized word) will be sent to server immediately. The server can hence calculate both latency and BLEU score of the sentence.
### Server
The server code is provided and can be set up directly locally for development purpose. For example, to evaluate a text simultaneous test set,
The state that server sent to client is has the following format
```json
{
'sent_id':Int,
'segment_id':Int,
'segment':String
}
```
### Client
The client will handle the evaluation process mentioned above. It should be out-of-box as well. The client's protocol is as following table
|Action|Content|
|:---:|:---:|
|Request new word / utterence| ```{key: "Get", value: None}```|
|Predict word "W"| ```{key: "SEND", value: "W"}```|
The core of the client module is the agent, which needs to be modified to different models accordingly. The abstract class of agent is as follow, the evaluation process happens in the `decode()` function.
```python
classAgent(object):
"an agent needs to follow this pattern"
def__init__(self,*args,**kwargs):
...
definit_states(self):
# Initializing states
...
defupdate_states(self,states,new_state):
# Update states with given new state from server
# TODO (describe the states)
...
deffinish_eval(self,states,new_state):
# Check if evaluation is finished
...
defpolicy(self,state:list)->dict:
# Provide a action given current states
# The action can only be either
# {key: "GET", value: NONE}
# or
# {key: "SEND", value: W}
...
defreset(self):
# Reset agent
...
defdecode(self,session):
states=self.init_states()
self.reset()
# Evaluataion protocol happens here
whileTrue:
# Get action from the current states according to self.policy()
action=self.policy(states)
ifaction['key']==GET:
# Read a new state from server
new_state=session.get_src()
states=self.update_states(states,new_state)
ifself.finish_eval(states,new_state):
# End of document
break
elifaction['key']==SEND:
# Send a new prediction to server
session.send_hypo(action['value'])
# Clean the history, wait for next sentence
ifaction['value']==DEFAULT_EOS:
states=self.init_states()
self.reset()
else:
raiseNotImplementedError
```
Here an implementation of agent of text [*wait-k* model](somelink). Notice that the tokenization is not considered.
## Quality
The quality is measured by detokenized BLEU. So make sure that the predicted words sent to server are detokenized. An implementation is can be find [here](some link)
## Latency
The latency metrics are
* Average Proportion
* Average Lagging
* Differentiable Average Lagging
Again Thery will also be evaluated on detokenized text.