core.html

---

title: Iterative_masking


keywords: fastai
sidebar: home_sidebar

summary: "Use MSA Transformer to generate synthetic protein sequences by masking iteratively the same MSA."
description: "Use MSA Transformer to generate synthetic protein sequences by masking iteratively the same MSA."
nb_path: "00_core.ipynb"
---
<!--

#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: 00_core.ipynb
# command to build the docs after a change: nbdev_build_docs

-->

<div class="container" id="notebook-container">
        
    {% raw %}
    
<div class="cell border-box-sizing code_cell rendered">

</div>
    {% endraw %}

    {% raw %}
    
<div class="cell border-box-sizing code_cell rendered">

</div>
    {% endraw %}

    {% raw %}
    
<div class="cell border-box-sizing code_cell rendered">

<div class="output_wrapper">
<div class="output">

<div class="output_area">


<div class="output_markdown rendered_html output_subarea ">
<h2 id="IM_MSA_Transformer" class="doc_header"><code>class</code> <code>IM_MSA_Transformer</code><a href="" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>IM_MSA_Transformer</code>(<strong><code>iterations</code></strong>=<em><code>None</code></em>, <strong><code>p_mask</code></strong>=<em><code>None</code></em>, <strong><code>filename</code></strong>=<em><code>None</code></em>, <strong><code>num</code></strong>=<em><code>None</code></em>, <strong><code>filepath</code></strong>=<em><code>None</code></em>)</p>
</blockquote>
<p>Class that implement the Iterative masking algorithm</p>

</div>

</div>

</div>
</div>

</div>
    {% endraw %}

    {% raw %}
    
<div class="cell border-box-sizing code_cell rendered">

<div class="output_wrapper">
<div class="output">

<div class="output_area">


<div class="output_markdown rendered_html output_subarea ">
<h4 id="IM_MSA_Transformer.Batch_MSA" class="doc_header"><code>IM_MSA_Transformer.Batch_MSA</code><a href="__main__.py#L303" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>IM_MSA_Transformer.Batch_MSA</code>(<strong><code>use_pdf</code></strong>=<em><code>False</code></em>, <strong><code>simplified</code></strong>=<em><code>False</code></em>, <strong><code>repetitions</code></strong>=<em><code>2</code></em>, <strong><code>sample_all</code></strong>=<em><code>False</code></em>, <strong><code>T</code></strong>=<em><code>1</code></em>, <strong><code>phylo</code></strong>=<em><code>False</code></em>)</p>
</blockquote>
<p>Generate a full MSA by calling with different input MSAs the iterative MSA generator defined
in: <code>self.NEW_MSA</code>.</p>
<p>---&gt; Use this function with <code>simplified</code>=False only if you need tokens in cuda ! (i.e. if you want to compute embed
     or contacs), otherwise use <code>simplified</code>=True</p>
<p>The variable <code>self.iterations</code> must be a numpy array which specifies when (at which iterations)
the tokens must be saved. The last element of the array gives the maximum number of iterations that should be done.</p>
<p><code>repetitions</code>:      the number of times self.NEW_MSA() is repeated with a different input MSA.</p>
<p><code>use_pdf</code>:    if it's True the function sample the token from the logits pdf 
            instead of getting the argmax (greedy sampling).</p>
<p><code>sample_all</code>: if True all the new tokens are obtained from the logits (both
            the masked and the non masked), if False the non masked tokens
            are left untouched and only the masked ones are changed.</p>
<p><code>T</code>:          Temperature of sampling from the pdf of output logits.</p>
<p><code>phylo</code>:            if True the start sequences are sampled from phylogeny weights instead of randomly.</p>

</div>

</div>

</div>
</div>

</div>
    {% endraw %}

    {% raw %}
    
<div class="cell border-box-sizing code_cell rendered">

<div class="output_wrapper">
<div class="output">

<div class="output_area">


<div class="output_markdown rendered_html output_subarea ">
<h4 id="IM_MSA_Transformer.Context_MSA" class="doc_header"><code>IM_MSA_Transformer.Context_MSA</code><a href="__main__.py#L448" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>IM_MSA_Transformer.Context_MSA</code>(<strong><code>depth</code></strong>=<em><code>None</code></em>, <strong><code>ancestor</code></strong>=<em><code>None</code></em>, <strong><code>context</code></strong>=<em><code>None</code></em>, <strong><code>use_pdf</code></strong>=<em><code>False</code></em>, <strong><code>simplified</code></strong>=<em><code>False</code></em>, <strong><code>sample_all</code></strong>=<em><code>False</code></em>, <strong><code>print_all</code></strong>=<em><code>True</code></em>, <strong><code>T</code></strong>=<em><code>1</code></em>)</p>
</blockquote>
<p>Generates a new MSA with context-generation by iterating the masking on the original ancestor sequence
using: <code>self.generate_MSA_context</code>. It masks <code>ancestor</code> (original sequence) and uses the sequences in <code>context</code> as context MSA.</p>
<p>---&gt; Use this function with <code>simplified</code>=False only if you need tokens in cuda ! (i.e. if you want to compute embed
     or contacs), otherwise use <code>simplified</code>=True</p>
<p>The variable <code>self.iterations</code> must be a numpy array which specifies when (at which iterations)
the tokens must be saved. The last element of the array gives the maximum number of iterations that should be done.
If <code>print_all</code>=True then it saves the generated sequences at each iteration.</p>
<p><code>ancestor</code>:     input sequence to be masked iteratively.</p>
<p><code>context</code>:      context MSA (not masked).</p>
<p><code>use_pdf</code>:      if it's True the function sample the token from the logits pdf 
                instead of getting the argmax (greedy sampling).</p>
<p><code>sample_all</code>:   if True all the new tokens are obtained from the logits (both
                the masked and the non masked), if False the non masked tokens
                are left untouched and only the masked ones are changed.</p>
<p><code>T</code>:            Temperature of sampling from the pdf of output logits.</p>
<p><code>depth</code>:        number of generated sequences, if None the depth is the number of ancestor sequences.</p>

</div>

</div>

</div>
</div>

</div>
    {% endraw %}

    {% raw %}
    
<div class="cell border-box-sizing code_cell rendered">

</div>
    {% endraw %}

    {% raw %}
    
<div class="cell border-box-sizing code_cell rendered">

<div class="output_wrapper">
<div class="output">

<div class="output_area">


<div class="output_markdown rendered_html output_subarea ">
<h4 id="gen_MSAs" class="doc_header"><code>gen_MSAs</code><a href="__main__.py#L6" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>gen_MSAs</code>(<strong><code>filepath</code></strong>:"Path of the input directory", <strong><code>filename</code></strong>:"Name of the input file(s)", <strong><code>new_dir</code></strong>:"Name of the output directory", <strong><code>pdf</code></strong>:"Should I sample tokens from the pdf ? (bool)", <strong><code>T</code></strong>:"Which is the sampling Temperature from the pdf ? (only when <code>pdf</code> is True)", <strong><code>sample_all</code></strong>:"Should I sample all tokens or just the masked ones ? (True = sample all tokens)", <strong><code>Iters</code></strong>:"Number of total iterations to generate the new tokens", <strong><code>pmask</code></strong>:"Masking probability", <strong><code>num</code></strong>:"Size of the batches MSAs which the MSA-Transformer receives as input", <strong><code>depth</code></strong>:"Number of batches (of size num) that you want to generate", <strong><code>generate</code></strong>:"How should I generate sequences ? False (=Batch generation) or Linear with context (=linear-ran/linear-tot-ran), <code>-ran</code> means that the context MSA is sampled randomly (once) while <code>-tot-ran</code> means that it is sampled randomly each time.", <strong><code>print_all</code></strong>:"Should I print the MSA after each iteration ? (bool)", <strong><code>range_vals</code></strong>:"First and last index of the sequences that you want to use as ancestors", <strong><code>phylo_w</code></strong>:"Should I sample the starting sequences from the phylogeny weights ? (bool)")</p>
</blockquote>
<p>Generate a new MSA either with Batch generation of Context generation. It shuffles the initial MSA and uses different slices as batch MSAs</p>

</div>

</div>

</div>
</div>

</div>
    {% endraw %}

<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Build-library">Build library<a class="anchor-link" href="#Build-library"> </a></h2>
</div>
</div>
</div>
</div>