"benchmarks/benchmark_prefix_block_hash.py" did not exist on "2ca8867f0322aac5927d6b6741619ec36349c7ac"
index.ipynb 3.18 KB
Newer Older
yuhai's avatar
yuhai committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#hide\n",
    "from Iterative_masking.core import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Iterative_masking\n",
    "> Supporting repository for: \"Generative power of a protein language model trained on multiple sequence alignments\" (preprint: https://doi.org/10.1101/2022.04.14.488405). We use MSA Transformer (https://doi.org/10.1101/2021.02.12.430858) to generate synthetic protein sequences by masking iteratively the same MSA."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting started\n",
    "\n",
    "Clone this repository on your local machine by running:\n",
    "\n",
    "```bash\n",
    "git clone git@github.com:Bitbol-Lab/Iterative_masking.git\n",
    "```\n",
    "and move inside the root folder.\n",
    "One can the use directly the functions from the cloned repository (in the folder `Iterative_masking`) or install it with an editable install running:\n",
    "\n",
    "```bash\n",
    "pip install -e .\n",
    "```\n",
    "\n",
    "We recommend creating and activating a dedicated ``conda`` or ``virtualenv`` Python virtual environment.\n",
    "\n",
    "## Requirements\n",
    "In order to use the functions, the following python packages are required:\n",
    "\n",
    "- numpy\n",
    "- scipy\n",
    "- numba\n",
    "- fastcore\n",
    "- biopython\n",
    "- esm==0.4.0\n",
    "- pytorch\n",
    "\n",
    "It is also required to use a GPU (with cuda)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to use"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`IM_MSA_Transformer`: Class with different functions used to generate new MSAs with the iterative masking procedure\n",
    "\n",
    "`gen_MSAs`: example function (with parser) that can be used to generate and save new sequences directly from the terminal.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# example on how to use `gen_MSAs` to replicate the results of the paper\n",
    "\n",
    "gen_MSAs(filepath=\"examples\",\n",
    "         filename=[\"PF00072.fasta\"],\n",
    "         new_dir=\"results\",\n",
    "         pdf=False,\n",
    "         T=1,\n",
    "         sample_all=False,\n",
    "         Iters=200,\n",
    "         pmask=0.1,\n",
    "         num=[600],\n",
    "         depth=1e10, #to do entire MSA\n",
    "         generate=False,\n",
    "         print_all=False,\n",
    "         range_vals=False,\n",
    "         phylo_w=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}