1. 06 May, 2024 1 commit
    • Lukas Jarosch's avatar
      Add more efficient script to generate all-seqs FASTA · e2479cb5
      Lukas Jarosch authored
      The previous data_dir_to_fasta.py script is very slow and requires fully reparsing mmCIF files. This new script is much faster and uses the sequence information from the alignment data instead. Note that this will not include chains for which alignments could not be generated, but we can't use those during training anyways.
      e2479cb5
  2. 20 Mar, 2024 2 commits
  3. 19 Mar, 2024 2 commits
  4. 20 Feb, 2024 2 commits
  5. 14 Feb, 2024 1 commit
  6. 12 Feb, 2024 1 commit
  7. 09 Feb, 2024 1 commit
  8. 08 Feb, 2024 1 commit
  9. 08 Dec, 2023 1 commit
  10. 29 Nov, 2023 1 commit
  11. 13 Nov, 2023 1 commit
  12. 03 Nov, 2023 1 commit
  13. 30 Oct, 2023 1 commit
  14. 27 Oct, 2023 1 commit
  15. 24 Oct, 2023 2 commits
  16. 23 Oct, 2023 2 commits
  17. 21 Oct, 2023 1 commit
  18. 20 Oct, 2023 2 commits
  19. 17 Oct, 2023 2 commits
  20. 16 Oct, 2023 2 commits
  21. 06 Oct, 2023 1 commit
  22. 20 Sep, 2023 1 commit
  23. 13 Sep, 2023 1 commit
  24. 08 Sep, 2023 1 commit
  25. 02 Aug, 2023 1 commit
  26. 02 Jun, 2023 1 commit
  27. 26 Apr, 2023 1 commit
  28. 18 Apr, 2023 1 commit
  29. 17 Apr, 2023 1 commit
  30. 14 Mar, 2023 1 commit
    • Jonathan King's avatar
      Fix check for max_seqlen. · bdbfef1d
      Jonathan King authored
      Previously, long sequences were not excluded from the script.
      This commit changes the comparison to exclude sequences with length
      greater than args.max_seqlen.
      bdbfef1d
  31. 07 Mar, 2023 1 commit
  32. 08 Oct, 2022 1 commit
    • Jonathan King's avatar
      Process *.tar and *.tar.gz files. · 67f23568
      Jonathan King authored
      Because download_mm_seqs_dbs.sh downloads and gunzips its target file (uniref30_2103.tar.gz), this script mistakenly does not process the .tar file. This fix expands the glob to match *.tar*.
      67f23568