server/text_generation_server/models/__init__.py · 4d38a1c4ad9e262617a3f36e1d01e8c57693b6ef · OpenDAS / text-generation-inference

feat(server): Add Non flash MPT. (#514) · 1da07e85

Nicolas Patry authored Jul 03, 2023

# What does this PR do?


This adds a non flash version of MPT.
Flash is harder because we need to create a bias ready cuda kernel of
flash attention.

Fixes
https://github.com/huggingface/text-generation-inference/issues/361
Fixes
https://github.com/huggingface/text-generation-inference/issues/491
Fixes
https://github.com/huggingface/text-generation-inference/issues/290

1da07e85

__init__.py 10.3 KB

Replace __init__.py

Replace init.py