Unverified Commit 3dbdf63e authored by Nicolas Patry's avatar Nicolas Patry Committed by GitHub
Browse files

Intel ci (#2630)

* Intel CI ?

* Let's try non sharded gemma.

* Snapshot rename

* Apparently container can be gone already.
parent d912f0bf
...@@ -75,10 +75,10 @@ jobs: ...@@ -75,10 +75,10 @@ jobs:
export label_extension="-intel-cpu" export label_extension="-intel-cpu"
export docker_devices="none" export docker_devices="none"
export docker_volume="/mnt/cache" export docker_volume="/mnt/cache"
export runs_on="ubuntu-latest" # export runs_on="ubuntu-latest"
# export runs_on="aws-highmemory-32-plus-priv" export runs_on="aws-highmemory-32-plus-priv"
export platform="cpu" export platform="cpu"
export extra_pytest="-k test_flash_llama_load" export extra_pytest="-k test_flash_gemma_simple"
;; ;;
esac esac
echo $dockerfile echo $dockerfile
......
...@@ -572,7 +572,10 @@ def launcher(event_loop): ...@@ -572,7 +572,10 @@ def launcher(event_loop):
print(container_output, file=sys.stderr) print(container_output, file=sys.stderr)
finally: finally:
container.remove() try:
container.remove()
except Exception:
pass
if DOCKER_IMAGE is not None: if DOCKER_IMAGE is not None:
return docker_launcher return docker_launcher
......
...@@ -16,7 +16,7 @@ async def flash_gemma(flash_gemma_handle): ...@@ -16,7 +16,7 @@ async def flash_gemma(flash_gemma_handle):
@pytest.mark.release @pytest.mark.release
@pytest.mark.asyncio @pytest.mark.asyncio
@pytest.mark.private @pytest.mark.private
async def test_flash_gemma(flash_gemma, response_snapshot): async def test_flash_gemma_simple(flash_gemma, response_snapshot):
response = await flash_gemma.generate( response = await flash_gemma.generate(
"Test request", max_new_tokens=10, decoder_input_details=True "Test request", max_new_tokens=10, decoder_input_details=True
) )
......
...@@ -15,7 +15,7 @@ async def flash_llama(flash_llama_handle): ...@@ -15,7 +15,7 @@ async def flash_llama(flash_llama_handle):
@pytest.mark.asyncio @pytest.mark.asyncio
@pytest.mark.private @pytest.mark.private
async def test_flash_llama(flash_llama, response_snapshot): async def test_flash_llama_simple(flash_llama, response_snapshot):
response = await flash_llama.generate( response = await flash_llama.generate(
"Test request", max_new_tokens=10, decoder_input_details=True "Test request", max_new_tokens=10, decoder_input_details=True
) )
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment