vllm.v1.attention.ops.triton_decode_attention ¶
Memory-efficient attention for decoding. It supports page size >= 1. It supports FP8 quantized KV cache with on-the-fly dequantization.
vllm.v1.attention.ops.triton_decode_attention ¶Memory-efficient attention for decoding. It supports page size >= 1. It supports FP8 quantized KV cache with on-the-fly dequantization.