Skip to content

vllm.v1.attention.ops.triton_decode_attention

Memory-efficient attention for decoding. It supports page size >= 1. It supports FP8 quantized KV cache with on-the-fly dequantization.