llms.txt Content
# Modular
> Deploy fast and scalable GenAI inference
This file contains links to documentation sections following the llmstxt.org standard.
## Table of Contents
- [Attention mask](https://docs.modular.com/glossary/ai/attention-mask): An attention mask specifies which tokens in a sequence a model can attend to
- [Attention](https://docs.modular.com/glossary/ai/attention): Attention is a mechanism used in AI models such as
- [Autoregression](https://docs.modular.com/glossary/ai/autoregression): Autoregression is a process by which an AI model iteratively predicts future
- [Batching](https://docs.modular.com/glossary/ai/batching): Batching is the process of combining multiple inference requests into a single
- [Context encoding](https://docs.modular.com/glossary/ai/context-encoding): Context encoding is the first phase of inference in a [transformer
- [Continuous batching](https://docs.modular.com/glossary/ai/continuous-batching): Continuous batching is a [batching](batching.mdx) technique that can
- [Disaggregated inference](https://docs.modular.com/glossary/ai/disaggregated-inference): Disaggregated inference is a serving architecture pattern for large language
- [Embedding](https://docs.modular.com/glossary/ai/embedding): An embedding (also known as a "vector embedding") is a numerical representation
- [Flash attention](https://docs.modular.com/glossary/ai/flash-attention): Flash attention is an optimization technique to compute attention blocks in
- [AI terms](https://docs.modular.com/glossary/ai): import MDXListing from '@site/src/components/Listing/MDXListing';
- [Inference routing](https://docs.modular.com/glossary/ai/inference-routing): Inference routing is the process of directing incoming inference requests to
- [KV cache](https://docs.modular.com/glossary/ai/kv-cache): KV (key-value) cache is a memory structure used in
- [Padding tokens](https://docs.modular.com/glossary/ai/padding-tokens): Padding tokens are extra tokens (usually zeros or special tokens