TransformerDecoderModelSpec
- class ctranslate2.specs.TransformerDecoderModelSpec
Describes a Transformer decoder model (e.g. GPT-2).
Inherits from:
ctranslate2.specs.LanguageModelSpec
Attributes:
Methods:
- __init__(decoder: TransformerDecoderSpec)
Initializes a Transformer decoder model specification.
- Parameters
decoder – The decoder specification.
- classmethod from_config(num_layers: int, num_heads: int, pre_norm: bool = True, activation: Activation = Activation.RELU, layernorm_embedding: bool = False, no_final_norm: bool = False, project_in_out: bool = False, with_relative_position: bool = False, ffn_glu: bool = False, rms_norm: bool = False, alibi: bool = False, alibi_use_positive_positions: bool = False, scale_alibi: bool = False, rotary_dim: Optional[int] = None, rotary_interleave: bool = True, rotary_scaling_type: Optional[RotaryScalingType] = None, rotary_scaling_factor: float = 1, rotary_base: float = 10000, original_max_position_embeddings: int = 0, max_position_embeddings: int = 0, parallel_residual: bool = False, shared_layer_norm: bool = False, pre_post_layer_norm: bool = False, multi_query_attention: bool = False, num_heads_kv: Optional[int] = None, head_dim: Optional[int] = None, sliding_window: Optional[int] = None, quant_type: Optional[Quantization] = None, quant_group_size: Optional[int] = None, quant_bits: Optional[int] = None)
Creates a Transformer decoder model specification.
- Parameters
num_layers – Number of decoder layers.
num_heads – Number of attention heads.
pre_norm – Enable the pre-norm Transformer architecture.
activation – Activation to apply in the feed-forward network.
layernorm_embedding – Apply layer normalization after the embedding layer.
no_final_norm – Do not apply layer normalization after the last decoder block.
project_in_out – Add a linear layer after the embedding layer and another one before the final output projection.
with_relative_position – Enable relative position representations modules.
ffn_glu – Use gated linear units in the FFN layers as described in https://arxiv.org/abs/2002.05202.
rms_norm – Use the root mean square layer normalization.
alibi – Use attention with linear biases.
alibi_use_positive_positions – Use positive positions in the ALiBi definition.
scale_alibi – Apply the dot product scale factor to ALiBi.
rotary_dim – Apply rotary embeddings to these first N dimensions. If 0, rotary embeddings are applied to all dimensions.
rotary_interleave – Interleave the head dimensions when rotary embeddings are applied. Otherwise the head dimensions are sliced in half.
rotary_scaling_type – Type of RoPE scaling.
rotary_scaling_factor – Factor used in the RoPE scaling.
rotary_base – The base period of the rotary embeddings.
original_max_position_embeddings – The original max position embeddings for Su rope embeddings
max_position_embeddings – The max position embeddings for Su rope embeddings
parallel_residual – Use parallel residual connections in each layer block, as used by the GPT-J and GPT-NeoX models.
shared_layer_norm – When using parallel residual, share the input and post attention layer norms.
pre_post_layer_norm – add post layer norm for each pre norm layer
multi_query_attention – Use multi-query attention (alias for num_heads_kv=1).
num_heads_kv – Number of attention heads for the key and value.
head_dim – Number of head
sliding_window – max sequence length to retain KV cache
quant_type – quantization type used (like awq… for lower bit quantization)
quant_group_size – group size of the lower bit quantization
quant_bits – number of bit of the quantization (ex: 4bit)
- get_default_config()
Returns the default configuration used by this model.
- get_vocabulary_size()
Returns the vocabulary size expected by the model.
- optimize(quantization: Optional[str] = None) None
Recursively applies some optimizations to this layer:
Alias variables with the same shape and value.
Quantize weights.
- Parameters
quantization – Weight quantization scheme (possible values are: int8, int8_float32, int8_float16, int8_bfloat16, int16, float16, bfloat16, float32).
- register_file(path: str, filename: Optional[str] = None) None
Registers a file to be saved in the model directory.
- register_vocabulary(tokens: List[str]) None
Registers the vocabulary of tokens.
- Parameters
tokens – List of tokens.
- save(output_dir: str) None
Saves this model on disk.
- Parameters
output_dir – Output directory where the model is saved.
- validate() None
Verify that the required weights are set.
- Raises
ValueError – If a required weight is not set in the specification.
- variables(prefix: str = '', ordered: bool = False) Dict[str, ndarray]
Recursively returns the weights from this layer and its children.
- Parameters
prefix – Prefix to prepend to all variable names.
ordered – If set, an ordered list is returned instead.
- Returns
Dictionary mapping variables name to value.
- property config
The model configuration.
- property name
The name of the model specification.
- property revision
The model specification revision.
This value is incremented each time the weights layout of the model is changed (e.g. a weight is renamed).