# 模型

基类 [PreTrainedModel](/docs/transformers/main/zh/main_classes/model#transformers.PreTrainedModel) 实现了从本地文件或目录加载/保存模型的常用方法，或者从库上提供的预训练模型配置（从 HuggingFace 的 AWS S3 存储库下载）加载模型。

[PreTrainedModel](/docs/transformers/main/zh/main_classes/model#transformers.PreTrainedModel) 和 `TFPreTrainedModel` 还实现了一些所有模型共有的方法：

- 在向量词嵌入增加新词汇时调整输入标记（token）的大小
- 对模型的注意力头进行修剪。

其他的通用方法在 [ModuleUtilsMixin](/docs/transformers/main/zh/main_classes/model#transformers.modeling_utils.ModuleUtilsMixin)（用于 PyTorch 模型）中定义；文本生成方面的方法则定义在 [GenerationMixin](/docs/transformers/main/zh/main_classes/text_generation#transformers.GenerationMixin)（用于 PyTorch 模型）中。

## PreTrainedModel[[transformers.PreTrainedModel]]

#### transformers.PreTrainedModel[[transformers.PreTrainedModel]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L1151)

Base class for all models.

[PreTrainedModel](/docs/transformers/main/zh/main_classes/model#transformers.PreTrainedModel) takes care of storing the configuration of the models and handles methods for loading,
downloading and saving models as well as a few methods common to all models to:

- resize the input embeddings

Class attributes (overridden by derived classes):

- **config_class** ([PreTrainedConfig](/docs/transformers/main/zh/main_classes/configuration#transformers.PreTrainedConfig)) -- A subclass of [PreTrainedConfig](/docs/transformers/main/zh/main_classes/configuration#transformers.PreTrainedConfig) to use as configuration class
  for this model architecture.
- **base_model_prefix** (`str`) -- A string indicating the attribute associated to the base model in derived
  classes of the same architecture adding modules on top of the base model.
- **main_input_name** (`str`) -- The name of the principal input to the model (often `input_ids` for NLP
  models, `pixel_values` for vision models and `input_values` for speech models).
- **can_record_outputs** (dict):

push_to_hubtransformers.PreTrainedModel.push_to_hubhttps://github.com/huggingface/transformers/blob/main/src/transformers/utils/hub.py#L720[{"name": "repo_id", "val": ": str"}, {"name": "commit_message", "val": ": str | None = None"}, {"name": "commit_description", "val": ": str | None = None"}, {"name": "private", "val": ": bool | None = None"}, {"name": "token", "val": ": bool | str | None = None"}, {"name": "revision", "val": ": str | None = None"}, {"name": "create_pr", "val": ": bool = False"}, {"name": "max_shard_size", "val": ": int | str | None = '50GB'"}, {"name": "tags", "val": ": list[str] | None = None"}]- **repo_id** (`str`) --
  The name of the repository you want to push your model to. It should contain your organization name
  when pushing to a given organization.
- **commit_message** (`str`, *optional*) --
  Message to commit while pushing. Will default to `"Upload model"`.
- **commit_description** (`str`, *optional*) --
  The description of the commit that will be created
- **private** (`bool`, *optional*) --
  Whether to make the repo private. If `None` (default), the repo will be public unless the organization's default is private. This value is ignored if the repo already exists.
- **token** (`bool` or `str`, *optional*) --
  The token to use as HTTP bearer authorization for remote files. If `True` (default), will use the token generated
  when running `hf auth login` (stored in `~/.huggingface`).
- **revision** (`str`, *optional*) --
  Branch to push the uploaded files to.
- **create_pr** (`bool`, *optional*, defaults to `False`) --
  Whether or not to create a PR with the uploaded files or directly commit.
- **max_shard_size** (`int` or `str`, *optional*, defaults to `"50GB"`) --
  Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard
  will then be each of size lower than this size. If expressed as a string, needs to be digits followed
  by a unit (like `"5MB"`).
- **tags** (`list[str]`, *optional*) --
  List of tags to push on the Hub.0

Upload the model file to the 🤗 Model Hub.

Examples:

```python
from transformers import AutoModel

model = AutoModel.from_pretrained("google-bert/bert-base-cased")

# Push the model to your namespace with the name "my-finetuned-bert".
model.push_to_hub("my-finetuned-bert")

# Push the model to an organization with the name "my-finetuned-bert".
model.push_to_hub("huggingface/my-finetuned-bert")
```

**Parameters:**

repo_id (`str`) : The name of the repository you want to push your model to. It should contain your organization name when pushing to a given organization.

commit_message (`str`, *optional*) : Message to commit while pushing. Will default to `"Upload model"`.

commit_description (`str`, *optional*) : The description of the commit that will be created

private (`bool`, *optional*) : Whether to make the repo private. If `None` (default), the repo will be public unless the organization's default is private. This value is ignored if the repo already exists.

token (`bool` or `str`, *optional*) : The token to use as HTTP bearer authorization for remote files. If `True` (default), will use the token generated when running `hf auth login` (stored in `~/.huggingface`).

revision (`str`, *optional*) : Branch to push the uploaded files to.

create_pr (`bool`, *optional*, defaults to `False`) : Whether or not to create a PR with the uploaded files or directly commit.

max_shard_size (`int` or `str`, *optional*, defaults to `"50GB"`) : Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like `"5MB"`).

tags (`list[str]`, *optional*) : List of tags to push on the Hub.
#### add_model_tags[[transformers.PreTrainedModel.add_model_tags]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L1481)

Add custom tags into the model that gets pushed to the Hugging Face Hub. Will
not overwrite existing tags in the model.

Examples:

```python
from transformers import AutoModel

model = AutoModel.from_pretrained("google-bert/bert-base-cased")

model.add_model_tags(["custom", "custom-bert"])

# Push the model to your namespace with the name "my-custom-bert".
model.push_to_hub("my-custom-bert")
```

**Parameters:**

tags (`Union[list[str], str]`) : The desired tags to inject in the model
#### can_generate[[transformers.PreTrainedModel.can_generate]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L1592)

Returns whether this model can generate sequences with `.generate()` from the `GenerationMixin`.

Under the hood, on classes where this function returns True, some generation-specific changes are triggered:
for instance, the model instance will have a populated `generation_config` attribute.

**Returns:**

``bool``

Whether this model can generate sequences with `.generate()`.
#### dequantize[[transformers.PreTrainedModel.dequantize]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L1463)

Potentially dequantize the model in case it has been quantized by a quantization method that support
dequantization.
#### disable_input_require_grads[[transformers.PreTrainedModel.disable_input_require_grads]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2248)

Removes the `_require_grads_hook`.
#### enable_input_require_grads[[transformers.PreTrainedModel.enable_input_require_grads]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2204)

Enables the gradients for the input embeddings. This is useful for fine-tuning adapter weights while keeping
the model weights fixed.
#### from_pretrained[[transformers.PreTrainedModel.from_pretrained]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L3765)

Instantiate a pretrained pytorch model from a pre-trained model configuration.

The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated). To train
the model, you should first set it back in training mode with `model.train()`.

The warning *Weights from XXX not initialized from pretrained model* means that the weights of XXX do not come
pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning
task.

The warning *Weights from XXX not used in YYY* means that the layer XXX is not used by YYY, therefore those
weights are discarded.

Activate the special ["offline-mode"](https://huggingface.co/transformers/installation.html#offline-mode) to
use this method in a firewalled environment.

Examples:

```python
>>> from transformers import BertConfig, BertModel

>>> # Download model and configuration from huggingface.co and cache.
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = BertModel.from_pretrained("./test/saved_model/")
>>> # Update configuration during loading.
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
>>> assert model.config.output_attentions == True
```

**Parameters:**

pretrained_model_name_or_path (`str` or `os.PathLike`, *optional*) : Can be either:  - A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co. - A path to a *directory* containing model weights saved using [save_pretrained()](/docs/transformers/main/zh/main_classes/model#transformers.PreTrainedModel.save_pretrained), e.g., `./my_model_directory/`. - `None` if you are both providing the configuration and state dictionary (resp. with keyword arguments `config` and `state_dict`).

model_args (sequence of positional arguments, *optional*) : All remaining positional arguments will be passed to the underlying model's `__init__` method.

config (`Union[PreTrainedConfig, str, os.PathLike]`, *optional*) : Can be either:  - an instance of a class derived from [PreTrainedConfig](/docs/transformers/main/zh/main_classes/configuration#transformers.PreTrainedConfig), - a string or path valid as input to [from_pretrained()](/docs/transformers/main/zh/main_classes/configuration#transformers.PreTrainedConfig.from_pretrained).  Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:  - The model is a model provided by the library (loaded with the *model id* string of a pretrained model). - The model was saved using [save_pretrained()](/docs/transformers/main/zh/main_classes/model#transformers.PreTrainedModel.save_pretrained) and is reloaded by supplying the save directory. - The model is loaded by supplying a local directory as `pretrained_model_name_or_path` and a configuration JSON file named *config.json* is found in the directory.

state_dict (`dict[str, torch.Tensor]`, *optional*) : A state dictionary to use instead of a state dictionary loaded from saved weights file.  This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using [save_pretrained()](/docs/transformers/main/zh/main_classes/model#transformers.PreTrainedModel.save_pretrained) and [from_pretrained()](/docs/transformers/main/zh/main_classes/model#transformers.PreTrainedModel.from_pretrained) is not a simpler option.

cache_dir (`Union[str, os.PathLike]`, *optional*) : Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.

ignore_mismatched_sizes (`bool`, *optional*, defaults to `False`) : Whether or not to raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model (if for instance, you are instantiating a model with 10 labels from a checkpoint with 3 labels).

force_download (`bool`, *optional*, defaults to `False`) : Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.

proxies (`dict[str, str]`, *optional*) : A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.

output_loading_info(`bool`, *optional*, defaults to `False`) : Whether or not to also return a dictionary containing missing keys, unexpected keys and error messages.

local_files_only(`bool`, *optional*, defaults to `False`) : Whether or not to only look at local files (i.e., do not try to download the model).

token (`str` or `bool`, *optional*) : The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use the token generated when running `hf auth login` (stored in `~/.huggingface`).

revision (`str`, *optional*, defaults to `"main"`) : The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git.    To test a pull request you made on the Hub, you can pass `revision="refs/pr/<pr_number>"`.  

attn_implementation (`str`, *optional*) : The attention implementation to use in the model (if relevant). Can be any of - `"eager"` (manual implementation of the attention) - `"sdpa"` (using [`F.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html)) - `"flash_attention_2"` (using [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention)) - `"flash_attention_3"` (using [Dao-AILab/flash-attention/hopper](https://github.com/Dao-AILab/flash-attention/tree/main/hopper)) - `"flash_attention_4"` (using [Dao-AILab/flash-attention/flash_attn/cute](https://github.com/Dao-AILab/flash-attention/tree/main/flash_attn/cute)). By default, if available, SDPA will be used. The default is otherwise the manual `"eager"` implementation.  Accept HF kernel references in the form: /[@][:]  -  and  are any non-"/" and non-":" sequences. - "@" is optional (branch, tag, or commit-ish), e.g. "@main", "@v1.2.0", "@abc123". - ":" is optional and selects a function inside the kernel repo. - Both options can appear together and in this order only: @revision first, then :kernel_name. - We intentionally allow a leading "|" prefix (e.g., "flash|...") because the code strips it before loading; '|' is not excluded in the character classes here.  Examples that match: "org/model" "org/model@main" "org/model:custom_kernel" "org/model@v1.2.3:custom_kernel"

experts_implementation (`str`, *optional*) : The experts implementation to use in the model (if relevant). Can be any of:  - `"eager"` (sequential implementation of the experts matrix multiplications). - `"batched_mm"` (using [`torch.bmm`](https://pytorch.org/docs/stable/generated/torch.bmm.html)). - `"grouped_mm"` (using [`torch.nn.functional.grouped_mm`](https://docs.pytorch.org/docs/main/generated/torch.nn.functional.grouped_mm.html)).  By default, if the model supports it, `"grouped_mm"` will be used. The default is otherwise the manual `"eager"` implementation.
#### get_compiled_call[[transformers.PreTrainedModel.get_compiled_call]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L4618)

Return a `torch.compile`'d version of `self.__call__`. This is useful to dynamically choose between
non-compiled/compiled `forward` during inference, especially to switch between prefill (where we don't
want to use compiled version to avoid recomputing the graph with new shapes) and iterative decoding
(where we want the speed-ups of compiled version with static shapes).
#### get_decoder[[transformers.PreTrainedModel.get_decoder]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2321)

Best-effort lookup of the *decoder* module.

Order of attempts (covers ~85 % of current usages):

1. `self.decoder/self.language_model/self.text_model`
2. `self.base_model`                  (many wrappers store the decoder here)
3. `self.base_model.get_decoder()`    (nested wrappers)
4. fallback: raise for the few exotic models that need a bespoke rule
#### get_encoder[[transformers.PreTrainedModel.get_encoder]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2263)

Best-effort lookup of the *encoder* module. If provided with `modality` argument,
it looks for a modality-specific encoder in multimodal models (e.g. "image_encoder")
By default the function returns model's text encoder if any, and otherwise returns `self`.

Possible `modality` values are "image", "video" and "audio".
#### get_expanded_tied_weights_keys[[transformers.PreTrainedModel.get_expanded_tied_weights_keys]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2474)

Return the expanded tied weight keys (in case they contain modules or regex patterns) for only the current
model, or recursively for all submodels if `all_submodels=True` (i.e. it will re-check the config values for all
submodels).

For almost all models, we only require to tie the embeddings, so the model has an internal property
`_tied_weights_keys = {"lm_head.weight": "model.embed_tokens.weight"}`. In this case, the mapping is already
"expanded", i.e. it already contains full parameters, and this function will simply return a copy of the property.
For more complex patterns, e.g. for `DFineForObjectDetection`, we have the following attribute

```
_tied_weights_keys = {
    r"bbox_embed.(?![0])\d+": "bbox_embed.0",
    r"class_embed.(?![0])\d+": "class_embed.0",
    "model.decoder.class_embed": "class_embed",
    "model.decoder.bbox_embed": "bbox_embed",
}
```

In this case, the function looks up all the model's parameters and buffers, and matches all the params,

returning the following:
```
{
    'bbox_embed.1.layers.0.bias': 'bbox_embed.0.layers.0.bias',
    'bbox_embed.1.layers.0.weight': 'bbox_embed.0.layers.0.weight',
    'bbox_embed.1.layers.1.bias': 'bbox_embed.0.layers.1.bias',
    'bbox_embed.1.layers.1.weight': 'bbox_embed.0.layers.1.weight',
    'bbox_embed.1.layers.2.bias': 'bbox_embed.0.layers.2.bias',
    'bbox_embed.1.layers.2.weight': 'bbox_embed.0.layers.2.weight',
    'bbox_embed.2.layers.0.bias': 'bbox_embed.0.layers.0.bias',
    'bbox_embed.2.layers.0.weight': 'bbox_embed.0.layers.0.weight',
    ...
    'class_embed.1.bias': 'class_embed.0.bias',
    'class_embed.1.weight': 'class_embed.0.weight',
    'class_embed.2.bias': 'class_embed.0.bias',
    'class_embed.2.weight': 'class_embed.0.weight',
    ...
    'model.decoder.class_embed.0.bias': 'class_embed.0.bias',
    'model.decoder.class_embed.0.weight': 'class_embed.0.weight',
    'model.decoder.class_embed.1.bias': 'class_embed.0.bias',
    'model.decoder.class_embed.1.weight': 'class_embed.0.weight',
    ...
    'model.decoder.bbox_embed.0.layers.0.bias': 'bbox_embed.0.layers.0.bias',
    'model.decoder.bbox_embed.0.layers.0.weight': 'bbox_embed.0.layers.0.weight',
    'model.decoder.bbox_embed.0.layers.1.bias': 'bbox_embed.0.layers.1.bias',
    'model.decoder.bbox_embed.0.layers.1.weight': 'bbox_embed.0.layers.1.weight',
    ...
}
```

i.e. all the parameters matching the regex and modules patterns in `_tied_weights_keys`
#### get_memory_footprint[[transformers.PreTrainedModel.get_memory_footprint]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L3553)

Get the memory footprint of a model. This will return the memory footprint of the current model in bytes.
Useful to benchmark the memory footprint of the current model and design some tests. Solution inspired from the
PyTorch discussions: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2

**Parameters:**

return_buffers (`bool`, *optional*, defaults to `True`) : Whether to return the size of the buffer tensors in the computation of the memory footprint. Buffers are tensors that do not require gradients and not registered as parameters. E.g. mean and std in batch norm layers. Please see: https://discuss.pytorch.org/t/what-pytorch-means-by-buffers/120266/2
#### get_parameter_or_buffer[[transformers.PreTrainedModel.get_parameter_or_buffer]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L4787)

Return the parameter or buffer given by `target` if it exists, otherwise throw an error. This combines
`get_parameter()` and `get_buffer()` in a single handy function. If the target is an `_extra_state` attribute,
it will return the extra state provided by the module. Note that it only work if `target` is a leaf of the model.
#### gradient_checkpointing_disable[[transformers.PreTrainedModel.gradient_checkpointing_disable]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L3224)

Deactivates gradient checkpointing for the current model.
#### gradient_checkpointing_enable[[transformers.PreTrainedModel.gradient_checkpointing_enable]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L3160)

Activates gradient checkpointing for the current model.

We pass the `__call__` method of the modules instead of `forward` because `__call__` attaches all the hooks of
the module. https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690/2

**Parameters:**

gradient_checkpointing_kwargs (dict, *optional*) : Additional keyword arguments passed along to the `torch.utils.checkpoint.checkpoint` function.
#### init_weights[[transformers.PreTrainedModel.init_weights]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L3148)

Initialize and tie the weights if needed. If using a custom `PreTrainedModel`, you need to implement any
initialization logic in `_init_weights`.
#### initialize_weights[[transformers.PreTrainedModel.initialize_weights]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2444)

This is equivalent to calling `self.apply(self._initialize_weights)`, but correctly handles composite models.
This function dynamically dispatches the correct `init_weights` function to the modules as we advance in the
module graph along the recursion. It can handle an arbitrary number of sub-models. Without it, every composite
model would have to recurse a second time on all sub-models explicitly in the outer-most `_init_weights`, which
is extremely error prone and inefficient.
#### kernelize[[transformers.PreTrainedModel.kernelize]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L4572)

Temporarily register hidden kernel wrappers so `kernelize` can discover and replace them.
#### mark_tied_weights_as_initialized[[transformers.PreTrainedModel.mark_tied_weights_as_initialized]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L4759)

Adds the `_is_hf_initialized` flag on parameters that will be tied, in order to avoid initializing them
later as they will be tied (overwritten) anyway.
This is very important as most embeddings are tied, and they are huge params (vocabularies are often 256k), so
running inits on them is very costly.
#### named_non_persistent_buffers[[transformers.PreTrainedModel.named_non_persistent_buffers]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L4811)

Similar to `named_buffers`, but only yield non-persistent ones. It is handy as it's not perfectly straightforward
to know if they are persistent or not
#### post_init[[transformers.PreTrainedModel.post_init]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L1349)

A method executed at the end of each Transformer model initialization, to execute code that needs the model's
modules properly initialized (such as weight initialization).
It is also used to obtain all correct static properties (parallelism plans, tied_weights_keys, _keep_in_fp32_modules, etc)
correctly in the case of composite models (that is, the top level model should know about those properties from its children).
#### register_for_auto_class[[transformers.PreTrainedModel.register_for_auto_class]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L4457)

Register this class with a given auto class. This should only be used for custom models as the ones in the
library are already mapped with an auto class.

**Parameters:**

auto_class (`str` or `type`, *optional*, defaults to `"AutoModel"`) : The auto class to register this new model with.
#### resize_token_embeddings[[transformers.PreTrainedModel.resize_token_embeddings]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2683)

Resizes input token embeddings matrix of the model if `new_num_tokens != config.vocab_size`.

Takes care of tying weights embeddings afterwards if the model class has a `tie_weights()` method.

**Parameters:**

new_num_tokens (`int`, *optional*) : The new number of tokens in the embedding matrix. Increasing the size will add newly initialized vectors at the end. Reducing the size will remove vectors from the end. If not provided or `None`, just returns a pointer to the input tokens `torch.nn.Embedding` module of the model without doing anything.

pad_to_multiple_of (`int`, *optional*) : If set will pad the embedding matrix to a multiple of the provided value.If `new_num_tokens` is set to `None` will just pad the embedding to a multiple of `pad_to_multiple_of`.  This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability `>= 7.5` (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc

mean_resizing (`bool`) : Whether to initialize the added embeddings from a multivariate normal distribution that has old embeddings' mean and covariance or to initialize them with a normal distribution that has a mean of zero and std equals `config.initializer_range`.  Setting `mean_resizing` to `True` is useful when increasing the size of the embeddings of causal language models, where the generated tokens' probabilities won't be affected by the added embeddings because initializing the new embeddings with the old embeddings' mean will reduce the kl-divergence between the next token probability before and after adding the new embeddings. Refer to this article for more information: https://nlp.stanford.edu/~johnhew/vocab-expansion.html

**Returns:**

``torch.nn.Embedding``

Pointer to the input tokens Embeddings Module of the model.
#### save_pretrained[[transformers.PreTrainedModel.save_pretrained]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L3251)

Save a model and its configuration file to a directory, so that it can be re-loaded using the
[from_pretrained()](/docs/transformers/main/zh/main_classes/model#transformers.PreTrainedModel.from_pretrained) class method.

**Parameters:**

save_directory (`str` or `os.PathLike`) : Directory to which to save. Will be created if it doesn't exist.

is_main_process (`bool`, *optional*, defaults to `True`) : Whether the process calling this is the main process or not. Useful when in distributed training like TPUs and need to call this function on all processes. In this case, set `is_main_process=True` only on the main process to avoid race conditions.

state_dict (nested dictionary of `torch.Tensor`) : The state dictionary of the model to save. Will default to `self.state_dict()`, but can be used to only save parts of the model or if special precautions need to be taken when recovering the state dictionary of a model (like when using model parallelism).

push_to_hub (`bool`, *optional*, defaults to `False`) : Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to with `repo_id` (will default to the name of `save_directory` in your namespace).

max_shard_size (`int` or `str`, *optional*, defaults to `"50GB"`) : The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like `"5MB"`).    If a single weight of the model is bigger than `max_shard_size`, it will be in its own checkpoint shard which will be bigger than `max_shard_size`.   

variant (`str`, *optional*) : If specified, weights are saved in the format model..safetensors.

token (`str` or `bool`, *optional*) : The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use the token generated when running `hf auth login` (stored in `~/.huggingface`).

save_peft_format (`bool`, *optional*, defaults to `True`) : For backward compatibility with PEFT library, in case adapter weights are attached to the model, all keys of the state dict of adapters needs to be prepended with `base_model.model`. Advanced users can disable this behaviours by setting `save_peft_format` to `False`.

save_original_format (`bool`, *optional*, defaults to `True`) : For backward compatibility with the previous versions of `transformers` you can save the checkpoint with its reverse mapping. The reverse mapping needs to exists even if the model was loaded from a None legacy checkpoint.

kwargs (`dict[str, Any]`, *optional*) : Additional key word arguments passed along to the [push_to_hub()](/docs/transformers/main/zh/main_classes/model#transformers.utils.PushToHubMixin.push_to_hub) method.
#### set_attn_implementation[[transformers.PreTrainedModel.set_attn_implementation]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2058)

Set the requested `attn_implementation` for this model.

**Parameters:**

attn_implementation (`str` or `dict`) : The attention implementation to set for this model. It can be either a `str`, in which case it will be dispatched to all submodels if relevant, or a `dict` where keys are the sub_configs name, in which case each submodel will dispatch the corresponding value.

allow_all_kernels (`bool`, optional) : Whether to load kernels from unverified hub repos, if `attn_implementation` is a custom kernel outside of the `kernels-community` hub repository.
#### set_decoder[[transformers.PreTrainedModel.set_decoder]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2344)

Symmetric setter. Mirrors the lookup logic used in `get_decoder`.
#### set_encoder[[transformers.PreTrainedModel.set_encoder]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2295)

Symmetric setter. Mirrors the lookup logic used in `get_encoder`.
#### set_experts_implementation[[transformers.PreTrainedModel.set_experts_implementation]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2160)

Set the requested `experts_implementation` for this model.

**Parameters:**

experts_implementation (`str` or `dict`) : The experts implementation to set for this model. It can be either a `str`, in which case it will be dispatched to all submodels if relevant, or a `dict` where keys are the sub_configs name, in which case each submodel will dispatch the corresponding value.
#### set_use_kernels[[transformers.PreTrainedModel.set_use_kernels]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L3726)

Set whether or not to use the `kernels` library to kernelize some layers of the model.

**Parameters:**

use_kernels (`bool`) : Whether or not to use the `kernels` library to kernelize some layers of the model.

kernel_config (`KernelConfig`, *optional*) : The kernel configuration to use to kernelize the model. If `None`, the default kernel mapping will be used.
#### tie_weights[[transformers.PreTrainedModel.tie_weights]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2588)

Tie the model weights. If `recompute_mapping=False` (default when called internally), it will rely on the
`model.all_tied_weights_keys` attribute, containing the `{target: source}` mapping for the tied params.
If `recompute_mapping=True`, it will re-check all internal submodels and their config to determine the params
that need to be tied. This is the default when `model.tie_weights()` is called on its own, outside of
`__init__`, and `from_pretrained`, in case the config values were changed somewhere.

Note that during `from_pretrained`, tying is *symmetric*: if the mapping says "tie target -> source" but
`source` is missing in the checkpoint while `target` exists, we *swap* source and target so we can still
tie everything to the parameter that actually exists.
#### warn_if_padding_and_no_attention_mask[[transformers.PreTrainedModel.warn_if_padding_and_no_attention_mask]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L4479)

Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given.

### 大模型加载

在 Transformers 4.20.0 中，[from_pretrained()](/docs/transformers/main/zh/main_classes/model#transformers.PreTrainedModel.from_pretrained) 方法已重新设计，以适应使用 [Accelerate](https://huggingface.co/docs/accelerate/big_modeling) 加载大型模型的场景。这需要您使用的 Accelerate 和 PyTorch 版本满足： Accelerate >= 0.9.0， PyTorch >= 1.9.0。除了创建完整模型，然后在其中加载预训练权重（这会占用两倍于模型大小的内存空间，一个用于随机初始化模型，一个用于预训练权重），我们提供了一种选项，将模型创建为空壳，然后只有在加载预训练权重时才实例化其参数。

此外，如果内存不足以放下加载整个模型（目前仅适用于推理），您可以直接将模型放置在不同的设备上。使用 `device_map="auto"`，Accelerate 将确定将每一层放置在哪个设备上，以最大化使用最快的设备（GPU），并将其余部分卸载到 CPU，甚至硬盘上（如果您没有足够的 GPU 内存 或 CPU 内存）。即使模型分布在几个设备上，它也将像您通常期望的那样运行。

```python
from transformers import AutoModelForSeq2SeqLM

t0pp = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp", device_map="auto")
```

您可以通过 `hf_device_map` 属性来查看模型是如何在设备上分割的：

```python
t0pp.hf_device_map
{'shared': 0,
 'decoder.embed_tokens': 0,
 'encoder': 0,
 'decoder.block.0': 0,
 'decoder.block.1': 1,
 'decoder.block.2': 1,
 'decoder.block.3': 1,
 'decoder.block.4': 1,
 'decoder.block.5': 1,
 'decoder.block.6': 1,
 'decoder.block.7': 1,
 'decoder.block.8': 1,
 'decoder.block.9': 1,
 'decoder.block.10': 1,
 'decoder.block.11': 1,
 'decoder.block.12': 1,
 'decoder.block.13': 1,
 'decoder.block.14': 1,
 'decoder.block.15': 1,
 'decoder.block.16': 1,
 'decoder.block.17': 1,
 'decoder.block.18': 1,
 'decoder.block.19': 1,
 'decoder.block.20': 1,
 'decoder.block.21': 1,
 'decoder.block.22': 'cpu',
 'decoder.block.23': 'cpu',
 'decoder.final_layer_norm': 'cpu',
 'decoder.dropout': 'cpu',
 'lm_head': 'cpu'}
```

您还可以按照相同的格式（一个层名称到设备的映射关系的字典）编写自己的设备映射规则。它应该将模型的所有参数映射到给定的设备上，如果该层的所有子模块都在同一设备上，您不必详细说明其中所有子模块的位置。例如，以下设备映射对于 T0pp 将正常工作（只要您有 GPU 内存）：

```python
device_map = {"shared": 0, "encoder": 0, "decoder": 1, "lm_head": 1}
```

另一种减少模型内存影响的方法是以较低精度的 dtype（例如 `torch.float16`）实例化它，或者使用下面介绍的直接量化技术。

### 模型实例化 dtype

在 PyTorch 下，模型通常以 `torch.float32` 格式实例化。如果尝试加载权重为 fp16 的模型，这可能会导致问题，因为它将需要两倍的内存。为了克服此限制，您可以使用 `dtype` 参数显式传递所需的 `dtype`：

```python
model = T5ForConditionalGeneration.from_pretrained("t5", dtype=torch.float16)
```
或者，如果您希望模型始终以最优的内存模式加载，则可以使用特殊值 `"auto"`，然后 `dtype` 将自动从模型的权重中推导出：
```python
model = T5ForConditionalGeneration.from_pretrained("t5", dtype="auto")
```

也可以通过以下方式告知从头开始实例化的模型要使用哪种 `dtype`：

```python
config = T5Config.from_pretrained("t5")
model = AutoModel.from_config(config)
```

由于 PyTorch 的设计，此功能仅适用于浮点类型。

## ModuleUtilsMixin[[transformers.modeling_utils.ModuleUtilsMixin]]

#### transformers.modeling_utils.ModuleUtilsMixin[[transformers.modeling_utils.ModuleUtilsMixin]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L900)

A few utilities for `torch.nn.Modules`, to be used as a mixin.

get_extended_attention_masktransformers.modeling_utils.ModuleUtilsMixin.get_extended_attention_maskhttps://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L964[{"name": "attention_mask", "val": ": Tensor"}, {"name": "input_shape", "val": ": tuple"}, {"name": "dtype", "val": ": torch.dtype | None = None"}]- **attention_mask** (`torch.Tensor`) --
  Mask with ones indicating tokens to attend to, zeros for tokens to ignore.
- **input_shape** (`tuple[int]`) --
  The shape of the input to the model.0`torch.Tensor` The extended attention mask, with a the same dtype as `attention_mask.dtype`.

Makes broadcastable attention and causal masks so that future and masked tokens are ignored.

**Parameters:**

attention_mask (`torch.Tensor`) : Mask with ones indicating tokens to attend to, zeros for tokens to ignore.

input_shape (`tuple[int]`) : The shape of the input to the model.

**Returns:**

`torch.Tensor` The extended attention mask, with a the same dtype as `attention_mask.dtype`.
#### invert_attention_mask[[transformers.modeling_utils.ModuleUtilsMixin.invert_attention_mask]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L920)

Invert an attention mask (e.g., switches 0. and 1.).

**Parameters:**

encoder_attention_mask (`torch.Tensor`) : An attention mask.

**Returns:**

``torch.Tensor``

The inverted attention mask.
#### num_parameters[[transformers.modeling_utils.ModuleUtilsMixin.num_parameters]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L1013)

Get number of (optionally, trainable or non-embeddings) parameters in the module.

**Parameters:**

only_trainable (`bool`, *optional*, defaults to `False`) : Whether or not to return only the number of trainable parameters 

exclude_embeddings (`bool`, *optional*, defaults to `False`) : Whether or not to return only the number of non-embeddings parameters

**Returns:**

``int``

The number of parameters.

## 推送到 Hub[[transformers.utils.PushToHubMixin]]
#### transformers.utils.PushToHubMixin[[transformers.utils.PushToHubMixin]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/utils/hub.py#L626)

A Mixin containing the functionality to push a model or tokenizer to the hub.

push_to_hubtransformers.utils.PushToHubMixin.push_to_hubhttps://github.com/huggingface/transformers/blob/main/src/transformers/utils/hub.py#L720[{"name": "repo_id", "val": ": str"}, {"name": "commit_message", "val": ": str | None = None"}, {"name": "commit_description", "val": ": str | None = None"}, {"name": "private", "val": ": bool | None = None"}, {"name": "token", "val": ": bool | str | None = None"}, {"name": "revision", "val": ": str | None = None"}, {"name": "create_pr", "val": ": bool = False"}, {"name": "max_shard_size", "val": ": int | str | None = '50GB'"}, {"name": "tags", "val": ": list[str] | None = None"}]- **repo_id** (`str`) --
  The name of the repository you want to push your {object} to. It should contain your organization name
  when pushing to a given organization.
- **commit_message** (`str`, *optional*) --
  Message to commit while pushing. Will default to `"Upload {object}"`.
- **commit_description** (`str`, *optional*) --
  The description of the commit that will be created
- **private** (`bool`, *optional*) --
  Whether to make the repo private. If `None` (default), the repo will be public unless the organization's default is private. This value is ignored if the repo already exists.
- **token** (`bool` or `str`, *optional*) --
  The token to use as HTTP bearer authorization for remote files. If `True` (default), will use the token generated
  when running `hf auth login` (stored in `~/.huggingface`).
- **revision** (`str`, *optional*) --
  Branch to push the uploaded files to.
- **create_pr** (`bool`, *optional*, defaults to `False`) --
  Whether or not to create a PR with the uploaded files or directly commit.
- **max_shard_size** (`int` or `str`, *optional*, defaults to `"50GB"`) --
  Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard
  will then be each of size lower than this size. If expressed as a string, needs to be digits followed
  by a unit (like `"5MB"`).
- **tags** (`list[str]`, *optional*) --
  List of tags to push on the Hub.0

Upload the {object_files} to the 🤗 Model Hub.

Examples:

```python
from transformers import {object_class}

{object} = {object_class}.from_pretrained("google-bert/bert-base-cased")

# Push the {object} to your namespace with the name "my-finetuned-bert".
{object}.push_to_hub("my-finetuned-bert")

# Push the {object} to an organization with the name "my-finetuned-bert".
{object}.push_to_hub("huggingface/my-finetuned-bert")
```

**Parameters:**

repo_id (`str`) : The name of the repository you want to push your {object} to. It should contain your organization name when pushing to a given organization.

commit_message (`str`, *optional*) : Message to commit while pushing. Will default to `"Upload {object}"`.

commit_description (`str`, *optional*) : The description of the commit that will be created

private (`bool`, *optional*) : Whether to make the repo private. If `None` (default), the repo will be public unless the organization's default is private. This value is ignored if the repo already exists.

token (`bool` or `str`, *optional*) : The token to use as HTTP bearer authorization for remote files. If `True` (default), will use the token generated when running `hf auth login` (stored in `~/.huggingface`).

revision (`str`, *optional*) : Branch to push the uploaded files to.

create_pr (`bool`, *optional*, defaults to `False`) : Whether or not to create a PR with the uploaded files or directly commit.

max_shard_size (`int` or `str`, *optional*, defaults to `"50GB"`) : Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like `"5MB"`).

tags (`list[str]`, *optional*) : List of tags to push on the Hub.

