Components & Configs

GLiClassModelConfig ^[source]

The configuration class used to define the architecture and behavior of a GLiClass model. It inherits from PretrainedConfig.

This class is used to control key architectural aspects, including the encoder, label encoder, projection strategy, scorers or LSTM layers.

Parameters

`encoder_config`

dict|None, optional

Encoder model configuration

`encoder_model`

str|None, optional

Base encoder model identifier from Hugging Face Hub or local path.

`label_model_config`

dict|None, optional

Label encoder model configuration

note

Requiered for bi-encoder architecture

`label_model_name`

str|None, optional

Encoder model to be used for embedding label texts. Can be a model ID or local path.

note

Requiered for bi-encoder architecture

`class_token_index`

int, optional, defaults to -1

Сlass token index

`text_token_index`

int, optional, defaults to -1

Text token index

`ignore_index`

int, optional, defaults to -100

Index to ignore when calculating loss

`hidden_size`

int|None, optional

Dimensionality of hidden representations in internal layers.

`projector_hidden_act`

str, optional, defaults to gelu

Projector activation function, for available options, please see transformers repository.

`vocab_size`

int|None, optional

Dictionary size (taken from encoder_config)

`problem_type`

str|None, optional, defaults to single_label_classification

Defines the type of classification problem and determines the loss function used during training.

Available options:

"regression" — Regression task for predicting continuous values. Uses MSELoss for single or multi-output regression.
"single_label_classification" — Standard multi-class classification where each sample belongs to exactly one class. Uses CrossEntropyLoss. Default option.
"multi_label_classification" — Multi-label classification where each sample can belong to multiple classes simultaneously. Uses focal loss with configurable alpha and gamma parameters.
None - Automatic problem type detection based on label structure: regression for single label, CrossEntropyLoss for 1D labels, and LogSoftmax for multi-dimensional soft labels.

`max_num_classes`

Any, optional, defaults to 25

Max classes for classification.

`use_lstm`

bool, optional, defaults to False

Flag to choose LSTM usage in GLiClass

`initializer_range`

float, optional, defaults to 0.03

Weights initialization range

`scorer_type`

str, optional, defaults to simple

Defines the scoring mechanism used to compute similarity between text representations and class label representations.

Available options:

"weighted-dot" - Projects both text and labels into 2x hidden size, creates two representation pairs, concatenates them with element-wise product, and applies MLP.
"simple" - Computes direct dot product between text and label embeddings using einsum. Fastest and most straightforward approach, suitable for most tasks.
"mlp" - Concatenates text and label representations, then passes through 3-layer MLP (2×hidden_size → mlp_hidden_size → mlp_hidden_size/2 → 1).
"hopfield" - Uses Hopfield attention mechanism with Q/K/V projections and iterative refinement of label representations.

`pooling_strategy`

str, optional, defaults to first

Defines the pooling strategy used to aggregate token-level representations into a single sequence representation.

Available options:

"max" - Applies global max pooling across the sequence dimension.
"first" - Takes the first token's embedding.
"last" - Takes the last token's embedding.
"avg" - Applies global average pooling with optional attention mask support.
"sum" - Applies global sum pooling with optional attention mask support.
"rms" - Applies Root Mean Square pooling with attention mask support.
"abs_max" - Applies max pooling on absolute values with attention mask support.
"abs_avg" - Applies average pooling on absolute values with attention mask support.

`focal_loss_alpha`

float, optional, defaults to 0.5

Alpha parameter for the focal loss.

`focal_loss_gamma`

float, optional, defaults to 2

Gamma parameter for the focal loss.

`focal_loss_reduction`

str|None, optional, defaults to None

Selects reduction method for focal loss calculation.

Available options:

"none" - Returns the loss tensor without any reduction applied.
"sum" - Returns the mean loss normalized by the number of valid (non-ignored) elements.
"mean" - Returns the sum of all loss elements.

`logit_scale_init_value`

float, optional, defaults to 2.6592

Initial value of logit scale

`normalize_features`

bool, optional, defaults to False

Flag for normalizing features

`extract_text_features`

bool, optional, defaults to False

Flag for extracting text features

`contrastive_loss_coef`

float, optional, defaults to 0

Contrastive loss coefficient

`architecture_type`

str, optional, defaults to uni-encoder

Defines the architectural approach used for encoding text and class representations in the GLiClass model.

Available options:

"uni-encoder" — Single encoder processes both text and class tokens in the same sequence. Classes are embedded as special tokens within the input.
"bi-encoder" — Separate encoders for text and class labels. Text encoder processes input text, while label encoder processes class descriptions independently. Allows for different model architectures for each component.
"bi-encoder-fused" — Extension of bi-encoder where class embeddings from the label encoder are fused back into the text encoder at class token positions.
"encoder-decoder" — Uses encoder-decoder architecture where text is processed by the encoder and class information by the decoder. Requires models with is_encoder_decoder=True configuration.

`prompt_first`

bool, optional, defaults to False

Flag to choose whether to place labels at the beginning

`squeeze_layers`

bool, optional, defaults to False

Flag to choose whether to compress layers

`embed_class_token`

Any, optional, defaults to True

If True, <<LABEL>> tokens will be pooled for each label, if False, the first token of each label will be pooled as label embedding.

TrainingArguments ^[source]

This configuration class used to define the behavior of a GLiClass model during training. It inherits from TrainingArguments.

Parameters

`cache_dir`

str|None, optional defaults to None

Directory to store cache files. If specified, the model and tokenizer would be loaded form local cache_dir.

`others_lr`

float, optional

Separate learning rate for all model parameters except the encoder (e.g. for projectors, classification heads and other components).

`optim`

float, optional, defaults to adamw_torch

Optimizer to be used during training.

`others_weight_decay`

float, optional, defaults to 0.0

Weight decay factor for all model parameters except the encoder. Only applies to parameters that use others_lr.

RLTrainerConfig ^[source]

This configuration class used to define the behavior of a GLiClass model during reinforcement learning. It inherits from gliclass.TrainingArguments.

Parameters

`cliprange`

float, optional, defaults to 0.2

Clipping range for PPO policy gradient updates to prevent large policy changes.

`num_rl_iters`

int, optional, defaults to 3

Number of reinforcement learning iterations per training step.

`gamma`

float, optional, defaults to -1

Focal loss gamma parameter for hard example mining. If set to -1, focal loss is disabled.

`alpha`

float, optional, defaults to -1

Focal loss alpha parameter for class balancing. If set to -1, focal loss is disabled.

`labels_smoothing`

float, optional, defaults to -1

Label smoothing factor for predicted actions. If set to -1, label smoothing is disabled.

`entropy_beta`

float, optional, defaults to -1

Coefficient for entropy regularization term in the loss function. If set to -1, entropy regularization is disabled.

`kl_beta`

float, optional, defaults to -1

Coefficient for KL-divergence regularization between current and reference model predictions. If set to -1, KL regularization is disabled.

`get_actions`

str, optional, defaults to bernoulli

Method for sampling actions from model predictions.

Available options

"bernoulli" - for probabilistic sampling
"threshold" - for deterministic thresholding.

`threshold`

float, optional, defaults to 0.5

Threshold value for converting probabilities to binary predictions when get_actions is set to "threshold".

RLTrainer ^[source]

This configuration class used to define the behavior of a GLiClass model during reinforcement learning. It inherits from gliclass.Trainer.

Parameters

`value_model`

torch.nn.Module|None, optional, defaults to None

Optional value function model for advantage estimation in reinforcement learning.

`reference_model`

ZeroShotClassificationPipeline|TransformersClassificationPipeline|None, optional, defaults to None

Reference model for computing KL-divergence regularization during training.

`reward_components`

Optional, optional, defaults to None

List of reward functions as (name, function) tuples. If None, defaults to F1-score reward.

GLiClassModelConfig [source]​

Parameters​

encoder_config​

encoder_model​

label_model_config​

label_model_name​

class_token_index​

text_token_index​

ignore_index​

hidden_size​

projector_hidden_act​

vocab_size​

problem_type​

max_num_classes​

use_lstm​

initializer_range​

scorer_type​

pooling_strategy​

focal_loss_alpha​

focal_loss_gamma​

focal_loss_reduction​

logit_scale_init_value​

normalize_features​

extract_text_features​

contrastive_loss_coef​

architecture_type​

prompt_first​

squeeze_layers​

embed_class_token​

TrainingArguments [source]​

Parameters​

cache_dir​

others_lr​

optim​

others_weight_decay​

RLTrainerConfig [source]​

Parameters​

cliprange​

num_rl_iters​

gamma​

alpha​

labels_smoothing​

entropy_beta​

kl_beta​

get_actions​

threshold​

RLTrainer [source]​

Parameters​

value_model​

reference_model​

reward_components​

GLiClassModelConfig ^[source]

Parameters

`encoder_config`

`encoder_model`

`label_model_config`

`label_model_name`

`class_token_index`

`text_token_index`

`ignore_index`

`hidden_size`

`projector_hidden_act`

`vocab_size`

`problem_type`

`max_num_classes`

`use_lstm`

`initializer_range`

`scorer_type`

`pooling_strategy`

`focal_loss_alpha`

`focal_loss_gamma`

`focal_loss_reduction`

`logit_scale_init_value`

`normalize_features`

`extract_text_features`

`contrastive_loss_coef`

`architecture_type`

`prompt_first`

`squeeze_layers`

`embed_class_token`

TrainingArguments ^[source]

Parameters

`cache_dir`

`others_lr`

`optim`

`others_weight_decay`

RLTrainerConfig ^[source]

Parameters

`cliprange`

`num_rl_iters`

`gamma`

`alpha`

`labels_smoothing`

`entropy_beta`

`kl_beta`

`get_actions`

`threshold`

RLTrainer ^[source]

Parameters

`value_model`

`reference_model`

`reward_components`