Components & Configs
GLiClassModelConfig [source]
The configuration class used to define the architecture and behavior of a GLiClass model. It inherits from PretrainedConfig.
This class is used to control key architectural aspects, including the encoder, label encoder, projection strategy, scorers or LSTM layers.
Parameters
encoder_config
dict|None, optional
Encoder model configuration
encoder_model
str|None, optional
Base encoder model identifier from Hugging Face Hub or local path.
label_model_config
dict|None, optional
Label encoder model configuration
Requiered for bi-encoder architecture
label_model_name
str|None, optional
Encoder model to be used for embedding label texts. Can be a model ID or local path.
Requiered for bi-encoder architecture
class_token_index
int, optional, defaults to -1
Сlass token index
text_token_index
int, optional, defaults to -1
Text token index
ignore_index
int, optional, defaults to -100
Index to ignore when calculating loss
hidden_size
int|None, optional
Dimensionality of hidden representations in internal layers.
projector_hidden_act
str, optional, defaults to gelu
Projector activation function, for available options, please see transformers repository.
vocab_size
int|None, optional
Dictionary size (taken from encoder_config)
problem_type
str|None, optional, defaults to single_label_classification
Defines the type of classification problem and determines the loss function used during training.
Available options:
"regression"— Regression task for predicting continuous values. Uses MSELoss for single or multi-output regression."single_label_classification"— Standard multi-class classification where each sample belongs to exactly one class. Uses CrossEntropyLoss. Default option."multi_label_classification"— Multi-label classification where each sample can belong to multiple classes simultaneously. Uses focal loss with configurable alpha and gamma parameters.None- Automatic problem type detection based on label structure: regression for single label, CrossEntropyLoss for 1D labels, and LogSoftmax for multi-dimensional soft labels.
max_num_classes
Any, optional, defaults to 25
Max classes for classification.
use_lstm
bool, optional, defaults to False
Flag to choose LSTM usage in GLiClass
initializer_range
float, optional, defaults to 0.03
Weights initialization range
scorer_type
str, optional, defaults to simple
Defines the scoring mechanism used to compute similarity between text representations and class label representations.
Available options:
"weighted-dot"- Projects both text and labels into 2x hidden size, creates two representation pairs, concatenates them with element-wise product, and applies MLP."simple"- Computes direct dot product between text and label embeddings using einsum. Fastest and most straightforward approach, suitable for most tasks."mlp"- Concatenates text and label representations, then passes through 3-layer MLP (2×hidden_size → mlp_hidden_size → mlp_hidden_size/2 → 1)."hopfield"- Uses Hopfield attention mechanism with Q/K/V projections and iterative refinement of label representations.
pooling_strategy
str, optional, defaults to first
Defines the pooling strategy used to aggregate token-level representations into a single sequence representation.
Available options:
"max"- Applies global max pooling across the sequence dimension."first"- Takes the first token's embedding."last"- Takes the last token's embedding."avg"- Applies global average pooling with optional attention mask support."sum"- Applies global sum pooling with optional attention mask support."rms"- Applies Root Mean Square pooling with attention mask support."abs_max"- Applies max pooling on absolute values with attention mask support."abs_avg"- Applies average pooling on absolute values with attention mask support.
focal_loss_alpha
float, optional, defaults to 0.5
Alpha parameter for the focal loss.
focal_loss_gamma
float, optional, defaults to 2
Gamma parameter for the focal loss.
focal_loss_reduction
str|None, optional, defaults to None
Selects reduction method for focal loss calculation.
Available options:
"none"- Returns the loss tensor without any reduction applied."sum"- Returns the mean loss normalized by the number of valid (non-ignored) elements."mean"- Returns the sum of all loss elements.
logit_scale_init_value
float, optional, defaults to 2.6592
Initial value of logit scale
normalize_features
bool, optional, defaults to False
Flag for normalizing features
extract_text_features
bool, optional, defaults to False
Flag for extracting text features
contrastive_loss_coef
float, optional, defaults to 0
Contrastive loss coefficient
architecture_type
str, optional, defaults to uni-encoder
Defines the architectural approach used for encoding text and class representations in the GLiClass model.
Available options:
"uni-encoder"— Single encoder processes both text and class tokens in the same sequence. Classes are embedded as special tokens within the input."bi-encoder"— Separate encoders for text and class labels. Text encoder processes input text, while label encoder processes class descriptions independently. Allows for different model architectures for each component."bi-encoder-fused"— Extension of bi-encoder where class embeddings from the label encoder are fused back into the text encoder at class token positions."encoder-decoder"— Uses encoder-decoder architecture where text is processed by the encoder and class information by the decoder. Requires models withis_encoder_decoder=Trueconfiguration.
prompt_first
bool, optional, defaults to False
Flag to choose whether to place labels at the beginning
squeeze_layers
bool, optional, defaults to False
Flag to choose whether to compress layers
embed_class_token
Any, optional, defaults to True
If True, <<LABEL>> tokens will be pooled for each label, if False, the first token of each label will be pooled as label embedding.
TrainingArguments [source]
This configuration class used to define the behavior of a GLiClass model during training. It inherits from TrainingArguments.
Parameters
cache_dir
str|None, optional defaults to None
Directory to store cache files. If specified, the model and tokenizer would be loaded form local cache_dir.
others_lr
float, optional
Separate learning rate for all model parameters except the encoder (e.g. for projectors, classification heads and other components).
optim
float, optional, defaults to adamw_torch
Optimizer to be used during training.
others_weight_decay
float, optional, defaults to 0.0
Weight decay factor for all model parameters except the encoder. Only applies to parameters that use others_lr.
RLTrainerConfig [source]
This configuration class used to define the behavior of a GLiClass model during reinforcement learning. It inherits from gliclass.TrainingArguments.
Parameters
cliprange
float, optional, defaults to 0.2
Clipping range for PPO policy gradient updates to prevent large policy changes.
num_rl_iters
int, optional, defaults to 3
Number of reinforcement learning iterations per training step.
gamma
float, optional, defaults to -1
Focal loss gamma parameter for hard example mining. If set to -1, focal loss is disabled.
alpha
float, optional, defaults to -1
Focal loss alpha parameter for class balancing. If set to -1, focal loss is disabled.
labels_smoothing
float, optional, defaults to -1
Label smoothing factor for predicted actions. If set to -1, label smoothing is disabled.
entropy_beta
float, optional, defaults to -1
Coefficient for entropy regularization term in the loss function. If set to -1, entropy regularization is disabled.
kl_beta
float, optional, defaults to -1
Coefficient for KL-divergence regularization between current and reference model predictions. If set to -1, KL regularization is disabled.
get_actions
str, optional, defaults to bernoulli
Method for sampling actions from model predictions.
Available options
"bernoulli"- for probabilistic sampling"threshold"- for deterministic thresholding.
threshold
float, optional, defaults to 0.5
Threshold value for converting probabilities to binary predictions when get_actions is set to "threshold".
RLTrainer [source]
This configuration class used to define the behavior of a GLiClass model during reinforcement learning. It inherits from gliclass.Trainer.
Parameters
value_model
torch.nn.Module|None, optional, defaults to None
Optional value function model for advantage estimation in reinforcement learning.
reference_model
ZeroShotClassificationPipeline|TransformersClassificationPipeline|None, optional, defaults to None
Reference model for computing KL-divergence regularization during training.
reward_components
Optional, optional, defaults to None
List of reward functions as (name, function) tuples. If None, defaults to F1-score reward.