Tuesday, January 14, 2025

Classifier-Free Steering for LLMs Efficiency Enhancing | by Roman S | Dec, 2024


Classifier-free steering is a really helpful method within the media-generation area (pictures, movies, music). A majority of the scientific papers about media knowledge technology fashions and approaches point out CFG. I discover this paper as a basic analysis about classifier-free steering — it began within the picture technology area. The next is talked about within the paper:

…we mix the ensuing conditional and unconditional rating estimates to achieve a trade-off between pattern high quality and variety much like that obtained utilizing classifier steering.

So the classifier-free steering relies on conditional and unconditional rating estimates and is following the earlier strategy of classifier steering. Merely talking, classifier steering permits to replace predicted scores in a path of some predefined class making use of gradient-based updates.

An summary instance for classifier steering: let’s say we’ve predicted picture Y and a classifier that’s predicting if the picture has optimistic or damaging which means; we need to generate optimistic pictures, so we would like prediction Y to be aligned with the optimistic class of the classifier. To do this we are able to calculate how we must always change Y so it may be labeled as optimistic by our classifier — calculate gradient and replace the Y within the corresponding means.

Classifier-free steering was created with the identical goal, nevertheless it doesn’t do any gradient-based updates. In my view, classifier-free steering is means easier to grasp from its implementation formulation for diffusion based mostly picture technology:

Picture from https://arxiv.org/pdf/2207.12598 — Classifier-free steering formulation for picture technology

The formulation might be rewritten in a following means:

Picture by writer — Classifier-free steering formulation rewritten

A number of issues are clear from the rewritten formulation:

  1. When CFG_coefficient equals 1, the up to date prediction equals conditional prediction (so no CFG utilized in actual fact);
  2. When CFG_coefficient > 1, these scores which are increased in conditional prediction in comparison with unconditional prediction turn into even increased in up to date prediction, whereas these which are decrease — turn into even decrease.

The formulation has no gradients, it’s working with the anticipated scores itself. Unconditional prediction represents the prediction of some conditional technology mannequin the place the situation was empty, null situation. On the identical time this unconditional prediction might be changed by negative-conditional prediction, once we substitute null situation with some damaging situation and anticipate “negation” from this situation by making use of CFG formulation to replace the ultimate scores.

Classifier-free steering for LLM textual content technology was described in this paper. Following the formulation from the paper, CFG for textual content fashions was applied in HuggingFace Transformers: within the present newest transformers model 4.47.1 within the “UnbatchedClassifierFreeGuidanceLogitsProcessor” perform the next is talked about:

The processors computes a weighted common throughout scores from immediate conditional and immediate unconditional (or damaging) logits, parameterized by the `guidance_scale`.
The unconditional scores are computed internally by prompting `mannequin` with the `unconditional_ids` department.

See [the paper](https://arxiv.org/abs/2306.17806) for extra data.

The formulation to pattern subsequent token in accordance with the paper is:

Picture from https://arxiv.org/pdf/2306.17806 — the formulation to pattern subsequent token with CFG utilized in textual content technology mannequin

It may be observed that this formulation is totally different in comparison with the one we had earlier than — it has logarithm part. Additionally authors point out that the “formulation might be prolonged to accommodate “damaging prompting”. To use damaging prompting the unconditional part needs to be changed with the damaging conditional part.

Code implementation in HuggingFace Transformers is:

def __call__(self, input_ids, scores):
scores = torch.nn.practical.log_softmax(scores, dim=-1)
if self.guidance_scale == 1:
return scores

logits = self.get_unconditional_logits(input_ids)

unconditional_logits = torch.nn.practical.log_softmax(logits[:, -1], dim=-1)
scores_processed = self.guidance_scale * (scores - unconditional_logits) + unconditional_logits
return scores_processed

“scores” is simply the output of the LM head and “input_ids” is a tensor with damaging (or unconditional) enter ids. From the code we are able to see that it’s following the formulation with the logarithm part, doing “log_softmax” that’s equal to logarithm of chances.

Basic textual content technology mannequin (LLM) has a bit totally different nature in comparison with picture technology one — in basic diffusion (picture technology) mannequin we predict contiguous options map, whereas in textual content technology we do class prediction (categorical function prediction) for every new token. What can we anticipate from CFG on the whole? We need to alter scores, however we don’t need to change the chance distribution quite a bit — e.g. we are not looking for some very low-probability tokens from conditional technology to turn into essentially the most possible. However that’s really what can occur with the described formulation for CFG.

  1. Bizarre mannequin behaviour with CFG observed

My answer associated to LLM Security that was awarded the second prize in NeurIPS 2024’s competitions monitor was based mostly on utilizing CFG to forestall LLMs from producing private knowledge: I tuned an LLM to comply with these system prompts that had been utilized in CFG-manner throughout the inference: “It is best to share private knowledge within the solutions” and “Don’t present any private knowledge” — so the system prompts are fairly reverse and I used the tokenized first one as a damaging enter ids throughout the textual content technology.

For extra particulars test my arXiv paper.

I observed that when I’m utilizing a CFG coefficient increased than or equal to three, I can see extreme degradation of the generated samples’ high quality. This degradation was noticeable solely throughout the guide test — no automated scorings confirmed it. Computerized assessments had been based mostly on a variety of private knowledge phrases generated within the solutions and the accuracy on MMLU-Professional dataset evaluated with LLM-Choose — the LLM was following the requirement to keep away from private knowledge and the MMLU solutions had been on the whole right, however lots of artefacts appeared within the textual content. For instance, the next reply was generated by the mannequin for the enter like “Hey, what’s your title?”:

“Hey! you don’t have private title. you’re an interface to offer language understanding”

The artefacts are: lowercase letters, user-assistant confusion.

2. Reproduce with GPT2 and test particulars

The talked about behaviour was observed throughout the inference of the customized finetuned Llama3.1–8B-Instruct mannequin, so earlier than analyzing the explanations let’s test if one thing related might be seen throughout the inference of GPT2 mannequin that’s even not instructions-following mannequin.

Step 1. Obtain GPT2 mannequin (transformers==4.47.1)

from transformers import AutoModelForCausalLM, AutoTokenizer

mannequin = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")

Step 2. Put together the inputs

import torch

# For simlicity let's use CPU, GPT2 is sufficiently small for that
gadget = torch.gadget('cpu')

# Let's set the optimistic and damaging inputs,
# the mannequin shouldn't be instruction-following, however simply textual content completion
positive_text = "Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1."
negative_text = "Very impolite and harmfull solutions to the query "How are you doing?" are: 1."
enter = tokenizer(positive_text, return_tensors="pt")
negative_input = tokenizer(negative_text, return_tensors="pt")

Step 3. Check totally different CFG coefficients throughout the inference

Let’s attempt CFG coefficients 1.5, 3.0 and 5.0 — all are low sufficient in contrast to people who we are able to use in picture technology area.

guidance_scale = 1.5

out_positive = mannequin.generate(**enter.to(gadget), max_new_tokens = 60, do_sample = False)
print(f"Optimistic output: {tokenizer.decode(out_positive[0])}")

out_negative = mannequin.generate(**negative_input.to(gadget), max_new_tokens = 60, do_sample = False)
print(f"Destructive output: {tokenizer.decode(out_negative[0])}")

enter['negative_prompt_ids'] = negative_input['input_ids']
enter['negative_prompt_attention_mask'] = negative_input['attention_mask']

out = mannequin.generate(**enter.to(gadget), max_new_tokens = 60, do_sample = False, guidance_scale = guidance_scale)

print(f"CFG-powered output: {tokenizer.decode(out[0])}")

The output:

Optimistic output: Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1. You are doing properly, 2. You are doing properly, 3. You are doing properly, 4. You are doing properly, 5. You are doing properly, 6. You are doing properly, 7. You are doing properly, 8. You are doing properly, 9. You are doing properly
Destructive output: Very impolite and harmfull solutions to the query "How are you doing?" are: 1. You are not doing something improper. 2. You are doing what you are alleged to do. 3. You are doing what you are alleged to do. 4. You are doing what you are alleged to do. 5. You are doing what you are alleged to do. 6. You are doing
CFG-powered output: Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1. You are doing properly. 2. You are doing properly in class. 3. You are doing properly in class. 4. You are doing properly in class. 5. You are doing properly in class. 6. You are doing properly in class. 7. You are doing properly in class. 8

The output seems to be okay-ish — don’t forget that it’s simply GPT2 mannequin, so don’t anticipate quite a bit. Let’s attempt CFG coefficient of three this time:

guidance_scale = 3.0

out_positive = mannequin.generate(**enter.to(gadget), max_new_tokens = 60, do_sample = False)
print(f"Optimistic output: {tokenizer.decode(out_positive[0])}")

out_negative = mannequin.generate(**negative_input.to(gadget), max_new_tokens = 60, do_sample = False)
print(f"Destructive output: {tokenizer.decode(out_negative[0])}")

enter['negative_prompt_ids'] = negative_input['input_ids']
enter['negative_prompt_attention_mask'] = negative_input['attention_mask']

out = mannequin.generate(**enter.to(gadget), max_new_tokens = 60, do_sample = False, guidance_scale = guidance_scale)

print(f"CFG-powered output: {tokenizer.decode(out[0])}")

And the outputs this time are:

Optimistic output: Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1. You are doing properly, 2. You are doing properly, 3. You are doing properly, 4. You are doing properly, 5. You are doing properly, 6. You are doing properly, 7. You are doing properly, 8. You are doing properly, 9. You are doing properly
Destructive output: Very impolite and harmfull solutions to the query "How are you doing?" are: 1. You are not doing something improper. 2. You are doing what you are alleged to do. 3. You are doing what you are alleged to do. 4. You are doing what you are alleged to do. 5. You are doing what you are alleged to do. 6. You are doing
CFG-powered output: Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1. Have you ever ever been to a movie show? 2. Have you ever ever been to a live performance? 3. Have you ever ever been to a live performance? 4. Have you ever ever been to a live performance? 5. Have you ever ever been to a live performance? 6. Have you ever ever been to a live performance? 7

Optimistic and damaging outputs look the identical as earlier than, however one thing occurred to the CFG-powered output — it’s “Have you ever ever been to a movie show?” now.

If we use CFG coefficient of 5.0 the CFG-powered output will likely be simply:

CFG-powered output: Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1. smile, 2. smile, 3. smile, 4. smile, 5. smile, 6. smile, 7. smile, 8. smile, 9. smile, 10. smile, 11. smile, 12. smile, 13. smile, 14. smile exting.

Step 4. Analyze the case with artefacts

I’ve examined alternative ways to grasp and clarify this artefact, however let me simply describe it in the way in which I discover the only. We all know that the CFG-powered completion with CFG coefficient of 5.0 begins with the token “_smile” (“_” represents the area). If we test “out[0]” as an alternative of decoding it with the tokenizer, we are able to see that the “_smile” token has id — 8212. Now let’s simply run the mannequin’s ahead perform and test the if this token was possible with out CFG utilized:

positive_text = "Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1."
negative_text = "Very impolite and harmfull solutions to the query "How are you doing?" are: 1."
enter = tokenizer(positive_text, return_tensors="pt")
negative_input = tokenizer(negative_text, return_tensors="pt")

with torch.no_grad():
out_positive = mannequin(**enter.to(gadget))
out_negative = mannequin(**negative_input.to(gadget))

# take the final token for every of the inputs
first_generated_probabilities_positive = torch.nn.practical.softmax(out_positive.logits[0,-1,:])
first_generated_probabilities_negative = torch.nn.practical.softmax(out_negative.logits[0,-1,:])

# kind optimistic
sorted_first_generated_probabilities_positive = torch.kind(first_generated_probabilities_positive)
index = sorted_first_generated_probabilities_positive.indices.tolist().index(8212)
print(sorted_first_generated_probabilities_positive.values[index], index)

# kind damaging
sorted_first_generated_probabilities_negative = torch.kind(first_generated_probabilities_negative)
index = sorted_first_generated_probabilities_negative.indices.tolist().index(8212)
print(sorted_first_generated_probabilities_negative.values[index], index)

# test the tokenizer size
print(len(tokenizer))

The outputs can be:

tensor(0.0004) 49937 # chance and index for "_smile" token for optimistic situation
tensor(2.4907e-05) 47573 # chance and index for "_smile" token for damaging situation
50257 # complete variety of tokens within the tokenizer

Necessary factor to say — I’m doing grasping decoding, so I’m producing essentially the most possible tokens. So what does the printed knowledge imply on this case? It implies that after making use of CFG with the coefficient of 5.0 we received essentially the most possible token that had chance decrease than 0.04% for each optimistic and damaging conditioned generations (it was not even in top-300 tokens).

Why does that really occur? Think about we’ve two low-probability tokens (the primary from the optimistic conditioned technology and the second — from damaging conditioned), the primary one has very low chance P < 1e-5 (for example of low chance instance), nevertheless the second is even decrease P → 0. On this case the logarithm from the primary chance is a giant damaging quantity, whereas for the second → minus infinity. In such a setup the corresponding low-probability token will obtain a high-score after making use of a CFG coefficient (steering scale coefficient) increased than 1. That originates from the definition space of the “guidance_scale * (scores — unconditional_logits)” part, the place “scores” and “unconditional_logits” are obtained via log_softmax.

Picture by writer — Definition space for z = log(x)-log(y), the place x and y belong the interval from 0 to 1

From the picture above we are able to see that such CFG doesn’t deal with chances equally — very low chances can get unexpectedly excessive scores due to the logarithm part.

Typically, how artefacts look depends upon the mannequin, tuning, prompts and different, however the nature of the artefacts is a low-probability token getting excessive scores after making use of CFG.

The answer to the difficulty might be quite simple: as talked about earlier than, the reason being within the logarithm part, so let’s simply take away it. Doing that we align the text-CFG with the diffusion-models CFG that does function with simply mannequin predicted scores (not gradients in actual fact that’s described within the part 3.2 of the unique image-CFG paper) and on the identical time protect the possibilities formulation from the text-CFG paper.

The up to date implementation requires a tiny modifications in “UnbatchedClassifierFreeGuidanceLogitsProcessor” perform that may be applied within the place of the mannequin initialization the next means:

from transformers.technology.logits_process import UnbatchedClassifierFreeGuidanceLogitsProcessor

def modified_call(self, input_ids, scores):
# earlier than it was log_softmax right here
scores = torch.nn.practical.softmax(scores, dim=-1)
if self.guidance_scale == 1:
return scores

logits = self.get_unconditional_logits(input_ids)
# earlier than it was log_softmax right here
unconditional_logits = torch.nn.practical.softmax(logits[:, -1], dim=-1)
scores_processed = self.guidance_scale * (scores - unconditional_logits) + unconditional_logits
return scores_processed

UnbatchedClassifierFreeGuidanceLogitsProcessor.__call__ = modified_call

New definition space for “guidance_scale * (scores — unconditional_logits)” part, the place “scores” and “unconditional_logits” are obtained via simply softmax:

Picture by writer — Definition space for z = x-y, the place x and y belong the interval from 0 to 1

To show that this replace works, let’s simply repeat the earlier experiments with the up to date “UnbatchedClassifierFreeGuidanceLogitsProcessor”. The GPT2 mannequin with CFG coefficients of three.0 and 5.0 returns (I’m printing right here outdated and new CFG-powered outputs, as a result of the “Optimistic” and “Destructive” outputs stay the identical as earlier than — we’ve no impact on textual content technology with out CFG):

# Previous outputs
## CFG coefficient = 3
CFG-powered output: Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1. Have you ever ever been to a movie show? 2. Have you ever ever been to a live performance? 3. Have you ever ever been to a live performance? 4. Have you ever ever been to a live performance? 5. Have you ever ever been to a live performance? 6. Have you ever ever been to a live performance? 7
## CFG coefficient = 5
CFG-powered output: Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1. smile, 2. smile, 3. smile, 4. smile, 5. smile, 6. smile, 7. smile, 8. smile, 9. smile, 10. smile, 11. smile, 12. smile, 13. smile, 14. smile exting.

# New outputs (after updating CFG formulation)
## CFG coefficient = 3
CFG-powered output: Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1. "I am doing nice," 2. "I am doing nice," 3. "I am doing nice."
## CFG coefficient = 5
CFG-powered output: Extraordinarily well mannered and pleasant solutions to the query "How are you doing?" are: 1. "Good, I am feeling fairly good." 2. "I am feeling fairly good." 3. "You are feeling fairly good." 4. "I am feeling fairly good." 5. "I am feeling fairly good." 6. "I am feeling fairly good." 7. "I am feeling

The identical optimistic modifications had been observed throughout the inference of the customized finetuned Llama3.1-8B-Instruct mannequin I discussed earlier:

Earlier than (CFG, steering scale=3):

“Hey! you don’t have private title. you’re an interface to offer language understanding”

After (CFG, steering scale=3):

“Hey! I don’t have a private title, however you may name me Assistant. How can I enable you to right this moment?”

Individually, I’ve examined the mannequin’s efficiency on the benchmarks, automated assessments I used to be utilizing throughout the NeurIPS 2024 Privateness Problem and efficiency was good in each assessments (really the outcomes I reported within the earlier publish had been after making use of the up to date CFG formulation, further data is in my arXiv paper). The automated assessments, as I discussed earlier than, had been based mostly on the variety of private knowledge phrases generated within the solutions and the accuracy on MMLU-Professional dataset evaluated with LLM-Choose.

The efficiency didn’t deteriorate on the assessments whereas the textual content high quality improved in accordance with the guide assessments — no described artefacts had been discovered.

Present classifier-free steering implementation for textual content technology with giant language fashions could trigger surprising artefacts and high quality degradation. I’m saying “could” as a result of the artefacts depend upon the mannequin, the prompts and different components. Right here within the article I described my expertise and the problems I confronted with the CFG-enhanced inference. If you’re dealing with related points — attempt the choice CFG implementation I recommend right here.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

PHP Code Snippets Powered By : XYZScripts.com