Monitoring Token Consumption

What is a Token?

In the context of generative language models (LLMs), a token is a unit of measurement used to quantify the amount of text processed by the model. A token can be a whole word, part of a word, or even a character, depending on the complexity of the language and the model used. Tokens are essential for understanding and managing the capabilities of language models, as they determine the amount of information the model can handle at one time.

Tokens In and Tokens Out

Tokens In: These are the tokens that enter the model, i.e., the text that the model receives as input to generate a response. This includes the conversation history, instructions, and any other context provided to the model.
Tokens Out: These are the tokens produced by the model as output, i.e., the response generated by the model based on the input tokens.

Model Capacities in Terms of Tokens

Each language model has a maximum token capacity it can handle, known as the maximum context. Here are the capacities of some commonly used models:

GPT-4 Turbo and GPT-4o (OpenAI): 128k tokens
Claude 3.x (Anthropic): 200k tokens
Mistral: 32k or 128k tokens depending on the model
Gemini (Google): Over 1 million tokens
Llama 3.1: 128k tokens

Impact of Maximum Token Capacity

When the maximum number of tokens allowed by a model is reached in a conversation, the Mate using that model will no longer be able to provide responses, as its maximum capacity will have been reached. This means the model cannot process new information or generate new responses until the number of tokens is reduced.

Monitoring Token Consumption in GPT Mates

In the GPT Mates application, you can monitor the number of tokens used in a conversation through the indicator located at the top right of the conversation. This indicator allows you to track token consumption in real-time and take measures to avoid reaching the model's maximum capacity.

Actions to Take When Token Limit is Reached

If the maximum number of tokens is reached, several options are available to manage the situation:

Delete or Hide Messages: By deleting or hiding messages, you reduce the number of tokens in the conversation history, freeing up space for new interactions.
Generate a Summary: You can generate a summary from a message in the conversation. This allows you to condense the information while retaining the essentials, thus reducing the number of tokens needed.
Start a New Chat or Collaboration: By starting a new conversation or collaboration, you can integrate the essential information from the previous conversation as the initial messages. This allows you to start with a lighter token load while retaining the necessary context.

Conclusion

Managing tokens is crucial for optimizing interactions with generative language models. By monitoring token consumption and taking appropriate actions when the maximum capacity is reached, you can ensure smooth and efficient interactions with Mates in GPT Mates.