
What to Do When You Hit a TPM Limit
What is TPM?
TPM stands for Tokens Per Minute. It’s a measure of how many tokens (chunks of text) your project can process per minute. If your project sends too much data too quickly, you may hit this limit and see the following error:
TPM Limit reached – Retry in: 60 sec
This happens when your project has reached its Tokens per Minute (TPM) limit.
What's Changed?
To make things easier for beta users, we’ve introduced an automated TPM increase feature on the Builder side.
Here’s how it works:
Yes! There's a hard cap at 300,000 tokens per minute.
- If you hit your TPM limit, you’ll see a "Increase TPM limit" button in the error message.
- Press the button to request an automatic bump to your TPM.
- Most requests are approved instantly.
Is there a maximum TPM I can request?
Is there a maximum TPM i can request?
- If you hit this ceiling and still need more capacity, you’ll need to contact our team to discuss your use case.
What about TPM limit for viewers?
We know this error can also show up for users viewing bots. Improvements to the viewer-side experience are planned for an upcoming sprint to reduce friction and improve clarity.
What affects token usage?
Several things contribute to how fast you hit your TPM limit:
- Long prompts or responses (more words = more tokens)
- Multiple rapid requests
- Heavy API usage by background actions or chains
Click on "What affects token usage?" in the error message for more detail.
What should I do if the button doesn't work or TPM still feels too low?
If the TPM increase button doesn’t appear or you continue running into limits:AI evaluation, ASU is leading the way in responsible AI adoption in academia.
- Wait a minute and retry.
- Check if your use case can be optimized (e.g., reduce prompt length).
- Contact our team or support contact for assistance.
Example of TPM Error Message
- New “Increase TPM limit” button lets you scale up quickly.
- There’s a 300k hard cap admins required beyond that.
- Viewer-side improvements coming soon!
Keep Reading
Breakdown of RAG Model Parameters, Settings and Their Impact
Retrieval-Augmented Generation (RAG) is an advanced approach in natural language processing that integrates information retrieval and generative language modeling. Unlike traditional language models that generate responses solely based on their pre-trained knowledge, RAG combines retrieval mechanisms with generative models to enhance the relevance and accuracy of its responses. This hybrid framework works by first retrieving relevant documents or information from a predefined knowledge base (e.g., databases, documents, or PDFs) and then using a generative model (such as a transformer-based model) to synthesize a response that incorporates the retrieved context.