Understanding Rate Limits on CreateAI Builder


What is a Token?

When your AI Project reads or generates text, each piece of text—letters, spaces, punctuation—gets counted as a token.

Tokens can represent:

  • Individual words (e.g., “cat”, “tree”).
  • Subword fragments (e.g., “un”, “##able” in “unable”).
  • Characters or punctuation marks.

For example:

  • Sentence: MyAI Builder is helpful
  • Tokens: [MyAI, Builder, is, helpful]

Tokens generally correspond to words or word fragments; about 750 words1000 tokens.

What is a ‘Tokens per Minute limit’?

When your AI Project reads or generates text, each piece—letters, spaces, punctuation—counts as a token. Think of tokens as Lego bricks for building words and sentences. Your AI Project has a limit to how many Lego bricks (tokens) it can use per minute: the Tokens per Minute Limit.

This means your bot can only pick and place a certain number of bricks every minute. If your limit is X tokens per minute, then your bot can only build or read X tokens worth of text each minute. In our My AI Builder platform, requests and responses both use tokens.

Reasons you might exceed the token rate limit:

  • Ultra-long prompts or instructions
  • High retrieval settings pulling loads of data from the knowledge base
  • Sending a barrage of messages in rapid-fire mode
  • Having a large number of files/URLs uploaded in the knowledge base

Why Does It Happen?

Every time you or your users have a long conversation with the bot or send very frequent requests, the platform measures all the tokens used. If you exceed the limit in a short time, the system waves its little red flag with “Token per minutes limit reached.”

Managing this limit helps ensure all your bots run smoothly without grabbing too many bricks at once.

What feature in MyAI Builder affects the token usage?

Let’s break down the actions of the project:

  • A user enters the prompt: Tokens are used to read the prompt
  • The project refers to the custom instructions: Tokens are used to read the system prompt
  • The project refers to your settings: Tokens are influenced by how settings guide prompt processing
  • The project refers to your knowledge base: Tokens are used to read uploaded files
  • The project responds to you: Tokens are used in the response it displays

Settings That Affect Your Token Usage Based on the Stages

A user enters the prompt

The model converts your input into tokens (smaller units of text). For example, “chatbot” might be one token, while complex words or punctuation split into multiple tokens. The longer your prompt, the more tokens are used.

💡 Tips:

  • Be concise: Use brief, clear language. Omit unnecessary details.
  • Focus on essentials: Include only core elements; provide just enough context.
  • Avoid over-explaining: Skip extra instructions if the task is straightforward.
  • Use abbreviations carefully: Ensure they’re understandable.
  • Iterative refinement: Start minimal and expand only as needed.
  • Step-by-step questions: Ask one question at a time to avoid confusion.

The project refers to the custom instructions

First it processes your prompt, then reads the system instructions that guide behavior.

💡 Tips:

  • Be concise: Keep instructions short and clear.
  • Focus on essentials: List only critical details.
  • Prioritize clarity: Order directives logically.
  • Avoid redundancy: Eliminate overlapping instructions.
  • Use structured format: Bullet points or numbered lists help.
  • Iterative refinement: Continuously shorten and clarify.

The project refers to your settings

After reading prompts and instructions, the model applies your project settings (e.g., retrieval type, Top K, time zones), which limit how much data is processed.

💡 Tips:

  • Retrieval & Top K: Use a practical Top K of 3–5; smaller chunks reduce token load.
  • Conversation settings: Enable chat memory and response caching to reuse context without re-reading everything.

The project refers to your knowledge base 

It selectively retrieves only the most relevant files or data segments, using indexes or summaries to minimize unnecessary tokens.

💡 Tips:

  • Be selective: Pull only pertinent documents or sections.
  • Use summaries: Shorten large documents to key points.
  • Index your data: Tag and organize files for quick access.
  • Monitor volume: Prune outdated or redundant information.

The project responds to you

Temperature: Controls randomness/creativity but doesn’t directly affect token count.

Direct impact: Temperature itself doesn’t change token usage; it shapes text style.

Indirect effects:

  • Higher temperature can yield more varied, lengthier responses (more tokens).
  • Lower temperature often produces concise outputs (fewer tokens).

💡 Tips:

  • Lower temperature for focused, less verbose outputs.
  • Verbosity: Longer responses use more tokens. Choose Brief, Succinct, or Detailed to control length.

What to Do If You Exceed the Tokens per Minute Limit?

  • Pause: Wait a minute—your tokens refill automatically.
  • Simplify prompts: Shorten instructions or reduce Top K.
  • New chat: Start fresh if the conversation is very long.
  • Contact support: Reach out if you need help increasing your limit.

Where in MyAI Builder Can You See Tokens Used?

Each response shows a token icon you can click to view how many tokens were used.

Conclusion

Tokens might sound technical, but by managing prompt length, retrieval settings, and context depth, you can avoid rate-limit barriers. Think of tokens as Lego blocks—use them wisely to build your perfect AI project without running out of pieces!

Stay awesome, and happy bot building!


Keep Reading

What to Do When You Hit a TPM Limit

Faith Timoh Abang

Learn about the new token limit updates in CreateAI Builder that allow rate limit request increases.

Accessing Generative AI APIs at ASU

Faith Abang Timoh

As interest in building with generative AI grows, ASU offers several pathways for API access depending on whether you're doing academic research, enterprise development, or individual experimentation.

Below is a breakdown of approved API access options, along with who to contact and what to expect.

Understanding Rate Limits on CreateAI Builder

Kofi Wood and Shailee Shah

Ever run into a message that says, “Your project has reached its Tokens per Minute limit (TPM)”? Well, just think of it as a friendly traffic signal reminding us not to zoom too fast. We’ll walk through what does a token mean, what does reaching TPM mean, which settings affect token usage, and how to optimize your AI Project to avoid hitting the limit.

AI with Integrity: ASU’s AI Acceleration Team is Setting New Standards for Ethical AI

Faith Timoh Abang

Artificial intelligence (AI) is rapidly transforming industries, from healthcare and finance to entertainment and education. At Arizona State University (ASU), the AI Acceleration team within Enterprise Technology is ensuring that this transformation happens responsibly.