Your cart is currently empty!
OpenAI’ s API has attracted many developers with its powerful language processing capabilities, yet its pricing structure often causes confusion. Although the official documentation states that pricing is “per token”, the concept of tokens is not intuitive for most developers. A common question arises: “How much does using the OpenAI Realtime API cost per minute?”
To shed light on this, I conducted in-depth tests on two mainstream models using OpenAI Playground. With real-world data, I uncover the actual costs and provide practical optimization strategies.
1. Why Do Developers Care More About “Per Minute Call Costs”?
1.1 The Intrinsic Complexity of Token-Based Billing
OpenAI’s pricing formula seems very straightforward, it goes as input tokens + output tokens × unit price, but in real-world development, when it comes to pricing, developers often struggle with the following issues:
- Difficult cost estimation: How many tokens are needed for a 1,000-word text? How much is consumed in a 5-minute voice conversation?
- Opaque model differences: What is the token price difference between Realtime-gpt-4o-mini-preview-2024-12-17 and Realtime-gpt-4o-preview-2024-12-17?
- Hidden cost traps: How do system prompts and other fixed costs affect long-term expenses?
1.2 Real-World Business Scenarios
In applications like real-time voice customer service and intelligent assistants, per-minute costs are the primary concern:
- Budgeting: If the per-minute cost is $0.50, a 10-minute call costs only $5; if it rises to $2, the cost jumps to $20.
- Performance trade-offs: Higher-performance models offer a smoother experience but may increase costs by up to 8x (as demonstrated later).
- Scalability challenges: For 1,000 calls per day, cost differences can reach $1,500 per day, directly affecting business viability.
2. Token Mechanism: From Technical Definition to Cost Mapping
2.1 What Is a Token?
Technical definition: A token is the smallest unit of text processing, with OpenAI using the Byte Pair Encoding (BPE) algorithm for tokenization.
Example: “Hello, world!” → [“Hello”, “,”, “world”, “!”] (4 tokens)
2.2 Cost Calculation Formula
Single-call cost = (Input tokens × Input price) + (Output tokens × Output price)
For Realtime-gpt-4o-preview-2024-12-17, the pricing is:
Input price: $0.00004 per token ($40 per million tokens)
Output price: $0.00008 per token ($80 per million tokens)
Example: Processing 1,000 input tokens + 500 output tokens, the cost would be: (1000×0.00004) + (500×0.00008) = $0.08
2.3 How to Calculate Token Count?
OpenAI provides the tiktoken library to help developers precisely calculate token counts. Example:
import tiktoken
# Initialize encoder
encoder = tiktoken.get_encoding("cl100k_base")
# Calculate token count
text = "Hello, World!"
tokens = encoder.encode(text)
print(len(tokens)) # Output: 4
3. Real-World Testing: Per-Minute Costs Across Models & Configurations
3.1 Testing Methodology
To better understand OpenAI Realtime API costs, we conducted real-world tests on the OpenAI Playground. The test involved two OpenAI models:
- Realtime-gpt-4o-mini-preview-2024-12-17
- Realtime-gpt-4o-preview-2024-12-17
Each model was tested under two configurations:
- Without system prompts
- With a system prompt containing about 1,000 words (e.g., menus, conversation flows)
We extracted actual cost and token consumption data from OpenAI Playground’s Uasge, obtaining four test records.
3.2 Cost Analysis of the Economical Model: Realtime-gpt-4o-mini-preview-2024-12-17

Configuration | Per-Minute Cost | Cost Increase | Notes |
No System Prompt | $0.16 | — | Basic Q&A scenario, low token consumption |
With 1,000-Word Prompt | $0.33 | +106% | System prompt doubled the cost |
Key insights:
- Without system prompts: Cost remains at $0.16 per minute.
- With system prompts: Fixed token cost (about 1,300 tokens) drives total cost up to $0.33 per minute.
3.3 Cost Analysis of the High-Performance Model: Realtime-gpt-4o-preview-2024-12-17

Configuration | Per-Minute Cost | Cost Increase | Notes |
No System Prompt | $0.18 | — | Moderate token consumption for complex conversations |
With 1,000-Word Prompt | $1.63 | +805% | Cost surged 8x due to system prompt |
Key insights:
- Without system prompts: Cost remains at $0.18 per minute.
- With system prompts: Fixed token cost (about 1,300 tokens) drives total cost up to $1.63 per minute.
3.4 Core Findings
- Fixed costs accumulate: System prompts consume input token quotas and are billed every call.
- Model Differences: The Mini model offers better cost-effectiveness; even with about 1,300 token prompts, it costs just $0.33 per minute. The Preview model is more powerful but has a higher token price and consumes more tokens when processing long texts, requiring careful prompt usage.
4. Cost Optimization Strategies: From Code to Architecture
4.1 Three Principles for Streamlining Prompts
1)Remove redundancy:
# Before optimization: excessive description
prompt = "Welcome! We are a restaurant chain established in 2010, specializing in burgers, pizzas, and salads..."
# After optimization: concise core message
prompt = "Menu: Burger ($10), Pizza ($12), Salad ($8)"
Effect: Token count reduced from 200 to 40, cutting costs by 80%.
2)Use structured JSON instead of natural language
{
"menu": [
{"name": "Burger", "price": 10},
{"name": "Pizza", "price": 12}
]
}
Effect: Token count reduced by 65%, improving parsing efficiency.
3)Dynamically load prompts: Call the prompt word module on demand
if user_intent == "menu_inquiry":
load_prompt("menu_prompt.json")
elif user_intent == "customer_support":
load_prompt("service_prompt.json")
4.2 Golden Rules for Model Selection
- Simple scenarios (FAQs, order tracking)
→ Use Realtime-gpt-4o-mini to keep costs under $0.33 per minute.
- Complex scenarios (medical consultations, multi-turn conversations)
→ Use Realtime-gpt-4o-preview, but implement staged prompt loading to keep costs under $1 per minute.
4.3 Cost-Control Strategies through Technical Means
- Enforce token limits:
response = openai.ChatCompletion.create(
model="gpt-4o-mini",
messages=messages,
max_tokens=150 # Limit response length
)
- Implement cost threshold switching:
if current_cost > $1.0/minute:
switch_model("gpt-4o-mini") # Auto-downgrade
5. Conclusion: Striking a Balance Between Performance and Cost

Model | Configuration | Realtime API Cost ($/min) | Cost Increase (with System Prompt) |
Realtime-gpt-4o-mini-preview-2024-12-17 | Without System Prompt | 0.16 | — |
With System Prompt (Menu) | 0.33 | +106% | |
Realtime-gpt-4o-preview-2024-12-17 | Without System Prompt | 0.18 | — |
With System Prompt (Menu) | 1.63 | +805% |
Based on our test data, the key factors in controlling OpenAI Realtime API costs are:
1)Optimizing and dynamically loading system prompts
2)Choosing the right model for the right scenario
3)Implementing real-time monitoring and cost threshold switching mechanisms
Recommendations for Developers:
- Use Mini for 80% of simple scenarios: Keep costs below $0.3/min.
- Use Preview for 20% of complex scenarios: Optimize architecture to prevent cost surges.
Through this testing, we can clearly see the significant impact of system prompts and model selection on overall costs. Developers should make informed choices based on business needs to achieve the best balance between performance and cost.
If you need further technical support or detailed test data, feel free to leave a comment!
Thanks for reading! Additionally, a video version of this blog is available below—stay tuned and enjoy watching!
And welcome to explore my Youtube channel https://www.youtube.com/@frankfu007 for more exciting content. If you enjoy my video, don’t forget to like and subscribe for more insights!
Interested in more application of Realtime API in practical projects? Explore my articles linked below!
- Education Nano 01 – Modular Wheel-Leg Robot for STEM
- Audio-Visual Synchronization Algorithms in Digital Humans and the TIME_WAIT Challenge in WebSocket Communication
- Building a Voice-Controlled Robot Using OpenAI Realtime API: A Full-Link Implementation from RDK X5 to ES02
- Desktop Balancing Bot(ES02)-Dual-Wheel Legged Robot with High-Performance Algorithm
- Wheeled-Legged Robot ES01: Building with ESP32 & SimpleFOC
One response
Hi Frank,
I’m using gpt-4o-Realtime-preview and mini, I’m really struggling to calculate the effective cost, I’m checking the token I get from response.done directly from openai, so I have every kind of token with the price from openai directly:
GPT4O_TEXT_INPUT_COST=0.000010
GPT4O_AUDIO_INPUT_COST=0.000080
GPT4O_TEXT_CACHED_INPUT_COST=0.000005
GPT4O_AUDIO_CACHED_INPUT_COST=0.000005
GPT4O_TEXT_OUTPUT_COST=0.000040
GPT4O_AUDIO_OUTPUT_COST=0.000160The thing is I’m not sure if in a session of, let’s say, 10 response.done I have to sum all the values to see the total cost or if response.done provide the cumulative value of the session.
Looking at the token seems that these cumulates somehow:
GPT4O_TEXT_INPUT
GPT4O_AUDIO_INPUT
GPT4O_TEXT_CACHED_INPUT
GPT4O_AUDIO_CACHED_INPUTWhile these are always the real usage for that response:
GPT4O_TEXT_OUTPUT
GPT4O_AUDIO_OUTPUTIf I sum every token from response.done I get exorbitant cost that are not reflected in my https://platform.openai.com/usage
So I’m really struggling to understand if the dashboard is not reflecting the real cost (I doubt but who knows) or my way of calculating is faulty.
I feel like I should just sum the outputs for every response.done but only check the values of the input for the last response.done
If I do this the values start to get close to the ones I see in my dashboard…Any idea?
Leave a Reply