Photo by Shubham Dhage on Unsplash
Every API call to a Large Language Model comes with a price tag directly tied to the number of tokens processed. For developers working with OpenAI, Anthropic, or other LLM providers, reducing token count while maintaining semantic meaning isn’t just an optimization—it’s a business imperative.
Enter Brevit, a lightweight semantic compression library designed specifically for LLM prompts. With an average token reduction of 40-60%, Brevit helps developers slash API costs, improve response latency, and maximize context window utilization—all while preserving the essential meaning and structure of your data.
The Token Economics Problem
Before diving into Brevit, let’s understand the problem it solves.
Photo by Markus Spiske on Unsplash
Modern LLMs charge based on token consumption. Consider these real costs:
- GPT-4: $0.03 per 1K input tokens
- Claude 3.5 Sonnet: $0.003 per 1K input tokens
- GPT-4o: $0.005 per 1K input tokens
For applications processing thousands of prompts daily, even small reductions compound into substantial savings. A 50% token reduction on 10 million tokens per day could save $150-1,500 monthly depending on your model choice.
What is Brevit?
Brevit is a cross-platform semantic compression library available for JavaScript/Node.js, Python, and .NET that intelligently compresses structured data for LLM consumption. Unlike traditional compression algorithms (like gzip or Brotli), Brevit focuses on semantic compression—reducing verbosity while preserving meaning that LLMs can understand.
Key Features
✅ 40-60% token reduction on average
✅ Auto mode that intelligently selects compression strategies
✅ Multi-format support: JSON, YAML, text, tabular data, PDFs, images
✅ Zero-config defaults with customization options
✅ Cross-platform: npm, pip, NuGet
✅ Preserves semantic meaning that LLMs need
Installation
JavaScript/Node.js
npm install brevit
Python
pip install brevit
.NET
dotnet add package Brevit
Core Functions: brevity() vs optimize()
Brevit provides two main approaches to compression:
1. Auto Mode: brevity()
The brevity() function automatically analyzes your input and selects the optimal compression strategy:
import { brevity } from 'brevit';
const data = {
"friends": ["ana", "luis", "sam"],
"items": [
{ "sku": "A-88", "qty": 1, "price": 29.99 },
{ "sku": "T-22", "qty": 2, "price": 39.99 }
],
"customer": { "name": "John Doe", "email": "john@example.com" }
};
const compressed = brevity(data);
console.log(compressed);
Output:
friends:ana,luis,sam
items:sku|qty|price
A-88|1|29.99
T-22|2|39.99
customer.name:John Doe,email:john@example.com
2. Explicit Mode: optimize()
For more control, use optimize() with an intent parameter:
import { optimize } from 'brevit';
const compressed = optimize(data, "summarize customer order");
The intent helps Brevit prioritize what information to preserve most carefully.
Format Comparison: JSON vs YAML vs Brevit
Photo by Luke Chesser on Unsplash
Let’s compare how different formats handle the same data structure:
Example Dataset
{
"user": {
"id": 12345,
"name": "Sarah Johnson",
"email": "sarah.j@company.com",
"role": "Senior Developer",
"department": "Engineering"
},
"projects": [
{
"id": "PROJ-001",
"name": "API Redesign",
"status": "active",
"budget": 50000,
"team_size": 5
},
{
"id": "PROJ-002",
"name": "Mobile App",
"status": "planning",
"budget": 75000,
"team_size": 8
}
]
}
Token Count Comparison
| Format | Token Count | Savings | Pros | Cons |
|---|---|---|---|---|
| JSON | ~180 tokens | Baseline | Human-readable, widely supported | Verbose, lots of syntax overhead |
| YAML | ~140 tokens | 22% | More concise than JSON, readable | Still has significant whitespace |
| Brevit | ~75 tokens | 58% | Optimized for LLMs, minimal overhead | Requires parsing library |
Why Brevit Wins for LLMs
JSON Format (180 tokens):
{
"user": {
"id": 12345,
"name": "Sarah Johnson",
...
}
}
YAML Format (140 tokens):
user:
id: 12345
name: Sarah Johnson
...
Brevit Format (75 tokens):
user.id:12345,name:Sarah Johnson,email:sarah.j@company.com,role:Senior Developer,dept:Engineering
projects:id|name|status|budget|team_size
PROJ-001|API Redesign|active|50000|5
PROJ-002|Mobile App|planning|75000|8
Key Differences:
- Syntax Overhead: JSON and YAML require brackets, quotes, and indentation. Brevit minimizes structural characters.
- Tabular Optimization: For arrays of objects with consistent structure, Brevit uses a header-row format similar to CSV, dramatically reducing repetition.
- Nested Flattening: Brevit intelligently flattens nested structures using dot notation where appropriate.
- Semantic Preservation: Unlike simple minification, Brevit maintains the semantic relationships LLMs need to understand context.
Supported Data Types
Brevit handles multiple input formats seamlessly:
1. JSON Objects and Arrays
Perfect for API responses, configuration data, and structured records.
const apiResponse = {
"status": 200,
"data": [...],
"metadata": {...}
};
const compressed = brevity(apiResponse);
2. Plain Text and Structured Text
Compresses verbose text while maintaining readability.
const text = `
Dear Customer,
Thank you for your inquiry regarding our premium
services. We would be delighted to assist you...
`;
const compressed = brevity(text);
3. Tabular Data
Highly efficient for arrays of objects with consistent schemas.
const sales = [
{ date: "2024-01-01", product: "Widget", qty: 100, revenue: 5000 },
{ date: "2024-01-02", product: "Gadget", qty: 75, revenue: 3750 },
// ... hundreds more rows
];
const compressed = brevity(sales);
// Converts to efficient column-based format
4. Nested Structures
Intelligently handles deeply nested objects.
const complex = {
company: {
departments: {
engineering: {
teams: [...],
projects: [...]
}
}
}
};
5. Mixed Content Types
Handles documents with multiple data structures.
Real-World Use Cases
Photo by Carlos Muza on Unsplash
1. RAG (Retrieval-Augmented Generation) Systems
When retrieving context from vector databases, you often hit token limits. Brevit lets you include more context:
// Before: 3 documents fit in context
const contexts = await vectorDB.query(query, limit: 3);
// After: 7-8 documents fit in same context window
const contexts = await vectorDB.query(query, limit: 8);
const compressed = contexts.map(doc => brevity(doc.content));
2. Chatbot Context Management
Maintain longer conversation histories without exceeding limits:
const conversationHistory = [
{ role: "user", content: "..." },
{ role: "assistant", content: "..." },
// ... 20 more exchanges
];
const compressedHistory = brevity(conversationHistory);
3. API Request Payload Optimization
Reduce costs when sending large datasets to LLMs for analysis:
// Analyzing 500 customer reviews
const reviews = await db.reviews.findMany({ limit: 500 });
const compressed = brevity(reviews);
const analysis = await openai.chat.completions.create({
model: "gpt-4",
messages: [{
role: "user",
content: `Analyze these reviews: ${compressed}`
}]
});
4. Data Preprocessing for AI Pipelines
Integrate Brevit into data preprocessing before feeding to LLMs:
const pipeline = [
loadData,
cleanData,
brevity, // <-- Add compression step
sendToLLM
];
5. Cost-Sensitive Production Applications
For high-volume applications where every token matters:
// Processing 10,000 documents daily
const documents = await fetchDailyDocuments();
const processedDocs = await Promise.all(
documents.map(async doc => {
const compressed = brevity(doc);
return await llm.process(compressed);
})
);
// Monthly savings: $500-5000 depending on model
Brevit vs Other Compression Approaches
| Approach | Token Reduction | Semantic Loss | Setup Complexity | Cost |
|---|---|---|---|---|
| Brevit | 40-60% | Minimal | Zero-config | Free |
| LLMLingua | 50-80% | Low-Medium | Requires separate LLM | Additional API costs |
| Manual Summarization | 30-70% | Variable | High (manual work) | Development time |
| JSON Minification | 5-10% | None | Low | Free |
| YAML Conversion | 15-25% | None | Low | Free |
| Custom Rules | 20-50% | Medium | Very High | Development time |
When to Use What
Use Brevit when:
- You need consistent, predictable compression
- Semantic meaning must be preserved
- You want zero-config operation
- You’re working with structured data (JSON, tables, etc.)
Use LLMLingua when:
- You can tolerate higher semantic loss
- You have budget for additional LLM calls
- You’re working with natural language prompts
- You need extreme compression (80%+)
Use Manual Summarization when:
- You have very specific domain requirements
- Content is highly nuanced
- You can afford human review
Live Demo and Playground
Want to try Brevit before integrating? Check out the interactive playground at:
👉 https://www.javianpicardo.com/brevit
The playground includes:
- Real-time compression preview
- Token count comparison (JSON vs YAML vs Brevit)
- Side-by-side format visualization
- OpenAI API testing (via Puter integration)
- Support for text, JSON, PDF, and image inputs
Try the interactive playground to see compression in action
Advanced Usage: Intent-Based Optimization
Brevit’s optimize() function accepts an optional intent parameter that helps prioritize what to preserve:
// Preserve customer details for support queries
optimize(data, "customer support ticket analysis");
// Focus on financial data
optimize(data, "calculate quarterly revenue");
// Preserve technical details
optimize(data, "debug application error");
The intent guides compression decisions when trade-offs must be made.
Integration Examples
Express.js API Middleware
import express from 'express';
import { brevity } from 'brevit';
const app = express();
app.post('/api/analyze', async (req, res) => {
const compressed = brevity(req.body);
const result = await openai.chat.completions.create({
model: "gpt-4",
messages: [{
role: "user",
content: `Analyze: ${compressed}`
}]
});
res.json(result);
});
Python Flask Application
from flask import Flask, request
from brevit import brevity
import openai
app = Flask(__name__)
@app.route('/analyze', methods=['POST'])
def analyze():
data = request.json
compressed = brevity(data)
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"Analyze: {compressed}"
}]
)
return response
.NET Web API
using Brevit;
using Microsoft.AspNetCore.Mvc;
[ApiController]
[Route("api/[controller]")]
public class AnalyzeController : ControllerBase
{
[HttpPost]
public async Task<IActionResult> Analyze([FromBody] dynamic data)
{
var compressed = Brevit.Brevity(data);
var response = await openAIClient.GetChatCompletion(
new ChatMessage("user", $"Analyze: {compressed}")
);
return Ok(response);
}
}
Performance Benchmarks
Based on testing with various data structures:
| Data Type | Avg Token Reduction | Processing Time | Memory Overhead |
|---|---|---|---|
| JSON Objects | 45-55% | <1ms | Negligible |
| Tabular Arrays | 55-65% | <2ms | Negligible |
| Nested Structures | 40-50% | <3ms | Negligible |
| Plain Text | 30-40% | <1ms | Negligible |
| Mixed Content | 42-52% | <2ms | Negligible |
Benchmarks performed on typical web application datasets (1-100KB)
Best Practices
✅ Do’s
- Use brevity() for general cases – The auto mode handles most scenarios well
- Provide intent for domain-specific data – Helps optimize() make better decisions
- Test with your actual data – Compression ratios vary by structure
- Monitor LLM output quality – Ensure compression doesn’t hurt response accuracy
- Combine with other optimizations – Use alongside prompt engineering best practices
❌ Don’ts
- Don’t over-compress critical data – Some information loss is acceptable, but not for critical fields
- Don’t skip testing – Validate that LLMs still understand compressed data
- Don’t compress already minimal data – Sub-50 token payloads may not benefit
- Don’t ignore schema changes – Update compression strategies when data structures evolve
Cost Savings Calculator
Let’s calculate potential savings for a typical application:
Assumptions:
- 10,000 API calls per day
- Average 800 tokens per call
- Using GPT-4 ($0.03 per 1K tokens)
- 50% token reduction with Brevit
Without Brevit:
- Daily tokens: 10,000 × 800 = 8,000,000 tokens
- Daily cost: $240
- Monthly cost: $7,200
- Annual cost: $86,400
With Brevit:
- Daily tokens: 10,000 × 400 = 4,000,000 tokens
- Daily cost: $120
- Monthly cost: $3,600
- Annual cost: $43,200
Annual Savings: $43,200 💰
Conclusion
Brevit represents a new category of tools designed specifically for the LLM era—semantic compression libraries that understand both data structure and AI context requirements. With zero-config operation, cross-platform support, and impressive 40-60% token reductions, it’s a must-have tool for any production LLM application.
Whether you’re building chatbots, RAG systems, data analysis pipelines, or any application that sends structured data to LLMs, Brevit can significantly reduce your API costs while maintaining the semantic fidelity your models need.
Getting Started
- Install:
npm install brevit(or pip/NuGet) - Try the playground: javianpicardo.com/brevit
- Integrate: Add
brevity()to your LLM pipeline - Monitor: Track token savings and output quality
- Optimize: Use
optimize()with intent for domain-specific needs
Resources
- 📦 npm Package: npmjs.com/package/brevit
- 🐍 PyPI Package: pypi.org/project/brevit
- 📘 NuGet Package: nuget.org/packages/Brevit
- 🎮 Playground: javianpicardo.com/brevit
Have you tried Brevit? Share your token savings and use cases in the comments below!
Start saving on your LLM costs today
