Javian

Picardo

Back to blog

Brevit: The Semantic Compression Library That Cuts LLM Costs by 60%

Javian Picardo

Jan 02, 2026

Photo by Shubham Dhage on Unsplash

Every API call to a Large Language Model comes with a price tag directly tied to the number of tokens processed. For developers working with OpenAI, Anthropic, or other LLM providers, reducing token count while maintaining semantic meaning isn’t just an optimization—it’s a business imperative.

Enter Brevit, a lightweight semantic compression library designed specifically for LLM prompts. With an average token reduction of 40-60%, Brevit helps developers slash API costs, improve response latency, and maximize context window utilization—all while preserving the essential meaning and structure of your data.

The Token Economics Problem

Before diving into Brevit, let’s understand the problem it solves.

Token Cost Graph Photo by Markus Spiske on Unsplash

Modern LLMs charge based on token consumption. Consider these real costs:

GPT-4: $0.03 per 1K input tokens
Claude 3.5 Sonnet: $0.003 per 1K input tokens
GPT-4o: $0.005 per 1K input tokens

For applications processing thousands of prompts daily, even small reductions compound into substantial savings. A 50% token reduction on 10 million tokens per day could save $150-1,500 monthly depending on your model choice.

What is Brevit?

Brevit is a cross-platform semantic compression library available for JavaScript/Node.js, Python, and .NET that intelligently compresses structured data for LLM consumption. Unlike traditional compression algorithms (like gzip or Brotli), Brevit focuses on semantic compression—reducing verbosity while preserving meaning that LLMs can understand.

Key Features

✅ 40-60% token reduction on average
✅ Auto mode that intelligently selects compression strategies
✅ Multi-format support: JSON, YAML, text, tabular data, PDFs, images
✅ Zero-config defaults with customization options
✅ Cross-platform: npm, pip, NuGet
✅ Preserves semantic meaning that LLMs need

Installation

JavaScript/Node.js

npm install brevit

Python

pip install brevit

.NET

dotnet add package Brevit

Core Functions: brevity() vs optimize()

Brevit provides two main approaches to compression:

1. Auto Mode: brevity()

The brevity() function automatically analyzes your input and selects the optimal compression strategy:

import { brevity } from 'brevit';

const data = {
  "friends": ["ana", "luis", "sam"],
  "items": [
    { "sku": "A-88", "qty": 1, "price": 29.99 },
    { "sku": "T-22", "qty": 2, "price": 39.99 }
  ],
  "customer": { "name": "John Doe", "email": "john@example.com" }
};

const compressed = brevity(data);
console.log(compressed);

Output:

friends:ana,luis,sam
items:sku|qty|price
A-88|1|29.99
T-22|2|39.99
customer.name:John Doe,email:john@example.com

2. Explicit Mode: optimize()

For more control, use optimize() with an intent parameter:

import { optimize } from 'brevit';

const compressed = optimize(data, "summarize customer order");

The intent helps Brevit prioritize what information to preserve most carefully.

Format Comparison: JSON vs YAML vs Brevit

Data Formats Comparison Photo by Luke Chesser on Unsplash

Let’s compare how different formats handle the same data structure:

Example Dataset

{
  "user": {
    "id": 12345,
    "name": "Sarah Johnson",
    "email": "sarah.j@company.com",
    "role": "Senior Developer",
    "department": "Engineering"
  },
  "projects": [
    {
      "id": "PROJ-001",
      "name": "API Redesign",
      "status": "active",
      "budget": 50000,
      "team_size": 5
    },
    {
      "id": "PROJ-002",
      "name": "Mobile App",
      "status": "planning",
      "budget": 75000,
      "team_size": 8
    }
  ]
}

Token Count Comparison

Format	Token Count	Savings	Pros	Cons
JSON	~180 tokens	Baseline	Human-readable, widely supported	Verbose, lots of syntax overhead
YAML	~140 tokens	22%	More concise than JSON, readable	Still has significant whitespace
Brevit	~75 tokens	58%	Optimized for LLMs, minimal overhead	Requires parsing library

Why Brevit Wins for LLMs

JSON Format (180 tokens):

{
  "user": {
    "id": 12345,
    "name": "Sarah Johnson",
    ...
  }
}

YAML Format (140 tokens):

user:
  id: 12345
  name: Sarah Johnson
  ...

Brevit Format (75 tokens):

user.id:12345,name:Sarah Johnson,email:sarah.j@company.com,role:Senior Developer,dept:Engineering
projects:id|name|status|budget|team_size
PROJ-001|API Redesign|active|50000|5
PROJ-002|Mobile App|planning|75000|8

Key Differences:

Syntax Overhead: JSON and YAML require brackets, quotes, and indentation. Brevit minimizes structural characters.
Tabular Optimization: For arrays of objects with consistent structure, Brevit uses a header-row format similar to CSV, dramatically reducing repetition.
Nested Flattening: Brevit intelligently flattens nested structures using dot notation where appropriate.
Semantic Preservation: Unlike simple minification, Brevit maintains the semantic relationships LLMs need to understand context.

Supported Data Types

Brevit handles multiple input formats seamlessly:

1. JSON Objects and Arrays

Perfect for API responses, configuration data, and structured records.

const apiResponse = {
  "status": 200,
  "data": [...],
  "metadata": {...}
};

const compressed = brevity(apiResponse);

2. Plain Text and Structured Text

Compresses verbose text while maintaining readability.

const text = `
  Dear Customer,
  
  Thank you for your inquiry regarding our premium 
  services. We would be delighted to assist you...
`;

const compressed = brevity(text);

3. Tabular Data

Highly efficient for arrays of objects with consistent schemas.

const sales = [
  { date: "2024-01-01", product: "Widget", qty: 100, revenue: 5000 },
  { date: "2024-01-02", product: "Gadget", qty: 75, revenue: 3750 },
  // ... hundreds more rows
];

const compressed = brevity(sales);
// Converts to efficient column-based format

4. Nested Structures

Intelligently handles deeply nested objects.

const complex = {
  company: {
    departments: {
      engineering: {
        teams: [...],
        projects: [...]
      }
    }
  }
};

5. Mixed Content Types

Handles documents with multiple data structures.

Real-World Use Cases

Use Cases Photo by Carlos Muza on Unsplash

1. RAG (Retrieval-Augmented Generation) Systems

When retrieving context from vector databases, you often hit token limits. Brevit lets you include more context:

// Before: 3 documents fit in context
const contexts = await vectorDB.query(query, limit: 3);

// After: 7-8 documents fit in same context window
const contexts = await vectorDB.query(query, limit: 8);
const compressed = contexts.map(doc => brevity(doc.content));

2. Chatbot Context Management

Maintain longer conversation histories without exceeding limits:

const conversationHistory = [
  { role: "user", content: "..." },
  { role: "assistant", content: "..." },
  // ... 20 more exchanges
];

const compressedHistory = brevity(conversationHistory);

3. API Request Payload Optimization

Reduce costs when sending large datasets to LLMs for analysis:

// Analyzing 500 customer reviews
const reviews = await db.reviews.findMany({ limit: 500 });
const compressed = brevity(reviews);

const analysis = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{
    role: "user",
    content: `Analyze these reviews: ${compressed}`
  }]
});

4. Data Preprocessing for AI Pipelines

Integrate Brevit into data preprocessing before feeding to LLMs:

const pipeline = [
  loadData,
  cleanData,
  brevity, // <-- Add compression step
  sendToLLM
];

5. Cost-Sensitive Production Applications

For high-volume applications where every token matters:

// Processing 10,000 documents daily
const documents = await fetchDailyDocuments();

const processedDocs = await Promise.all(
  documents.map(async doc => {
    const compressed = brevity(doc);
    return await llm.process(compressed);
  })
);

// Monthly savings: $500-5000 depending on model

Brevit vs Other Compression Approaches

Approach	Token Reduction	Semantic Loss	Setup Complexity	Cost
Brevit	40-60%	Minimal	Zero-config	Free
LLMLingua	50-80%	Low-Medium	Requires separate LLM	Additional API costs
Manual Summarization	30-70%	Variable	High (manual work)	Development time
JSON Minification	5-10%	None	Low	Free
YAML Conversion	15-25%	None	Low	Free
Custom Rules	20-50%	Medium	Very High	Development time

When to Use What

Use Brevit when:

You need consistent, predictable compression
Semantic meaning must be preserved
You want zero-config operation
You’re working with structured data (JSON, tables, etc.)

Use LLMLingua when:

You can tolerate higher semantic loss
You have budget for additional LLM calls
You’re working with natural language prompts
You need extreme compression (80%+)

Use Manual Summarization when:

You have very specific domain requirements
Content is highly nuanced
You can afford human review

Live Demo and Playground

Want to try Brevit before integrating? Check out the interactive playground at:

👉 https://www.javianpicardo.com/brevit

The playground includes:

Real-time compression preview
Token count comparison (JSON vs YAML vs Brevit)
Side-by-side format visualization
OpenAI API testing (via Puter integration)
Support for text, JSON, PDF, and image inputs

Playground Screenshot Try the interactive playground to see compression in action

Advanced Usage: Intent-Based Optimization

Brevit’s optimize() function accepts an optional intent parameter that helps prioritize what to preserve:

// Preserve customer details for support queries
optimize(data, "customer support ticket analysis");

// Focus on financial data
optimize(data, "calculate quarterly revenue");

// Preserve technical details
optimize(data, "debug application error");

The intent guides compression decisions when trade-offs must be made.

Integration Examples

Express.js API Middleware

import express from 'express';
import { brevity } from 'brevit';

const app = express();

app.post('/api/analyze', async (req, res) => {
  const compressed = brevity(req.body);
  
  const result = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [{
      role: "user",
      content: `Analyze: ${compressed}`
    }]
  });
  
  res.json(result);
});

Python Flask Application

from flask import Flask, request
from brevit import brevity
import openai

app = Flask(__name__)

@app.route('/analyze', methods=['POST'])
def analyze():
    data = request.json
    compressed = brevity(data)
    
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{
            "role": "user",
            "content": f"Analyze: {compressed}"
        }]
    )
    
    return response

.NET Web API

using Brevit;
using Microsoft.AspNetCore.Mvc;

[ApiController]
[Route("api/[controller]")]
public class AnalyzeController : ControllerBase
{
    [HttpPost]
    public async Task<IActionResult> Analyze([FromBody] dynamic data)
    {
        var compressed = Brevit.Brevity(data);
        
        var response = await openAIClient.GetChatCompletion(
            new ChatMessage("user", $"Analyze: {compressed}")
        );
        
        return Ok(response);
    }
}

Performance Benchmarks

Based on testing with various data structures:

Data Type	Avg Token Reduction	Processing Time	Memory Overhead
JSON Objects	45-55%	<1ms	Negligible
Tabular Arrays	55-65%	<2ms	Negligible
Nested Structures	40-50%	<3ms	Negligible
Plain Text	30-40%	<1ms	Negligible
Mixed Content	42-52%	<2ms	Negligible

Benchmarks performed on typical web application datasets (1-100KB)

Best Practices

✅ Do’s

Use brevity() for general cases – The auto mode handles most scenarios well
Provide intent for domain-specific data – Helps optimize() make better decisions
Test with your actual data – Compression ratios vary by structure
Monitor LLM output quality – Ensure compression doesn’t hurt response accuracy
Combine with other optimizations – Use alongside prompt engineering best practices

❌ Don’ts

Don’t over-compress critical data – Some information loss is acceptable, but not for critical fields
Don’t skip testing – Validate that LLMs still understand compressed data
Don’t compress already minimal data – Sub-50 token payloads may not benefit
Don’t ignore schema changes – Update compression strategies when data structures evolve

Cost Savings Calculator

Let’s calculate potential savings for a typical application:

Assumptions:

10,000 API calls per day
Average 800 tokens per call
Using GPT-4 ($0.03 per 1K tokens)
50% token reduction with Brevit

Without Brevit:

Daily tokens: 10,000 × 800 = 8,000,000 tokens
Daily cost: $240
Monthly cost: $7,200
Annual cost: $86,400

With Brevit:

Daily tokens: 10,000 × 400 = 4,000,000 tokens
Daily cost: $120
Monthly cost: $3,600
Annual cost: $43,200

Annual Savings: $43,200 💰

Conclusion

Brevit represents a new category of tools designed specifically for the LLM era—semantic compression libraries that understand both data structure and AI context requirements. With zero-config operation, cross-platform support, and impressive 40-60% token reductions, it’s a must-have tool for any production LLM application.

Whether you’re building chatbots, RAG systems, data analysis pipelines, or any application that sends structured data to LLMs, Brevit can significantly reduce your API costs while maintaining the semantic fidelity your models need.

Getting Started

Install: npm install brevit (or pip/NuGet)
Try the playground: javianpicardo.com/brevit
Integrate: Add brevity() to your LLM pipeline
Monitor: Track token savings and output quality
Optimize: Use optimize() with intent for domain-specific needs

Resources

📦 npm Package: npmjs.com/package/brevit
🐍 PyPI Package: pypi.org/project/brevit
📘 NuGet Package: nuget.org/packages/Brevit
🎮 Playground: javianpicardo.com/brevit

Have you tried Brevit? Share your token savings and use cases in the comments below!

Footer Image Start saving on your LLM costs today

React in Enterprise Applications: From useEffect Tangles to State Machines

May 20, 2025

WebAssembly in Enterprise Applications: Solving Real-World Problems with WebAssembly using Python (via Pyodide) and TypeScript (via AssemblyScript)

May 14, 2025

Building a Retail Customer Management System: OOP Principles in Action with C#

May 02, 2025

Object-Oriented Programming Principles in Action