Back ArrowBack to blog

Brevit: The Semantic Compression Library That Cuts LLM Costs by 60%

Javian Picardo
Jan 02, 2026

Hero Image Photo by Shubham Dhage on Unsplash

Every API call to a Large Language Model comes with a price tag directly tied to the number of tokens processed. For developers working with OpenAI, Anthropic, or other LLM providers, reducing token count while maintaining semantic meaning isn’t just an optimization—it’s a business imperative.

Enter Brevit, a lightweight semantic compression library designed specifically for LLM prompts. With an average token reduction of 40-60%, Brevit helps developers slash API costs, improve response latency, and maximize context window utilization—all while preserving the essential meaning and structure of your data.

The Token Economics Problem

Before diving into Brevit, let’s understand the problem it solves.

Token Cost Graph Photo by Markus Spiske on Unsplash

Modern LLMs charge based on token consumption. Consider these real costs:

  • GPT-4: $0.03 per 1K input tokens
  • Claude 3.5 Sonnet: $0.003 per 1K input tokens

For applications processing thousands of prompts daily, even small reductions compound into substantial savings. A 50% token reduction on 10 million tokens per day could save $150-1,500 monthly depending on your model choice.

What is Brevit?

Brevit is a cross-platform semantic compression library available for JavaScript/Node.js, Python, and .NET that intelligently compresses structured data for LLM consumption. Unlike traditional compression algorithms (like gzip or Brotli), Brevit focuses on semantic compression—reducing verbosity while preserving meaning that LLMs can understand.

Key Features

40-60% token reduction on average
Auto mode that intelligently selects compression strategies
Multi-format support: JSON, YAML, text, tabular data, PDFs, images
Zero-config defaults with customization options
Cross-platform: npm, pip, NuGet
Preserves semantic meaning that LLMs need

Installation

JavaScript/Node.js

npm install brevit

Python

pip install brevit

.NET

dotnet add package Brevit

Core Functions: brevity() vs optimize()

Brevit provides two main approaches to compression:

1. Auto Mode: brevity()

The brevity() function automatically analyzes your input and selects the optimal compression strategy:

import { brevity } from 'brevit';

const data = {
  "friends": ["ana", "luis", "sam"],
  "items": [
    { "sku": "A-88", "qty": 1, "price": 29.99 },
    { "sku": "T-22", "qty": 2, "price": 39.99 }
  ],
  "customer": { "name": "John Doe", "email": "john@example.com" }
};

const compressed = brevity(data);
console.log(compressed);

Output:

friends:ana,luis,sam
items:sku|qty|price
A-88|1|29.99
T-22|2|39.99
customer.name:John Doe,email:john@example.com

2. Explicit Mode: optimize()

For more control, use optimize() with an intent parameter:

import { optimize } from 'brevit';

const compressed = optimize(data, "summarize customer order");

The intent helps Brevit prioritize what information to preserve most carefully.

Format Comparison: JSON vs YAML vs Brevit

Data Formats Comparison Photo by Luke Chesser on Unsplash

Let’s compare how different formats handle the same data structure:

Example Dataset

{
  "user": {
    "id": 12345,
    "name": "Sarah Johnson",
    "email": "sarah.j@company.com",
    "role": "Senior Developer",
    "department": "Engineering"
  },
  "projects": [
    {
      "id": "PROJ-001",
      "name": "API Redesign",
      "status": "active",
      "budget": 50000,
      "team_size": 5
    },
    {
      "id": "PROJ-002",
      "name": "Mobile App",
      "status": "planning",
      "budget": 75000,
      "team_size": 8
    }
  ]
}

Token Count Comparison

FormatToken CountSavingsProsCons
JSON~180 tokensBaselineHuman-readable, widely supportedVerbose, lots of syntax overhead
YAML~140 tokens22%More concise than JSON, readableStill has significant whitespace
Brevit~75 tokens58%Optimized for LLMs, minimal overheadRequires parsing library

Why Brevit Wins for LLMs

JSON Format (180 tokens):

{
  "user": {
    "id": 12345,
    "name": "Sarah Johnson",
    ...
  }
}

YAML Format (140 tokens):

user:
  id: 12345
  name: Sarah Johnson
  ...

Brevit Format (75 tokens):

user.id:12345,name:Sarah Johnson,email:sarah.j@company.com,role:Senior Developer,dept:Engineering
projects:id|name|status|budget|team_size
PROJ-001|API Redesign|active|50000|5
PROJ-002|Mobile App|planning|75000|8

Key Differences:

  1. Syntax Overhead: JSON and YAML require brackets, quotes, and indentation. Brevit minimizes structural characters.
  2. Tabular Optimization: For arrays of objects with consistent structure, Brevit uses a header-row format similar to CSV, dramatically reducing repetition.
  3. Nested Flattening: Brevit intelligently flattens nested structures using dot notation where appropriate.
  4. Semantic Preservation: Unlike simple minification, Brevit maintains the semantic relationships LLMs need to understand context.

Supported Data Types

Brevit handles multiple input formats seamlessly:

1. JSON Objects and Arrays

Perfect for API responses, configuration data, and structured records.

const apiResponse = {
  "status": 200,
  "data": [...],
  "metadata": {...}
};

const compressed = brevity(apiResponse);

2. Plain Text and Structured Text

Compresses verbose text while maintaining readability.

const text = `
  Dear Customer,
  
  Thank you for your inquiry regarding our premium 
  services. We would be delighted to assist you...
`;

const compressed = brevity(text);

3. Tabular Data

Highly efficient for arrays of objects with consistent schemas.

const sales = [
  { date: "2024-01-01", product: "Widget", qty: 100, revenue: 5000 },
  { date: "2024-01-02", product: "Gadget", qty: 75, revenue: 3750 },
  // ... hundreds more rows
];

const compressed = brevity(sales);
// Converts to efficient column-based format

4. Nested Structures

Intelligently handles deeply nested objects.

const complex = {
  company: {
    departments: {
      engineering: {
        teams: [...],
        projects: [...]
      }
    }
  }
};

5. Mixed Content Types

Handles documents with multiple data structures.

Real-World Use Cases

Use Cases Photo by Carlos Muza on Unsplash

1. RAG (Retrieval-Augmented Generation) Systems

When retrieving context from vector databases, you often hit token limits. Brevit lets you include more context:

// Before: 3 documents fit in context
const contexts = await vectorDB.query(query, limit: 3);

// After: 7-8 documents fit in same context window
const contexts = await vectorDB.query(query, limit: 8);
const compressed = contexts.map(doc => brevity(doc.content));

2. Chatbot Context Management

Maintain longer conversation histories without exceeding limits:

const conversationHistory = [
  { role: "user", content: "..." },
  { role: "assistant", content: "..." },
  // ... 20 more exchanges
];

const compressedHistory = brevity(conversationHistory);

3. API Request Payload Optimization

Reduce costs when sending large datasets to LLMs for analysis:

// Analyzing 500 customer reviews
const reviews = await db.reviews.findMany({ limit: 500 });
const compressed = brevity(reviews);

const analysis = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{
    role: "user",
    content: `Analyze these reviews: ${compressed}`
  }]
});

4. Data Preprocessing for AI Pipelines

Integrate Brevit into data preprocessing before feeding to LLMs:

const pipeline = [
  loadData,
  cleanData,
  brevity, // <-- Add compression step
  sendToLLM
];

5. Cost-Sensitive Production Applications

For high-volume applications where every token matters:

// Processing 10,000 documents daily
const documents = await fetchDailyDocuments();

const processedDocs = await Promise.all(
  documents.map(async doc => {
    const compressed = brevity(doc);
    return await llm.process(compressed);
  })
);

// Monthly savings: $500-5000 depending on model

Brevit vs Other Compression Approaches

ApproachToken ReductionSemantic LossSetup ComplexityCost
Brevit40-60%MinimalZero-configFree
LLMLingua50-80%Low-MediumRequires separate LLMAdditional API costs
Manual Summarization30-70%VariableHigh (manual work)Development time
JSON Minification5-10%NoneLowFree
YAML Conversion15-25%NoneLowFree
Custom Rules20-50%MediumVery HighDevelopment time

When to Use What

Use Brevit when:

  • You need consistent, predictable compression
  • Semantic meaning must be preserved
  • You want zero-config operation
  • You’re working with structured data (JSON, tables, etc.)

Use LLMLingua when:

  • You can tolerate higher semantic loss
  • You have budget for additional LLM calls
  • You’re working with natural language prompts
  • You need extreme compression (80%+)

Use Manual Summarization when:

  • You have very specific domain requirements
  • Content is highly nuanced
  • You can afford human review

Live Demo and Playground

Want to try Brevit before integrating? Check out the interactive playground at:

👉 https://www.javianpicardo.com/brevit

The playground includes:

  • Real-time compression preview
  • Token count comparison (JSON vs YAML vs Brevit)
  • Side-by-side format visualization
  • OpenAI API testing (via Puter integration)
  • Support for text, JSON, PDF, and image inputs

Playground Screenshot Try the interactive playground to see compression in action

Advanced Usage: Intent-Based Optimization

Brevit’s optimize() function accepts an optional intent parameter that helps prioritize what to preserve:

// Preserve customer details for support queries
optimize(data, "customer support ticket analysis");

// Focus on financial data
optimize(data, "calculate quarterly revenue");

// Preserve technical details
optimize(data, "debug application error");

The intent guides compression decisions when trade-offs must be made.

Integration Examples

Express.js API Middleware

import express from 'express';
import { brevity } from 'brevit';

const app = express();

app.post('/api/analyze', async (req, res) => {
  const compressed = brevity(req.body);
  
  const result = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [{
      role: "user",
      content: `Analyze: ${compressed}`
    }]
  });
  
  res.json(result);
});

Python Flask Application

from flask import Flask, request
from brevit import brevity
import openai

app = Flask(__name__)

@app.route('/analyze', methods=['POST'])
def analyze():
    data = request.json
    compressed = brevity(data)
    
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{
            "role": "user",
            "content": f"Analyze: {compressed}"
        }]
    )
    
    return response

.NET Web API

using Brevit;
using Microsoft.AspNetCore.Mvc;

[ApiController]
[Route("api/[controller]")]
public class AnalyzeController : ControllerBase
{
    [HttpPost]
    public async Task<IActionResult> Analyze([FromBody] dynamic data)
    {
        var compressed = Brevit.Brevity(data);
        
        var response = await openAIClient.GetChatCompletion(
            new ChatMessage("user", $"Analyze: {compressed}")
        );
        
        return Ok(response);
    }
}

Performance Benchmarks

Based on testing with various data structures:

Data TypeAvg Token ReductionProcessing TimeMemory Overhead
JSON Objects45-55%<1msNegligible
Tabular Arrays55-65%<2msNegligible
Nested Structures40-50%<3msNegligible
Plain Text30-40%<1msNegligible
Mixed Content42-52%<2msNegligible

Benchmarks performed on typical web application datasets (1-100KB)

Best Practices

✅ Do’s

  1. Use brevity() for general cases – The auto mode handles most scenarios well
  2. Provide intent for domain-specific data – Helps optimize() make better decisions
  3. Test with your actual data – Compression ratios vary by structure
  4. Monitor LLM output quality – Ensure compression doesn’t hurt response accuracy
  5. Combine with other optimizations – Use alongside prompt engineering best practices

❌ Don’ts

  1. Don’t over-compress critical data – Some information loss is acceptable, but not for critical fields
  2. Don’t skip testing – Validate that LLMs still understand compressed data
  3. Don’t compress already minimal data – Sub-50 token payloads may not benefit
  4. Don’t ignore schema changes – Update compression strategies when data structures evolve

Cost Savings Calculator

Let’s calculate potential savings for a typical application:

Assumptions:

  • 10,000 API calls per day
  • Average 800 tokens per call
  • Using GPT-4 ($0.03 per 1K tokens)
  • 50% token reduction with Brevit

Without Brevit:

  • Daily tokens: 10,000 × 800 = 8,000,000 tokens
  • Daily cost: $240
  • Monthly cost: $7,200
  • Annual cost: $86,400

With Brevit:

  • Daily tokens: 10,000 × 400 = 4,000,000 tokens
  • Daily cost: $120
  • Monthly cost: $3,600
  • Annual cost: $43,200

Annual Savings: $43,200 💰

Conclusion

Brevit represents a new category of tools designed specifically for the LLM era—semantic compression libraries that understand both data structure and AI context requirements. With zero-config operation, cross-platform support, and impressive 40-60% token reductions, it’s a must-have tool for any production LLM application.

Whether you’re building chatbots, RAG systems, data analysis pipelines, or any application that sends structured data to LLMs, Brevit can significantly reduce your API costs while maintaining the semantic fidelity your models need.

Getting Started

  1. Install: npm install brevit (or pip/NuGet)
  2. Try the playground: javianpicardo.com/brevit
  3. Integrate: Add brevity() to your LLM pipeline
  4. Monitor: Track token savings and output quality
  5. Optimize: Use optimize() with intent for domain-specific needs

Resources


Have you tried Brevit? Share your token savings and use cases in the comments below!

Footer Image Start saving on your LLM costs today