Optimal AI models – for Devices
Savings & Opportunities – for Business
Compressa.ai accelerates the transfer of
Transformers-based high-accurate AI models
On devices where the speed matters
Novel models that are working with vehicle resources restrictions
Faster operations based on computer vision decisions
On-board data processing that mitigates throughput restrictions
Autonomous on-board decisions for wide applications
Your Business will benefit:
💰 Savings

OPEX in cloud costs and carbon footprint of hardware reduction

⛰️ Opportunities
On-device: new AI products and best quality for current ones
💯 Users Satisfaction
Better UX of AI products by their speediness and quality
💹 Faster Growth
Models can compute on-device, Deploy in production is fast
Your Device will have optimal AI models:
Lower Latency: faster processing per object is equal to faster results.
Speeding up to x1.5-20
Higher Throughput: more data can be processed with fixed computational resources.
More data up to 50-300% per computational unit
Energy Efficiency: lower energy consumption on devices while model computations
Decrease of the energy consumption 10-30%
Lighter Models: less memory is used for model keeping on device
The target memory is just 5-25% of initial memory needed
Costs Optimization: maximizing the overall effect for lower price per object processing
Costs reduction by 10-50%
Low-bit Device Transfer: the opportunities to compute the model on the low-bit processors
The decrease of model quality in low-bit less than 5%
How it works?

State-of-the-Art compression methods will help

to optimize AI models for transferring them on your device

NAS: Search for neural network architectures
We study methods of automatic search and iteration of optimal neural network architectures that belong to a given class, fulfill the constraints defined by the task and solve the target task with the best quality.
We create methods for thinning the weights and relationships of models in order to optimize the resources consumed by the model.
We are exploring ways to train light models at the output of heavy analogues without loss as a solution to the final problem.
Evaluation of the potential quality of the model
We create methods for predicting the expected quality of the model on specific samples to automate the selection of the best candidates.
Effective methods of training models
We apply algorithms for automated initialization, optimization, and change approaches to model training to accelerate convergence to the best model configuration.
Before & After
Before Compressa.ai
After Compressa.ai
Core team focus
Team focus is spent on Compression
Focus is spent only on Product creation
Compression & Transfer Tasks
Looking for deep competence
Can be made with several clicks
Compression rates
x 1.5 – 2.0
x 2.0 – 10.0
Quality decrease
10 – 30 % drop
1 – 8 % drop
Use-Cases & Portfolio
Speech Recognition & Noise Reduction
Voice Assistants, Call Center Automation, Voice Interfaces
Optimization the pipeline for speech recognition models that are mostly based on the Transformers neural networks architecture
Object detection / segmentation / recognition
Drones, Automotive, Smart City, Biometry, Medicine
There are lots of architectures that are based on CNN img2img architecture
NLP Large models

Voice Assistants, RPA, Chat-bot

NLP Large models such as BERT are based on Transformers architectures
Signal processing

Predictive Maintenance, Biomed, Navigation
Such kind of models are mostly based on RNN and CNN architectures that could be effectively compressed
🔊 Compression of Noise Reduction NN
Tech Company
Challenge: Client needed to optimize the noise reduction model using int8 and float16 quantization, while ensuring the minimum drop in SDR noise reduction quality. The physical size of the model and the value of BOPS (basic operations per second) were chosen as target performance metrics.

Solution: Quantization (post-training, aware-training, learned side-step) with different initialization strategies (min-max, 99th percentile), Knowledge Distillation (KL loss, STFT Loss, Blockwise, etc.) and their combinations lets us to beat the target metrics

Results of float32→int8 quantisation
✔ Reducing the size of models: x25-x60
✔ Reduction BOPS: x150-x250
✔ SDR drop, dB: -0.2 - 1.6
Results of float32→float16 quantisation
✔ Reducing the size of models: x10 - x35
✔ Reduction BOPS: ~x30-80
✔ SDR drop, dB: -0.1 - 1.4
🎙️ Low-bit Quantization of ASR NN
Tech Company
Challenge: Client needed post-training quantized Transformers. The target was to reduce RAM and ROM consumption with inference time reduction, while minimizing the growth of the Word Error Rate. Furthermore, low-bit quantization was needed: int4 and int2 for Fairseq framework. 

Solution: We developed quantization schemes to perform calculations in low bitness, while maintaining the quality. Methods were: LSQ, APoT quantization methods, residual quantization approach and distillation using K-L divergence as a loss function.

We chose Fairseq model with target quantization:
For model in int4:
✔ model size reduction in Mb by >30%
✔ increase Word Error Rate by <0.5%
For model in int2:
✔ increase Word Error Rate by <4%
The library of compression allows us to reduce the risks of achieving results and accelerate the delivery of solutions
our methods are superior to the quality of ready-made Pwtorch or Tensorflow methods for complex architectures
the results of our methods are provided by honest methods of comparing the resources consumed
in adapting the methods to the customer's tasks, the solution architecture and research approach are provided
the results from the compression team are confirmed by a successful project track
Our Team
Core team of Compressa.ai
Alex Goncharov
CEO, Founder
Specialisation: DeepTech Start-ups
Konstantin Chernyak
Tech Lead
Specialisation: Pruning & Distillation
Alexander Prutko
Tech Lead
Specialisation: Quantization
Yuri Sverdlov
Product Manager
Specialisation: Management
Scientific advisors of Compressa.ai
Konstantin Vorontsov
Prof, MSU
Specialisation: NLP
Vadim Strijov
Prof, Grenoble
Specialisation: Sensors
Mikhail Burtsev
Prof, AIRI
Specialisation: Deep Learning
Ilya Zharikov
Specialisation: Compression
Schedule a Demo!