Platform

Compressa.ai

Automotive

Novel models that are working with vehicle resources restrictions

Robotics

Faster operations based on computer vision decisions

Aerospace

On-board data processing that mitigates throughput restrictions

Drones

Autonomous on-board decisions for wide applications

Your Business will benefit:

💰 Savings

OPEX in cloud costs and carbon footprint of hardware reduction
⛰️ Opportunities

On-device: new AI products and best quality for current ones
💯 Users Satisfaction

Better UX of AI products by their speediness and quality
💹 Faster Growth

Models can compute on-device, Deploy in production is fast

Lower Latency: faster processing per object is equal to faster results.
Speeding up to x1.5-20
Higher Throughput: more data can be processed with fixed computational resources.
More data up to 50-300% per computational unit
Energy Efficiency: lower energy consumption on devices while model computations
Decrease of the energy consumption 10-30%
Lighter Models: less memory is used for model keeping on device
The target memory is just 5-25% of initial memory needed
Costs Optimization: maximizing the overall effect for lower price per object processing
Costs reduction by 10-50%
Low-bit Device Transfer: the opportunities to compute the model on the low-bit processors
The decrease of model quality in low-bit less than 5%

Learn more

How it works?

State-of-the-Art compression methods will help

to optimize AI models for transferring them on your device

NAS: Search for neural network architectures

We study methods of automatic search and iteration of optimal neural network architectures that belong to a given class, fulfill the constraints defined by the task and solve the target task with the best quality.

Pruning

We create methods for thinning the weights and relationships of models in order to optimize the resources consumed by the model.

Distillation

We are exploring ways to train light models at the output of heavy analogues without loss as a solution to the final problem.

Quantization

We reduce the bitness of operations and weights of neural network models for the possibility of application on low-board processors, as well as to speed up the calculations of the model.

Evaluation of the potential quality of the model

We create methods for predicting the expected quality of the model on specific samples to automate the selection of the best candidates.

Effective methods of training models

We apply algorithms for automated initialization, optimization, and change approaches to model training to accelerate convergence to the best model configuration.

Before & After

Core team focus

Team focus is spent on Compression

Focus is spent only on Product creation

Compression & Transfer Tasks

Looking for deep competence

Can be made with several clicks

Compression rates

x 1.5 – 2.0

x 2.0 – 10.0

Quality decrease

10 – 30 % drop

1 – 8 % drop

Want a Product!

Use-Cases & Portfolio

Speech Recognition & Noise Reduction

Voice Assistants, Call Center Automation, Voice Interfaces

Optimization the pipeline for speech recognition models that are mostly based on the Transformers neural networks architecture

Object detection / segmentation / recognition

Drones, Automotive, Smart City, Biometry, Medicine

There are lots of architectures that are based on CNN img2img architecture

NLP Large models

Voice Assistants, RPA, Chat-bot

NLP Large models such as BERT are based on Transformers architectures

Signal processing

Predictive Maintenance, Biomed, Navigation

Such kind of models are mostly based on RNN and CNN architectures that could be effectively compressed

🔊 Compression of Noise Reduction NN

Tech Company

Challenge: Client needed to optimize the noise reduction model using int8 and float16 quantization, while ensuring the minimum drop in SDR noise reduction quality. The physical size of the model and the value of BOPS (basic operations per second) were chosen as target performance metrics.

Solution: Quantization (post-training, aware-training, learned side-step) with different initialization strategies (min-max, 99th percentile), Knowledge Distillation (KL loss, STFT Loss, Blockwise, etc.) and their combinations lets us to beat the target metrics

Result:
Results of float32→int8 quantisation
✔ Reducing the size of models: x25-x60
✔ Reduction BOPS: x150-x250
✔ SDR drop, dB: -0.2 - 1.6
Results of float32→float16 quantisation
✔ Reducing the size of models: x10 - x35
✔ Reduction BOPS: ~x30-80
✔ SDR drop, dB: -0.1 - 1.4

🎙️ Low-bit Quantization of ASR NN

Tech Company

Challenge: Client needed post-training quantized Transformers. The target was to reduce RAM and ROM consumption with inference time reduction, while minimizing the growth of the Word Error Rate. Furthermore, low-bit quantization was needed: int4 and int2 for Fairseq framework.

Solution: We developed quantization schemes to perform calculations in low bitness, while maintaining the quality. Methods were: LSQ, APoT quantization methods, residual quantization approach and distillation using K-L divergence as a loss function.

Result:
We chose Fairseq model with target quantization:
For model in int4:
✔ model size reduction in Mb by >30%
✔ increase Word Error Rate by <0.5%
For model in int2:
✔ increase Word Error Rate by <4%

Quality

our methods are superior to the quality of ready-made Pwtorch or Tensorflow methods for complex architectures

Measurability

the results of our methods are provided by honest methods of comparing the resources consumed

Flexibility

in adapting the methods to the customer's tasks, the solution architecture and research approach are provided

Guarantees

the results from the compression team are confirmed by a successful project track

Our Team

Core team of Compressa.ai

Alex Goncharov

CEO, Founder
Specialisation: DeepTech Start-ups
Konstantin Chernyak

Tech Lead
Specialisation: Pruning & Distillation
Alexander Prutko

Tech Lead
Specialisation: Quantization
Yuri Sverdlov

Product Manager
Specialisation: Management

Scientific advisors of Compressa.ai

Konstantin Vorontsov

Prof, MSU
Specialisation: NLP
Vadim Strijov

Prof, Grenoble
Specialisation: Sensors
Mikhail Burtsev

Prof, AIRI
Specialisation: Deep Learning
Ilya Zharikov

Consult
Specialisation: Compression

Schedule a Demo!

Compressa.ai