Optimal AI models – for Devices
Savings & Opportunities – for Business
Compressa.ai accelerates the transfer of
Transformers-based high-accurate AI models
On devices where the speed matters
Automotive
Novel models that are working with vehicle resources restrictions
Robotics
Faster operations based on computer vision decisions
Aerospace
On-board data processing that mitigates throughput restrictions
Drones
Autonomous on-board decisions for wide applications
Your Business will benefit:
  • 💰 Savings

    OPEX in cloud costs and carbon footprint of hardware reduction

  • ⛰️ Opportunities
    On-device: new AI products and best quality for current ones
  • 💯 Users Satisfaction
    Better UX of AI products by their speediness and quality
  • 💹 Faster Growth
    Models can compute on-device, Deploy in production is fast
Your Device will have optimal AI models:
  • Lower Latency: faster processing per object is equal to faster results.
    Speeding up to x1.5-20
  • Higher Throughput: more data can be processed with fixed computational resources.
    More data up to 50-300% per computational unit
  • Energy Efficiency: lower energy consumption on devices while model computations
    Decrease of the energy consumption 10-30%
  • Lighter Models: less memory is used for model keeping on device
    The target memory is just 5-25% of initial memory needed
  • Costs Optimization: maximizing the overall effect for lower price per object processing
    Costs reduction by 10-50%
  • Low-bit Device Transfer: the opportunities to compute the model on the low-bit processors
    The decrease of model quality in low-bit less than 5%
How it works?

State-of-the-Art compression methods will help

to optimize AI models for transferring them on your device

NAS: Search for neural network architectures
We study methods of automatic search and iteration of optimal neural network architectures that belong to a given class, fulfill the constraints defined by the task and solve the target task with the best quality.
Pruning
We create methods for thinning the weights and relationships of models in order to optimize the resources consumed by the model.
Distillation
We are exploring ways to train light models at the output of heavy analogues without loss as a solution to the final problem.
We reduce the bitness of operations and weights of neural network models for the possibility of application on low-board processors, as well as to speed up the calculations of the model.
Evaluation of the potential quality of the model
We create methods for predicting the expected quality of the model on specific samples to automate the selection of the best candidates.
Effective methods of training models
We apply algorithms for automated initialization, optimization, and change approaches to model training to accelerate convergence to the best model configuration.
Before & After
Features
Before Compressa.ai
After Compressa.ai
Core team focus
Team focus is spent on Compression
Focus is spent only on Product creation
Compression & Transfer Tasks
Looking for deep competence
Can be made with several clicks
Compression rates
x 1.5 – 2.0
x 2.0 – 10.0
Quality decrease
10 – 30 % drop
1 – 8 % drop
Use-Cases & Portfolio
Speech Recognition & Noise Reduction
Voice Assistants, Call Center Automation, Voice Interfaces
Optimization the pipeline for speech recognition models that are mostly based on the Transformers neural networks architecture
Object detection / segmentation / recognition
Drones, Automotive, Smart City, Biometry, Medicine
There are lots of architectures that are based on CNN img2img architecture
NLP Large models

Voice Assistants, RPA, Chat-bot

NLP Large models such as BERT are based on Transformers architectures
Signal processing

Predictive Maintenance, Biomed, Navigation
Such kind of models are mostly based on RNN and CNN architectures that could be effectively compressed
🔊 Compression of Noise Reduction NN
Tech Company
Challenge: Client needed to optimize the noise reduction model using int8 and float16 quantization, while ensuring the minimum drop in SDR noise reduction quality. The physical size of the model and the value of BOPS (basic operations per second) were chosen as target performance metrics.

Solution: Quantization (post-training, aware-training, learned side-step) with different initialization strategies (min-max, 99th percentile), Knowledge Distillation (KL loss, STFT Loss, Blockwise, etc.) and their combinations lets us to beat the target metrics

Result:
Results of float32→int8 quantisation
✔ Reducing the size of models: x25-x60
✔ Reduction BOPS: x150-x250
✔ SDR drop, dB: -0.2 - 1.6
Results of float32→float16 quantisation
✔ Reducing the size of models: x10 - x35
✔ Reduction BOPS: ~x30-80
✔ SDR drop, dB: -0.1 - 1.4
🎙️ Low-bit Quantization of ASR NN
Tech Company
Challenge: Client needed post-training quantized Transformers. The target was to reduce RAM and ROM consumption with inference time reduction, while minimizing the growth of the Word Error Rate. Furthermore, low-bit quantization was needed: int4 and int2 for Fairseq framework. 

Solution: We developed quantization schemes to perform calculations in low bitness, while maintaining the quality. Methods were: LSQ, APoT quantization methods, residual quantization approach and distillation using K-L divergence as a loss function.

Result:
We chose Fairseq model with target quantization:
For model in int4:
✔ model size reduction in Mb by >30%
✔ increase Word Error Rate by <0.5%
For model in int2:
✔ increase Word Error Rate by <4%
The library of compression allows us to reduce the risks of achieving results and accelerate the delivery of solutions
Quality
our methods are superior to the quality of ready-made Pwtorch or Tensorflow methods for complex architectures
Measurability
the results of our methods are provided by honest methods of comparing the resources consumed
Flexibility
in adapting the methods to the customer's tasks, the solution architecture and research approach are provided
Guarantees
the results from the compression team are confirmed by a successful project track
Our Team
Core team of Compressa.ai
  • Alex Goncharov
    CEO, Founder
    Specialisation: DeepTech Start-ups
  • Konstantin Chernyak
    Tech Lead
    Specialisation: Pruning & Distillation
  • Alexander Prutko
    Tech Lead
    Specialisation: Quantization
  • Yuri Sverdlov
    Product Manager
    Specialisation: Management
Scientific advisors of Compressa.ai
  • Konstantin Vorontsov
    Prof, MSU
    Specialisation: NLP
  • Vadim Strijov
    Prof, Grenoble
    Specialisation: Sensors
  • Mikhail Burtsev
    Prof, AIRI
    Specialisation: Deep Learning
  • Ilya Zharikov
    Consult
    Specialisation: Compression
Schedule a Demo!