# SOME DESCRIPTIVE TITLE. # Copyright (C) 2022, Microsoft # This file is distributed under the same license as the NNI package. # FIRST AUTHOR , 2022. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: NNI \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2022-04-13 03:14+0000\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: Babel 2.9.1\n" #: ../../source/compression/overview.rst:2 msgid "Overview of NNI Model Compression" msgstr "" #: ../../source/compression/overview.rst:4 msgid "" "Deep neural networks (DNNs) have achieved great success in many tasks " "like computer vision, nature launguage processing, speech processing. " "However, typical neural networks are both computationally expensive and " "energy-intensive, which can be difficult to be deployed on devices with " "low computation resources or with strict latency requirements. Therefore," " a natural thought is to perform model compression to reduce model size " "and accelerate model training/inference without losing performance " "significantly. Model compression techniques can be divided into two " "categories: pruning and quantization. The pruning methods explore the " "redundancy in the model weights and try to remove/prune the redundant and" " uncritical weights. Quantization refers to compress models by reducing " "the number of bits required to represent weights or activations. We " "further elaborate on the two methods, pruning and quantization, in the " "following chapters. Besides, the figure below visualizes the difference " "between these two methods." msgstr "" #: ../../source/compression/overview.rst:19 msgid "" "NNI provides an easy-to-use toolkit to help users design and use model " "pruning and quantization algorithms. For users to compress their models, " "they only need to add several lines in their code. There are some popular" " model compression algorithms built-in in NNI. On the other hand, users " "could easily customize their new compression algorithms using NNI’s " "interface." msgstr "" #: ../../source/compression/overview.rst:24 msgid "There are several core features supported by NNI model compression:" msgstr "" #: ../../source/compression/overview.rst:26 msgid "Support many popular pruning and quantization algorithms." msgstr "" #: ../../source/compression/overview.rst:27 msgid "" "Automate model pruning and quantization process with state-of-the-art " "strategies and NNI's auto tuning power." msgstr "" #: ../../source/compression/overview.rst:28 msgid "" "Speedup a compressed model to make it have lower inference latency and " "also make it smaller." msgstr "" #: ../../source/compression/overview.rst:29 msgid "" "Provide friendly and easy-to-use compression utilities for users to dive " "into the compression process and results." msgstr "" #: ../../source/compression/overview.rst:30 msgid "Concise interface for users to customize their own compression algorithms." msgstr "" #: ../../source/compression/overview.rst:34 msgid "Compression Pipeline" msgstr "" #: ../../source/compression/overview.rst:42 msgid "" "The overall compression pipeline in NNI is shown above. For compressing a" " pretrained model, pruning and quantization can be used alone or in " "combination. If users want to apply both, a sequential mode is " "recommended as common practise." msgstr "" #: ../../source/compression/overview.rst:46 msgid "" "Note that NNI pruners or quantizers are not meant to physically compact " "the model but for simulating the compression effect. Whereas NNI speedup " "tool can truly compress model by changing the network architecture and " "therefore reduce latency. To obtain a truly compact model, users should " "conduct :doc:`pruning speedup <../tutorials/pruning_speedup>` or " ":doc:`quantizaiton speedup <../tutorials/quantization_speedup>`. The " "interface and APIs are unified for both PyTorch and TensorFlow. Currently" " only PyTorch version has been supported, and TensorFlow version will be " "supported in future." msgstr "" #: ../../source/compression/overview.rst:52 msgid "Model Speedup" msgstr "" #: ../../source/compression/overview.rst:54 msgid "" "The final goal of model compression is to reduce inference latency and " "model size. However, existing model compression algorithms mainly use " "simulation to check the performance (e.g., accuracy) of compressed model." " For example, using masks for pruning algorithms, and storing quantized " "values still in float32 for quantization algorithms. Given the output " "masks and quantization bits produced by those algorithms, NNI can really " "speedup the model." msgstr "" #: ../../source/compression/overview.rst:59 msgid "The following figure shows how NNI prunes and speeds up your models." msgstr "" #: ../../source/compression/overview.rst:67 msgid "" "The detailed tutorial of Speedup Model with Mask can be found :doc:`here " "<../tutorials/pruning_speedup>`. The detailed tutorial of Speedup Model " "with Calibration Config can be found :doc:`here " "<../tutorials/quantization_speedup>`." msgstr "" #: ../../source/compression/overview.rst:72 msgid "" "NNI's model pruning framework has been upgraded to a more powerful " "version (named pruning v2 before nni v2.6). The old version (`named " "pruning before nni v2.6 " "`_) will be " "out of maintenance. If for some reason you have to use the old pruning, " "v2.6 is the last nni version to support old pruning version." msgstr ""