Unverified Commit fe3df9d5 authored by Klaus Hipp's avatar Klaus Hipp Committed by GitHub
Browse files

[Docs] Add language identifiers to fenced code blocks (#28955)

Add language identifiers to code blocks
parent c617f988
......@@ -8,7 +8,7 @@ The model is loaded with the pre-trained weights for the abstractive summarizati
## Setup
```
```bash
git clone https://github.com/huggingface/transformers && cd transformers
pip install .
pip install nltk py-rouge
......
......@@ -34,7 +34,7 @@ This is for evaluating fine-tuned DeeBERT models, given a number of different ea
## Citation
Please cite our paper if you find the resource useful:
```
```bibtex
@inproceedings{xin-etal-2020-deebert,
title = "{D}ee{BERT}: Dynamic Early Exiting for Accelerating {BERT} Inference",
author = "Xin, Ji and
......
......@@ -183,7 +183,7 @@ Happy distillation!
If you find the resource useful, you should cite the following paper:
```
```bibtex
@inproceedings{sanh2019distilbert,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
......
......@@ -84,7 +84,7 @@ python run_clm_igf.py\
If you find the resource useful, please cite the following paper
```
```bibtex
@inproceedings{antonello-etal-2021-selecting,
title = "Selecting Informative Contexts Improves Language Model Fine-tuning",
author = "Antonello, Richard and Beckage, Nicole and Turek, Javier and Huth, Alexander",
......
......@@ -311,7 +311,7 @@ library from source to profit from the most current additions during the communi
Simply run the following steps:
```
```bash
$ cd ~/
$ git clone https://github.com/huggingface/datasets.git
$ cd datasets
......@@ -389,13 +389,13 @@ source ~/<your-venv-name>/bin/activate
Next you should install JAX's TPU version on TPU by running the following command:
```
```bash
$ pip install requests
```
and then:
```
```bash
$ pip install "jax[tpu]>=0.2.16" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
```
......@@ -468,7 +468,7 @@ library from source to profit from the most current additions during the communi
Simply run the following steps:
```
```bash
$ cd ~/
$ git clone https://github.com/huggingface/datasets.git
$ cd datasets
......@@ -568,7 +568,7 @@ class ModelPyTorch:
Instantiating an object `model_pytorch` of the class `ModelPyTorch` would actually allocate memory for the model weights and attach them to the attributes `self.key_proj`, `self.value_proj`, `self.query_proj`, and `self.logits.proj`. We could access the weights via:
```
```python
key_projection_matrix = model_pytorch.key_proj.weight.data
```
......@@ -1224,25 +1224,25 @@ Sometimes you might be using different libraries or a very specific application
A common use case is how to load files you have in your model repository in the Hub from the Streamlit demo. The `huggingface_hub` library is here to help you!
```
```bash
pip install huggingface_hub
```
Here is an example downloading (and caching!) a specific file directly from the Hub
```
```python
from huggingface_hub import hf_hub_download
filepath = hf_hub_download("flax-community/roberta-base-als", "flax_model.msgpack");
```
In many cases you will want to download the full repository. Here is an example downloading all the files from a repo. You can even specify specific revisions!
```
```python
from huggingface_hub import snapshot_download
local_path = snapshot_download("flax-community/roberta-base-als");
```
Note that if you're using 🤗 Transformers library, you can quickly load the model and tokenizer as follows
```
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("REPO_ID")
......
......@@ -42,20 +42,20 @@ Here we call the model `"english-roberta-base-dummy"`, but you can change the mo
You can do this either directly on [huggingface.co](https://huggingface.co/new) (assuming that
you are logged in) or via the command line:
```
```bash
huggingface-cli repo create english-roberta-base-dummy
```
Next we clone the model repository to add the tokenizer and model files.
```
```bash
git clone https://huggingface.co/<your-username>/english-roberta-base-dummy
```
To ensure that all tensorboard traces will be uploaded correctly, we need to
track them. You can run the following command inside your model repo to do so.
```
```bash
cd english-roberta-base-dummy
git lfs track "*tfevents*"
```
......
......@@ -43,17 +43,17 @@ Here we call the model `"clip-roberta-base"`, but you can change the model name
You can do this either directly on [huggingface.co](https://huggingface.co/new) (assuming that
you are logged in) or via the command line:
```
```bash
huggingface-cli repo create clip-roberta-base
```
Next we clone the model repository to add the tokenizer and model files.
```
```bash
git clone https://huggingface.co/<your-username>/clip-roberta-base
```
To ensure that all tensorboard traces will be uploaded correctly, we need to
track them. You can run the following command inside your model repo to do so.
```
```bash
cd clip-roberta-base
git lfs track "*tfevents*"
```
......
......@@ -18,20 +18,20 @@ Here we call the model `"wav2vec2-base-robust"`, but you can change the model na
You can do this either directly on [huggingface.co](https://huggingface.co/new) (assuming that
you are logged in) or via the command line:
```
```bash
huggingface-cli repo create wav2vec2-base-robust
```
Next we clone the model repository to add the tokenizer and model files.
```
```bash
git clone https://huggingface.co/<your-username>/wav2vec2-base-robust
```
To ensure that all tensorboard traces will be uploaded correctly, we need to
track them. You can run the following command inside your model repo to do so.
```
```bash
cd wav2vec2-base-robust
git lfs track "*tfevents*"
```
......
......@@ -6,7 +6,7 @@ Based on the script [`run_mmimdb.py`](https://github.com/huggingface/transformer
### Training on MM-IMDb
```
```bash
python run_mmimdb.py \
--data_dir /path/to/mmimdb/dataset/ \
--model_type bert \
......
......@@ -173,7 +173,7 @@ In particular, hardware manufacturers are announcing devices that will speedup i
If you find this resource useful, please consider citing the following paper:
```
```bibtex
@article{sanh2020movement,
title={Movement Pruning: Adaptive Sparsity by Fine-Tuning},
author={Victor Sanh and Thomas Wolf and Alexander M. Rush},
......
......@@ -30,17 +30,17 @@ Required:
## Setup the environment with Dockerfile
Under the directory of `transformers/`, build the docker image:
```
```bash
docker build . -f examples/research_projects/quantization-qdqbert/Dockerfile -t bert_quantization:latest
```
Run the docker:
```
```bash
docker run --gpus all --privileged --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 bert_quantization:latest
```
In the container:
```
```bash
cd transformers/examples/research_projects/quantization-qdqbert/
```
......@@ -48,7 +48,7 @@ cd transformers/examples/research_projects/quantization-qdqbert/
Calibrate the pretrained model and finetune with quantization awared:
```
```bash
python3 run_quant_qa.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
......@@ -60,7 +60,7 @@ python3 run_quant_qa.py \
--percentile 99.99
```
```
```bash
python3 run_quant_qa.py \
--model_name_or_path calib/bert-base-uncased \
--dataset_name squad \
......@@ -80,7 +80,7 @@ python3 run_quant_qa.py \
To export the QAT model finetuned above:
```
```bash
python3 run_quant_qa.py \
--model_name_or_path finetuned_int8/bert-base-uncased \
--output_dir ./ \
......@@ -97,19 +97,19 @@ Recalibrating will affect the accuracy of the model, but the change should be mi
### Benchmark the INT8 QAT ONNX model inference with TensorRT using dummy input
```
```bash
trtexec --onnx=model.onnx --explicitBatch --workspace=16384 --int8 --shapes=input_ids:64x128,attention_mask:64x128,token_type_ids:64x128 --verbose
```
### Benchmark the INT8 QAT ONNX model inference with [ONNX Runtime-TRT](https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html) using dummy input
```
```bash
python3 ort-infer-benchmark.py
```
### Evaluate the INT8 QAT ONNX model inference with TensorRT
```
```bash
python3 evaluate-hf-trt-qa.py \
--onnx_model_path=./model.onnx \
--output_dir ./ \
......@@ -126,7 +126,7 @@ python3 evaluate-hf-trt-qa.py \
Finetune a fp32 precision model with [transformers/examples/pytorch/question-answering/](../../pytorch/question-answering/):
```
```bash
python3 ../../pytorch/question-answering/run_qa.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
......@@ -145,7 +145,7 @@ python3 ../../pytorch/question-answering/run_qa.py \
### PTQ by calibrating and evaluating the finetuned FP32 model above:
```
```bash
python3 run_quant_qa.py \
--model_name_or_path ./finetuned_fp32/bert-base-uncased \
--dataset_name squad \
......@@ -161,7 +161,7 @@ python3 run_quant_qa.py \
### Export the INT8 PTQ model to ONNX
```
```bash
python3 run_quant_qa.py \
--model_name_or_path ./calib/bert-base-uncased \
--output_dir ./ \
......@@ -175,7 +175,7 @@ python3 run_quant_qa.py \
### Evaluate the INT8 PTQ ONNX model inference with TensorRT
```
```bash
python3 evaluate-hf-trt-qa.py \
--onnx_model_path=./model.onnx \
--output_dir ./ \
......
......@@ -45,7 +45,7 @@ We publish two `base` models which can serve as a starting point for finetuning
The `base` models initialize the question encoder with [`facebook/dpr-question_encoder-single-nq-base`](https://huggingface.co/facebook/dpr-question_encoder-single-nq-base) and the generator with [`facebook/bart-large`](https://huggingface.co/facebook/bart-large).
If you would like to initialize finetuning with a base model using different question encoder and generator architectures, you can build it with a consolidation script, e.g.:
```
```bash
python examples/research_projects/rag/consolidate_rag_checkpoint.py \
--model_type rag_sequence \
--generator_name_or_path facebook/bart-large-cnn \
......
......@@ -216,7 +216,7 @@ library from source to profit from the most current additions during the communi
Simply run the following steps:
```
```bash
$ cd ~/
$ git clone https://github.com/huggingface/datasets.git
$ cd datasets
......
......@@ -21,7 +21,7 @@ To install locally:
In the root of the repo run:
```
```bash
conda create -n vqganclip python=3.8
conda activate vqganclip
git-lfs install
......@@ -30,7 +30,7 @@ pip install -r requirements.txt
```
### Generate new images
```
```python
from VQGAN_CLIP import VQGAN_CLIP
vqgan_clip = VQGAN_CLIP()
vqgan_clip.generate("a picture of a smiling woman")
......@@ -41,7 +41,7 @@ To get a test image, run
`git clone https://huggingface.co/datasets/erwann/vqgan-clip-pic test_images`
To edit:
```
```python
from VQGAN_CLIP import VQGAN_CLIP
vqgan_clip = VQGAN_CLIP()
......
......@@ -138,20 +138,20 @@ For bigger datasets, we recommend to train Wav2Vec2 locally instead of in a goog
First, you need to clone the `transformers` repo with:
```
```bash
$ git clone https://github.com/huggingface/transformers.git
```
Second, head over to the `examples/research_projects/wav2vec2` directory, where the `run_common_voice.py` script is located.
```
```bash
$ cd transformers/examples/research_projects/wav2vec2
```
Third, install the required packages. The
packages are listed in the `requirements.txt` file and can be installed with
```
```bash
$ pip install -r requirements.txt
```
......@@ -259,7 +259,7 @@ Then and add the following files that fully define a XLSR-Wav2Vec2 checkpoint in
- `pytorch_model.bin`
Having added the above files, you should run the following to push files to your model repository.
```
```bash
git add . && git commit -m "Add model files" && git push
```
......
......@@ -134,7 +134,7 @@ which helps with capping GPU memory usage.
To learn how to deploy Deepspeed Integration please refer to [this guide](https://huggingface.co/transformers/main/main_classes/deepspeed.html#deepspeed-trainer-integration).
But to get started quickly all you need is to install:
```
```bash
pip install deepspeed
```
and then use the default configuration files in this directory:
......@@ -148,7 +148,7 @@ Here are examples of how you can use DeepSpeed:
ZeRO-2:
```
```bash
PYTHONPATH=../../../src deepspeed --num_gpus 2 \
run_asr.py \
--output_dir=output_dir --num_train_epochs=2 --per_device_train_batch_size=2 \
......@@ -162,7 +162,7 @@ run_asr.py \
```
For ZeRO-2 with more than 1 gpu you need to use (which is already in the example configuration file):
```
```json
"zero_optimization": {
...
"find_unused_parameters": true,
......@@ -172,7 +172,7 @@ For ZeRO-2 with more than 1 gpu you need to use (which is already in the example
ZeRO-3:
```
```bash
PYTHONPATH=../../../src deepspeed --num_gpus 2 \
run_asr.py \
--output_dir=output_dir --num_train_epochs=2 --per_device_train_batch_size=2 \
......@@ -192,7 +192,7 @@ It is recommended to pre-train Wav2Vec2 with Trainer + Deepspeed (please refer t
Here is an example of how you can use DeepSpeed ZeRO-2 to pretrain a small Wav2Vec2 model:
```
```bash
PYTHONPATH=../../../src deepspeed --num_gpus 4 run_pretrain.py \
--output_dir="./wav2vec2-base-libri-100h" \
--num_train_epochs="3" \
......@@ -238,7 +238,7 @@ Output directory will contain 0000.txt and 0001.txt. Each file will have format
#### Run command
```
```bash
python alignment.py \
--model_name="arijitx/wav2vec2-xls-r-300m-bengali" \
--wav_dir="./wavs"
......
......@@ -21,7 +21,7 @@ classification performance to the original zero-shot model
A teacher NLI model can be distilled to a more efficient student model by running [`distill_classifier.py`](https://github.com/huggingface/transformers/blob/main/examples/research_projects/zero-shot-distillation/distill_classifier.py):
```
```bash
python distill_classifier.py \
--data_file <unlabeled_data.txt> \
--class_names_file <class_names.txt> \
......
......@@ -41,7 +41,7 @@ can also be used by passing the name of the TPU resource with the `--tpu` argume
This script trains a masked language model.
### Example command
```
```bash
python run_mlm.py \
--model_name_or_path distilbert-base-cased \
--output_dir output \
......@@ -50,7 +50,7 @@ python run_mlm.py \
```
When using a custom dataset, the validation file can be separately passed as an input argument. Otherwise some split (customizable) of training data is used as validation.
```
```bash
python run_mlm.py \
--model_name_or_path distilbert-base-cased \
--output_dir output \
......@@ -62,7 +62,7 @@ python run_mlm.py \
This script trains a causal language model.
### Example command
```
```bash
python run_clm.py \
--model_name_or_path distilgpt2 \
--output_dir output \
......@@ -72,7 +72,7 @@ python run_clm.py \
When using a custom dataset, the validation file can be separately passed as an input argument. Otherwise some split (customizable) of training data is used as validation.
```
```bash
python run_clm.py \
--model_name_or_path distilgpt2 \
--output_dir output \
......
......@@ -45,7 +45,7 @@ README, but for more information you can see the 'Input Datasets' section of
[this document](https://www.tensorflow.org/guide/tpu).
### Example command
```
```bash
python run_qa.py \
--model_name_or_path distilbert-base-cased \
--output_dir output \
......
......@@ -36,7 +36,7 @@ may not always be what you want, especially if you have more than two fields!
Here is a snippet of a valid input JSON file, though note that your texts can be much longer than these, and are not constrained
(despite the field name) to being single grammatical sentences:
```
```json
{"sentence1": "COVID-19 vaccine updates: How is the rollout proceeding?", "label": "news"}
{"sentence1": "Manchester United celebrates Europa League success", "label": "sports"}
```
......@@ -69,7 +69,7 @@ README, but for more information you can see the 'Input Datasets' section of
[this document](https://www.tensorflow.org/guide/tpu).
### Example command
```
```bash
python run_text_classification.py \
--model_name_or_path distilbert-base-cased \
--train_file training_data.json \
......@@ -101,7 +101,7 @@ README, but for more information you can see the 'Input Datasets' section of
[this document](https://www.tensorflow.org/guide/tpu).
### Example command
```
```bash
python run_glue.py \
--model_name_or_path distilbert-base-cased \
--task_name mnli \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment