Update run_xnli.py to use Datasets library (#9829)

* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric * fix * fix * fix * push * fix * everything works * fix init * fix * special treatment for sepconv1d * style * 🙏🏽 * add doc and cleanup * fix doc * fix doc again * fix doc again * Apply suggestions from code review * make style * Proposal that should work * Remove needless code * Fix test * Apply suggestions from code review * remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric * amend README * removed data_args.task_name and replaced with task_name = "xnli"; use split function to load train and validation dataset separately; remove __post_init__; remove flag --task_name from README. * removed dict task_to_keys, use str "xnli" instead of variable task_name, change preprocess_function to use examples["premise"], examples["hypothesis"] directly, remove sentence1_key and sentence2_key, change compute_metrics function to cater only to accuracy metric, add condition for train_langauge is None when using dataset.load_dataset() * removed `torch.distributed.barrier()` and `import torch` as `from_pretrained` is able to do the work; amend README

Update run_xnli.py to use Datasets library (#9829)
* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric * fix * fix * fix * push * fix * everything works * fix init * fix * special treatment for sepconv1d * style * 🙏🏽 * add doc and cleanup * fix doc * fix doc again * fix doc again * Apply suggestions from code review * make style * Proposal that should work * Remove needless code * Fix test * Apply suggestions from code review * remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric * amend README * removed data_args.task_name and replaced with task_name = "xnli"; use split function to load train and validation dataset separately; remove __post_init__; remove flag --task_name from README. * removed dict task_to_keys, use str "xnli" instead of variable task_name, change preprocess_function to use examples["premise"], examples["hypothesis"] directly, remove sentence1_key and sentence2_key, change compute_metrics function to cater only to accuracy metric, add condition for train_langauge is None when using dataset.load_dataset() * removed `torch.distributed.barrier()` and `import torch` as `from_pretrained` is able to do the work; amend README
8dcfaea0 · Qbiwan · GitHub · 77b86284 · 8dcfaea0 · 8dcfaea0
Unverified Commit 8dcfaea0 authored Feb 11, 2021 by Qbiwan Committed by GitHub Feb 11, 2021
Showing with 225 additions and 536 deletions

examples/text-classification/README.md examples/text-classification/README.md +1 -9

examples/text-classification/run_xnli.py examples/text-classification/run_xnli.py +224 -527

No files found.
--- a/examples/text-classification/README.md
+++ b/examples/text-classification/README.md
@@ -143,23 +143,15 @@ Based on the script [`run_xnli.py`](https://github.com/huggingface/transformers/

 #### Fine-tuning on XNLI

-This example code fine-tunes mBERT (multi-lingual BERT) on the XNLI dataset. It runs in 106 mins
-on a single tesla V100 16GB. The data for XNLI can be downloaded with the following links and should be both saved (and un-zipped) in a
-`$XNLI_DIR` directory.
-
-* [XNLI 1.0](https://cims.nyu.edu/~sbowman/xnli/XNLI-1.0.zip)
-* [XNLI-MT 1.0](https://dl.fbaipublicfiles.com/XNLI/XNLI-MT-1.0.zip)
+This example code fine-tunes mBERT (multi-lingual BERT) on the XNLI dataset. It runs in 106 mins on a single tesla V100 16GB.

 ```bash
-export XNLI_DIR=/path/to/XNLI
-
 python run_xnli.py \
  --model_name_or_path bert-base-multilingual-cased \
  --language de \
  --train_language en \
  --do_train \
  --do_eval \
-  --data_dir $XNLI_DIR \
  --per_device_train_batch_size 32 \
  --learning_rate 5e-5 \
  --num_train_epochs 2.0 \

--- a/examples/text-classification/run_xnli.py
+++ b/examples/text-classification/run_xnli.py