Unverified Commit 204ebc25 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Update installation page and add contributing to the doc (#5084)

* Update installation page and add contributing to the doc

* Remove mention of symlinks
parent 043f9f51
...@@ -65,7 +65,8 @@ Awesome! Please provide the following information: ...@@ -65,7 +65,8 @@ Awesome! Please provide the following information:
If you are willing to contribute the model yourself, let us know so we can best If you are willing to contribute the model yourself, let us know so we can best
guide you. guide you.
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder. We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them
in the [`templates`](https://github.com/huggingface/transformers/templates) folder.
### Do you want a new feature (that is not a model)? ### Do you want a new feature (that is not a model)?
...@@ -86,7 +87,9 @@ A world-class feature request addresses the following points: ...@@ -86,7 +87,9 @@ A world-class feature request addresses the following points:
If your issue is well written we're already 80% of the way there by the time you If your issue is well written we're already 80% of the way there by the time you
post it. post it.
We have added **templates** to guide you in the process of adding a new example script for training or testing the models in the library. You can find them in the [`templates`](./templates) folder. We have added **templates** to guide you in the process of adding a new example script for training or testing the
models in the library. You can find them in the [`templates`](https://github.com/huggingface/transformers/templates)
folder.
## Start contributing! (Pull Requests) ## Start contributing! (Pull Requests)
...@@ -206,15 +209,21 @@ Follow these steps to start contributing: ...@@ -206,15 +209,21 @@ Follow these steps to start contributing:
to be merged; to be merged;
4. Make sure existing tests pass; 4. Make sure existing tests pass;
5. Add high-coverage tests. No quality testing = no merge. 5. Add high-coverage tests. No quality testing = no merge.
- If you are adding a new model, make sure that you use `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests. - If you are adding a new model, make sure that you use
- If you are adding new `@slow` tests, make sure they pass using `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`. `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
- If you are adding a new tokenizer, write tests, and make sure `RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes. - If you are adding new `@slow` tests, make sure they pass using
CircleCI does not run them. `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an example. - If you are adding a new tokenizer, write tests, and make sure
`RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes.
CircleCI does not run the slow tests.
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an
example.
### Tests ### Tests
You can run 🤗 Transformers tests with `unittest` or `pytest`. An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the
[examples folder](https://github.com/huggingface/transformers/tree/master/examples).
We like `pytest` and `pytest-xdist` because it's faster. From the root of the We like `pytest` and `pytest-xdist` because it's faster. From the root of the
repository, here's how to run tests with `pytest` for the library: repository, here's how to run tests with `pytest` for the library:
...@@ -261,7 +270,8 @@ $ python -m unittest discover -s examples -t examples -v ...@@ -261,7 +270,8 @@ $ python -m unittest discover -s examples -t examples -v
### Style guide ### Style guide
For documentation strings, `transformers` follows the [google For documentation strings, `transformers` follows the [google style](https://google.github.io/styleguide/pyguide.html).
style](https://google.github.io/styleguide/pyguide.html). Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification)
for more information.
#### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md) #### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md)
...@@ -42,20 +42,14 @@ pip install recommonmark ...@@ -42,20 +42,14 @@ pip install recommonmark
## Building the documentation ## Building the documentation
Make sure that there is a symlink from the `example` file (in /examples) inside the source folder. Run the following
command to generate it:
```bash
ln -s ../../examples/README.md examples.md
```
Once you have setup `sphinx`, you can build the documentation by running the following command in the `/docs` folder: Once you have setup `sphinx`, you can build the documentation by running the following command in the `/docs` folder:
```bash ```bash
make html make html
``` ```
A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your browser. A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your
browser.
--- ---
**NOTE** **NOTE**
...@@ -132,8 +126,8 @@ XXXConfig ...@@ -132,8 +126,8 @@ XXXConfig
:members: :members:
``` ```
This will include every public method of the configuration. If for some reason you wish for a method not to be displayed This will include every public method of the configuration. If for some reason you wish for a method not to be
in the documentation, you can do so by specifying which methods should be in the docs: displayed in the documentation, you can do so by specifying which methods should be in the docs:
``` ```
XXXTokenizer XXXTokenizer
...@@ -147,8 +141,8 @@ XXXTokenizer ...@@ -147,8 +141,8 @@ XXXTokenizer
### Writing source documentation ### Writing source documentation
Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as an object Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as
using the :obj: syntax: :obj:\`like so\`. an object using the :obj: syntax: :obj:\`like so\`.
When mentionning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically When mentionning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically
linked by Sphinx: :class:\`transformers.XXXClass\` linked by Sphinx: :class:\`transformers.XXXClass\`
......
../../CONTRIBUTING.md
\ No newline at end of file
...@@ -142,6 +142,7 @@ conversion utilities for the following models: ...@@ -142,6 +142,7 @@ conversion utilities for the following models:
converting_tensorflow_models converting_tensorflow_models
migration migration
torchscript torchscript
contributing
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
......
# Installation # Installation
Transformers is tested on Python 3.6+ and PyTorch 1.1.0 🤗 Transformers is tested on Python 3.6+, and PyTorch 1.1.0+ or TensorFlow 2.0+.
## With pip You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're
unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). Create a virtual environment with the version of Python you're going
to use and activate it.
PyTorch Transformers can be installed using pip as follows: Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you
must install it from source.
``` bash ## Installation with pip
First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available)
and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific
install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
```bash
pip install transformers pip install transformers
``` ```
## From source Alternatively, for CPU-support only, you can install 🤗 Transformers and PyTorch in one line with
```bash
pip install transformers[torch]
```
or 🤗 Transformers and TensorFlow 2.0 in one line with
```bash
pip install transformers[tf-cpu]
```
To check 🤗 Transformers is properly installed, run the following command:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))"
```
It should download a pretrained model then print something like
```bash
[{'label': 'NEGATIVE', 'score': 0.9991129040718079}]
```
(Note that TensorFlow will print additional stuff before that last statement.)
## Installing from source
To install from source, clone the repository and install with: To install from source, clone the repository and install with the following commands:
``` bash ``` bash
git clone https://github.com/huggingface/transformers.git git clone https://github.com/huggingface/transformers.git
cd transformers cd transformers
pip install . pip install -e .
```
Again, you can run
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))"
``` ```
to check 🤗 Transformers is properly installed.
## Caching models ## Caching models
This library provides pretrained models that will be downloaded and cached locally. Unless you specify a location with This library provides pretrained models that will be downloaded and cached locally. Unless you specify a location with
`cache_dir=...` when you use the `from_pretrained` method, these models will automatically be downloaded in the `cache_dir=...` when you use methods like `from_pretrained`, these models will automatically be downloaded in the
folder given by the shell environment variable ``TRANSFORMERS_CACHE``. The default value for it will be the PyTorch folder given by the shell environment variable ``TRANSFORMERS_CACHE``. The default value for it will be the PyTorch
cache home followed by ``/transformers/`` (even if you don't have PyTorch installed). This is (by order of priority): cache home followed by ``/transformers/`` (even if you don't have PyTorch installed). This is (by order of priority):
...@@ -38,32 +84,19 @@ So if you don't have any specific environment variable set, the cache directory ...@@ -38,32 +84,19 @@ So if you don't have any specific environment variable set, the cache directory
(``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell (``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell
enviromnent variable for ``TRANSFORMERS_CACHE``. enviromnent variable for ``TRANSFORMERS_CACHE``.
## Tests ### Note on model downloads (Continuous Integration or large-scale deployments)
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
Refer to the [contributing guide](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#tests) for details about running tests.
## OpenAI GPT original tokenization workflow
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install `ftfy` and `SpaCy`:
``` bash
pip install spacy ftfy==4.4.3
python -m spacy download en
```
If you don't install `ftfy` and `SpaCy`, the `OpenAI GPT` tokenizer will default to tokenize using BERT's `BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
## Note on model downloads (Continuous Integration or large-scale deployments)
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way faster, and cheaper. Feel free to contact us privately if you need any help. If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through
your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way
faster, and cheaper. Feel free to contact us privately if you need any help.
## Do you want to run a Transformer model on a mobile device? ## Do you want to run a Transformer model on a mobile device?
You should check out our [swift-coreml-transformers](https://github.com/huggingface/swift-coreml-transformers) repo. You should check out our [swift-coreml-transformers](https://github.com/huggingface/swift-coreml-transformers) repo.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`, `DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices. It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`,
`DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch to productizing them in CoreML, At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch or
or prototype a model or an app in CoreML then research its hyperparameters or architecture from PyTorch. Super exciting! TensorFlow 2.0 to productizing them in CoreML, or prototype a model or an app in CoreML then research its
hyperparameters or architecture from PyTorch or TensorFlow 2.0. Super exciting!
...@@ -38,6 +38,17 @@ Hugging Face showcasing the generative capabilities of several models. GPT is on ...@@ -38,6 +38,17 @@ Hugging Face showcasing the generative capabilities of several models. GPT is on
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`_. The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`_.
Note:
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install
``ftfy`` and ``SpaCy``::
pip install spacy ftfy==4.4.3
python -m spacy download en
If you don't install ``ftfy`` and ``SpaCy``, the :class:`transformers.OpenAIGPTTokenizer` will default to tokenize using
BERT's :obj:`BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't
worry).
OpenAIGPTConfig OpenAIGPTConfig
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment