1. 06 Apr, 2023 1 commit
    • Nicolas Patry's avatar
      Adding Llama FastTokenizer support. (#22264) · 1670be4b
      Nicolas Patry authored
      * Adding Llama FastTokenizer support.
      
      - Requires https://github.com/huggingface/tokenizers/pull/1183 version
      - Only support byte_fallback for llama, raise otherwise (safety net).
      - Lots of questions are special tokens
      
      How to test:
      
      ```python
      
      from transformers.convert_slow_tokenizer import convert_slow_tokenizer
      from transformers import AutoTokenizer
      from tokenizers import Tokenizer
      
      tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")
      
      if False:
          new_tokenizer = Tokenizer.from_file("tok.json")
      else:
          new_tokenizer = convert_slow_tokenizer(tokenizer)
          new_tokenizer.save("tok.json")
      
      strings = [
          "This is a test",
          "生活的真谛是",
          "生活的真谛是[MASK]。",
          # XXX: This one is problematic because of special tokens
          # "<s> Something something",
      ]
      
      for string in strings:
          encoded = tokenizer(string)["input_ids"]
          encoded2 = new_tokenizer.encode(string).ids
      
          assert encoded == encoded2, f"{encoded} != {encoded2}"
      
          decoded = tokenizer.decode(encoded)
          decoded2 = new_tokenizer.decode(encoded2)
      
          assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
      ```
      
      The converter + some test script.
      
      The test script.
      
      Tmp save.
      
      Adding Fast tokenizer + tests.
      
      Adding the tokenization tests.
      
      Correct combination.
      
      Small fix.
      
      Fixing tests.
      
      Fixing with latest update.
      
      Rebased.
      
      fix copies + normalized added tokens  + copies.
      
      Adding doc.
      
      TMP.
      
      Doc + split files.
      
      Doc.
      
      Versions + try import.
      
      Fix Camembert + warnings -> Error.
      
      Fix by ArthurZucker.
      
      Not a decorator.
      
      * Fixing comments.
      
      * Adding more to docstring.
      
      * Doc rewriting.
      1670be4b
  2. 29 Mar, 2023 1 commit
  3. 23 Mar, 2023 1 commit
  4. 02 Mar, 2023 1 commit
  5. 28 Feb, 2023 2 commits
    • Sylvain Gugger's avatar
      Fix flaky test for log level (#21776) · b29e2dca
      Sylvain Gugger authored
      * Fix flaky test for log level
      
      * Fix other flaky test
      b29e2dca
    • Yih-Dar's avatar
      🔥Rework pipeline testing by removing `PipelineTestCaseMeta` 🚀 (#21516) · 871c31a6
      Yih-Dar authored
      
      
      * Add PipelineTesterMixin
      
      * remove class PipelineTestCaseMeta
      
      * move validate_test_components
      
      * Add for ViT
      
      * Add to SPECIAL_MODULE_TO_TEST_MAP
      
      * style and quality
      
      * Add feature-extraction
      
      * update
      
      * raise instead of skip
      
      * add tiny_model_summary.json
      
      * more explicit
      
      * skip tasks not in mapping
      
      * add availability check
      
      * Add Copyright
      
      * A way to diable irrelevant tests
      
      * update with main
      
      * remove disable_irrelevant_tests
      
      * skip tests
      
      * better skip message
      
      * better skip message
      
      * Add all pipeline task tests
      
      * revert
      
      * Import PipelineTesterMixin
      
      * subclass test classes with PipelineTesterMixin
      
      * Add pipieline_model_mapping
      
      * Fix import after adding pipieline_model_mapping
      
      * Fix style and quality after adding pipieline_model_mapping
      
      * Fix one more import after adding pipieline_model_mapping
      
      * Fix style and quality after adding pipieline_model_mapping
      
      * Fix test issues
      
      * Fix import requirements
      
      * Fix mapping for MobileViTModelTest
      
      * Update
      
      * Better skip message
      
      * pipieline_model_mapping could not be None
      
      * Remove some PipelineTesterMixin
      
      * Fix typo
      
      * revert tests_fetcher.py
      
      * update
      
      * rename
      
      * revert
      
      * Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests
      
      * style and quality
      
      * test fetcher for all pipeline/model tests
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      871c31a6
  6. 27 Feb, 2023 1 commit
  7. 22 Feb, 2023 1 commit
  8. 06 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Update quality tooling for formatting (#21480) · 6f79d264
      Sylvain Gugger authored
      * Result of black 23.1
      
      * Update target to Python 3.7
      
      * Switch flake8 to ruff
      
      * Configure isort
      
      * Configure isort
      
      * Apply isort with line limit
      
      * Put the right black version
      
      * adapt black in check copies
      
      * Fix copies
      6f79d264
  9. 02 Feb, 2023 1 commit
  10. 26 Jan, 2023 1 commit
  11. 17 Jan, 2023 2 commits
  12. 30 Nov, 2022 1 commit
  13. 28 Nov, 2022 2 commits
    • amyeroberts's avatar
      321ef388
    • Matt's avatar
      More TF int dtype fixes (#20384) · de4159a3
      Matt authored
      * Add a test to ensure int dummy inputs are int64
      
      * Move the test into the existing int64 test and update a lot of existing dummies
      
      * Fix remaining dummies
      
      * Fix remaining dummies
      
      * Test for int64 serving sigs as well
      
      * Update core tests to use tf.int64
      
      * Add better messages to the assertions
      
      * Update all serving sigs to int64
      
      * More sneaky hiding tf.int32s
      
      * Add an optional int32 signature in save_pretrained
      
      * make fixup
      
      * Add Amy's suggestions
      
      * Switch all serving sigs back to tf.int32
      
      * Switch all dummies to tf.int32
      
      * Adjust tests to check for tf.int32 instead of tf.int64
      
      * Fix base dummy_inputs dtype
      
      * Start casting to tf.int32 in input_processing
      
      * Change dtype for unpack_inputs test
      
      * Add proper tf.int32 test
      
      * Make the alternate serving signature int64
      de4159a3
  14. 23 Nov, 2022 1 commit
  15. 21 Nov, 2022 1 commit
  16. 02 Nov, 2022 1 commit
    • amyeroberts's avatar
      Add Image Processors (#19796) · a6b77598
      amyeroberts authored
      
      
      * Add CLIP image processor
      
      * Crop size as dict too
      
      * Update warning
      
      * Actually use logger this time
      
      * Normalize doesn't change dtype of input
      
      * Add perceiver image processor
      
      * Tidy up
      
      * Add DPT image processor
      
      * Add Vilt image processor
      
      * Tidy up
      
      * Add poolformer image processor
      
      * Tidy up
      
      * Add LayoutLM v2 and v3 imsge processors
      
      * Tidy up
      
      * Add Flava image processor
      
      * Tidy up
      
      * Add deit image processor
      
      * Tidy up
      
      * Add ConvNext image processor
      
      * Tidy up
      
      * Add levit image processor
      
      * Add segformer image processor
      
      * Add in post processing
      
      * Fix up
      
      * Add ImageGPT image processor
      
      * Fixup
      
      * Add mobilevit image processor
      
      * Tidy up
      
      * Add postprocessing
      
      * Fixup
      
      * Add VideoMAE image processor
      
      * Tidy up
      
      * Add ImageGPT image processor
      
      * Fixup
      
      * Add ViT image processor
      
      * Tidy up
      
      * Add beit image processor
      
      * Add mobilevit image processor
      
      * Tidy up
      
      * Add postprocessing
      
      * Fixup
      
      * Fix up
      
      * Fix flava and remove tree module
      
      * Fix image classification pipeline failing tests
      
      * Update feature extractor in trainer scripts
      
      * Update pad_if_smaller to accept tuple and int size
      
      * Update for image segmentation pipeline
      
      * Update src/transformers/models/perceiver/image_processing_perceiver.py
      Co-authored-by: default avatarAlara Dirik <8944735+alaradirik@users.noreply.github.com>
      
      * Update src/transformers/image_processing_utils.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/beit/image_processing_beit.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * PR comments - docstrings; remove accidentally added resize; var names
      
      * Update docstrings
      
      * Add exception if size is not in the right format
      
      * Fix exception check
      
      * Fix up
      
      * Use shortest_edge in tuple in script
      Co-authored-by: default avatarAlara Dirik <8944735+alaradirik@users.noreply.github.com>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      a6b77598
  17. 24 Oct, 2022 1 commit
  18. 18 Oct, 2022 1 commit
  19. 17 Oct, 2022 1 commit
  20. 12 Oct, 2022 1 commit
  21. 27 Sep, 2022 2 commits
  22. 26 Sep, 2022 1 commit
  23. 21 Sep, 2022 1 commit
  24. 02 Sep, 2022 1 commit
  25. 31 Aug, 2022 1 commit
  26. 17 Aug, 2022 1 commit
    • amyeroberts's avatar
      Update feature extractor methods to enable type cast before normalize (#18499) · 49e44b21
      amyeroberts authored
      * Update methods to optionally rescale
      This is necessary to allow for casting our images / videos to numpy arrays within the feature extractors' call. We want to do this to make sure the behaviour is as expected when flags like  are False. If some transformations aren't applied, then the output type can't be unexpected e.g. a list of PIL images instead of numpy arrays.
      
      * Cast images to numpy arrays in call to enable consistent behaviour with different configs
      
      * Remove accidental clip changes
      
      * Update tests to reflect the scaling logic
      We write a generic  function to handle rescaling of our arrays. In order for the API to be intuitive, we take some factor c and rescale the image values by that. This means, the rescaling done in normalize and to_numpy_array are now done with array * (1/255) instead of array / 255. This leads to small differences in the resulting image. When testing, this was in the order of 1e-8, and so deemed OK
      49e44b21
  27. 08 Aug, 2022 1 commit
  28. 22 Jul, 2022 1 commit
    • amyeroberts's avatar
      Update serving code to enable `saved_model=True` (#18153) · 8e838466
      amyeroberts authored
      
      
      * Add serving_output and serving methods to some vision models
      
      * Add serving outputs for DeiT
      
      * Don't convert hidden states - differing shapes
      
      * Make saveable
      
      * Fix up
      
      * Make swin saveable
      
      * Add in tests
      
      * Fix funnel tests (can't convert to tensor)
      
      * Fix numpy call
      
      * Tidy up a bit
      
      * Add in hidden states - resnet
      
      * Remove numpy
      
      * Fix failing tests - tensor shape and skipping tests
      
      * Remove duplicated function
      
      * PR comments - formatting and var names
      
      * PR comments
      Add suggestions made by Joao Gante:
      * Use tf.shape instead of shape_list
      * Use @tooslow decorator on tests
      * Simplify some of the logic
      
      * PR comments
      Address Yih-Dar Sheih comments - making tensor names consistent and make types float
      
      * Types consistent with docs; disable test on swin (slow)
      
      * CI trigger
      
      * Change input_features to float32
      
      * Add serving_output for segformer
      
      * Fixup
      Co-authored-by: default avatarAmy Roberts <amyeroberts@users.noreply.github.com>
      8e838466
  29. 19 Jul, 2022 1 commit
  30. 13 Jul, 2022 1 commit
  31. 12 Jul, 2022 1 commit
  32. 06 Jul, 2022 1 commit
  33. 01 Jul, 2022 1 commit
    • Matt's avatar
      XLA train step fixes (#17973) · d6cec458
      Matt authored
      * Copy inputs to train and test step before modifying them, as this breaks things
      
      * Add XLA tests, fix our loss functions to be XLA-compatible
      
      * make fixup
      
      * Update loss computation test to expect vector of per-sample losses
      
      * Patch loss for TFLED
      
      * Patch loss for TFAlbert
      
      * Add a tf_legacy_loss config flag that enables old loss functions
      
      * Stop using config.get() because it's not a dict
      
      * Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it
      
      * make fixup
      
      * Add XLA-compatible RAG loss
      
      * Fix dtype of loss mask for TFAlbert
      
      * Fix test for XLNet too because it overrides the default one
      
      * make fixup
      
      * Fix config test
      
      * No more depending on GPU NaN behaviour
      
      * Add test, avoid potential zero division
      
      * Fix test item assignment
      
      * Fix loss computation masking test
      
      * make fixup
      
      * Fix dtype bugs
      d6cec458
  34. 10 Jun, 2022 2 commits
  35. 01 Jun, 2022 1 commit