• Vincent QB's avatar
    Example pipeline with wav2letter (#632) · 9c274228
    Vincent QB authored
    * example pipeline, initial commit.
    
    * removing notebook conversion artifacts.
    
    * remove extra comments. lint.
    
    * addressing some feedback.
    
    * main function.
    
    * defining args in function.
    
    * refactor.
    
    * lint.
    
    * checkpoint.
    
    * clean version to start with.
    
    * adding more parameters.
    
    * lint.
    
    * cleaning full version.
    
    * check for not None.
    
    * cleaning.
    
    * back -l 160
    
    * black.
    
    * fix runtime error.
    
    * removing some print statements.
    
    * add help to command line. add progress bar option.
    
    * grouping librispeech-specific transform in subclass.
    
    * typo.
    
    * fix concatenation.
    
    * typo.
    
    * black. tqdm.
    
    * missing transpose.
    
    * renaming variables.
    
    * sum cer and wer
    
    * clip norm.
    
    * second signal handler removed.
    
    * cosmetic.
    
    * default to no checkpoint.
    
    * remove non_blocking.
    
    * adadelta works better than sgd.
    
    * anomaly detection.
    
    * moving dataset to separate file.
    
    * lint.
    
    * move to separate module: languagemodel, decoder, metric.
    
    * flush=True.
    
    * renaming decoder.
    
    * CTC Decoders.
    
    * flush=True.
    
    * pass length for viterbi decoder.
    
    * progress bar. relative path.
    
    * generalize transition matrix to n-gram. progress bar.
    
    * choice of decoder.
    
    * collate func.
    
    * remove signal handling.
    
    * adding distributed.
    
    * lint.
    
    * normalize w/r to length of dataset, and w/r to total number characters.
    
    * relative cer/wer.
    
    * clip grad parameter. momentum back but not yet used.
    
    * Switch to SGD.
    
    * choice of optimizer.
    
    * scheduler.
    
    * move to utils file.
    
    * metric log, and utils file.
    
    * rename metric_logger.
    
    * stderr and stdout. simpler metric logger.
    
    * replace by logging.
    
    * adding time measurement in metric logger.
    
    * fix duplicate name. remove tqdm. keep track of epoch instead and iteration instead.
    
    * rename main file. and add readme.
    
    * refactor distributed.
    
    * swap example and output in readme.
    
    * remove time from logger.
    
    * check non-empty tensor input.
    
    * typo in variable name and log update.
    
    * typo.
    
    * compute cer/wer in training too.
    
    * typo.
    
    * add back slurm signal capture to resubmit job.
    
    * update levinstein distance.
    
    * adding tests for levenstein distance.
    
    * record error rate during iteration.
    
    * metric logger using setitem.
    
    * moving signal break to end of loop and return loss so far.
    
    * typo.
    
    * add citation.
    
    * change default to best run.
    
    * adding other experiment with decoders.
    
    * remove other decoders than greedy.
    
    * Revert "remove other decoders than greedy."
    
    This reverts commit fb114372e89e317bf48d0b1f846c60bca8efe1ac.
    
    * changing name of folfder.
    
    * remove other decoders, and unused dataset class.
    
    * rename functions to align with other pipeline.
    
    * pick which parts to train with.
    
    * adding specaugment to validation. note that caching prevents randomization from happening in validation.
    
    * updating readme.
    
    * typo in metric logging.
    
    * Revert "typo in metric logging."
    
    This reverts commit acac245eec250f61d2039a67933d3c01f1975ce9.
    
    * Revert "Revert "typo in metric logging.""
    
    This reverts commit 2c80d9691ed401044da734c40df3715dba92d0db.
    
    * update metric logger.
    
    * simplify metric logger implementation.
    
    * use json dumps instead.
    
    * group metric together.
    
    * move function.
    
    * lint.
    
    * quick summary of files in folder.
    
    * pass clip_grad explictly.
    
    * typo in default dataset name.
    
    * option to disable logger.
    
    * ergonomics for distributed.
    
    * reminder about signal handler.
    
    * minor refactor of main in main.
    
    * replace by not_main_rank.
    
    * raising error if parameter not supported.
    
    * move model before invoking DDP.
    
    * changing log level. using python 2 style string for logging.
    
    * dynamic augmentations.
    
    * update metric log.
    
    batch cer/wer metric. correct typo in time. adding other dimensions in metric.
    
    * save learning rate even if function not available.
    
    * add type option to model.
    
    * add adamw.
    
    * reduce lr on validation step or training step.
    
    * specify hop-length and win-length.
    
    * normalize option.
    
    * rename parameter.
    
    * add dropout and tweak to number of channels.
    
    * copy model in pipeline folder for experimentation.
    
    * fix scheduler stepping.
    
    * fix input_type and num_features.
    
    * waveform mode changes shape more.
    
    * adding best character error rate with current implementation of model with mfcc.
    
    * comment update.
    
    * remove signal. remove custom wav2letter model.
    
    * remove comment.
    
    * simpler import with pandas.
    9c274228
transforms.py 274 Bytes