"Uses Langkit toolkit to calculate text statistics for evaluating structural complexity and readability. Extracts multiple linguistic features including sentence length, lexical diversity, and sentiment.\n\n"
"Output Parameters:\n"
"- LangkitNumSentencesScore: Number of sentences\n"
"- LangkitNumWordsScore: Number of words\n"
"- LangkitAvgWordLengthScore: Average word length\n"
"Evaluates text redundancy by calculating n-gram repetition ratio. Measures text originality by comparing the ratio of unique n-grams to total n-grams.\n\n"
"Initialization Parameters:\n"
"- ngrams: Length of n-grams, default is 5\n\n"
"Output Parameters:\n"
"- NgramScore: N-gram repetition ratio score (0-1, higher = less repetition)"
"Detect personally identifiable information (PII) in text using the Microsoft Presidio model and return the count of detected PII entities. "
"Supports various entity types such as names, emails, phone numbers, etc., implemented based on the dslim/bert-base-NER model. Suitable for assessing text privacy and security risks.\n"
num_examples=len(list(data_loader))# not ideal but it's quicker in dev time, usually we won't feed the entire data set to task2vec so this should be fine
print(f'\nfinal loss {step=}{epoch=} of final layer loss {loss.item()} (note we are not recomputing loss after a step so this loss printed is larger than it should be/one off)')
num_examples=len(list(data_loader))# not idea but it's quicker in dev time, usually we won't feed the entire data set to task2vec so this should be fine
# - double checks the mean was computed corrects. Since it's symmetric the mean after removing diagonal should be equal to just one side of the diagonals
ifremove_diagonal:
# from uutils.torch_uu import approx_equal
# assert approx_equal(triu.sum(), tril.sum(), tolerance=1e-4), f'Distance matrix is not symmetric, are you sure this is correct?'
# assert approx_equal(distance_matrix.mean(), triu[triu != 0.0].mean(), tolerance=1e-4), f'Mean should be equal to triangular matrix'
print('Lower tri sum',tril.sum(),' / Upper tri sum',triu.sum(),'| These should be approx equal!!')
print('Total mean',distance_matrix.mean(),' / Upper mean',triu[triu!=0.0].mean(),' / Lower mean',tril[tril!=0.0].mean(),'| These should all be approx equal!!')
print('mu (div coefficient)',mu,' / Upper mean',triu[triu!=0.0].mean(),'| These should all be approx equal!!')