• jeonsworld's avatar
    Update pregenerate_training_data.py · 60005f46
    jeonsworld authored
    If the value of rand_end is returned from the randint function, the value of sampled_doc_index that matches current_idx is returned from searchsorted.
    
    example:
    cumsum_max = {int64} 30
    doc_cumsum = {ndarray} [ 5  7 11 19 30]
    doc_lengths = {list} <class 'list'>: [5, 2, 4, 8, 11]
    if current_idx  = 1,
    rand_start = 7
    rand_end = 35
    sentence_index = randint(7, 35) % cumsum_max
    if randint return 35, sentence_index becomes 5.
    if sentence_index is 5, np.searchsorted returns 1 equal to current_index.
    60005f46
pregenerate_training_data.py 12.7 KB