- 30 Mar, 2019 1 commit
-
-
jeonsworld authored
If the value of rand_end is returned from the randint function, the value of sampled_doc_index that matches current_idx is returned from searchsorted. example: cumsum_max = {int64} 30 doc_cumsum = {ndarray} [ 5 7 11 19 30] doc_lengths = {list} <class 'list'>: [5, 2, 4, 8, 11] if current_idx = 1, rand_start = 7 rand_end = 35 sentence_index = randint(7, 35) % cumsum_max if randint return 35, sentence_index becomes 5. if sentence_index is 5, np.searchsorted returns 1 equal to current_index.
-
- 21 Mar, 2019 7 commits
-
-
Matthew Carrigan authored
order.
-
Matthew Carrigan authored
data on disc as a memmap rather than in memory
-
Matthew Carrigan authored
-
Matthew Carrigan authored
-
Matthew Carrigan authored
-
Matthew Carrigan authored
out on the fly without shuffling - the Sampler in the finetuning script will shuffle for us.
-
Matthew Carrigan authored
-
- 20 Mar, 2019 4 commits
-
-
Matthew Carrigan authored
-
Matthew Carrigan authored
-
Matthew Carrigan authored
-
Matthew Carrigan authored
-