You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Code in data/dpr_wikitext103_1024/test_retrieve.py
def search_one_job(worker_id):
# encode the test prefix
# with open(f'../{args["dataset"]}/new_test.txt') as f:
with open(f'../{args["dataset"]}/test.txt') as f:
datasets = [line.strip() for line in tqdm(f.readlines())]
test_set = []
for line in datasets:
words = nltk.word_tokenize(line)
if len(words) >= 32:
# prefix = clean_data(words[:32])
prefix = clean_data(words) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# reference = clean_data(words[32:32+128])
reference = clean_data(words)
test_set.append((prefix, reference))
print(f'[!] collect {len(test_set)} samples from the test set')
I think the prefix should not the whole,
becase “actual generation, behind is unknow”
The text was updated successfully, but these errors were encountered:
Code in
data/dpr_wikitext103_1024/test_retrieve.py
I think the
prefix
should not the whole,becase “actual generation, behind is unknow”
The text was updated successfully, but these errors were encountered: