You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @zatchwu I want to use your trained model to sample/score signal peptides.
The following is what I came up with by going through the provided notebooks and trying to get a more straightforward sequence -> model -> prediction worflow independent of the datasets you were using. It would be great to get some feedback whether what I'm doing here is correct.
def load_spgen_model():
# the weights were extracted from the .chkpt file with the same name
state_dict = torch.load('../../SPGen/remote_generation/signal_peptide/outputs/SIM99_550_12500_64_6_5_0.1_64_100_0.0001_-0.03_99_weightsonly.pt')
model = Models.Transformer(
27,
27,
107,
proj_share_weight=True,
embs_share_weight=True,
d_k=64,
d_v=64,
d_model=550,
d_word_vec=550,
d_inner_hid=1100,
n_layers=6,
n_head=5,
dropout=0.1)
model.load_state_dict(state_dict)
model.eval()
return model
Making predictions (logits) and scoring the perplexity. I encode the data as shown in step 1, and make prot_positions, sp_positions masks that are 0 at true positions and 1 at masked positions.
def get_perplexity_batch(transformer, src_seq, src_positions, tgt_seq, tgt_positions):
'''Adapted from Translator()._epoch().'''
ppls = []
loss_fn = torch.nn.CrossEntropyLoss()
pred = transformer((src_seq, src_positions), (tgt_seq, tgt_positions))
# process each seq in batch
for idx in range(len(src_seq)):
loss = loss_fn(pred[idx].view(-1, 27), tgt_seq[idx,1:].view(-1))
ppls.append(torch.exp(loss).item())
return ppls
def predict_spgen(model, loader):
with torch.no_grad():
ppl = []
for idx, batch in tqdm(enumerate(loader), total=len(loader)):
proteins, prot_positions, sps, sp_positions = batch
proteins, prot_positions, sps, sp_positions = proteins.to(device), prot_positions.to(device), sps.to(device), sp_positions.to(device)
aa_logits = model((proteins,prot_positions), (sps, sp_positions))
ppls = get_perplexity_batch(model, proteins, prot_positions, sps, sp_positions)
ppl.extend(ppls)
return np.array(ppl)
My code is running, but it is a bit hard to tell whether everything is in place or there's an error somewhere. Would be great to get some feedback - also open to any other way to make the model run on new data.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi @zatchwu I want to use your trained model to sample/score signal peptides.
The following is what I came up with by going through the provided notebooks and trying to get a more straightforward
sequence -> model -> prediction
worflow independent of the datasets you were using. It would be great to get some feedback whether what I'm doing here is correct.prot_positions
,sp_positions
masks that are0
at true positions and1
at masked positions.My code is running, but it is a bit hard to tell whether everything is in place or there's an error somewhere. Would be great to get some feedback - also open to any other way to make the model run on new data.
Thanks!
The text was updated successfully, but these errors were encountered: