length RNA #10

mtinti · 2022-04-11T08:18:59Z

Hi,
I'm getting an error when I try to predict RNA sequences longer than 600 bases:

Here is the error when I input sequences of 700 bases:

Welcome using UFold prediction tool!!!
Traceback (most recent call last):
File "/cluster/majf_lab/mtinti/UFold/ufold_predict.py", line 328, in
main()
File "/cluster/majf_lab/mtinti/UFold/ufold_predict.py", line 302, in main
test_data = RNASSDataGenerator_input('data/', 'input')
File "/cluster/majf_lab/mtinti/UFold/ufold/data_generator.py", line 217, in init
self.load_data()
File "/cluster/majf_lab/mtinti/UFold/ufold/data_generator.py", line 229, in load_data
self.data_x = np.array([self.one_hot_600(item) for item in self.seq])
File "/cluster/majf_lab/mtinti/UFold/ufold/data_generator.py", line 229, in
self.data_x = np.array([self.one_hot_600(item) for item in self.seq])
File "/cluster/majf_lab/mtinti/UFold/ufold/data_generator.py", line 244, in one_hot_600
one_hot_matrix_600[:len(seq_item),] = feat
ValueError: could not broadcast input array from shape (700,4) into shape (600,4)

Is this expected? I thought I could go up to 1600bp...

Cheers
Michele

sperfu · 2022-04-11T08:50:02Z

Hi Michele,

Thanks for reaching out. UFold could go up to 1600bp. But as the sequence gets too long, it will inevitably cost a lot memory usage and time to calculate for the final result during our training and testing process, it may also cause severe out-of-memory issue especially for our backend server. So to keep our backend from crashing down. We have deliberately limit the sequence length to 600bp to achieve the best efficiency and accuracy. Please understand that.

Nevertheless, we have also add one comment line in the data_generator.py file (line 244) as shown here:

UFold/ufold/data_generator.py

Line 244 in 174437f

# one_hot_matrix_600 = np.zeros((len(seq_item),4))

you may replace this line with 243 line to get the whole sequence length feature. But as I mentioned earlier, it may result in high calculation cost. So we still recommended the users to predict the sequence better within 900~1000nt(best is within 600bp), you may cut the sequence to multiple short ones for prediction.

Thanks

mtinti · 2022-04-11T10:51:41Z

Thanks for the speedy response!
I'll try your suggestions.

cheers
Michele

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

length RNA #10

length RNA #10

mtinti commented Apr 11, 2022

sperfu commented Apr 11, 2022

mtinti commented Apr 11, 2022

length RNA #10

length RNA #10

Comments

mtinti commented Apr 11, 2022

sperfu commented Apr 11, 2022

mtinti commented Apr 11, 2022