Problems in Sentence Matching Task #320

zhzou2020 · 2016-08-10T07:01:16Z

I want to use LSTM as the encoder of sentences and then calculate their similarity based on it.
But when I train the model, it seems that the parameters of it dont change at all.
I've also tried other models before, and it turns out that they work properly.
So I wonder if there's something wrong in my implementation of this model.

My model is as follows,

function ModelBuilder:make_net(w2v)
  require 'rnn'

  if opt.cudnn == 1 then
    require 'cudnn'
    require 'cunn'
  end

  local lookup = nn.LookupTable(opt.vocab_size, opt.vec_size) -- batch_size * seq_len
  lookup.weight:uniform(-0.25, 0.25)
  lookup.weight[1]:zero()

  rnn = nn.Sequential()
  rnn:add(lookup)

  input_size = opt.vec_size
  lstm_hidden_sizes = loadstring(" return " .. opt.lstm_hidden_sizes)()
  for i, lstm_hidden_size in ipairs(lstm_hidden_sizes) do
    local r = nn.SeqLSTM(input_size, lstm_hidden_size)
    r.maskzero = true
    r.batchfirst = true
    rnn:add(r)
    input_size = lstm_hidden_size
  end

  rnn:add(nn.Select(2, -1)) -- batch_size * lstm_hidden_size

  siamese_encoder = nn.ParallelTable()
  siamese_encoder:add(rnn)
  siamese_encoder:add(rnn:clone('weight', 'bias', 'gradWeight', 'gradBias'))

  model = nn.Sequential()
  model:add(siamese_encoder)
  model:add(nn.JoinTable(1, 1))
  model:add(nn.Dropout(opt.dropout_p))
  model:add(nn.Linear(lstm_hidden_sizes[#lstm_hidden_sizes] * 2, opt.hidden_size))
  model:add(nn.Dropout(opt.dropout_p))
  model:add(nn.Linear(opt.hidden_size, 2))
  model:add(nn.LogSoftMax())

  if opt.cudnn == 1 then
    cudnn.convert(model, cudnn)
  end

  return model
end

zhzou2020 · 2016-08-10T07:29:04Z

@nicholas-leonard Could you please help me find out the bugs in my code? Thanks!

JoostvDoorn · 2016-08-10T07:30:56Z

From the information you provide nothing seems to be wrong per se, though the example is not complete. So if you do need help first isolate your problem in the least amount of code as possible with (fake) data, and provide us with a working example that does not work for you. Or adapt one of the examples to use your data, and see if you still experience your issue.

zhuang-li · 2016-08-17T10:58:17Z

Hi. Have you solved this problem? I am trying to do the sentence similarity too and getting the same issue, except that here you use the "clone" to create the second encoder I just use the same lstm. The results are absurdly bad. Basically its accuracy never get promoted and its F-measure can only achieve 0.20~0.25.

zhzou2020 · 2016-08-17T11:10:27Z

My model converges, but it still cannot get a good performance. I assume that this model overfits and I am training this model with a larger dataset instead now.

zhuang-li · 2016-08-17T11:17:21Z

Yes I got the exactly same issue! The model converges but the result is pretty bad. But the model ,in fact, is a common baseline, I don't believe it will get such a bad performance. I don't know how large your dataset is, I am using the dataset "http://alt.qcri.org/semeval2015/task1/" which contains 13000 training instances. But no matter I use 100, 8000 or 13000 the performance is still the similar bad.

zhzou2020 · 2016-08-17T11:49:53Z

Maybe there's something wrong with the implementation of SeqLSTM, I'll try it with theano later on.

zhuang-li · 2016-08-17T12:24:01Z

Probably, but I implemented the LSTM myself before. Got the same problem. Then I switched to this module and haven't got any improvement. I am currently very confused. Maybe the problem is the model itself or just the way I code it.

JoostvDoorn · 2016-08-17T12:29:19Z

It is probably something other than the SeqLSTM implementation, but you could try the cudnn implementation if you are unsure. It would be very helpful if you could give us something that we can run though. @deathlee you should definitely clone otherwise the gradients are not stored.

JoostvDoorn · 2016-08-17T12:42:12Z

Are you following Mueller et al.? You should probably use CSubTable instead of JoinTable.

zhuang-li · 2016-08-17T13:35:02Z

Hi. I saw the Mueller et al. use the "tied weight" lstm. So basically I just run the same lstm back and forth twice for the left and right sequence. If we clone here, would they not share weights but two separate lstm?
And I also tried to use these two functions(copying from the encoder-decoder example)

function LSTMSim:forwardConnect(llstm, rlstm)
    rlstm.layer.userPrevOutput = llstm.layer.output[self.seq_length]
    rlstm.layer.userPrevCell = llstm.layer.cell[self.seq_length]
end

function LSTMSim:backwardConnect(llstm, rlstm)
    llstm.layer.gradPrevOutput = rlstm.layer.userGradPrevOutput
    llstm.layer.userNextGradCell = rlstm.layer.userGradPrevCell
end

to perserve the state and gradient, which, in here, I believe it just passes the previous state and gradient in the same lstm from head to tail because I use the same lstm for two sequences.

I also tried to create two separate lstms. Didn't work either.

I'd really like to offer something to run if you are not bothered. I am currently doing some comments.
Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems in Sentence Matching Task #320

Problems in Sentence Matching Task #320

zhzou2020 commented Aug 10, 2016 •

edited by nicholas-leonard

Loading

zhzou2020 commented Aug 10, 2016 •

edited

Loading

JoostvDoorn commented Aug 10, 2016

zhuang-li commented Aug 17, 2016

zhzou2020 commented Aug 17, 2016

zhuang-li commented Aug 17, 2016

zhzou2020 commented Aug 17, 2016

zhuang-li commented Aug 17, 2016 •

edited

Loading

JoostvDoorn commented Aug 17, 2016

JoostvDoorn commented Aug 17, 2016 •

edited

Loading

zhuang-li commented Aug 17, 2016

Problems in Sentence Matching Task #320

Problems in Sentence Matching Task #320

Comments

zhzou2020 commented Aug 10, 2016 • edited by nicholas-leonard Loading

zhzou2020 commented Aug 10, 2016 • edited Loading

JoostvDoorn commented Aug 10, 2016

zhuang-li commented Aug 17, 2016

zhzou2020 commented Aug 17, 2016

zhuang-li commented Aug 17, 2016

zhzou2020 commented Aug 17, 2016

zhuang-li commented Aug 17, 2016 • edited Loading

JoostvDoorn commented Aug 17, 2016

JoostvDoorn commented Aug 17, 2016 • edited Loading

zhuang-li commented Aug 17, 2016

zhzou2020 commented Aug 10, 2016 •

edited by nicholas-leonard

Loading

zhzou2020 commented Aug 10, 2016 •

edited

Loading

zhuang-li commented Aug 17, 2016 •

edited

Loading

JoostvDoorn commented Aug 17, 2016 •

edited

Loading