Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems in Sentence Matching Task #320

Open
zhzou2020 opened this issue Aug 10, 2016 · 10 comments
Open

Problems in Sentence Matching Task #320

zhzou2020 opened this issue Aug 10, 2016 · 10 comments

Comments

@zhzou2020
Copy link

zhzou2020 commented Aug 10, 2016

I want to use LSTM as the encoder of sentences and then calculate their similarity based on it.
But when I train the model, it seems that the parameters of it dont change at all.
I've also tried other models before, and it turns out that they work properly.
So I wonder if there's something wrong in my implementation of this model.

My model is as follows,

function ModelBuilder:make_net(w2v)
  require 'rnn'

  if opt.cudnn == 1 then
    require 'cudnn'
    require 'cunn'
  end

  local lookup = nn.LookupTable(opt.vocab_size, opt.vec_size) -- batch_size * seq_len
  lookup.weight:uniform(-0.25, 0.25)
  lookup.weight[1]:zero()

  rnn = nn.Sequential()
  rnn:add(lookup)

  input_size = opt.vec_size
  lstm_hidden_sizes = loadstring(" return " .. opt.lstm_hidden_sizes)()
  for i, lstm_hidden_size in ipairs(lstm_hidden_sizes) do
    local r = nn.SeqLSTM(input_size, lstm_hidden_size)
    r.maskzero = true
    r.batchfirst = true
    rnn:add(r)
    input_size = lstm_hidden_size
  end

  rnn:add(nn.Select(2, -1)) -- batch_size * lstm_hidden_size

  siamese_encoder = nn.ParallelTable()
  siamese_encoder:add(rnn)
  siamese_encoder:add(rnn:clone('weight', 'bias', 'gradWeight', 'gradBias'))

  model = nn.Sequential()
  model:add(siamese_encoder)
  model:add(nn.JoinTable(1, 1))
  model:add(nn.Dropout(opt.dropout_p))
  model:add(nn.Linear(lstm_hidden_sizes[#lstm_hidden_sizes] * 2, opt.hidden_size))
  model:add(nn.Dropout(opt.dropout_p))
  model:add(nn.Linear(opt.hidden_size, 2))
  model:add(nn.LogSoftMax())

  if opt.cudnn == 1 then
    cudnn.convert(model, cudnn)
  end

  return model
end
@zhzou2020
Copy link
Author

zhzou2020 commented Aug 10, 2016

@nicholas-leonard Could you please help me find out the bugs in my code? Thanks!

@JoostvDoorn
Copy link
Contributor

From the information you provide nothing seems to be wrong per se, though the example is not complete. So if you do need help first isolate your problem in the least amount of code as possible with (fake) data, and provide us with a working example that does not work for you. Or adapt one of the examples to use your data, and see if you still experience your issue.

@zhuang-li
Copy link

Hi. Have you solved this problem? I am trying to do the sentence similarity too and getting the same issue, except that here you use the "clone" to create the second encoder I just use the same lstm. The results are absurdly bad. Basically its accuracy never get promoted and its F-measure can only achieve 0.20~0.25.

@zhzou2020
Copy link
Author

My model converges, but it still cannot get a good performance. I assume that this model overfits and I am training this model with a larger dataset instead now.

@zhuang-li
Copy link

Yes I got the exactly same issue! The model converges but the result is pretty bad. But the model ,in fact, is a common baseline, I don't believe it will get such a bad performance. I don't know how large your dataset is, I am using the dataset "http://alt.qcri.org/semeval2015/task1/" which contains 13000 training instances. But no matter I use 100, 8000 or 13000 the performance is still the similar bad.

@zhzou2020
Copy link
Author

Maybe there's something wrong with the implementation of SeqLSTM, I'll try it with theano later on.

@zhuang-li
Copy link

zhuang-li commented Aug 17, 2016

Probably, but I implemented the LSTM myself before. Got the same problem. Then I switched to this module and haven't got any improvement. I am currently very confused. Maybe the problem is the model itself or just the way I code it.

@JoostvDoorn
Copy link
Contributor

It is probably something other than the SeqLSTM implementation, but you could try the cudnn implementation if you are unsure. It would be very helpful if you could give us something that we can run though. @deathlee you should definitely clone otherwise the gradients are not stored.

@JoostvDoorn
Copy link
Contributor

JoostvDoorn commented Aug 17, 2016

Are you following Mueller et al.? You should probably use CSubTable instead of JoinTable.

@zhuang-li
Copy link

Hi. I saw the Mueller et al. use the "tied weight" lstm. So basically I just run the same lstm back and forth twice for the left and right sequence. If we clone here, would they not share weights but two separate lstm?
And I also tried to use these two functions(copying from the encoder-decoder example)

function LSTMSim:forwardConnect(llstm, rlstm)
    rlstm.layer.userPrevOutput = llstm.layer.output[self.seq_length]
    rlstm.layer.userPrevCell = llstm.layer.cell[self.seq_length]
end

function LSTMSim:backwardConnect(llstm, rlstm)
    llstm.layer.gradPrevOutput = rlstm.layer.userGradPrevOutput
    llstm.layer.userNextGradCell = rlstm.layer.userGradPrevCell
end

to perserve the state and gradient, which, in here, I believe it just passes the previous state and gradient in the same lstm from head to tail because I use the same lstm for two sequences.

I also tried to create two separate lstms. Didn't work either.

I'd really like to offer something to run if you are not bothered. I am currently doing some comments.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants