Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple object recognition with visual attention #134

Open
pefi9 opened this issue Feb 8, 2016 · 20 comments
Open

Multiple object recognition with visual attention #134

pefi9 opened this issue Feb 8, 2016 · 20 comments

Comments

@pefi9
Copy link

pefi9 commented Feb 8, 2016

Hi,

I am trying to use the recurrent attention model for multiple object ( http://arxiv.org/pdf/1412.7755v2.pdf ). Would you have suggestions how to do it?

@pefi9 pefi9 changed the title MULTIPLE OBJECT RECOGNITION WITH VISUAL ATTENTION Multiple object recognition with visual attention Feb 8, 2016
@nicholas-leonard
Copy link
Member

@pefi9 Use recurrent-visual-attention.lua script as a starting point. Build a dataset (without dp) where input is image output is sequence of targets (with location?). You should probably still be able to use most of the modules used in the original script, but you will need to assemble them differently and create a different one for the first time-step. If you need help, fork this repo and make multiple-object-recognition.lua script. Create a branch and pull request it here with (work in progress) in title. Or you could create your own repository. In any case, making it open source, we can work on it together.

@pefi9
Copy link
Author

pefi9 commented Feb 10, 2016

Thanks @nicholas-leonard .
I created github repo with the code available - https://github.com/pefi9/mo_va
(This version has padding between the digits and the glimpse size is small by intention, so I can validate whether the network can learn and move from the left part to the right part.)

I tried two approaches:

  1. Comment line 38, 39 in RecurrentAttention.lua so that the attention module not forget after first digit. However I was not able to make it running as number of steps for rnn and attention modules inside of the recurrent attention did not match. Even though I set rho = 5 for rnn, after analyzing second digit the # of steps of the rnn was 10 and step of the attention was 5.

  2. Set rho = (# of glimpse for one digit) * (# of digits). So that the recurrent attention model remembers all the history of one image. For this solution I removed line 154 in recurrent-visual-attention.lua (nn.SelectTable(-1)) as I want to output more than just one table. To be specific I want to forward and backward propagate only the x-th (e.g. 5th) output of the recurrent attention module. In addition to that, according to the paper, I want to back-propagate only the digits where the previous digit was correctly classify. This matter should be handled on the lines 102 - 129 in 4_train.lua.

It seems to be learning, but the performance is not excellent. I'm sure I do have there some more mistakes.
Is it possible to adjust it for a variable number of digits? I can't think of any solution at the moment.

@nicholas-leonard
Copy link
Member

@pefi9 I don't think you should need to modify RecurrentAttention. Say you want to detect n objects per image, then formulate the problem as giving rho/n steps per object. So for 2 objects, I could assign a rho of 10 such that an object should be identified every 5 time-steps.

You should build a MultiObjectReward criterion for doing https://github.com/pefi9/mo_va/blob/multi_digit_development/4_train.lua#L102-L129 (of course, you will still need a loop over n objects to update the ConfusionMatrix). Why build a criterion? So you can unit test it. Also, the current implementation only allows one call to reinforce() per batch as a single reward is expected. Calling reinforce(reward) n times per batch (once per object) will only use the last reward.

So yeah, I think you would build a MultiObjectReward criterion and include some unit tests so that it behaves as expected.

Also, you should be able to use the original RecurrentAttention without modification as the output should have rho = (# of glimpse for one digit) * (# of digits) as you said. To select only the n (# of digits) outputs, use something like :

concat = nn.ConcatTable():add(nn.Select(n)):add(nn.Select(n*2))...:add(nn.Select(-1))

@pefi9
Copy link
Author

pefi9 commented Feb 17, 2016

@nicholas-leonard , I had time to look at it today. I tried to handle the step-wise reward by implementing the https://github.com/Element-Research/rnn/blob/master/Sequencer.lua#L144-L146 , but as RecurrentAttention wrap the locator into Recursor, there is issue: "Sequencer.lua:37: expecting input table". So I created MOReinforce and MOReinforceNormal where the first one return reward for a specific step and the second one keeps track of the actual step.
There is a MORewardCriterion as well which should replace VRClassReward but putting the gradInputs into correct form is ... perhaps it will be easier to not use ParallelCriterion at all and use only something like the MOReward.
Or would you have some other idea how solve it (more elegant way)?

@nicholas-leonard
Copy link
Member

@pefi9 Sorry had a bad cold these past days. So I think we should modify AbstractSequencer to accept tables of rewards (one per time-step).

@nicholas-leonard
Copy link
Member

@pefi9 I have modified the AbstractRecurrent to handle tables of rewards : 417f8df . Basically, you shouldn't need MOReinforce and MOReinforceNormal anymore. Instead, make sure that your MORewardCriterion calls module:reinforce(rewards) where rewards is a table of the same length as its input. So it returns one reward per time-step.

@pefi9
Copy link
Author

pefi9 commented Feb 25, 2016

@nicholas-leonard No worries, hope you are well now. I had couple errors in the code I'll update the github version tomorrow. It's works fine for single object however it takes a lot of time to train for multiple digits.

Modification which I have not tackled yet is to enable recognition of sequences with variable length. I`m not whether is it even possible to do with the current version of RecurrentAttention?

@pefi9
Copy link
Author

pefi9 commented Feb 25, 2016

Thanks for the update.

@nicholas-leonard
Copy link
Member

@pefi9 For variable length sequences, you could add a terminate class. When this class is predicted, regardless of position, it means that the model has found all instances. If your longest sequence has length n, then you should let your model detect n+1 objects. The +1 is so it can always learn to detect the terminate class/object at the end of the sequence.

@pefi9
Copy link
Author

pefi9 commented Feb 26, 2016

@nicholas-leonard With the new AbstractRecurrent I've got this error:

...orch/install/share/lua/5.1/rnn/AbstractSequencer.lua:4: DEPRECATED 27 Oct 2015. Wrap your internal modules into a Recursor instead
stack traceback:
        ...petrfiala/torch/install/share/lua/5.1/trepl/init.lua:500: in function <...petrfiala/torch/install/share/lua/5.1/trepl/init.lua:493>
        [C]: in function 'error'
        ...orch/install/share/lua/5.1/rnn/AbstractSequencer.lua:4: in function 'getStepModule'
        ...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:162: in function 'reinforce'
        ...etrfiala/torch/install/share/lua/5.1/dpnn/Module.lua:598: in function 'reinforce'
        MORewardCriterion_table.lua:111: in function 'updateGradInput'

I assume it's caused by the RecurrentAttention. By changing its parent to nn.Container I have got different error:

...rfiala/torch/install/share/lua/5.1/rnn/Sequencer.lua:145: Sequencer Error : step-wise rewards not yet supported
Would be sufficient to change 
https://github.com/Element-Research/rnn/blob/master/Sequencer.lua#L143-L148 
for 
function Sequencer:reinforce(reward)
    return parent.reinforce(self, reward)
end

?

@nicholas-leonard
Copy link
Member

@pefi9 I just removed that check in latest commit. As for the first error, not sure how that is happening.

@pefi9
Copy link
Author

pefi9 commented Feb 29, 2016

@nicholas-leonard I've got 2 findings:

  1. When Sequencer is used (to wrap not recurrent module) it gives the error which I mentioned in the previous comment. The reason is https://github.com/Element-Research/rnn/blob/master/AbstractRecurrent.lua#L162 , it calls AbstractSequencer:getStepModule and that is deprecated. Which decorator shall I use for classifier and concat2 https://github.com/pefi9/mo_va/blob/multi_digit_development/2_model_VA.lua#L128-L132 ?

  2. When called method model:reinforce(reward) goes through selection tables the reward is not filled with zero tables for the other indexes as in the case of updateGradInput. I'll adjust the backward method in MOCriterion accordingly or would you rather make changes in nn.SelectTable?

@vyouman
Copy link

vyouman commented Mar 21, 2016

@pefi9 Hi, I'm also going to implement the DRAM model to apply to some real-world images. So have you got the problems solved? Do you think is it possible to use ReinforceNormal, Reinforce and RecurrentAttention without any modifications and just write a new Criterion to get the time-step reward now? Thanks.

@pefi9
Copy link
Author

pefi9 commented Mar 22, 2016

Hi @vyouman, yes it should be possible. However, we did not solved the first point in my previous comment. The work around I used is to change the parent class of RecurrentAttention from "nn.AbstractSequencer" to "nn.Container".
I was able to train it only for 2 digits (objects) not more, so we decided to use just simple CNN with multiple classifiers on the output and the MOCriterion has stayed in development phase.

@vyouman
Copy link

vyouman commented Mar 23, 2016

@pefi9 Thanks for your patient reply. :p I wonder if you have any idea about how to solve the sequences of the variable length, to be clear, say the longest sequences in the dataset is D, and there are samples of different variable length in one batch, but the longest sequcence in a single batch may be shorter than D. Does it help to write a terminate class? Kind of confused about the solution to the sequences of variable length.

@pefi9
Copy link
Author

pefi9 commented Mar 23, 2016

@vyouman, I had the same question (Nicholas' answer from Feb 12). It's not possible at the moment. You have to define the maximum number of objects (length of the sequence) and number of taken glimpses in advance (I did: https://github.com/pefi9/mo_va/blob/multi_digit_development/2_model_VA.lua#L122-L126 , where opt.digits is the max length and opt.steps is the # of taken glimpses per object, digit). It would be nice feature to have but I can't think of any easy extension of the current code which would enable it.

@nicholas-leonard
Copy link
Member

You could add padding. Specifically, you add dummy classes at the end of the target sequence that mean "END OF SEQUENCE".

@ssampang
Copy link

@nicholas-leonard I've come across the same problem that @pefi9 faced with RecurrentAttention's error when getStepModule is called. Shall I change the parent class like they did as well?

Until now I was using a custom reinforce method for the Recursor module that essentially did the same thing, but I think it'd be better to delete my code and use what's built into this library.

@vyouman
Copy link

vyouman commented Apr 14, 2016

@nicholas-leonard Yeah, I've also encoutered the problem @pefi9 and @ssampang came across because of the deprecated getStepModule of the AbstractSequencer. Changing the parent class just doesn't work. I'm trying to implement the Deep Recurrent Attention model and my reward is a table.

/home/vyouman/torch/install/bin/luajit: ...an/torch/install/share/lua/5.1/rnn/AbstractSequencer.lua:4: DEPRECATED 27 Oct 2015. Wrap your internal modules into a Recursor instead
stack traceback:
    [C]: in function 'error'
    ...an/torch/install/share/lua/5.1/rnn/AbstractSequencer.lua:4: in function 'getStepModule'
    ...an/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:177: in function 'reinforce'
    /home/vyouman/torch/install/share/lua/5.1/dpnn/Module.lua:586: in function 'reinforce'
    ...-linux.gtk.x86_64/workspace/DRAM/src/VRCaptionReward.lua:53: in function 'backward'
    ...product-linux.gtk.x86_64/workspace/DRAM/src/testRAEx.lua:171: in main chunk
    [C]: at 0x00406670

@nicholas-leonard
Copy link
Member

@pefi9 @ssampang As mentioned in #210, I think @vyouman identified the problem. The latest commit should fix it. Let me know if there are any further issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants