Skip to content

Conversation

mehtamansi29
Copy link
Collaborator

Falcon model converter is missing. Added the same. Fixes #1988

class TestTask(TestCase):
@pytest.mark.large
def test_convert_tiny_preset(self):
model = FalconCausalLM.from_preset("hf://tiiuae/falcon-7b")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can afford to download this ~15gb file in our testing setup. You could try the 1b model? Or create a small test model on hf, as was done for llama and others.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw - I'll create small test with 1b falcon model and commit again.

@mattdangerw
Copy link
Member

@SamanehSaadat can you take a look for the falcon conversions options here? I remember there were some annoying gotchas (e.g. different tokenizer types), that this might not conver.


@pytest.mark.large
def test_class_detection(self):
model = FalconCausalLM.from_preset("hf://tiiuae/falcon-7b")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? I think we only have Falcon-1b support! 7b model has a different attention mechanism which hasn't been added!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also attach a colab verifying that output from the huggingface and KerasHub versions align. And sound like that might actually run into differences here due to what @SamanehSaadat is saying.

@SamanehSaadat how much work is needed of the architecture code to support the 7 and other variants? Is it something that could be added here or a ton to do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw I think adding support for the 7b is non-trivial. There are some major architectural differences like alibi, GQA vs. MHA, and rotary embedding (to me, it's almost like adding a new architecture!).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sounds like we will need to either throw in the converter if we encounter the falcon huggingface options we don't currently support, or add them in (on a separate pr?).

@mehtamansi29 we'd probably need a colab verifying that the output matches for some subset of falcon checkpoints on huggingface, and ideally that we throw for falcon checkpoints that needs arch options we don't yet support.

Copy link
Collaborator Author

@mehtamansi29 mehtamansi29 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. @mattdangerw - I'll create a colab for verifying that the output matches for some subset of falcon checkpoints on huggingface and share it with you.

@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Jan 22, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jan 22, 2025
@JyotinderSingh
Copy link
Collaborator

Hi @mehtamansi29, just checking on this PR. Looks like we need to add a numerics verification notebook and swap out the 7b preset for the 1b (along with a test checkpoint for the unit testing).

@mehtamansi29
Copy link
Collaborator Author

Hi @mattdangerw and @JyotinderSingh -

Here is notebooks regarding 7b numerics for falcon model and that seems different for huggingface and keras_hub model. I'll take a look into the converter again to get correct numerics.

@sachinprasadhs sachinprasadhs added the WIP Pull requests which are work in progress and not ready yet for review. label May 1, 2025
@mehtamansi29
Copy link
Collaborator Author

Hi @JyotinderSingh and @mattdangerw - I’ve updated the Falcon converter to include support for both GQA (Grouped Query Attention) and MQA (Multi-Query Attention). With these changes, the converter can now handle weights for both the Falcon 1B and 7B models.

Here is the notebook where both models(falcon 1b and 7b) load correctly. The total parameters are nearly identical, and the numerics line up as expected.

@sachinprasadhs
Copy link
Collaborator

Hi @mehtamansi29 , I ran the parameter verification colab, 1B model is matching the numerics, but there are trainable parameters mismatch to the 7B model.
Could you please check it, here is the updated Gist

Torch Trainable parameters: 6,921,720,704
Total Keras Trainable Parameters): 6,923,033,472
Difference in parameters: 1312768

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Pull requests which are work in progress and not ready yet for review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing Transformers initializer for Falcon models
7 participants