Added falcon model converter #2040

mehtamansi29 · 2025-01-09T19:36:45Z

Falcon model converter is missing. Added the same. Fixes #1988

mattdangerw · 2025-01-13T22:36:15Z

keras_hub/src/utils/transformers/convert_falcon_test.py

+class TestTask(TestCase):
+    @pytest.mark.large
+    def test_convert_tiny_preset(self):
+        model = FalconCausalLM.from_preset("hf://tiiuae/falcon-7b")


I don't think we can afford to download this ~15gb file in our testing setup. You could try the 1b model? Or create a small test model on hf, as was done for llama and others.

@mattdangerw - I'll create small test with 1b falcon model and commit again.

mattdangerw · 2025-01-13T22:39:00Z

@SamanehSaadat can you take a look for the falcon conversions options here? I remember there were some annoying gotchas (e.g. different tokenizer types), that this might not conver.

SamanehSaadat · 2025-01-13T23:13:48Z

keras_hub/src/utils/transformers/convert_falcon_test.py

+
+    @pytest.mark.large
+    def test_class_detection(self):
+        model = FalconCausalLM.from_preset("hf://tiiuae/falcon-7b")


Does this work? I think we only have Falcon-1b support! 7b model has a different attention mechanism which hasn't been added!

We should probably also attach a colab verifying that output from the huggingface and KerasHub versions align. And sound like that might actually run into differences here due to what @SamanehSaadat is saying.

@SamanehSaadat how much work is needed of the architecture code to support the 7 and other variants? Is it something that could be added here or a ton to do?

@mattdangerw I think adding support for the 7b is non-trivial. There are some major architectural differences like alibi, GQA vs. MHA, and rotary embedding (to me, it's almost like adding a new architecture!).

Thanks! Sounds like we will need to either throw in the converter if we encounter the falcon huggingface options we don't currently support, or add them in (on a separate pr?).

@mehtamansi29 we'd probably need a colab verifying that the output matches for some subset of falcon checkpoints on huggingface, and ideally that we throw for falcon checkpoints that needs arch options we don't yet support.

Okay. @mattdangerw - I'll create a colab for verifying that the output matches for some subset of falcon checkpoints on huggingface and share it with you.

JyotinderSingh · 2025-03-27T09:31:26Z

Hi @mehtamansi29, just checking on this PR. Looks like we need to add a numerics verification notebook and swap out the 7b preset for the 1b (along with a test checkpoint for the unit testing).

mehtamansi29 · 2025-04-17T13:52:58Z

Hi @mattdangerw and @JyotinderSingh -

Here is notebooks regarding 7b numerics for falcon model and that seems different for huggingface and keras_hub model. I'll take a look into the converter again to get correct numerics.

mehtamansi29 · 2025-08-19T09:15:43Z

Hi @JyotinderSingh and @mattdangerw - I’ve updated the Falcon converter to include support for both GQA (Grouped Query Attention) and MQA (Multi-Query Attention). With these changes, the converter can now handle weights for both the Falcon 1B and 7B models.

Here is the notebook where both models(falcon 1b and 7b) load correctly. The total parameters are nearly identical, and the numerics line up as expected.

sachinprasadhs · 2025-08-20T21:34:39Z

Hi @mehtamansi29 , I ran the parameter verification colab, 1B model is matching the numerics, but there are trainable parameters mismatch to the 7B model.
Could you please check it, here is the updated Gist

Torch Trainable parameters: 6,921,720,704
Total Keras Trainable Parameters): 6,923,033,472
Difference in parameters: 1312768

mehtamansi29 added 2 commits January 10, 2025 01:05

Added falcon model converter

f0d3696

Added falcon model converter -1

21df61e

mattdangerw requested review from SamanehSaadat and mattdangerw January 13, 2025 22:35

mattdangerw reviewed Jan 13, 2025

View reviewed changes

SamanehSaadat reviewed Jan 13, 2025

View reviewed changes

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Jan 22, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jan 22, 2025

divyashreepathihalli requested a review from JyotinderSingh March 19, 2025 05:07

mehtamansi29 added 4 commits April 28, 2025 14:23

Falcon converter changes

bc4b4f7

Falcon converter changes_1

060e95c

transformer config changes of falcon converter

496f3e7

transformer config changes of falcon converter

9dd0e61

sachinprasadhs added the WIP Pull requests which are work in progress and not ready yet for review. label May 1, 2025

mehtamansi29 added 13 commits July 21, 2025 20:19

transformer config changes of falcon converter_3

b990401

transformer config changes of falcon converter_4

8f2284c

transformer config changes of falcon converter_6

3642f1e

transformer config changes of falcon converter_7

6da4ced

transformer config changes of falcon converter_8

a8ea36f

transformer config changes of falcon converter_9

cea948d

transformer config changes of falcon converter_11

60078c5

Merge remote-tracking branch 'upstream/master' into patch-1

d7a5c31

intermediate_dim change

c7d4a9c

intermediate_dim change_1

152c19e

backbone_config change

3bc83bd

transformer config intermediate_dim

d3cbdec

attention layer weights changes

50e6d06

mehtamansi29 added 22 commits July 31, 2025 13:12

attention layer indention change

89bac89

transformers_config changes

164e6cc

transformer config changes

7873b3c

transformer config

5f174d4

num_key_value_heads change

559ee01

remove keyvalue head from transformer config

5047254

intermediate_dim in transformer c

af2c647

change head dim

3aaa529

hidden dim changes

13c04d7

convert_falcon_changes

8cc06a6

attention layer change

21e4473

attention layer changes_1

6aa4244

falcon converter changesa

1ce3837

preset_loader precommit run

496eeeb

Merge branch 'keras-team:master' into patch-1

9ccc46a

backbone and casual_lm test

b64cd4c

loading issue for falcon1b

fba4aba

loading issue for falcon1b_1

f3c5041

loading issue for falcon1b_1

9b860c1

resolving conflict

41289d8

convert_falcon file changes

2284520

convert_falcon chanes_1

2933774

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added falcon model converter #2040

Added falcon model converter #2040

Uh oh!

mehtamansi29 commented Jan 9, 2025

Uh oh!

mattdangerw Jan 13, 2025

Uh oh!

mehtamansi29 Jan 14, 2025

Uh oh!

mattdangerw commented Jan 13, 2025

Uh oh!

SamanehSaadat Jan 13, 2025

Uh oh!

mattdangerw Jan 14, 2025

Uh oh!

SamanehSaadat Jan 14, 2025

Uh oh!

mattdangerw Jan 21, 2025

Uh oh!

mehtamansi29 Jan 23, 2025 •

edited

Loading

Uh oh!

JyotinderSingh commented Mar 27, 2025

Uh oh!

mehtamansi29 commented Apr 17, 2025

Uh oh!

mehtamansi29 commented Aug 19, 2025

Uh oh!

sachinprasadhs commented Aug 20, 2025

Uh oh!

Uh oh!

Added falcon model converter #2040

Are you sure you want to change the base?

Added falcon model converter #2040

Uh oh!

Conversation

mehtamansi29 commented Jan 9, 2025

Uh oh!

mattdangerw Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

mehtamansi29 Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

mattdangerw commented Jan 13, 2025

Uh oh!

SamanehSaadat Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

mattdangerw Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

SamanehSaadat Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

mattdangerw Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

mehtamansi29 Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JyotinderSingh commented Mar 27, 2025

Uh oh!

mehtamansi29 commented Apr 17, 2025

Uh oh!

mehtamansi29 commented Aug 19, 2025

Uh oh!

sachinprasadhs commented Aug 20, 2025

Uh oh!

Uh oh!

mehtamansi29 Jan 23, 2025 •

edited

Loading