InfiXAI.github.io/index.json at main · InfiXAI/InfiXAI.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
{
  "title": "InfiX-AI",
  "description": "InfiX.ai Team Blog - GenAI for all, Intelligence in every task",
  "baseURL": "http://infixai.github.io/",
  "pages": [
    {
      "title": "InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization",
      "url": "http://infixai.github.io/research/infigui-g1/",
      "date": "2025-08-05",
      "description": "We introduce InfiGUI-G1, a multimodal GUI agent that employs Adaptive Exploration Policy Optimization (AEPO) to improve semantic alignment in GUI grounding, achieving up to 8.3% relative improvement over baseline methods.",
      "tags": ["GUI Agent","Multimodal LLM","Reinforcement Learning","Computer Vision","AI Research"],
      "categories": ["Research","AI","Machine Learning"]
    },
    {
      "title": "InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners",
      "url": "http://infixai.github.io/research/infigui-r1/",
      "date": "2025-04-19",
      "description": "We present InfiGUI-R1, a novel GUI agent that combines spatial reasoning with reinforcement learning to achieve superior performance in GUI automation tasks across desktop, mobile, and web platforms.",
      "tags": ["GUI Agent","Multimodal LLM","Reinforcement Learning","Computer Vision","AI Research"],
      "categories": ["Research","AI","Machine Learning"]
    },
    {
      "title": "InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models",
      "url": "http://infixai.github.io/research/infifpo/",
      "date": "2025-05-20",
      "description": "\u003cp\u003eWe propose \u003cstrong\u003eInfiFPO\u003c/strong\u003e, a principled and efficient framework for performing model fusion during the preference alignment phase. Our key insight is that the reference model in preference optimization (e.g., in DPO) can be replaced with a fused source model, thereby enabling the pivot model to learn not only from preference data but also from the probabilistic behaviors of multiple source models.\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"assets/exp.png\" alt=\"InfiFPO\"\u003e\u003c/p\u003e\n\u003cp\u003eComprehensive experiments on 11 widely-used benchmarks demonstrate that \u003cstrong\u003eInfiFPO\u003c/strong\u003e consistently outperforms existing model fusion and preference optimization methods. When using Phi-4 as the pivot model, \u003cstrong\u003eInfiFPO\u003c/strong\u003e improve its average performance from 79.95 to 83.33 on 11 benchmarks, significantly improving its capabilities in mathematics, coding, and reasoning tasks.\u003c/p\u003e",
      "tags": ["Model Fusion","Preference Optimization","Direct Preference Optimization","Large Language Models","Alignment"],
      "categories": null
    },
    {
      "title": "InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion",
      "url": "http://infixai.github.io/research/infigfusion/",
      "date": "2025-05-20",
      "description": "\u003cp\u003e\u003cstrong\u003eInfiGFusion\u003c/strong\u003e is the first structure-aware fusion framework for large language models that models semantic dependencies among logits using feature-level graphs. We introduce a novel Graph-on-Logits Distillation (GLD) loss that captures cross-dimension interactions via co-activation graphs and aligns them using an efficient, provable approximation of Gromov-Wasserstein distance (reducing complexity from O(n^4) to O(nlogn)). Our released \u003cstrong\u003eInfiGFusion-14B\u003c/strong\u003e model consistently shows better performance, achieving +35.6 on Multistep Arithmetic and +37.06 on Causal Judgement over SFT, demonstrating superior multi-step and complex logic inference.\u003c/p\u003e",
      "tags": ["Graph Neural Networks","Knowledge Distillation","Gromov-Wasserstein","Model Fusion","Large Language Models"],
      "categories": null
    },
    {
      "title": "InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion",
      "url": "http://infixai.github.io/research/infifusion/",
      "date": "2025-02-17",
      "description": "\u003cp\u003e\u003cstrong\u003eInfiFusion\u003c/strong\u003e is the first fusion framework for large language models that fuse up to 4 models with 14B~24B parameters. We introduce a unified framework which can fuse many heterogeneous models\nin one distillation stage. InfiFusion outperforms the state-of-the-art models, such as Qwen-2.5-14B-Instruct and Phi-4, across 11 widely applied benchmarks covering reasoning, coding, mathematics, and instruction-following tasks. Notably, InfiFusion achieves this superior performance while significantly reduces\ncomputational costs, completing full training with only 160 H800 GPU hours compared to the millions typically required for traditional LLM training.\u003c/p\u003e",
      "tags": ["Unified Fusion","Knowledge Distillation","Model Fusion","Large Language Models"],
      "categories": null
    },
    {
      "title": "InfiGUI-G1-3B",
      "url": "http://infixai.github.io/huggingface.co/InfiX-ai/InfiGUI-G1-3B",
      "date": "2025-08-11",
      "description": "A novel policy optimization framework for multimodal large language models that addresses semantic alignment challenges in GUI grounding.",
      "tags": ["GUI agent","multimodal","planning","error recovery"],
      "categories": null
    },
    {
      "title": "InfiGUI-G1-7B",
      "url": "http://infixai.github.io/huggingface.co/InfiX-ai/InfiGUI-G1-7B",
      "date": "2025-08-11",
      "description": "A novel policy optimization framework for multimodal large language models that addresses semantic alignment challenges in GUI grounding.",
      "tags": ["GUI agent","multimodal","planning","error recovery"],
      "categories": null
    },
    {
      "title": "android_control_test",
      "url": "http://infixai.github.io/huggingface.co/datasets/InfiX-ai/android_control_test",
      "date": "2025-07-31",
      "description": "Test dataset for evaluating Android control models, complementing the training set with unseen scenarios to assess generalization and performance of automation agents on Android interfaces.",
      "tags": ["android","control","test data","evaluation"],
      "categories": null
    },
    {
      "title": "android_control_train",
      "url": "http://infixai.github.io/huggingface.co/datasets/InfiX-ai/android_control_train",
      "date": "2025-07-31",
      "description": "Training dataset for Android control tasks, likely containing interaction data, command sequences, or GUI operation logs to support the development of Android-based automation agents.",
      "tags": ["android","control","training data","GUI interaction"],
      "categories": null
    },
    {
      "title": "InfiGUIAgent-Data",
      "url": "http://infixai.github.io/huggingface.co/datasets/InfiX-ai/InfiGUIAgent-Data",
      "date": "2025-07-31",
      "description": "Specialized dataset for training InfiGUIAgent models, containing multimodal data (e.g., GUI screenshots, user actions, task descriptions) to enable robust GUI task automation and reasoning.",
      "tags": ["GUI agent","multimodal","task automation","agent training"],
      "categories": null
    },
    {
      "title": "s1K-1.1-850",
      "url": "http://infixai.github.io/huggingface.co/datasets/InfiX-ai/s1K-1.1-850",
      "date": "2025-07-31",
      "description": "An updated version (1.1) of a small dataset, likely containing 850 samples (inferred from '850'). May extend or refine the content of 's1K-QwQ' for more targeted model training.",
      "tags": ["small dataset","updated version","refined data"],
      "categories": null
    },
    {
      "title": "s1K-QwQ",
      "url": "http://infixai.github.io/huggingface.co/datasets/InfiX-ai/s1K-QwQ",
      "date": "2025-07-31",
      "description": "A dataset with 's1K' (likely 1,000 samples) in its name, potentially focused on question-answer pairs, dialogue, or reasoning tasks, supporting model training in conversational or logical reasoning abilities.",
      "tags": ["small dataset","question-answer","reasoning"],
      "categories": null
    },
    {
      "title": "Infi-MMR-3B",
      "url": "http://infixai.github.io/huggingface.co/InfiX-ai/Infi-MMR-3B",
      "date": "2025-07-31",
      "description": "A multimodal model developed via the InfiMMR three-phase curriculum framework, enhancing multimodal reasoning capabilities in small language models.",
      "tags": ["multimodal","reasoning","small language model"],
      "categories": null
    },
    {
      "title": "InfiFPO-14B",
      "url": "http://infixai.github.io/huggingface.co/InfiX-ai/InfiFPO-14B",
      "date": "2025-07-31",
      "description": "A lightweight fusion method during the preference alignment phase that injects fused model behavior into preference learning.",
      "tags": ["preference optimization","DPO","model fusion"],
      "categories": null
    },
    {
      "title": "InfiFusion-14B",
      "url": "http://infixai.github.io/huggingface.co/InfiX-ai/InfiFusion-14B",
      "date": "2025-07-31",
      "description": "A logit-level fusion pipeline based on Universal Logit Distillation, enhanced with Top-K filtering and logits standardization. Supports both pairwise and unified fusion strategies to balance performance and efficiency.",
      "tags": ["model fusion","logit distillation","cross-model reasoning"],
      "categories": null
    },
    {
      "title": "InfiGFusion-14B",
      "url": "http://infixai.github.io/huggingface.co/InfiX-ai/InfiGFusion-14B",
      "date": "2025-07-31",
      "description": "A structure-aware extension that builds co-activation graphs from logits and aligns them via an efficient Gromov-Wasserstein loss approximation.",
      "tags": ["model fusion","graph-on-logits","Gromov-Wasserstein loss"],
      "categories": null
    },
    {
      "title": "InfiGUI-R1-3B",
      "url": "http://infixai.github.io/huggingface.co/InfiX-ai/InfiGUI-R1-3B",
      "date": "2025-07-31",
      "description": "A GUI agent developed via the Actor2Reasoner framework, evolving a reactive model into a deliberative reasoner through spatial reasoning distillation and reinforcement learning.",
      "tags": ["GUI agent","multimodal","planning","error recovery"],
      "categories": null
    },
    {
      "title": "InfiGUIAgent-2B-Stage1",
      "url": "http://infixai.github.io/huggingface.co/InfiX-ai/InfiGUIAgent-2B-Stage1",
      "date": "2025-07-31",
      "description": "A multimodal generalist GUI agent with native hierarchical and expectation-reflection reasoning through a unique two-stage supervised pipeline.",
      "tags": ["GUI agent","multimodal","task automation"],
      "categories": null
    },
    {
      "title": "InfiR-1B-Base",
      "url": "http://infixai.github.io/huggingface.co/InfiX-ai/InfiR-1B-Base",
      "date": "2025-07-31",
      "description": "Part of the InfiR reasoning-enhanced low-resource training pipeline, crafted to be an effective small language model with improved reasoning.",
      "tags": ["small language model","reasoning","low-resource training"],
      "categories": null
    },
    {
      "title": "InfiR-1B-Instruct",
      "url": "http://infixai.github.io/huggingface.co/InfiX-ai/InfiR-1B-Instruct",
      "date": "2025-07-31",
      "description": "An instructed version of the InfiR small language model, part of the reasoning-enhanced low-resource training pipeline.",
      "tags": ["small language model","reasoning","instruct"],
      "categories": null
    },
    {
      "title": "InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection",
      "url": "http://infixai.github.io/research/infiguiagent/",
      "date": "2025-01-09",
      "description": "A multimodal large language model-based GUI agent that enables enhanced task automation on computing devices through hierarchical reasoning and expectation-reflection reasoning.",
      "tags": ["GUI Agent","Multimodal","Large Language Model","Computer Vision","Automation"],
      "categories": ["Research","AI Agents","Human-Computer Interaction"]
    }
  ]
}