feat: support audio modality input, add voice input and voice attachment bubbles#453
Draft
luosc wants to merge 8 commits intoChevey339:masterfrom
Draft
feat: support audio modality input, add voice input and voice attachment bubbles#453luosc wants to merge 8 commits intoChevey339:masterfrom
luosc wants to merge 8 commits intoChevey339:masterfrom
Conversation
Contributor
Author
|
893cef7 is stable, cherry-pick is welcome |
Contributor
Author
|
0b5ccc8 确认引入了iOS支持。iPadOS测试通过。没有iPhone测试。 |
Contributor
Author
|
d2bec61 引入桌面版逻辑,macOS测试通过。macOS默认绑cmd+shift+R长按录制。 |
Contributor
Author
|
0d1bd3d 引入linux支持,需要系统安装parecord |
Contributor
Author
|
895926d 引入录音回放功能,加入显示播放进度的UI,已经在android/macos/iPadOS上测试通过 |
Contributor
Author
|
@Chevey339 Stage 2已经完成,这个PR就可以合并了。因为Gemini已经功能完备。后面涉及到模型音频模态支持判定,和拓展音频输入支持到其他api。可以在第二个PR开展。 |
Contributor
Author
|
在等待的时间里,我再往前推一下吧 |
Add Gemini-native mobile voice input with press-and-hold recording, haptic feedback, drag-to-cancel keep zone, and localized voice attachment bubbles that show user voice duration instead of raw file names. Files changed:\n- android/app/src/main/AndroidManifest.xml: add RECORD_AUDIO permission for mobile voice input.\n- lib/core/utils/multimodal_input_utils.dart: add Gemini native audio input model detection helpers.\n- lib/features/chat/widgets/chat_message_widget.dart: render recorded voice attachments as audio bubbles labeled with localized duration instead of raw file names.\n- lib/features/home/controllers/home_page_controller.dart: wire start/stop/cancel voice recording into the shared home controller and expose audio capability checks to the input UI.\n- lib/features/home/pages/home_page.dart: pass voice recording state and callbacks into the shared input section.\n- lib/features/home/services/message_builder_service.dart: append Gemini audio-input system guidance only when a request contains audio media.\n- lib/features/home/services/message_generation_service.dart: gate audio attachments on Gemini native audio support and detect audio media paths for prompt injection.\n- lib/features/home/services/voice_input_service.dart: add mobile WAV recording, raise the max recording duration to 1 minute, and rename recorded files with embedded duration metadata.\n- lib/features/home/widgets/chat_input_bar.dart: add the press-and-hold mic button between plus/send, haptic feedback, cancel keep zone overlay, animated mic scaling, recording keep zone visuals, and localized voice attachment chips.\n- lib/features/home/widgets/chat_input_section.dart: plumb voice-input gating and recording callbacks into the input bar.\n- lib/icons/lucide_adapter.dart: expose Lucide.Mic and Lucide.AudioLines for the new voice UI.\n- lib/l10n/app_en.arb: add localized voice-input and voice duration display strings.\n- lib/l10n/app_localizations.dart: regenerate localization interface after adding voice-input strings.\n- lib/l10n/app_localizations_en.dart: regenerate English localization output.\n- lib/l10n/app_localizations_zh.dart: regenerate Chinese localization output.\n- lib/l10n/app_zh.arb: add matching Simplified Chinese voice-input strings.\n- lib/l10n/app_zh_Hans.arb: add matching zh_Hans voice-input strings.\n- lib/l10n/app_zh_Hant.arb: add matching Traditional Chinese voice-input strings.\n- lib/utils/voice_attachment_utils.dart: add helpers to build and parse recorded voice file names and format mm:ss labels.\n- pubspec.yaml: add the record dependency for mobile voice capture.\n- test/gemini_audio_input_support_test.dart: cover Gemini audio-capability gating behavior.\n- test/voice_attachment_utils_test.dart: cover recorded voice filename metadata parsing and duration formatting. Signed-off-by: Shuchen Luo <nemo0806@gmail.com>
… cap keep-zone radius to a fixed value Enable the iOS microphone permission required for voice recording and keep the recording keep-zone consistently sized on wide layouts. This preserves the existing voice input interaction while fixing iOS permission handling and preventing the solid keep-zone circle from becoming excessively large. Signed-off-by: Shuchen Luo <nemo0806@gmail.com>
…nd hotkey flow Add macOS desktop voice input with a popover flow, including countdown start, stop-to-confirm send, and press-and-hold Cmd+Shift+R recording. This adds macOS-native microphone permission handling, the required audio-input entitlements, and proper shortcut event consumption so desktop recording starts reliably without invalid-key system beeps. Files changed: - lib/desktop/desktop_home_page.dart: cancel active desktop voice sessions when leaving the chat tab so the IndexedStack-kept page does not retain stale recording UI. - lib/desktop/hotkeys/chat_action_bus.dart: add a cancelTransientUi chat action for desktop voice popover cleanup. - lib/features/home/controllers/home_page_controller.dart: cancel desktop voice UI on desktop lifecycle changes and chat action bus cleanup events. - lib/features/home/pages/home_page.dart: add macOS in-app Cmd+Shift+R press-and-hold handling and consume repeated shortcut events to avoid system invalid-key feedback. - lib/features/home/services/voice_input_service.dart: enable macOS recording, switch macOS microphone permission checks to the record plugin, and open the macOS microphone privacy settings when permission is denied. - lib/features/home/widgets/chat_input_bar.dart: add the desktop voice popover flow with countdown, recording, confirmation, Enter-to-send, Esc-to-cancel, and temporary file cleanup while preserving the existing mobile press-and-hold behavior. - lib/features/home/widgets/chat_input_section.dart: expose the voice input entry on macOS desktop while keeping existing capability gating intact. - lib/l10n/app_en.arb: add desktop voice popover strings for countdown, recording, confirmation, and actions. - lib/l10n/app_localizations.dart: regenerate localization interface after adding macOS desktop voice strings. - lib/l10n/app_localizations_en.dart: regenerate English localization output. - lib/l10n/app_localizations_zh.dart: regenerate Chinese localization output. - lib/l10n/app_zh.arb: add matching Chinese desktop voice popover strings. - lib/l10n/app_zh_Hans.arb: add matching zh_Hans desktop voice popover strings. - lib/l10n/app_zh_Hant.arb: add matching Traditional Chinese desktop voice popover strings. - macos/Flutter/GeneratedPluginRegistrant.swift: register the macOS record plugin required for desktop voice capture. - macos/Runner/DebugProfile.entitlements: enable the audio-input entitlement for debug/profile builds. - macos/Runner/Info.plist: add the macOS microphone usage description required for desktop recording permission prompts. - macos/Runner/Release.entitlements: enable the audio-input entitlement for release builds. - test/chat_action_bus_test.dart: cover desktop transient UI cleanup event delivery. Signed-off-by: Shuchen Luo <nemo0806@gmail.com>
…nd hotkey flow Add Linux desktop voice input to the existing desktop popover recording flow, reusing the shared countdown, stop-to-confirm send, and attachment pipeline. This enables Linux recording in the shared input layer, adds explicit parecord/ffmpeg dependency checks with localized errors, and keeps Windows excluded while preserving existing mobile and macOS behavior. Files changed: - lib/features/home/pages/home_page.dart: extend the desktop voice hotkey handler to Linux and use Ctrl+Shift+R outside macOS. - lib/features/home/services/voice_input_service.dart: allow Linux recording, skip unsupported Linux permission requests, and surface explicit missing-dependency errors for parecord/ffmpeg before recording starts. - lib/features/home/utils/desktop_voice_input_utils.dart: centralize desktop voice platform support and shortcut matching for macOS and Linux. - lib/features/home/widgets/chat_input_bar.dart: enable the desktop voice popover flow on Linux and show the correct platform-specific shortcut label. - lib/features/home/widgets/chat_input_section.dart: expose desktop voice input on Linux while keeping the existing capability gating intact. - lib/l10n/app_en.arb: add localized Linux voice dependency error text. - lib/l10n/app_localizations.dart: regenerate localization interface after adding the Linux dependency error string. - lib/l10n/app_localizations_en.dart: regenerate English localization output. - lib/l10n/app_localizations_zh.dart: regenerate Chinese localization output. - lib/l10n/app_zh.arb: add matching Chinese Linux voice dependency error text. - lib/l10n/app_zh_Hans.arb: add matching zh_Hans Linux voice dependency error text. - lib/l10n/app_zh_Hant.arb: add matching Traditional Chinese Linux voice dependency error text. - test/linux_voice_input_support_test.dart: cover Linux desktop voice support, hotkey matching, and missing dependency detection. Signed-off-by: Shuchen Luo <nemo0806@gmail.com>
Add in-app replay for recorded voice message bubbles so users can review sent voice notes without leaving chat. Show playback progress directly inside the bubble, update only the duration label inside the existing localized voice bubble text to a countdown while playing, stop active TTS before replay, and keep playback synchronized through a shared single-player controller across mobile and desktop. Files changed: - lib/core/providers/voice_message_playback_provider.dart: add shared playback state for sent voice bubbles, including stop-on-retap, progress tracking, remaining time, and completion cleanup. - lib/features/chat/widgets/chat_message_widget.dart: route recorded voice attachments to in-app playback, render the in-bubble progress overlay, update only the duration portion of the localized voice bubble label to a countdown while playing, and keep normal files on the existing open-file path. - lib/main.dart: register the shared voice message playback provider. - lib/utils/voice_attachment_utils.dart: add a helper to format voice bubble labels from explicit durations so playback countdown can reuse the existing localized label structure. - lib/l10n/app_en.arb: add localized voice playback failure text. - lib/l10n/app_localizations.dart: regenerate localization interface after adding the voice playback failure string. - lib/l10n/app_localizations_en.dart: regenerate English localization output. - lib/l10n/app_localizations_zh.dart: regenerate Chinese localization output. - lib/l10n/app_zh.arb: add matching Chinese voice playback failure text. - lib/l10n/app_zh_Hans.arb: add matching zh_Hans voice playback failure text. - lib/l10n/app_zh_Hant.arb: add matching Traditional Chinese voice playback failure text. - test/voice_message_playback_provider_test.dart: cover shared playback activation, stop-on-retap, progress updates, playback switching, and failure cleanup. Signed-off-by: Shuchen Luo <nemo0806@gmail.com>
Allow Windows desktop sessions to enter the existing voice recording flow so audio-capable models expose the same in-app controls and shortcut path as Linux. This keeps the desktop voice UI aligned across supported platforms without changing the audio payload flow. Files changed: - lib/features/home/utils/desktop_voice_input_utils.dart: include Windows in desktop voice support, shortcut labels, and hotkey matching. - lib/features/home/services/voice_input_service.dart: allow Windows to enter the existing recorder start flow. - test/linux_voice_input_support_test.dart: cover Windows desktop support and hotkey behavior. Signed-off-by: Shuchen Luo <nemo0806@gmail.com>
Mark supported audio-capable models with an explicit audio input modality so follow-up routing can rely on one manifest source of truth. Preserve that modality across override parsing, model tags, and both model editors so audio capability is displayed and saved correctly. Files changed: - lib/core/models/model_types.dart: add Modality.audio and a shared storage serializer for persisted modality values. - lib/core/providers/model_provider.dart: infer audio input capability for supported Gemini, LongCat Omni, Whisper, and transcribe model ids. - lib/core/services/model_override_resolver.dart: parse audio modality values from model override payloads. - lib/shared/widgets/model_tag_wrap.dart: render audio modality with dedicated labels and icons instead of collapsing it into image tags. - lib/features/model/widgets/model_detail_sheet.dart: expose audio as an input mode in the mobile model editor and serialize audio modalities correctly when saving overrides. - lib/desktop/model_edit_dialog.dart: expose audio as an input mode in the desktop model editor and serialize audio modalities correctly when saving overrides. - lib/l10n/app_en.arb: add the localized audio mode label. - lib/l10n/app_localizations.dart: regenerate the localization interface for the new audio mode label. - lib/l10n/app_localizations_en.dart: regenerate English localization output. - lib/l10n/app_localizations_zh.dart: regenerate Chinese localization output, including zh_Hans and zh_Hant variants. - lib/l10n/app_zh.arb: add the matching Chinese audio mode label. - lib/l10n/app_zh_Hans.arb: add the matching zh_Hans audio mode label. - lib/l10n/app_zh_Hant.arb: add the matching Traditional Chinese audio mode label. - test/model_manifest_audio_support_test.dart: cover audio modality inference and override preservation for supported model manifests. Signed-off-by: Shuchen Luo <nemo0806@gmail.com>
a6758fc to
6353467
Compare
Fix the mobile voice overlay parent-data structure and block message-list scrolling while a hold-to-record gesture is active so drag movement is reserved for cancel detection. Files changed: - lib/features/home/widgets/chat_input_bar.dart: fix the mobile voice overlay parent-data structure for the hold-to-record keep-zone. - lib/features/home/widgets/message_list_view.dart: allow temporarily disabling user scrolling during hold-to-record. - lib/features/home/pages/home_page.dart: disable mobile message-list scrolling while voice recording is active. Signed-off-by: Shuchen Luo <nemo0806@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scope
Stage 1: Proof of Concept
Stage 2
Stage 3
Backlog