diff --git a/examples/ios_calendar_assistant/README.md b/examples/ios_calendar_assistant/README.md index fd0043e7f..f175f36a6 100644 --- a/examples/ios_calendar_assistant/README.md +++ b/examples/ios_calendar_assistant/README.md @@ -1,16 +1,18 @@ # iOSCalendarAssistant -iOSCalendarAssistant is a demo app ([video](https://drive.google.com/file/d/1xjdYVm3zDnlxZGi40X_D4IgvmASfG5QZ/view?usp=sharing)) that takes a meeting transcript, summarizes it, extracts action items, and calls tools to book any followup meetings. +iOSCalendarAssistant is a demo app ([video](https://drive.google.com/file/d/1xjdYVm3zDnlxZGi40X_D4IgvmASfG5QZ/view?usp=sharing)) that uses Llama Stack Swift SDK's remote inference and agent APIs to take a meeting transcript, summarizes it, extracts action items, and calls tools to book any followup meetings. You can also test the create calendar event with a direct ask instead of a detailed meeting note. -We also have a demo project for running on-device inference. Checkout the instructions in the section below. +## Installation -# Installation +We also have a demo project for running on-device inference. Checkout the instructions in the section `iOSCalendarAssistantWithLocalInf` below. We recommend you try the [iOS Quick Demo](../ios_quick_demo) first to confirm the prerequisite and installation - both demos have the same prerequisite and the first two installation steps. -## Prerequisite +The quickest way to try out the demo for remote inference is using Together.ai's Llama Stack distro at https://llama-stack.together.ai - you can skip the next section and go to the Build and Run the iOS demo section directly. + +## (Optional) Build and Run Own Llama Stack Distro You need to set up a remote Llama Stack distributions to run this demo. Assuming you have a [Fireworks](https://fireworks.ai/account/api-keys) or [Together](https://api.together.ai/) API key, which you can get easily by clicking the link above: @@ -41,11 +43,12 @@ The default port is 5000 for `llama stack run` and you can specify a different p 2. Under the iOSCalendarAssistant project - Package Dependencies, click the + sign, then add `https://github.com/meta-llama/llama-stack-client-swift` at the top right and 0.1.0 in the Dependency Rule, then click Add Package. -3. Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro started in Prerequisite: +3. (Optional) Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro in Build and Run Own Llama Stack Distro: ``` -private let agent = RemoteAgents(url: URL(string: "http://127.0.0.1:5000")!) +private let agent = RemoteAgents(url: URL(string: "https://llama-stack.together.ai")!) ``` + **Note:** In order for the app to access the remote URL, the app's `Info.plist` needs to have the entry `App Transport Security Settings` with `Allow Arbitrary Loads` set to YES. Also, to allow the app to add event to the Calendar app, the `Info.plist` needs to have an entry `Privacy - Calendars Usage Description` and when running the app for the first time, you need to accept the Calendar access request. @@ -53,12 +56,12 @@ Also, to allow the app to add event to the Calendar app, the `Info.plist` needs 4. Build the run the app on an iOS simulator or your device. First you may try a simple request: ``` -Create a calendar event with a meeting title as Llama Stack update for 2-3pm January 27, 2025. +Create a calendar event with a meeting title as Llama Stack update for 2-3pm February 3, 2025. ``` Then, a detailed meeting note: ``` -Date: January 20, 2025 +Date: February 4, 2025 Time: 10:00 AM - 11:00 AM Location: Zoom Attendees: @@ -82,84 +85,37 @@ Sarah: Good. Jane, any updates from operations? Jane: Yes, logistics are sorted, and we’ve confirmed the warehouse availability. The only pending item is training customer support for the new product. Sarah: Let’s coordinate with the training team to expedite that. Anything else? Mike: Quick note—can we get feedback on the beta version by Friday? -Sarah: Yes, let’s make that a priority. Anything else? No? Great. Thanks, everyone. Let’s meet again next week from 4-5pm on January 27, 2025 to review progress. +Sarah: Yes, let’s make that a priority. Anything else? No? Great. Thanks, everyone. Let’s meet again next week from 4-5pm on February 11, 2025 to review progress. ``` You'll see a summary, action items and a Calendar event created, made possible by Llama Stack's custom tool calling API support and Llama 3.1's tool calling capability. # iOSCalendarAssistantWithLocalInf -This project shows you how to run local inference on-device using ExecuTorch in conjunction with Llama Stack Swift SDK. - -1. git clone `https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant` - -2. Double click `ios_calendar_assistant/iOSCalendarAssistantWithLocalInf.xcodeproj` to open it in Xcode. - -3. If there are already Frameworks in the General section of the TARGETS, remove them. -4. In Package Dependencies, delete all dependencies there and clean the dependencies cache. +iOSCalendarAssistantWithLocalInf is a demo app that uses Llama Stack Swift SDK's local inference and agent APIs and ExecuTorch to run local inference on device. -5. In Package Dependencies, click the + sign, then add `https://github.com/meta-llama/llama-stack-client-swift`. Select Branch and input `v0.1.0`. This should resolve the package and add necessary dependencies in your project panel. (This should add a LlamaStackClient in your Frameworks) - -6. In the same place, add `https://github.com/pytorch/executorch`. Select Branch and input `latest`. This will add ExecuTorch as your dependencies. - -7. In the Frameworks for TARGETS, add all ExecuTorch kernels (including debug ones), but not `executorch` one. For example: +1. On a Mac terminal, in your top level directory, run commands: ``` -backend_coreml -backend_mps -backend_xnnpack -kernels_custom -kernels_optimized -kernels_portable -kernels_quantized +git clone https://github.com/meta-llama/llama-stack-apps +cd llama-stack-apps +git submodule update --init --recursive ``` -8. In your project panel, if there is already a xcode project called `LocalInferenceImpl.xcodeproj`, remove it completely. - -9. Then git clone `https://github.com/meta-llama/llama-stack/tree/adecb2a2d3bc5b5fb12280c54096706974e58201/llama_stack/providers/impls/ios/inference/LocalInferenceImpl` - -10. In the repo, run `git submodule update --init --recursive` to sync the executorch submodules. +2. Go back to your top level directory, run commands: -11. Install [Cmake](https://cmake.org/) for the executorch build. Additional [guidance](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md#1-install-cmake) to install and link cmake - -12. Drag `LocalInferenceImpl.xcodeproj` into your `iOSCalendarAssistantWithLocalInf` project. Import it as a reference - -13. In LocalInferenceImpl’s Package Dependencies, change `LlamaStackClient package` version to `v0.1.0` matching iOSCalendarAssistantWithLocalInf’s package version. This is important to resolve Stencil dependencies. - -14. Add LocalInferenceImpl.framework into the Framework section for TARGETS. - -15. In "Build Settings" > "Other Linker Flags" > For both Debug and Release > "Any iOS Simulator SDK", add: ``` --force_load -$(BUILT_PRODUCTS_DIR)/libkernels_optimized-simulator-release.a --force_load -$(BUILT_PRODUCTS_DIR)/libkernels_custom-simulator-release.a --force_load -$(BUILT_PRODUCTS_DIR)/libkernels_quantized-simulator-release.a --force_load -$(BUILT_PRODUCTS_DIR)/libbackend_xnnpack-simulator-release.a --force_load -$(BUILT_PRODUCTS_DIR)/libbackend_coreml-simulator-release.a --force_load -$(BUILT_PRODUCTS_DIR)/libbackend_mps-simulator-release.a +git clone https://github.com/meta-llama/llama-stack +cd llama-stack +git submodule update --init --recursive ``` -16. For "Any iOS SDK", add: -``` --force_load -$(BUILT_PRODUCTS_DIR)/libkernels_optimized-ios-release.a --force_load -$(BUILT_PRODUCTS_DIR)/libkernels_custom-ios-release.a --force_load -$(BUILT_PRODUCTS_DIR)/libkernels_quantized-ios-release.a --force_load -$(BUILT_PRODUCTS_DIR)/libbackend_xnnpack-ios-release.a --force_load -$(BUILT_PRODUCTS_DIR)/libbackend_coreml-ios-release.a --force_load -$(BUILT_PRODUCTS_DIR)/libbackend_mps-ios-release.a -``` +3. Double click `llama-stack-apps/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf.xcodeproj` to open it in Xcode. + +4. In the `iOSCalendarAssistantWithLocalInf` project panel, remove `LocalInferenceImpl.xcodeproj` and drag and drop `LocalInferenceImpl.xcodeproj` from `llama-stack/llama_stack/providers/inline/ios/inference` into the `iOSCalendarAssistantWithLocalInf` project. + +5. Prepare a Llama model file named `llama3_2_spinquant_oct23.pte` by following the steps [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#step-2-prepare-model) - you'll also download the `tokenizer.model` file there. Then drag and drop both files to the project `iOSCalendarAssistantWithLocalInf`. -17. Lastly prepare the model: prepare a .pte file following the executorch [docs](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#step-2-prepare-model). Bundle the .pte and tokenizer.model file into Build Phases -> Copy Bundle Resources +6. Build and run the app on an iOS simulator or a real device. -18. Build the app for simulator or real device +**Note** If you see a build error about cmake not found, you can install cmake by following the instruction [here](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md#1-install-cmake). diff --git a/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf/ContentView.swift b/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf/ContentView.swift index de525e714..1a8c6b5e0 100644 --- a/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf/ContentView.swift +++ b/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf/ContentView.swift @@ -39,7 +39,9 @@ struct ContentView: View { public init () { self.inference = LocalInference(queue: runnerQueue) self.localAgents = LocalAgents(inference: self.inference) - self.remoteAgents = RemoteAgents(url: URL(string: "http://localhost:5000")!) + + // replace the URL string if you build and run your own Llama Stack distro as shown in https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant#optional-build-and-run-own-llama-stack-distro + self.remoteAgents = RemoteAgents(url: URL(string: "https://llama-stack.together.ai")!) } var agents: Agents { @@ -130,39 +132,39 @@ struct ContentView: View { func summarizeConversation(prompt: String) async { do { let request = Components.Schemas.CreateAgentTurnRequest( - agent_id: self.agentId, messages: [ .UserMessage(Components.Schemas.UserMessage( content: .case1("Summarize the following conversation in 1-2 sentences:\n\n \(prompt)"), role: .user )) ], - session_id: self.agenticSystemSessionId, stream: true ) - for try await chunk in try await self.agents.createTurn(request: request) { + for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) { let payload = chunk.event.payload switch (payload) { - case .AgentTurnResponseStepStartPayload(_): + case .step_start(_): break - case .AgentTurnResponseStepProgressPayload(let step): - if (step.model_response_text_delta != nil) { + case .step_progress(let step): + if (step.delta != nil) { DispatchQueue.main.async { withAnimation { var message = messages.removeLast() - message.text += step.model_response_text_delta! + if case .text(let delta) = step.delta { + message.text += "\(delta.text)" + } message.tokenCount += 2 message.dateUpdated = Date() messages.append(message) } } } - case .AgentTurnResponseStepCompletePayload(_): + case .step_complete(_): break - case .AgentTurnResponseTurnStartPayload(_): + case .turn_start(_): break - case .AgentTurnResponseTurnCompletePayload(_): + case .turn_complete(_): break } @@ -175,41 +177,39 @@ struct ContentView: View { func actionItems(prompt: String) async throws { let request = Components.Schemas.CreateAgentTurnRequest( - agent_id: self.agentId, messages: [ .UserMessage(Components.Schemas.UserMessage( content: .case1("List out any action items based on this text:\n\n \(prompt)"), role: .user )) ], - session_id: self.agenticSystemSessionId, stream: true ) - for try await chunk in try await self.agents.createTurn(request: request) { + for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) { let payload = chunk.event.payload switch (payload) { - case .AgentTurnResponseStepStartPayload(_): + case .step_start(_): break - case .AgentTurnResponseStepProgressPayload(let step): - if (step.model_response_text_delta != nil) { - DispatchQueue.main.async { - withAnimation { - var message = messages.removeLast() - message.text += step.model_response_text_delta! - message.tokenCount += 2 - message.dateUpdated = Date() - messages.append(message) - - self.actionItems += step.model_response_text_delta! + case .step_progress(let step): + DispatchQueue.main.async(execute: DispatchWorkItem { + withAnimation { + var message = messages.removeLast() + + if case .text(let delta) = step.delta { + message.text += "\(delta.text)" + self.actionItems += "\(delta.text)" } + message.tokenCount += 2 + message.dateUpdated = Date() + messages.append(message) } - } - case .AgentTurnResponseStepCompletePayload(_): + }) + case .step_complete(_): break - case .AgentTurnResponseTurnStartPayload(_): + case .turn_start(_): break - case .AgentTurnResponseTurnCompletePayload(_): + case .turn_complete(_): break } } @@ -217,61 +217,60 @@ struct ContentView: View { func callTools(prompt: String) async throws { let request = Components.Schemas.CreateAgentTurnRequest( - agent_id: self.agentId, messages: [ .UserMessage(Components.Schemas.UserMessage( content: .case1("Call functions as needed to handle any actions in the following text:\n\n" + prompt), role: .user )) ], - session_id: self.agenticSystemSessionId, stream: true ) - for try await chunk in try await self.agents.createTurn(request: request) { + for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) { let payload = chunk.event.payload switch (payload) { - case .AgentTurnResponseStepStartPayload(_): + case .step_start(_): break - case .AgentTurnResponseStepProgressPayload(let step): - if (step.tool_call_delta != nil) { - switch (step.tool_call_delta!.content) { - case .case1(_): - break - case .ToolCall(let call): - switch (call.tool_name) { - case .BuiltinTool(_): - break - case .case2(let toolName): - if (toolName == "create_event") { - var args: [String : String] = [:] - for (arg_name, arg) in call.arguments.additionalProperties { - switch (arg) { - case .case1(let s): // type string - args[arg_name] = s - case .case2(_), .case3(_), .case4(_), .case5(_), .case6(_): - break + case .step_progress(let step): + switch (step.delta) { + case .tool_call(let call): + if call.parse_status == .succeeded { + switch (call.tool_call) { + case .ToolCall(let toolCall): + var args: [String : String] = [:] + for (arg_name, arg) in toolCall.arguments.additionalProperties { + switch (arg) { + case .case1(let s): + args[arg_name] = s + case .case2(_), .case3(_), .case4(_), .case5(_), .case6(_): + break + } } - } - let formatter = DateFormatter() - formatter.dateFormat = "yyyy-MM-dd HH:mm" - formatter.timeZone = TimeZone.current - formatter.locale = Locale.current - self.triggerAddEventToCalendar( - title: args["event_name"]!, - startDate: formatter.date(from: args["start"]!) ?? Date(), - endDate: formatter.date(from: args["end"]!) ?? Date() - ) + let formatter = DateFormatter() + formatter.dateFormat = "yyyy-MM-dd HH:mm" + formatter.timeZone = TimeZone.current + formatter.locale = Locale.current + self.triggerAddEventToCalendar( + title: args["event_name"]!, + startDate: formatter.date(from: args["start"]!) ?? Date(), + endDate: formatter.date(from: args["end"]!) ?? Date() + ) + case .case1(_): + break } } + case .text(let text): + break + case .image(_): + break } - } - case .AgentTurnResponseStepCompletePayload(_): break - case .AgentTurnResponseTurnStartPayload(_): + case .step_complete(_): + break + case .turn_start(_): break - case .AgentTurnResponseTurnCompletePayload(_): + case .turn_complete(_): break } } @@ -308,22 +307,17 @@ struct ContentView: View { let createSystemResponse = try await self.agents.create( request: Components.Schemas.CreateAgentRequest( agent_config: Components.Schemas.AgentConfig( + client_tools: [ CustomTools.getCreateEventToolForAgent() ], enable_session_persistence: false, instructions: "You are a helpful assistant", max_infer_iters: 1, - model: "Llama3.1-8B-Instruct", - tools: [ - Components.Schemas.AgentConfig.toolsPayloadPayload.FunctionCallToolDefinition( - CustomTools.getCreateEventTool() - ) - ] + model: "meta-llama/Llama-3.1-8B-Instruct" ) ) ) self.agentId = createSystemResponse.agent_id - let createSessionResponse = try await self.agents.createSession( - request: Components.Schemas.CreateAgentSessionRequest(agent_id: self.agentId, session_name: "llama-assistant") + let createSessionResponse = try await self.agents.createSession(agent_id: self.agentId, request: Components.Schemas.CreateAgentSessionRequest(session_name: "llama-assistant") ) self.agenticSystemSessionId = createSessionResponse.session_id diff --git a/examples/ios_quick_demo/README.md b/examples/ios_quick_demo/README.md index 980ab7014..194171b3c 100644 --- a/examples/ios_quick_demo/README.md +++ b/examples/ios_quick_demo/README.md @@ -2,9 +2,11 @@ iOSQuickDemo is a demo app ([video](https://drive.google.com/file/d/1HnME3VmsYlyeFgsIOMlxZy5c8S2xP4r4/view?usp=sharing)) that shows how to use the Llama Stack Swift SDK ([repo](https://github.com/meta-llama/llama-stack-client-swift)) and its `ChatCompletionRequest` API with a remote Llama Stack server to perform remote inference with Llama 3.1. -# Installation +## Installation -## Prerequisite +The quickest way to try out the demo for remote inference is using Together.ai's Llama Stack distro at https://llama-stack.together.ai - you can skip the next section and go to the Build and Run the iOS demo section directly. + +## (Optional) Build and Run Own Llama Stack Distro You need to set up a remote Llama Stack distributions to run this demo. Assuming you have a [Fireworks](https://fireworks.ai/account/api-keys) or [Together](https://api.together.ai/) API key, which you can get easily by clicking the link above: @@ -38,7 +40,7 @@ The default port is 5000 for `llama stack run` and you can specify a different p ![](quick1.png) ![](quick2.png) -3. Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro started in Prerequisite: +3. (Optional) Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro in Build and Run Own Llama Stack Distro: ``` let inference = RemoteInference(url: URL(string: "http://127.0.0.1:5000")!) diff --git a/examples/ios_quick_demo/iOSQuickDemo/iOSQuickDemo/ContentView.swift b/examples/ios_quick_demo/iOSQuickDemo/iOSQuickDemo/ContentView.swift index e356f1c79..c4b93f075 100644 --- a/examples/ios_quick_demo/iOSQuickDemo/iOSQuickDemo/ContentView.swift +++ b/examples/ios_quick_demo/iOSQuickDemo/iOSQuickDemo/ContentView.swift @@ -12,9 +12,9 @@ import LlamaStackClient struct ContentView: View { @State private var message: String = "" @State private var userInput: String = "Best quotes in Godfather" - + private let runnerQueue = DispatchQueue(label: "org.llamastack.iosquickdemo") - + var body: some View { VStack(spacing: 20) { Text(message.isEmpty ? "Click Inference to see Llama's answer" : message) @@ -24,11 +24,11 @@ struct ContentView: View { .frame(maxWidth: .infinity) .background(Color.gray.opacity(0.2)) .cornerRadius(8) - + VStack(alignment: .leading, spacing: 10) { Text("Question") .font(.headline) - + TextField("Enter your question here", text: $userInput) .textFieldStyle(RoundedBorderTextFieldStyle()) .padding() @@ -56,17 +56,19 @@ struct ContentView: View { message = "Please enter a question before clicking 'Inference'." return } - + message = "" - + let workItem = DispatchWorkItem { defer { DispatchQueue.main.async { } } - + Task { - let inference = RemoteInference(url: URL(string: "http://127.0.0.1:5000")!) + + // replace the URL string if you build and run your own Llama Stack distro as shown in https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_quick_demo#optional-build-and-run-own-llama-stack-distro + let inference = RemoteInference(url: URL(string: "https://llama-stack.together.ai")!) do { for await chunk in try await inference.chatCompletion( @@ -108,7 +110,7 @@ struct ContentView: View { } } } - + runnerQueue.async(execute: workItem) } }