update for local inference demo for LS 0.1 (#163)

# What does this PR do? To work with the updated LocalInferenceImpl [here](meta-llama/llama-stack#911). Closes # (issue) ## Feature/Issue validation/testing/test plan Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced. Please also list any relevant details for your test configuration or test plan. - [ ] Test A Logs for Test A - [ ] Test B Logs for Test B ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/meta-llama/llama-stack-apps/blob/main/CONTRIBUTING.md#pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? - [ ] Did you write any new necessary tests? Thanks for contributing 🎉!
meta-llama · Feb 6, 2025 · a6136af · a6136af
1 parent ee7ae4d
commit a6136af
Show file tree

Hide file tree

Showing 4 changed files with 112 additions and 158 deletions.
diff --git a/examples/ios_calendar_assistant/README.md b/examples/ios_calendar_assistant/README.md
@@ -1,16 +1,18 @@
 # iOSCalendarAssistant
 
-iOSCalendarAssistant is a demo app ([video](https://drive.google.com/file/d/1xjdYVm3zDnlxZGi40X_D4IgvmASfG5QZ/view?usp=sharing)) that takes a meeting transcript, summarizes it, extracts action items, and calls tools to book any followup meetings.
+iOSCalendarAssistant is a demo app ([video](https://drive.google.com/file/d/1xjdYVm3zDnlxZGi40X_D4IgvmASfG5QZ/view?usp=sharing)) that uses Llama Stack Swift SDK's remote inference and agent APIs to take a meeting transcript, summarizes it, extracts action items, and calls tools to book any followup meetings.
 
 You can also test the create calendar event with a direct ask instead of a detailed meeting note.
 
-We also have a demo project for running on-device inference. Checkout the instructions in the section below.
+## Installation
 
-# Installation
+We also have a demo project for running on-device inference. Checkout the instructions in the section `iOSCalendarAssistantWithLocalInf` below.
 
 We recommend you try the [iOS Quick Demo](../ios_quick_demo) first to confirm the prerequisite and installation - both demos have the same prerequisite and the first two installation steps.
 
-## Prerequisite
+The quickest way to try out the demo for remote inference is using Together.ai's Llama Stack distro at https://llama-stack.together.ai - you can skip the next section and go to the Build and Run the iOS demo section directly.
+
+## (Optional) Build and Run Own Llama Stack Distro
 
 You need to set up a remote Llama Stack distributions to run this demo. Assuming you have a [Fireworks](https://fireworks.ai/account/api-keys) or [Together](https://api.together.ai/) API key, which you can get easily by clicking the link above:
 
@@ -41,24 +43,25 @@ The default port is 5000 for `llama stack run` and you can specify a different p
 
 2. Under the iOSCalendarAssistant project - Package Dependencies, click the + sign, then add `https://github.com/meta-llama/llama-stack-client-swift` at the top right and 0.1.0 in the Dependency Rule, then click Add Package.
 
-3. Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro started in Prerequisite:
+3. (Optional) Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro  in Build and Run Own Llama Stack Distro:
 
 ```
-private let agent = RemoteAgents(url: URL(string: "http://127.0.0.1:5000")!)
+private let agent = RemoteAgents(url: URL(string: "https://llama-stack.together.ai")!)
 ```
+
 **Note:** In order for the app to access the remote URL, the app's `Info.plist` needs to have the entry `App Transport Security Settings` with `Allow Arbitrary Loads` set to YES.
 
 Also, to allow the app to add event to the Calendar app, the `Info.plist` needs to have an entry `Privacy - Calendars Usage Description` and when running the app for the first time, you need to accept the Calendar access request.
 
 4. Build the run the app on an iOS simulator or your device. First you may try a simple request:
 
 ```
-Create a calendar event with a meeting title as Llama Stack update for 2-3pm January 27, 2025.
+Create a calendar event with a meeting title as Llama Stack update for 2-3pm February 3, 2025.
 ```
 
 Then, a detailed meeting note:
 ```
-Date: January 20, 2025
+Date: February 4, 2025
 Time: 10:00 AM - 11:00 AM
 Location: Zoom
 Attendees:
@@ -82,84 +85,37 @@ Sarah: Good. Jane, any updates from operations?
 Jane: Yes, logistics are sorted, and we’ve confirmed the warehouse availability. The only pending item is training customer support for the new product.
 Sarah: Let’s coordinate with the training team to expedite that. Anything else?
 Mike: Quick note—can we get feedback on the beta version by Friday?
-Sarah: Yes, let’s make that a priority. Anything else? No? Great. Thanks, everyone. Let’s meet again next week from 4-5pm on January 27, 2025 to review progress.
+Sarah: Yes, let’s make that a priority. Anything else? No? Great. Thanks, everyone. Let’s meet again next week from 4-5pm on February 11, 2025 to review progress.
 ```
 
 You'll see a summary, action items and a Calendar event created, made possible by Llama Stack's custom tool calling API support and Llama 3.1's tool calling capability.
 
 
 # iOSCalendarAssistantWithLocalInf
-This project shows you how to run local inference on-device using ExecuTorch in conjunction with Llama Stack Swift SDK. 
-
-1. git clone `https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant`
-
-2. Double click `ios_calendar_assistant/iOSCalendarAssistantWithLocalInf.xcodeproj` to open it in Xcode.
-
-3. If there are already Frameworks in the General section of the TARGETS, remove them.
 
-4. In Package Dependencies, delete all dependencies there and clean the dependencies cache.
+iOSCalendarAssistantWithLocalInf is a demo app that uses Llama Stack Swift SDK's local inference and agent APIs and ExecuTorch to run local inference on device.
 
-5. In Package Dependencies, click the + sign, then add `https://github.com/meta-llama/llama-stack-client-swift`. Select Branch and input `v0.1.0`. This should resolve the package and add necessary dependencies in your project panel. (This should add a LlamaStackClient in your Frameworks)
-
-6. In the same place, add `https://github.com/pytorch/executorch`. Select Branch and input `latest`. This will add ExecuTorch as your dependencies.
-
-7. In the Frameworks for TARGETS, add all ExecuTorch kernels (including debug ones), but not `executorch` one. For example:  
+1. On a Mac terminal, in your top level directory, run commands:
 ```
-backend_coreml
-backend_mps
-backend_xnnpack
-kernels_custom
-kernels_optimized
-kernels_portable
-kernels_quantized
+git clone https://github.com/meta-llama/llama-stack-apps
+cd llama-stack-apps
+git submodule update --init --recursive
 ```
 
-8. In your project panel, if there is already a xcode project called `LocalInferenceImpl.xcodeproj`, remove it completely.
-
-9. Then git clone `https://github.com/meta-llama/llama-stack/tree/adecb2a2d3bc5b5fb12280c54096706974e58201/llama_stack/providers/impls/ios/inference/LocalInferenceImpl`
-
-10. In the repo, run `git submodule update --init --recursive` to sync the executorch submodules.
+2. Go back to your top level directory, run commands:
 
-11. Install [Cmake](https://cmake.org/) for the executorch build. Additional [guidance](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md#1-install-cmake) to install and link cmake
-
-12. Drag `LocalInferenceImpl.xcodeproj` into your `iOSCalendarAssistantWithLocalInf` project. Import it as a reference
-
-13. In LocalInferenceImpl’s Package Dependencies, change `LlamaStackClient package` version to `v0.1.0` matching iOSCalendarAssistantWithLocalInf’s package version. This is important to resolve Stencil dependencies.
-
-14. Add LocalInferenceImpl.framework into the Framework section for TARGETS.
-
-15. In "Build Settings" > "Other Linker Flags" > For both Debug and Release > "Any iOS Simulator SDK", add:
 ```
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_optimized-simulator-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_custom-simulator-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_quantized-simulator-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_xnnpack-simulator-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_coreml-simulator-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_mps-simulator-release.a
+git clone https://github.com/meta-llama/llama-stack
+cd llama-stack
+git submodule update --init --recursive
 ```
 
-16. For "Any iOS SDK", add:
-```
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_optimized-ios-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_custom-ios-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_quantized-ios-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_xnnpack-ios-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_coreml-ios-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_mps-ios-release.a
-```
+3. Double click `llama-stack-apps/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf.xcodeproj` to open it in Xcode.
+
+4. In the `iOSCalendarAssistantWithLocalInf` project panel, remove `LocalInferenceImpl.xcodeproj` and drag and drop `LocalInferenceImpl.xcodeproj` from `llama-stack/llama_stack/providers/inline/ios/inference` into the `iOSCalendarAssistantWithLocalInf` project.
+
+5. Prepare a Llama model file named `llama3_2_spinquant_oct23.pte` by following the steps [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#step-2-prepare-model) - you'll also download the `tokenizer.model` file there. Then drag and drop both files to the project `iOSCalendarAssistantWithLocalInf`.
 
-17. Lastly prepare the model: prepare a .pte file following the executorch [docs](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#step-2-prepare-model). Bundle the .pte and tokenizer.model file into Build Phases -> Copy Bundle Resources
+6. Build and run the app on an iOS simulator or a real device.
 
-18. Build the app for simulator or real device
+**Note** If you see a build error about cmake not found, you can install cmake by following the instruction [here](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md#1-install-cmake).
diff --git a/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf/ContentView.swift b/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf/ContentView.swift
@@ -39,7 +39,9 @@ struct ContentView: View {
   public init () {
     self.inference = LocalInference(queue: runnerQueue)
     self.localAgents = LocalAgents(inference: self.inference)
-    self.remoteAgents = RemoteAgents(url: URL(string: "http://localhost:5000")!)
+
+    // replace the URL string if you build and run your own Llama Stack distro as shown in https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant#optional-build-and-run-own-llama-stack-distro
+    self.remoteAgents = RemoteAgents(url: URL(string: "https://llama-stack.together.ai")!)
   }
 
   var agents: Agents {
@@ -130,39 +132,39 @@ struct ContentView: View {
   func summarizeConversation(prompt: String) async {
     do {
       let request = Components.Schemas.CreateAgentTurnRequest(
-        agent_id: self.agentId,
         messages: [
           .UserMessage(Components.Schemas.UserMessage(
             content: .case1("Summarize the following conversation in 1-2 sentences:\n\n \(prompt)"),
             role: .user
           ))
         ],
-        session_id: self.agenticSystemSessionId,
         stream: true
       )
 
-      for try await chunk in try await self.agents.createTurn(request: request) {
+      for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) {
         let payload = chunk.event.payload
         switch (payload) {
-        case .AgentTurnResponseStepStartPayload(_):
+        case .step_start(_):
           break
-        case .AgentTurnResponseStepProgressPayload(let step):
-          if (step.model_response_text_delta != nil) {
+        case .step_progress(let step):
+          if (step.delta != nil) {
             DispatchQueue.main.async {
               withAnimation {
                 var message = messages.removeLast()
-                message.text += step.model_response_text_delta!
+                if case .text(let delta) = step.delta {
+                  message.text += "\(delta.text)"
+                }
                 message.tokenCount += 2
                 message.dateUpdated = Date()
                 messages.append(message)
               }
             }
           }
-        case .AgentTurnResponseStepCompletePayload(_):
+        case .step_complete(_):
           break
-        case .AgentTurnResponseTurnStartPayload(_):
+        case .turn_start(_):
           break
-        case .AgentTurnResponseTurnCompletePayload(_):
+        case .turn_complete(_):
           break
 
         }
@@ -175,103 +177,100 @@ struct ContentView: View {
 
   func actionItems(prompt: String) async throws {
     let request = Components.Schemas.CreateAgentTurnRequest(
-      agent_id: self.agentId,
       messages: [
         .UserMessage(Components.Schemas.UserMessage(
           content: .case1("List out any action items based on this text:\n\n \(prompt)"),
           role: .user
         ))
       ],
-      session_id: self.agenticSystemSessionId,
       stream: true
     )
 
-    for try await chunk in try await self.agents.createTurn(request: request) {
+    for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) {
       let payload = chunk.event.payload
       switch (payload) {
-      case .AgentTurnResponseStepStartPayload(_):
+      case .step_start(_):
         break
-      case .AgentTurnResponseStepProgressPayload(let step):
-        if (step.model_response_text_delta != nil) {
-          DispatchQueue.main.async {
-            withAnimation {
-              var message = messages.removeLast()
-              message.text += step.model_response_text_delta!
-              message.tokenCount += 2
-              message.dateUpdated = Date()
-              messages.append(message)
-
-              self.actionItems += step.model_response_text_delta!
+      case .step_progress(let step):
+        DispatchQueue.main.async(execute: DispatchWorkItem {
+          withAnimation {
+            var message = messages.removeLast()
+
+            if case .text(let delta) = step.delta {
+              message.text += "\(delta.text)"
+              self.actionItems += "\(delta.text)"
             }
+            message.tokenCount += 2
+            message.dateUpdated = Date()
+            messages.append(message)
           }
-        }
-      case .AgentTurnResponseStepCompletePayload(_):
+        })
+      case .step_complete(_):
         break
-      case .AgentTurnResponseTurnStartPayload(_):
+      case .turn_start(_):
         break
-      case .AgentTurnResponseTurnCompletePayload(_):
+      case .turn_complete(_):
         break
       }
     }
   }
 
   func callTools(prompt: String) async throws {
     let request = Components.Schemas.CreateAgentTurnRequest(
-      agent_id: self.agentId,
       messages: [
         .UserMessage(Components.Schemas.UserMessage(
           content: .case1("Call functions as needed to handle any actions in the following text:\n\n" + prompt),
           role: .user
         ))
       ],
-      session_id: self.agenticSystemSessionId,
       stream: true
     )
 
-    for try await chunk in try await self.agents.createTurn(request: request) {
+    for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) {
       let payload = chunk.event.payload
       switch (payload) {
-      case .AgentTurnResponseStepStartPayload(_):
+      case .step_start(_):
         break
-      case .AgentTurnResponseStepProgressPayload(let step):
-        if (step.tool_call_delta != nil) {
-          switch (step.tool_call_delta!.content) {
-          case .case1(_):
-            break
-          case .ToolCall(let call):
-            switch (call.tool_name) {
-            case .BuiltinTool(_):
-              break
-            case .case2(let toolName):
-              if (toolName == "create_event") {
-                var args: [String : String] = [:]
-                for (arg_name, arg) in call.arguments.additionalProperties {
-                  switch (arg) {
-                  case .case1(let s): // type string
-                    args[arg_name] = s
-                  case .case2(_), .case3(_), .case4(_), .case5(_), .case6(_):
-                    break
+      case .step_progress(let step):
+          switch (step.delta) {
+          case .tool_call(let call):
+            if call.parse_status == .succeeded {
+              switch (call.tool_call) {
+              case .ToolCall(let toolCall):
+                  var args: [String : String] = [:]
+                  for (arg_name, arg) in toolCall.arguments.additionalProperties {
+                    switch (arg) {
+                    case .case1(let s):
+                      args[arg_name] = s
+                    case .case2(_), .case3(_), .case4(_), .case5(_), .case6(_):
+                      break
+                    }
                   }
-                }
 
-                let formatter = DateFormatter()
-                formatter.dateFormat = "yyyy-MM-dd HH:mm"
-                formatter.timeZone = TimeZone.current
-                formatter.locale = Locale.current
-                self.triggerAddEventToCalendar(
-                  title: args["event_name"]!,
-                  startDate: formatter.date(from: args["start"]!) ?? Date(),
-                  endDate: formatter.date(from: args["end"]!) ?? Date()
-                )
+                  let formatter = DateFormatter()
+                  formatter.dateFormat = "yyyy-MM-dd HH:mm"
+                  formatter.timeZone = TimeZone.current
+                  formatter.locale = Locale.current
+                  self.triggerAddEventToCalendar(
+                    title: args["event_name"]!,
+                    startDate: formatter.date(from: args["start"]!) ?? Date(),
+                    endDate: formatter.date(from: args["end"]!) ?? Date()
+                  )
+              case .case1(_):
+                break
               }
             }
+          case .text(let text):
+            break
+          case .image(_):
+            break
           }
-        }
-      case .AgentTurnResponseStepCompletePayload(_):
         break
-      case .AgentTurnResponseTurnStartPayload(_):
+      case .step_complete(_):
+        break
+      case .turn_start(_):
         break
-      case .AgentTurnResponseTurnCompletePayload(_):
+      case .turn_complete(_):
         break
       }
     }
@@ -308,22 +307,17 @@ struct ContentView: View {
           let createSystemResponse = try await self.agents.create(
             request: Components.Schemas.CreateAgentRequest(
               agent_config: Components.Schemas.AgentConfig(
+                client_tools: [ CustomTools.getCreateEventToolForAgent() ],
                 enable_session_persistence: false,
                 instructions: "You are a helpful assistant",
                 max_infer_iters: 1,
-                model: "Llama3.1-8B-Instruct",
-                tools: [
-                 Components.Schemas.AgentConfig.toolsPayloadPayload.FunctionCallToolDefinition(
-                   CustomTools.getCreateEventTool()
-                 )
-                ]
+                model: "meta-llama/Llama-3.1-8B-Instruct"
               )
             )
           )
           self.agentId = createSystemResponse.agent_id
 
-          let createSessionResponse = try await self.agents.createSession(
-            request: Components.Schemas.CreateAgentSessionRequest(agent_id: self.agentId, session_name: "llama-assistant")
+          let createSessionResponse = try await self.agents.createSession(agent_id: self.agentId, request: Components.Schemas.CreateAgentSessionRequest(session_name: "llama-assistant")
           )
           self.agenticSystemSessionId = createSessionResponse.session_id