diff --git a/examples/ios_calendar_assistant/README.md b/examples/ios_calendar_assistant/README.md
index fd0043e7f..f175f36a6 100644
--- a/examples/ios_calendar_assistant/README.md
+++ b/examples/ios_calendar_assistant/README.md
@@ -1,16 +1,18 @@
 # iOSCalendarAssistant
 
-iOSCalendarAssistant is a demo app ([video](https://drive.google.com/file/d/1xjdYVm3zDnlxZGi40X_D4IgvmASfG5QZ/view?usp=sharing)) that takes a meeting transcript, summarizes it, extracts action items, and calls tools to book any followup meetings.
+iOSCalendarAssistant is a demo app ([video](https://drive.google.com/file/d/1xjdYVm3zDnlxZGi40X_D4IgvmASfG5QZ/view?usp=sharing)) that uses Llama Stack Swift SDK's remote inference and agent APIs to take a meeting transcript, summarizes it, extracts action items, and calls tools to book any followup meetings.
 
 You can also test the create calendar event with a direct ask instead of a detailed meeting note.
 
-We also have a demo project for running on-device inference. Checkout the instructions in the section below.
+## Installation
 
-# Installation
+We also have a demo project for running on-device inference. Checkout the instructions in the section `iOSCalendarAssistantWithLocalInf` below.
 
 We recommend you try the [iOS Quick Demo](../ios_quick_demo) first to confirm the prerequisite and installation - both demos have the same prerequisite and the first two installation steps.
 
-## Prerequisite
+The quickest way to try out the demo for remote inference is using Together.ai's Llama Stack distro at https://llama-stack.together.ai - you can skip the next section and go to the Build and Run the iOS demo section directly.
+
+## (Optional) Build and Run Own Llama Stack Distro
 
 You need to set up a remote Llama Stack distributions to run this demo. Assuming you have a [Fireworks](https://fireworks.ai/account/api-keys) or [Together](https://api.together.ai/) API key, which you can get easily by clicking the link above:
 
@@ -41,11 +43,12 @@ The default port is 5000 for `llama stack run` and you can specify a different p
 
 2. Under the iOSCalendarAssistant project - Package Dependencies, click the + sign, then add `https://github.com/meta-llama/llama-stack-client-swift` at the top right and 0.1.0 in the Dependency Rule, then click Add Package.
 
-3. Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro started in Prerequisite:
+3. (Optional) Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro  in Build and Run Own Llama Stack Distro:
 
 ```
-private let agent = RemoteAgents(url: URL(string: "http://127.0.0.1:5000")!)
+private let agent = RemoteAgents(url: URL(string: "https://llama-stack.together.ai")!)
 ```
+
 **Note:** In order for the app to access the remote URL, the app's `Info.plist` needs to have the entry `App Transport Security Settings` with `Allow Arbitrary Loads` set to YES.
 
 Also, to allow the app to add event to the Calendar app, the `Info.plist` needs to have an entry `Privacy - Calendars Usage Description` and when running the app for the first time, you need to accept the Calendar access request.
@@ -53,12 +56,12 @@ Also, to allow the app to add event to the Calendar app, the `Info.plist` needs
 4. Build the run the app on an iOS simulator or your device. First you may try a simple request:
 
 ```
-Create a calendar event with a meeting title as Llama Stack update for 2-3pm January 27, 2025.
+Create a calendar event with a meeting title as Llama Stack update for 2-3pm February 3, 2025.
 ```
 
 Then, a detailed meeting note:
 ```
-Date: January 20, 2025
+Date: February 4, 2025
 Time: 10:00 AM - 11:00 AM
 Location: Zoom
 Attendees:
@@ -82,84 +85,37 @@ Sarah: Good. Jane, any updates from operations?
 Jane: Yes, logistics are sorted, and we’ve confirmed the warehouse availability. The only pending item is training customer support for the new product.
 Sarah: Let’s coordinate with the training team to expedite that. Anything else?
 Mike: Quick note—can we get feedback on the beta version by Friday?
-Sarah: Yes, let’s make that a priority. Anything else? No? Great. Thanks, everyone. Let’s meet again next week from 4-5pm on January 27, 2025 to review progress.
+Sarah: Yes, let’s make that a priority. Anything else? No? Great. Thanks, everyone. Let’s meet again next week from 4-5pm on February 11, 2025 to review progress.
 ```
 
 You'll see a summary, action items and a Calendar event created, made possible by Llama Stack's custom tool calling API support and Llama 3.1's tool calling capability.
 
 
 # iOSCalendarAssistantWithLocalInf
-This project shows you how to run local inference on-device using ExecuTorch in conjunction with Llama Stack Swift SDK. 
-
-1. git clone `https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant`
-
-2. Double click `ios_calendar_assistant/iOSCalendarAssistantWithLocalInf.xcodeproj` to open it in Xcode.
-
-3. If there are already Frameworks in the General section of the TARGETS, remove them.
 
-4. In Package Dependencies, delete all dependencies there and clean the dependencies cache.
+iOSCalendarAssistantWithLocalInf is a demo app that uses Llama Stack Swift SDK's local inference and agent APIs and ExecuTorch to run local inference on device.
 
-5. In Package Dependencies, click the + sign, then add `https://github.com/meta-llama/llama-stack-client-swift`. Select Branch and input `v0.1.0`. This should resolve the package and add necessary dependencies in your project panel. (This should add a LlamaStackClient in your Frameworks)
-
-6. In the same place, add `https://github.com/pytorch/executorch`. Select Branch and input `latest`. This will add ExecuTorch as your dependencies.
-
-7. In the Frameworks for TARGETS, add all ExecuTorch kernels (including debug ones), but not `executorch` one. For example:  
+1. On a Mac terminal, in your top level directory, run commands:
 ```
-backend_coreml
-backend_mps
-backend_xnnpack
-kernels_custom
-kernels_optimized
-kernels_portable
-kernels_quantized
+git clone https://github.com/meta-llama/llama-stack-apps
+cd llama-stack-apps
+git submodule update --init --recursive
 ```
 
-8. In your project panel, if there is already a xcode project called `LocalInferenceImpl.xcodeproj`, remove it completely.
-
-9. Then git clone `https://github.com/meta-llama/llama-stack/tree/adecb2a2d3bc5b5fb12280c54096706974e58201/llama_stack/providers/impls/ios/inference/LocalInferenceImpl`
-
-10. In the repo, run `git submodule update --init --recursive` to sync the executorch submodules.
+2. Go back to your top level directory, run commands:
 
-11. Install [Cmake](https://cmake.org/) for the executorch build. Additional [guidance](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md#1-install-cmake) to install and link cmake
-
-12. Drag `LocalInferenceImpl.xcodeproj` into your `iOSCalendarAssistantWithLocalInf` project. Import it as a reference
-
-13. In LocalInferenceImpl’s Package Dependencies, change `LlamaStackClient package` version to `v0.1.0` matching iOSCalendarAssistantWithLocalInf’s package version. This is important to resolve Stencil dependencies.
-
-14. Add LocalInferenceImpl.framework into the Framework section for TARGETS.
-
-15. In "Build Settings" > "Other Linker Flags" > For both Debug and Release > "Any iOS Simulator SDK", add:
 ```
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_optimized-simulator-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_custom-simulator-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_quantized-simulator-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_xnnpack-simulator-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_coreml-simulator-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_mps-simulator-release.a
+git clone https://github.com/meta-llama/llama-stack
+cd llama-stack
+git submodule update --init --recursive
 ```
 
-16. For "Any iOS SDK", add:
-```
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_optimized-ios-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_custom-ios-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libkernels_quantized-ios-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_xnnpack-ios-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_coreml-ios-release.a
--force_load
-$(BUILT_PRODUCTS_DIR)/libbackend_mps-ios-release.a
-```
+3. Double click `llama-stack-apps/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf.xcodeproj` to open it in Xcode.
+
+4. In the `iOSCalendarAssistantWithLocalInf` project panel, remove `LocalInferenceImpl.xcodeproj` and drag and drop `LocalInferenceImpl.xcodeproj` from `llama-stack/llama_stack/providers/inline/ios/inference` into the `iOSCalendarAssistantWithLocalInf` project.
+
+5. Prepare a Llama model file named `llama3_2_spinquant_oct23.pte` by following the steps [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#step-2-prepare-model) - you'll also download the `tokenizer.model` file there. Then drag and drop both files to the project `iOSCalendarAssistantWithLocalInf`.
 
-17. Lastly prepare the model: prepare a .pte file following the executorch [docs](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#step-2-prepare-model). Bundle the .pte and tokenizer.model file into Build Phases -> Copy Bundle Resources
+6. Build and run the app on an iOS simulator or a real device.
 
-18. Build the app for simulator or real device
+**Note** If you see a build error about cmake not found, you can install cmake by following the instruction [here](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md#1-install-cmake).
diff --git a/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf/ContentView.swift b/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf/ContentView.swift
index de525e714..1a8c6b5e0 100644
--- a/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf/ContentView.swift
+++ b/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf/ContentView.swift
@@ -39,7 +39,9 @@ struct ContentView: View {
   public init () {
     self.inference = LocalInference(queue: runnerQueue)
     self.localAgents = LocalAgents(inference: self.inference)
-    self.remoteAgents = RemoteAgents(url: URL(string: "http://localhost:5000")!)
+    
+    // replace the URL string if you build and run your own Llama Stack distro as shown in https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant#optional-build-and-run-own-llama-stack-distro
+    self.remoteAgents = RemoteAgents(url: URL(string: "https://llama-stack.together.ai")!)
   }
 
   var agents: Agents {
@@ -130,39 +132,39 @@ struct ContentView: View {
   func summarizeConversation(prompt: String) async {
     do {
       let request = Components.Schemas.CreateAgentTurnRequest(
-        agent_id: self.agentId,
         messages: [
           .UserMessage(Components.Schemas.UserMessage(
             content: .case1("Summarize the following conversation in 1-2 sentences:\n\n \(prompt)"),
             role: .user
           ))
         ],
-        session_id: self.agenticSystemSessionId,
         stream: true
       )
 
-      for try await chunk in try await self.agents.createTurn(request: request) {
+      for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) {
         let payload = chunk.event.payload
         switch (payload) {
-        case .AgentTurnResponseStepStartPayload(_):
+        case .step_start(_):
           break
-        case .AgentTurnResponseStepProgressPayload(let step):
-          if (step.model_response_text_delta != nil) {
+        case .step_progress(let step):
+          if (step.delta != nil) {
             DispatchQueue.main.async {
               withAnimation {
                 var message = messages.removeLast()
-                message.text += step.model_response_text_delta!
+                if case .text(let delta) = step.delta {
+                  message.text += "\(delta.text)"
+                }
                 message.tokenCount += 2
                 message.dateUpdated = Date()
                 messages.append(message)
               }
             }
           }
-        case .AgentTurnResponseStepCompletePayload(_):
+        case .step_complete(_):
           break
-        case .AgentTurnResponseTurnStartPayload(_):
+        case .turn_start(_):
           break
-        case .AgentTurnResponseTurnCompletePayload(_):
+        case .turn_complete(_):
           break
 
         }
@@ -175,41 +177,39 @@ struct ContentView: View {
 
   func actionItems(prompt: String) async throws {
     let request = Components.Schemas.CreateAgentTurnRequest(
-      agent_id: self.agentId,
       messages: [
         .UserMessage(Components.Schemas.UserMessage(
           content: .case1("List out any action items based on this text:\n\n \(prompt)"),
           role: .user
         ))
       ],
-      session_id: self.agenticSystemSessionId,
       stream: true
     )
 
-    for try await chunk in try await self.agents.createTurn(request: request) {
+    for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) {
       let payload = chunk.event.payload
       switch (payload) {
-      case .AgentTurnResponseStepStartPayload(_):
+      case .step_start(_):
         break
-      case .AgentTurnResponseStepProgressPayload(let step):
-        if (step.model_response_text_delta != nil) {
-          DispatchQueue.main.async {
-            withAnimation {
-              var message = messages.removeLast()
-              message.text += step.model_response_text_delta!
-              message.tokenCount += 2
-              message.dateUpdated = Date()
-              messages.append(message)
-
-              self.actionItems += step.model_response_text_delta!
+      case .step_progress(let step):
+        DispatchQueue.main.async(execute: DispatchWorkItem {
+          withAnimation {
+            var message = messages.removeLast()
+            
+            if case .text(let delta) = step.delta {
+              message.text += "\(delta.text)"
+              self.actionItems += "\(delta.text)"
             }
+            message.tokenCount += 2
+            message.dateUpdated = Date()
+            messages.append(message)
           }
-        }
-      case .AgentTurnResponseStepCompletePayload(_):
+        })
+      case .step_complete(_):
         break
-      case .AgentTurnResponseTurnStartPayload(_):
+      case .turn_start(_):
         break
-      case .AgentTurnResponseTurnCompletePayload(_):
+      case .turn_complete(_):
         break
       }
     }
@@ -217,61 +217,60 @@ struct ContentView: View {
 
   func callTools(prompt: String) async throws {
     let request = Components.Schemas.CreateAgentTurnRequest(
-      agent_id: self.agentId,
       messages: [
         .UserMessage(Components.Schemas.UserMessage(
           content: .case1("Call functions as needed to handle any actions in the following text:\n\n" + prompt),
           role: .user
         ))
       ],
-      session_id: self.agenticSystemSessionId,
       stream: true
     )
 
-    for try await chunk in try await self.agents.createTurn(request: request) {
+    for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) {
       let payload = chunk.event.payload
       switch (payload) {
-      case .AgentTurnResponseStepStartPayload(_):
+      case .step_start(_):
         break
-      case .AgentTurnResponseStepProgressPayload(let step):
-        if (step.tool_call_delta != nil) {
-          switch (step.tool_call_delta!.content) {
-          case .case1(_):
-            break
-          case .ToolCall(let call):
-            switch (call.tool_name) {
-            case .BuiltinTool(_):
-              break
-            case .case2(let toolName):
-              if (toolName == "create_event") {
-                var args: [String : String] = [:]
-                for (arg_name, arg) in call.arguments.additionalProperties {
-                  switch (arg) {
-                  case .case1(let s): // type string
-                    args[arg_name] = s
-                  case .case2(_), .case3(_), .case4(_), .case5(_), .case6(_):
-                    break
+      case .step_progress(let step):
+          switch (step.delta) {
+          case .tool_call(let call):
+            if call.parse_status == .succeeded {
+              switch (call.tool_call) {
+              case .ToolCall(let toolCall):
+                  var args: [String : String] = [:]
+                  for (arg_name, arg) in toolCall.arguments.additionalProperties {
+                    switch (arg) {
+                    case .case1(let s):
+                      args[arg_name] = s
+                    case .case2(_), .case3(_), .case4(_), .case5(_), .case6(_):
+                      break
+                    }
                   }
-                }
 
-                let formatter = DateFormatter()
-                formatter.dateFormat = "yyyy-MM-dd HH:mm"
-                formatter.timeZone = TimeZone.current
-                formatter.locale = Locale.current
-                self.triggerAddEventToCalendar(
-                  title: args["event_name"]!,
-                  startDate: formatter.date(from: args["start"]!) ?? Date(),
-                  endDate: formatter.date(from: args["end"]!) ?? Date()
-                )
+                  let formatter = DateFormatter()
+                  formatter.dateFormat = "yyyy-MM-dd HH:mm"
+                  formatter.timeZone = TimeZone.current
+                  formatter.locale = Locale.current
+                  self.triggerAddEventToCalendar(
+                    title: args["event_name"]!,
+                    startDate: formatter.date(from: args["start"]!) ?? Date(),
+                    endDate: formatter.date(from: args["end"]!) ?? Date()
+                  )
+              case .case1(_):
+                break
               }
             }
+          case .text(let text):
+            break
+          case .image(_):
+            break
           }
-        }
-      case .AgentTurnResponseStepCompletePayload(_):
         break
-      case .AgentTurnResponseTurnStartPayload(_):
+      case .step_complete(_):
+        break
+      case .turn_start(_):
         break
-      case .AgentTurnResponseTurnCompletePayload(_):
+      case .turn_complete(_):
         break
       }
     }
@@ -308,22 +307,17 @@ struct ContentView: View {
           let createSystemResponse = try await self.agents.create(
             request: Components.Schemas.CreateAgentRequest(
               agent_config: Components.Schemas.AgentConfig(
+                client_tools: [ CustomTools.getCreateEventToolForAgent() ],
                 enable_session_persistence: false,
                 instructions: "You are a helpful assistant",
                 max_infer_iters: 1,
-                model: "Llama3.1-8B-Instruct",
-                tools: [
-                 Components.Schemas.AgentConfig.toolsPayloadPayload.FunctionCallToolDefinition(
-                   CustomTools.getCreateEventTool()
-                 )
-                ]
+                model: "meta-llama/Llama-3.1-8B-Instruct"
               )
             )
           )
           self.agentId = createSystemResponse.agent_id
 
-          let createSessionResponse = try await self.agents.createSession(
-            request: Components.Schemas.CreateAgentSessionRequest(agent_id: self.agentId, session_name: "llama-assistant")
+          let createSessionResponse = try await self.agents.createSession(agent_id: self.agentId, request: Components.Schemas.CreateAgentSessionRequest(session_name: "llama-assistant")
           )
           self.agenticSystemSessionId = createSessionResponse.session_id
 
diff --git a/examples/ios_quick_demo/README.md b/examples/ios_quick_demo/README.md
index 980ab7014..194171b3c 100644
--- a/examples/ios_quick_demo/README.md
+++ b/examples/ios_quick_demo/README.md
@@ -2,9 +2,11 @@
 
 iOSQuickDemo is a demo app ([video](https://drive.google.com/file/d/1HnME3VmsYlyeFgsIOMlxZy5c8S2xP4r4/view?usp=sharing)) that shows how to use the Llama Stack Swift SDK ([repo](https://github.com/meta-llama/llama-stack-client-swift)) and its `ChatCompletionRequest` API with a remote Llama Stack server to perform remote inference with Llama 3.1.
 
-# Installation
+## Installation
 
-## Prerequisite
+The quickest way to try out the demo for remote inference is using Together.ai's Llama Stack distro at https://llama-stack.together.ai - you can skip the next section and go to the Build and Run the iOS demo section directly.
+
+## (Optional) Build and Run Own Llama Stack Distro
 
 You need to set up a remote Llama Stack distributions to run this demo. Assuming you have a [Fireworks](https://fireworks.ai/account/api-keys) or [Together](https://api.together.ai/) API key, which you can get easily by clicking the link above:
 
@@ -38,7 +40,7 @@ The default port is 5000 for `llama stack run` and you can specify a different p
 ![](quick1.png)
 ![](quick2.png)
 
-3. Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro started in Prerequisite:
+3. (Optional) Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro in Build and Run Own Llama Stack Distro:
 
 ```
 let inference = RemoteInference(url: URL(string: "http://127.0.0.1:5000")!)
diff --git a/examples/ios_quick_demo/iOSQuickDemo/iOSQuickDemo/ContentView.swift b/examples/ios_quick_demo/iOSQuickDemo/iOSQuickDemo/ContentView.swift
index e356f1c79..c4b93f075 100644
--- a/examples/ios_quick_demo/iOSQuickDemo/iOSQuickDemo/ContentView.swift
+++ b/examples/ios_quick_demo/iOSQuickDemo/iOSQuickDemo/ContentView.swift
@@ -12,9 +12,9 @@ import LlamaStackClient
 struct ContentView: View {
   @State private var message: String = ""
   @State private var userInput: String = "Best quotes in Godfather"
-  
+
   private let runnerQueue = DispatchQueue(label: "org.llamastack.iosquickdemo")
-  
+
   var body: some View {
     VStack(spacing: 20) {
       Text(message.isEmpty ? "Click Inference to see Llama's answer" : message)
@@ -24,11 +24,11 @@ struct ContentView: View {
           .frame(maxWidth: .infinity)
           .background(Color.gray.opacity(0.2))
           .cornerRadius(8)
-    
+
       VStack(alignment: .leading, spacing: 10) {
         Text("Question")
             .font(.headline)
-        
+
         TextField("Enter your question here", text: $userInput)
             .textFieldStyle(RoundedBorderTextFieldStyle())
             .padding()
@@ -56,17 +56,19 @@ struct ContentView: View {
       message = "Please enter a question before clicking 'Inference'."
       return
     }
-    
+
     message = ""
-    
+
     let workItem = DispatchWorkItem {
       defer {
         DispatchQueue.main.async {
         }
       }
-      
+
       Task {
-        let inference = RemoteInference(url: URL(string: "http://127.0.0.1:5000")!)
+
+        // replace the URL string if you build and run your own Llama Stack distro as shown in https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_quick_demo#optional-build-and-run-own-llama-stack-distro
+        let inference = RemoteInference(url: URL(string: "https://llama-stack.together.ai")!)
 
         do {
           for await chunk in try await inference.chatCompletion(
@@ -108,7 +110,7 @@ struct ContentView: View {
         }
       }
     }
-    
+
     runnerQueue.async(execute: workItem)
   }
 }