Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update for local inference demo for LS 0.1 #163

Merged
merged 11 commits into from
Feb 6, 2025
49 changes: 41 additions & 8 deletions examples/ios_calendar_assistant/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# iOSCalendarAssistant

iOSCalendarAssistant is a demo app ([video](https://drive.google.com/file/d/1xjdYVm3zDnlxZGi40X_D4IgvmASfG5QZ/view?usp=sharing)) that takes a meeting transcript, summarizes it, extracts action items, and calls tools to book any followup meetings.
iOSCalendarAssistant is a demo app ([video](https://drive.google.com/file/d/1xjdYVm3zDnlxZGi40X_D4IgvmASfG5QZ/view?usp=sharing)) that uses Llama Stack Swift SDK's remote inference and agent APIs to take a meeting transcript, summarizes it, extracts action items, and calls tools to book any followup meetings.

You can also test the create calendar event with a direct ask instead of a detailed meeting note.

# Installation
## Installation

We recommend you try the [iOS Quick Demo](../ios_quick_demo) first to confirm the prerequisite and installation - both demos have the same prerequisite and the first two installation steps.

## Prerequisite
The quickest way to try out the demo for remote inference is using Together.ai's Llama Stack distro at https://llama-stack.together.ai - you can skip the next section and go to the Build and Run the iOS demo section directly.

## (Optional) Build and Run Own Llama Stack Distro

You need to set up a remote Llama Stack distributions to run this demo. Assuming you have a [Fireworks](https://fireworks.ai/account/api-keys) or [Together](https://api.together.ai/) API key, which you can get easily by clicking the link above:

Expand Down Expand Up @@ -39,24 +41,25 @@ The default port is 5000 for `llama stack run` and you can specify a different p

2. Under the iOSCalendarAssistant project - Package Dependencies, click the + sign, then add `https://github.com/meta-llama/llama-stack-client-swift` at the top right and 0.1.0 in the Dependency Rule, then click Add Package.

3. Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro started in Prerequisite:
3. (Optional) Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro in Build and Run Own Llama Stack Distro:

```
private let agent = RemoteAgents(url: URL(string: "http://127.0.0.1:5000")!)
private let agent = RemoteAgents(url: URL(string: "https://llama-stack.together.ai")!)
```

**Note:** In order for the app to access the remote URL, the app's `Info.plist` needs to have the entry `App Transport Security Settings` with `Allow Arbitrary Loads` set to YES.

Also, to allow the app to add event to the Calendar app, the `Info.plist` needs to have an entry `Privacy - Calendars Usage Description` and when running the app for the first time, you need to accept the Calendar access request.

4. Build the run the app on an iOS simulator or your device. First you may try a simple request:

```
Create a calendar event with a meeting title as Llama Stack update for 2-3pm January 27, 2025.
Create a calendar event with a meeting title as Llama Stack update for 2-3pm February 3, 2025.
```

Then, a detailed meeting note:
```
Date: January 20, 2025
Date: February 4, 2025
Time: 10:00 AM - 11:00 AM
Location: Zoom
Attendees:
Expand All @@ -80,7 +83,37 @@ Sarah: Good. Jane, any updates from operations?
Jane: Yes, logistics are sorted, and we’ve confirmed the warehouse availability. The only pending item is training customer support for the new product.
Sarah: Let’s coordinate with the training team to expedite that. Anything else?
Mike: Quick note—can we get feedback on the beta version by Friday?
Sarah: Yes, let’s make that a priority. Anything else? No? Great. Thanks, everyone. Let’s meet again next week from 4-5pm on January 27, 2025 to review progress.
Sarah: Yes, let’s make that a priority. Anything else? No? Great. Thanks, everyone. Let’s meet again next week from 4-5pm on February 11, 2025 to review progress.
```

You'll see a summary, action items and a Calendar event created, made possible by Llama Stack's custom tool calling API support and Llama 3.1's tool calling capability.


# iOSCalendarAssistantWithLocalInf

iOSCalendarAssistantWithLocalInf is a demo app that uses Llama Stack Swift SDK's local inference and agent APIs and ExecuTorch to run local inference on device.

1. In your work folder, run commands:
```
git clone https://github.com/meta-llama/llama-stack-apps
cd llama-stack-apps
git submodule update --init --recursive
```

2. Go back to your work folder, run commands:

```
git clone https://github.com/meta-llama/llama-stack
cd llama-stack
git submodule update --init --recursive
```

3. Double click `llama-stack-apps/examples/ios_calendar_assistant/iOSCalendarAssistantWithLocalInf.xcodeproj` to open it in Xcode.

4. In the `iOSCalendarAssistantWithLocalInf` project panel, remove `LocalInferenceImpl.xcodeproj` and drag and drop `LocalInferenceImpl.xcodeproj` from `llama-stack/llama_stack/providers/inline/ios/inference` into the `iOSCalendarAssistantWithLocalInf` project.

5. Prepare a Llama model file named `llama3_2_spinquant_oct23.pte` by following the steps [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#step-2-prepare-model) - you'll also download the `tokenizer.model` file there. Then drag and drop both files to the project `iOSCalendarAssistantWithLocalInf`.

6. Build and run the app on an iOS simulator or a real device.

**Note** If you see a build error about cmake not found, you can install cmake by following the instruction [here](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md#1-install-cmake).
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,9 @@ struct ContentView: View {
public init () {
self.inference = LocalInference(queue: runnerQueue)
self.localAgents = LocalAgents(inference: self.inference)
self.remoteAgents = RemoteAgents(url: URL(string: "http://localhost:5000")!)

// replace the URL string if you build and run your own Llama Stack distro as shown in https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant#optional-build-and-run-own-llama-stack-distro
self.remoteAgents = RemoteAgents(url: URL(string: "https://llama-stack.together.ai")!)
}

var agents: Agents {
Expand Down Expand Up @@ -130,39 +132,39 @@ struct ContentView: View {
func summarizeConversation(prompt: String) async {
do {
let request = Components.Schemas.CreateAgentTurnRequest(
agent_id: self.agentId,
messages: [
.UserMessage(Components.Schemas.UserMessage(
content: .case1("Summarize the following conversation in 1-2 sentences:\n\n \(prompt)"),
role: .user
))
],
session_id: self.agenticSystemSessionId,
stream: true
)

for try await chunk in try await self.agents.createTurn(request: request) {
for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) {
let payload = chunk.event.payload
switch (payload) {
case .AgentTurnResponseStepStartPayload(_):
case .step_start(_):
break
case .AgentTurnResponseStepProgressPayload(let step):
if (step.model_response_text_delta != nil) {
case .step_progress(let step):
if (step.delta != nil) {
DispatchQueue.main.async {
withAnimation {
var message = messages.removeLast()
message.text += step.model_response_text_delta!
if case .text(let delta) = step.delta {
message.text += "\(delta.text)"
}
message.tokenCount += 2
message.dateUpdated = Date()
messages.append(message)
}
}
}
case .AgentTurnResponseStepCompletePayload(_):
case .step_complete(_):
break
case .AgentTurnResponseTurnStartPayload(_):
case .turn_start(_):
break
case .AgentTurnResponseTurnCompletePayload(_):
case .turn_complete(_):
break

}
Expand All @@ -175,103 +177,100 @@ struct ContentView: View {

func actionItems(prompt: String) async throws {
let request = Components.Schemas.CreateAgentTurnRequest(
agent_id: self.agentId,
messages: [
.UserMessage(Components.Schemas.UserMessage(
content: .case1("List out any action items based on this text:\n\n \(prompt)"),
role: .user
))
],
session_id: self.agenticSystemSessionId,
stream: true
)

for try await chunk in try await self.agents.createTurn(request: request) {
for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) {
let payload = chunk.event.payload
switch (payload) {
case .AgentTurnResponseStepStartPayload(_):
case .step_start(_):
break
case .AgentTurnResponseStepProgressPayload(let step):
if (step.model_response_text_delta != nil) {
DispatchQueue.main.async {
withAnimation {
var message = messages.removeLast()
message.text += step.model_response_text_delta!
message.tokenCount += 2
message.dateUpdated = Date()
messages.append(message)

self.actionItems += step.model_response_text_delta!
case .step_progress(let step):
DispatchQueue.main.async(execute: DispatchWorkItem {
withAnimation {
var message = messages.removeLast()

if case .text(let delta) = step.delta {
message.text += "\(delta.text)"
self.actionItems += "\(delta.text)"
}
message.tokenCount += 2
message.dateUpdated = Date()
messages.append(message)
}
}
case .AgentTurnResponseStepCompletePayload(_):
})
case .step_complete(_):
break
case .AgentTurnResponseTurnStartPayload(_):
case .turn_start(_):
break
case .AgentTurnResponseTurnCompletePayload(_):
case .turn_complete(_):
break
}
}
}

func callTools(prompt: String) async throws {
let request = Components.Schemas.CreateAgentTurnRequest(
agent_id: self.agentId,
messages: [
.UserMessage(Components.Schemas.UserMessage(
content: .case1("Call functions as needed to handle any actions in the following text:\n\n" + prompt),
role: .user
))
],
session_id: self.agenticSystemSessionId,
stream: true
)

for try await chunk in try await self.agents.createTurn(request: request) {
for try await chunk in try await self.agents.createTurn(agent_id: self.agentId, session_id: self.agenticSystemSessionId, request: request) {
let payload = chunk.event.payload
switch (payload) {
case .AgentTurnResponseStepStartPayload(_):
case .step_start(_):
break
case .AgentTurnResponseStepProgressPayload(let step):
if (step.tool_call_delta != nil) {
switch (step.tool_call_delta!.content) {
case .case1(_):
break
case .ToolCall(let call):
switch (call.tool_name) {
case .BuiltinTool(_):
break
case .case2(let toolName):
if (toolName == "create_event") {
var args: [String : String] = [:]
for (arg_name, arg) in call.arguments.additionalProperties {
switch (arg) {
case .case1(let s): // type string
args[arg_name] = s
case .case2(_), .case3(_), .case4(_), .case5(_), .case6(_):
break
case .step_progress(let step):
switch (step.delta) {
case .tool_call(let call):
if call.parse_status == .succeeded {
switch (call.tool_call) {
case .ToolCall(let toolCall):
var args: [String : String] = [:]
for (arg_name, arg) in toolCall.arguments.additionalProperties {
switch (arg) {
case .case1(let s):
args[arg_name] = s
case .case2(_), .case3(_), .case4(_), .case5(_), .case6(_):
break
}
}
}

let formatter = DateFormatter()
formatter.dateFormat = "yyyy-MM-dd HH:mm"
formatter.timeZone = TimeZone.current
formatter.locale = Locale.current
self.triggerAddEventToCalendar(
title: args["event_name"]!,
startDate: formatter.date(from: args["start"]!) ?? Date(),
endDate: formatter.date(from: args["end"]!) ?? Date()
)
let formatter = DateFormatter()
formatter.dateFormat = "yyyy-MM-dd HH:mm"
formatter.timeZone = TimeZone.current
formatter.locale = Locale.current
self.triggerAddEventToCalendar(
title: args["event_name"]!,
startDate: formatter.date(from: args["start"]!) ?? Date(),
endDate: formatter.date(from: args["end"]!) ?? Date()
)
case .case1(_):
break
}
}
case .text(let text):
break
case .image(_):
break
}
}
case .AgentTurnResponseStepCompletePayload(_):
break
case .AgentTurnResponseTurnStartPayload(_):
case .step_complete(_):
break
case .turn_start(_):
break
case .AgentTurnResponseTurnCompletePayload(_):
case .turn_complete(_):
break
}
}
Expand Down Expand Up @@ -308,22 +307,17 @@ struct ContentView: View {
let createSystemResponse = try await self.agents.create(
request: Components.Schemas.CreateAgentRequest(
agent_config: Components.Schemas.AgentConfig(
client_tools: [ CustomTools.getCreateEventToolForAgent() ],
enable_session_persistence: false,
instructions: "You are a helpful assistant",
max_infer_iters: 1,
model: "Llama3.1-8B-Instruct",
tools: [
Components.Schemas.AgentConfig.toolsPayloadPayload.FunctionCallToolDefinition(
CustomTools.getCreateEventTool()
)
]
model: "meta-llama/Llama-3.1-8B-Instruct"
)
)
)
self.agentId = createSystemResponse.agent_id

let createSessionResponse = try await self.agents.createSession(
request: Components.Schemas.CreateAgentSessionRequest(agent_id: self.agentId, session_name: "llama-assistant")
let createSessionResponse = try await self.agents.createSession(agent_id: self.agentId, request: Components.Schemas.CreateAgentSessionRequest(session_name: "llama-assistant")
)
self.agenticSystemSessionId = createSessionResponse.session_id

Expand Down
8 changes: 5 additions & 3 deletions examples/ios_quick_demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@

iOSQuickDemo is a demo app ([video](https://drive.google.com/file/d/1HnME3VmsYlyeFgsIOMlxZy5c8S2xP4r4/view?usp=sharing)) that shows how to use the Llama Stack Swift SDK ([repo](https://github.com/meta-llama/llama-stack-client-swift)) and its `ChatCompletionRequest` API with a remote Llama Stack server to perform remote inference with Llama 3.1.

# Installation
## Installation

## Prerequisite
The quickest way to try out the demo for remote inference is using Together.ai's Llama Stack distro at https://llama-stack.together.ai - you can skip the next section and go to the Build and Run the iOS demo section directly.

## (Optional) Build and Run Own Llama Stack Distro

You need to set up a remote Llama Stack distributions to run this demo. Assuming you have a [Fireworks](https://fireworks.ai/account/api-keys) or [Together](https://api.together.ai/) API key, which you can get easily by clicking the link above:

Expand Down Expand Up @@ -38,7 +40,7 @@ The default port is 5000 for `llama stack run` and you can specify a different p
![](quick1.png)
![](quick2.png)

3. Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro started in Prerequisite:
3. (Optional) Replace the `RemoteInference` url string in `ContentView.swift` below with the host IP and port of the remote Llama Stack distro in Build and Run Own Llama Stack Distro:

```
let inference = RemoteInference(url: URL(string: "http://127.0.0.1:5000")!)
Expand Down
Loading