Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse Ollama cached image when available #782

Merged
merged 5 commits into from
Feb 11, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 33 additions & 8 deletions ramalama/ollama.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import os
import urllib.request
import json
from ramalama.common import run_cmd, verify_checksum, download_file
from ramalama.common import run_cmd, verify_checksum, download_file, available
from ramalama.model import Model, rm_until_substring


Expand Down Expand Up @@ -30,15 +30,18 @@ def pull_blob(repos, layer_digest, accept, registry_head, models, model_name, mo
layer_blob_path = os.path.join(repos, "blobs", layer_digest)
url = f"{registry_head}/blobs/{layer_digest}"
headers = {"Accept": accept}
download_file(url, layer_blob_path, headers=headers, show_progress=True)

# Verify checksum after downloading the blob
if not verify_checksum(layer_blob_path):
print(f"Checksum mismatch for blob {layer_blob_path}, retrying download...")
os.remove(layer_blob_path)
local_blob = in_existing_cache(model_name, model_tag)
if local_blob is not None:
Copy link
Collaborator

@ericcurtin ericcurtin Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the:

if local_blob is not None:

case, we seem to call ln -sf twice, should we call it once?

In the rm case in an Ollama-stored object, I propose we just remove the symlink part:

local_blob

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think we should only remove the ramalama content on ramalama rm. But on ollama rm ramalama can end up in a bad state as well, which I think we should take care of also. ramalama ls should either ignore hanging symlinks or remove them.

BTW I have seen the database get futz up by breaking removing of images, and we don't handle it cracefully. I had to find the symlink and rm -f it .

Copy link
Collaborator

@ericcurtin ericcurtin Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking through this, I understand the double symlink now, once for the sha256 and a second time for the named version, I think the rm should handle this without touching the Ollama store, but we should test to be sure.

run_cmd(["ln", "-sf", local_blob, layer_blob_path])
else:
download_file(url, layer_blob_path, headers=headers, show_progress=True)
# Verify checksum after downloading the blob
if not verify_checksum(layer_blob_path):
raise ValueError(f"Checksum verification failed for blob {layer_blob_path}")
print(f"Checksum mismatch for blob {layer_blob_path}, retrying download...")
os.remove(layer_blob_path)
download_file(url, layer_blob_path, headers=headers, show_progress=True)
if not verify_checksum(layer_blob_path):
raise ValueError(f"Checksum verification failed for blob {layer_blob_path}")

os.makedirs(models, exist_ok=True)
relative_target_path = os.path.relpath(layer_blob_path, start=os.path.dirname(model_path))
Expand All @@ -58,6 +61,28 @@ def init_pull(repos, accept, registry_head, model_name, model_tag, models, model
return model_path


def in_existing_cache(model_name, model_tag):
if not available("ollama"):
return None
default_ollama_caches=[
os.path.join(os.environ['HOME'], '.ollama/models'),
'/usr/share/ollama/.ollama/models',
f'C:\\Users\\{os.getlogin()}\\.ollama\\models'
]

for cache_dir in default_ollama_caches:
manifest_path = os.path.join(cache_dir, 'manifests', 'registry.ollama.ai', model_name, model_tag)
if os.path.exists(manifest_path):
with open(manifest_path, 'r') as file:
manifest_data = json.load(file)
for layer in manifest_data["layers"]:
if layer["mediaType"] == "application/vnd.ollama.image.model":
layer_digest = layer["digest"]
ollama_digest_path = os.path.join(cache_dir, 'blobs', layer_digest)
if os.path.exists(str(ollama_digest_path).replace(':','-')):
Copy link
Collaborator

@ericcurtin ericcurtin Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever make breaking changes to the model store which I suspect we will, we should change ':' to '-'on disk. ':' doesn't work on Windows filesystems for one. But that's should be a hidden detail to the user, they should use ':'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make the change now, when we have not released 1.0, For now we can handle both : and -.

return str(ollama_digest_path).replace(':','-')
return None

class Ollama(Model):
def __init__(self, model):
model = rm_until_substring(model, "ollama.com/library/")
Expand Down
Loading