Skip to content

Tweaks to improve cleanliness of code; removes warnings that appear in Eclipse #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .classpath
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<?xml version="1.0" encoding="UTF-8"?>
<classpath>
<classpathentry kind="src" path="src"/>
<classpathentry exported="true" kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER">
<attributes>
<attribute name="module" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="output" path="bin"/>
</classpath>
17 changes: 17 additions & 0 deletions .project
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not include any project files, having the whole project as a single file at the root is deliberate.

Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
<name>llama3.java</name>
<comment></comment>
<projects>
</projects>
<buildSpec>
<buildCommand>
<name>org.eclipse.jdt.core.javabuilder</name>
<arguments>
</arguments>
</buildCommand>
</buildSpec>
<natures>
<nature>org.eclipse.jdt.core.javanature</nature>
</natures>
</projectDescription>
1 change: 1 addition & 0 deletions bin/.gitignore
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove, there's already a .gitignore at the root

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/com/
92 changes: 49 additions & 43 deletions Llama3.java → src/com/llama4j/Llama3.java
100755 → 100644
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not move the Llama3.java file outside the root. I understand that this is the way IDEs work, but the project is meant to be cloned and executed with the minimum friction possible.

Original file line number Diff line number Diff line change
Expand Up @@ -85,44 +85,45 @@ static void runInteractive(Llama model, Sampler sampler, Options options) {
conversationTokens.addAll(chatFormat.encodeMessage(new ChatFormat.Message(ChatFormat.Role.SYSTEM, options.systemPrompt())));
}
int startPosition = 0;
Scanner in = new Scanner(System.in);
while (true) {
System.out.print("> ");
System.out.flush();
String userText = in.nextLine();
if (List.of("quit", "exit").contains(userText)) {
break;
}
if (state == null) {
state = model.createNewState();
}
conversationTokens.addAll(chatFormat.encodeMessage(new ChatFormat.Message(ChatFormat.Role.USER, userText)));
conversationTokens.addAll(chatFormat.encodeHeader(new ChatFormat.Message(ChatFormat.Role.ASSISTANT, "")));
Set<Integer> stopTokens = chatFormat.getStopTokens();
List<Integer> responseTokens = Llama.generateTokens(model, state, startPosition, conversationTokens.subList(startPosition, conversationTokens.size()), stopTokens, options.maxTokens(), sampler, options.echo(), token -> {
if (options.stream()) {
if (!model.tokenizer().isSpecialToken(token)) {
System.out.print(model.tokenizer().decode(List.of(token)));
}
}
});
// Include stop token in the prompt history, but not in the response displayed to the user.
conversationTokens.addAll(responseTokens);
startPosition = conversationTokens.size();
Integer stopToken = null;
if (!responseTokens.isEmpty() && stopTokens.contains(responseTokens.getLast())) {
stopToken = responseTokens.getLast();
responseTokens.removeLast();
}
if (!options.stream()) {
String responseText = model.tokenizer().decode(responseTokens);
System.out.println(responseText);
}
if (stopToken == null) {
System.err.println("Ran out of context length...");
break;
}
}
try (Scanner in = new Scanner(System.in)) {
while (true) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation here is too high, please follow the same as the rest of the file (4 spaces).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit long in this snippet, but I'm looking at the code right now and the indentation seems to be fine. It was formatted this way by my IDE automatically. You may need to tweak this on your end to get whatever specific formatting you'd prefer, since I tried the two spaces thing and it made the code look very strange IMO.

image

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using tabs instead of spaces, GitHub render tabs as 8 spaces. Your IDE should have a setting to set it. Please keep 4 spaces as the rest of the file.

System.out.print("> ");
System.out.flush();
String userText = in.nextLine();
if (List.of("quit", "exit").contains(userText)) {
break;
}
if (state == null) {
state = model.createNewState();
}
conversationTokens.addAll(chatFormat.encodeMessage(new ChatFormat.Message(ChatFormat.Role.USER, userText)));
conversationTokens.addAll(chatFormat.encodeHeader(new ChatFormat.Message(ChatFormat.Role.ASSISTANT, "")));
Set<Integer> stopTokens = chatFormat.getStopTokens();
List<Integer> responseTokens = Llama.generateTokens(model, state, startPosition, conversationTokens.subList(startPosition, conversationTokens.size()), stopTokens, options.maxTokens(), sampler, options.echo(), token -> {
if (options.stream()) {
if (!model.tokenizer().isSpecialToken(token)) {
System.out.print(model.tokenizer().decode(List.of(token)));
}
}
});
// Include stop token in the prompt history, but not in the response displayed to the user.
conversationTokens.addAll(responseTokens);
startPosition = conversationTokens.size();
Integer stopToken = null;
if (!responseTokens.isEmpty() && stopTokens.contains(responseTokens.getLast())) {
stopToken = responseTokens.getLast();
responseTokens.removeLast();
}
if (!options.stream()) {
String responseText = model.tokenizer().decode(responseTokens);
System.out.println(responseText);
}
if (stopToken == null) {
System.err.println("Ran out of context length...");
break;
}
}
}
}

static void runInstructOnce(Llama model, Sampler sampler, Options options) {
Expand Down Expand Up @@ -346,7 +347,8 @@ public int byteSize() {
}
}

private void loadModelImpl(FileChannel fileChannel) throws IOException {
@SuppressWarnings("preview")
private void loadModelImpl(FileChannel fileChannel) throws IOException {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much indentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. I'm just using the default formatting and it appears to work on my end- I'm just matching my formatting to the rest of the documents.

image

// The header of the file.
readHeader(fileChannel); // gguf_header_t header;
// Tensor infos, which can be used to locate the tensor data.
Expand Down Expand Up @@ -726,7 +728,7 @@ private static Tokenizer createTokenizer(Map<String, Object> metadata, Vocabular

int allTokens = vocabulary.size();
int baseTokens = 128000; // assume all tokens after the base ones are special.
int reservedSpecialTokens = allTokens - baseTokens;
//int reservedSpecialTokens = allTokens - baseTokens;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed unused, do not comment it out, remove instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha.

List<String> specialTokensList = Arrays.stream(vocabulary.tokens(), baseTokens, allTokens).toList();

assert specialTokensList.stream().allMatch(token -> vocabulary.getIndex(token).isPresent());
Expand Down Expand Up @@ -1866,11 +1868,15 @@ public static Pair<float[], float[]> precomputeFreqsCis(int contextLength, int h
if (ropeScaling) {
// Llama 3.1 scaling
float loFreqWavelen = oldContextLength / loFreqFactor;
float hiFreqWavelen = oldContextLength / hiFreqFactor;
//float hiFreqWavelen = oldContextLength / hiFreqFactor;
float wavelen = (float) (2.0 * Math.PI / freq);
if (wavelen < hiFreqWavelen) {

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this changes the original behavior. We must keep sync with the RoPE scaling added in Llama 3.1

Here's the reference (RoPE scaling section):
1721756511924

Copy link
Author

@SkyAphid SkyAphid Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not arguing that, it's just that you're doing this after that check:

freq = freq

This doesn't do anything, technically. You basically comment it out with no change to functionality. That said, I agree that having it there (even if commented out) is crucial to understanding the code. I'll reset the code to how you had it though.

I'm curious- what program are you writing this in?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported it from the Python snippet, you can refactor it into something equivalent e.g. substitute the freq = freq; by a comment stating that no modification is done in that case, although you may get a warning about an empty code block.

It must remain functionally equivalent to the original (this is a hard constraint); and ideally close to the original as well, so it can be easily understood and compared by others.

I'm curious- what program are you writing this in?

I use IntelliJ and casually (neo)vim and run it from the command-line via jbang which supports a very convenient --debug flag.

//This doesn't do anything, so it triggers a warning.
/*if (wavelen < hiFreqWavelen) {
freq = freq;
} else if (wavelen > loFreqWavelen) {
} else */

if (wavelen > loFreqWavelen) {
freq = freq / scaleFactor;
} else {
float smooth = (oldContextLength / wavelen - loFreqFactor) / (hiFreqFactor - loFreqFactor);
Expand Down