Add patch size to configs

Currently everything is assuming patch size = 1
Previously, the literature suggested two things:
1. p = 1 is significantly better
2. pixel-space diffusion is impossible

I've found personally that neither of these are necessarily true, and were likely a consequence of learned positional encodings being bad. When using RoPE properly, you can train with larger patch sizes and minimal issues. To this end, I want us to do 16x16 latents with c32 given p = 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add patch size to configs #63

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add patch size to configs #63

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions