Skip to content

Add patch size to configs #63

@shahbuland

Description

@shahbuland

Currently everything is assuming patch size = 1
Previously, the literature suggested two things:

  1. p = 1 is significantly better
  2. pixel-space diffusion is impossible

I've found personally that neither of these are necessarily true, and were likely a consequence of learned positional encodings being bad. When using RoPE properly, you can train with larger patch sizes and minimal issues. To this end, I want us to do 16x16 latents with c32 given p = 2

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions