Claudiio Beck
This my personal comparison chart to benchmark its features and this is my attempt to understand what each node does and what each settings affects the generated result. Follow along to check it for yourself too.
Seed = fixed
Steps = see comparison below
CFG = 1 (will adhere strictly to the prompt, without deviations)
Sampler = res_multistep (suggested by the model’s author)
Scheduler = simple (suggested by the model’s author)
Denoise = 1
No LORAS, no extra nodes
Timestamps are based on a GeForce 3080TI 12 GB RAM
Positive Prompt below / Negative Prompt = none
Steps means how many times the latent space will be calculated before delivering the final output. Z-Image is a fast model and was developed to use less steps. Here, I found the best results are between 8 and 12. Less steps generates too few details, more steps generates extra noise and some inconsistencies, mainly on the hair.
Latina female with thick wavy hair, blinking to the camera, there are some out of focus tree leaves in front of her. Breezy seaside light, warm tones, cinematic close-up.
Model Shift node means how noise or how polished the output will be. The lower the value, more artifacts will be present, the greater the value, more waxy the result. The model’s author recommends using Model Shift value = 3. All generations below were using 9 steps.
Blonde female with thick wavy hair, is sending a cute kiss to the camera, there are some out of focus tree leaves in front of her. Breezy seaside light, warm tones, cinematic close-up.
Noise means how soon must the model’s latent space calculations stop before processing the final output. The model’s author recommends using Noise value = 1. All generations below were using 9 steps and shift 3. Very interesting to see how noise greatly affects the output.
Black female with thick wavy hair, is spontaneously smiling to the camera, there are some out of focus tree leaves in front of her. Breezy seaside light, warm tones, cinematic close-up.
Sampler means the agent technology that will interpret the latent space according to the scheduler model and convert to a visual output. Each agent has it’s own algorithm and will influence mainly the time it needs to generate the final output, with some minor subtle aspect changes. The model’s author recommends using sampler = res_multistep. More noticeable are the light streaks on her arm and the out of focus leaves. All generations below were using 9 steps and shift 3.
Brazilian female with thick wavy hair, is holding her surf board, there are some out of focus tree leaves in front of her. Breezy seaside light, warm tones, cinematic close-up.