First, a fair warning to the curious minds:
This entire endeavor is for educational purposes only. I'm trying to unravel the mysteries behind this technology and gain a deeper understanding of it. I chose "Kiwi" from "Cyberpunk Edgerunners" as my subject because she's a complex character, perfect for diving into uncharted waters. I don't think I will be sharing this current iteration of LoRA, as I consider it a failure, and sharing future iterations is still undecided.
And before you ask, no, I'm not an expert on this, so don't treat this as a guide. This is just me experimenting and sharing my results and thoughts. Maybe one day, I'll look back at this and laugh at how little I knew.
You might be wondering why this is part 4. Well, it's because this is my latest round of experimentation and is fresh in my mind. I'll get around to creating parts 1-3 when I have time.
Continuing our journey through trial and error, this time I think I failed as well, but the result is 70% satisfactory. Not good enough for me, but maybe it's a step in the right direction. Let's dive into this round of experimentation and see what we can learn.
This time around, I went back to using my own machine (Yes, brace yourself for a future article on another method I've tried. It was an absolute disaster, but I'll write about it when I have the time).
So, for this experiment, I started from scratch and made several minor adjustments. First, I added two new images to the training dataset, bringing the total up to 35 images (Part 1 will explain the dataset and my thought process). I also changed the keyword from "CyberpunkKiwi" to "kiwi(cyberpunk)".
I opted for a different checkpoint to train from this time: anylora checkpoint with a clip size of 2. I reduced the number of repeats from 95 to 75, and chose a network ranking setting of 36 x 16, which produced compact model sizes of 37 MB per epoch. I'm not entirely sold on the idea that smaller model sizes are better just yet, so next time, I'll likely try around 64 to see if it makes a difference—especially since, surprise, this experiment was unsatisfactory and will need to be redone. I made some minor tweaks here and there, and set the training batch size to 4 (I'll discuss training batch sizes and my thoughts on them in Part 3). Additionally, I fiddled with the color settings again. The training process seemed much quicker this time, with 24 epochs completed in about 9 hours. I was probably getting around 2s/it or less which is a huge improvement considering I was just tweaking settings for better or worse.
Testing the LoRA Models
I initially planned to do the basic calibration check at 768 x 768, but that would've taken around 2 hours for 24 epochs! So I canceled that and dropped it to 512, which took about an hour instead. After waiting impatiently for that to finish, I busied myself with other tasks. Once it was complete and I inspected the results, I found myself both impressed and disappointed. I could see the glimmers of my final goal as well as the shortcomings! I'll post some of the images I used for observation after the explanations, just in case some people don't want to see that. As I explained in Part 1, the point of this experiment is to ultimately produce four states of a single character. Most importantly, I need to see if I can get her tattoos working properly. However from this point I will add an additional state, hairstyle. This will be one of the last things to try and test. She has a few scenes where her hair is wet and might be interesting to see what happens.
I was absolutely blown away by her tattoo design this time! It was perfect! When I prompted it to draw Kiwi in a bikini, I could see her tattoo, and when I prompted it to draw her normally with clothes on, it didn't try to render the tattoo over the clothes. I'm also very happy with her head composition and overall design. She has a long neck, and the image is able to reproduce that – something I noticed in Part 2 or 3. I’ve been able to produce a few images with various settings which have impressed me. Enough so for me to start doing some more manual work on fixing them up. But I feel I usually am able to find a few with every batch I train, I guess it becomes a progress timestamp.
Well, this time around, there's quite a lot of artifacting, which indicates I overdid the training. I could barely get any of the later epochs to do anything, as they were too stiff. She also has a default weird pose when she's showing her tattoos. I think it's a cool pose, but it's hard to get it to do something else, which is another sign of overtraining. Overall, I'm quite unhappy with the outcome this time around, but I wouldn't say it's worse than before – there was definitely progress made.
What I've Learned This Time:
Firstly, those who might know about this sort of thing were probably already screaming at something I mentioned near the beginning. That's right, I changed the prompt to "kiwi(cyberpunk)"! This was a huge mistake! While it acknowledges the character as Kiwi, putting "cyberpunk" in brackets means the LoRA will emphasize that keyword making the image a cyberpunk image, unrelated to the game or Edgerunners, as it's now a common word. This could explain why I can't change the scenes and the other problems I'm having! So, in the next batch of training, this keyword is going away. Even though it's overtrained throughout the epochs, a strength of 0.6 produces something decent. I like the results at strengths 0.7 and 0.8, but there's artifacting, and it becomes increasingly harder to manipulate. From epoch 13 at a strength of 1, it's a complete failure. I don't feel comfortable sharing all my test data, so I won't, but I will cherry-pick some things. Even though I don't have a single image of her in a bikini in my dataset, I'm beyond happy with the outcome so far. Since Part 1 isn't published yet, basically, I only have three real image states of her: the majority are close-up face shots, a few body shots with her iconic red outfit on, and finally, 3 or 4 images of her nude. So, going by those 3 or 4 images to reproduce her tattoos is extremely impressive, considering it doesn't clip through what she's wearing. Overall, between this and the previous experiment, the end results seem to be producing two separate versions of Kiwi, even though the dataset is exactly the same (Part 3 is a must!). But I believe this is due to the tag I used; it's really messing things up this time. Previously, the tag was unique enough for it to identify what it was seeing and match it to that tag's name.
Next Steps and Lessons:
Moving forward, I'll definitely be more careful with the tags and prompts I use during training. The impact they have on the results is more significant than I initially realized. Furthermore, I'll continue to experiment with the number of epochs and strength settings to find the sweet spot that produces the best results without overtraining. It's a delicate balance, and I'm still learning.
In conclusion, while this round of experimentation hasn't been an outright success, it has provided me with valuable insights and a better understanding of the LoRA model. As frustrating as it can be, failure is just another stepping stone to success.
What I plan on doing next time: It's a failure, so retrain from zero, but don't you worry, I'm not giving up just yet. Next time, I want to try an image network size of 64, dropping the repeats to 10 with a maximum of 30 epochs. I want to try the same base model as I used this time. I think changing the network and repeats are already two major changes, and it's not a good way to debug. But, you know, in this case, the time it takes to train a LoRA makes it prohibitive to really experiment with all the settings at once. If it fails, obviously, we'll go back and retrain with different settings. Learning from your mistakes is the key in computers, after all!
Alright, time to look at the images and share my thoughts
First some artifacted face close ups!