At long last, I've managed to achieve a result that's not entirely terrible. But, as always, there's room for improvement, so brace yourselves for an onslaught of images as I dissect my thought process. However note the editing on here is absolute shambles, I will deal with it in the future. Sigh I just wanted to use software where I did not have to customize and code everything myself. Why is there no column control here?
Time for a quick rundown of the changes I made this time around. I switched the main tag to "kiwi-cyberpunk," reduced the repeats from 75 to a modest 10, and enabled the shuffle images feature. Despite my lack of faith in the batch size, I begrudgingly set it to 1, saving further tweaks for later. As for the network rank and alpha, I settled on 64 and 32, respectively. After running this training for 30 epochs, it took less than 3 hours to complete. My ever-reliable intuition insisted on adding "v1" to the file name, and wouldn't you know it, it turned out to be decent enough for a v1/MVP release. So here I am, roughly 80% pleased with the outcome.
Alright, let's dive into the good, the bad, and the lessons learned.
This time, I managed to dodge the bullet of overfitting and artifacting, which is a pleasant surprise. The results brought a rare smile to my face, with the ability to manipulate tags and produce more normal-looking images.
Naturally, there's always room for failure, and identifying those moments can be educational. I have the sneaking suspicion that I skimped on the epochs; even at 30, it could've benefited from more, maybe 35 or 40. Ultimately, this round felt overly cautious with just 10 repeats, and I could've pushed it to 15. Also, the shocked pose from the previous training set is gone, which is a shame. Reproducing it will be tricky, but I'll make use of ControlNet to get that pose since it's grown on me. Another issue: most generated images have a lime green tint, which indicates the model is still "cooking." If I don't add "1girl, solo" to the prompt, it deviates from the desired slender body type. My conclusion? It's slightly undertrained and needs about 30% more time, roughly 39 epochs (my intuition says 40).
Lessons Learned & Next Steps:
Ten repeats were too low, and the character's strength needs toning down. Avoiding artifacting due to undertraining was a plus, but I struggled with getting the tattoos just right. The model, however, is flexible with prompts and adjustments.
For the future, I have two routes in mind. First, continue training to epoch 40, starting from epoch 0. My constant training rate should push the training time past 3 hours, but it's not an issue. Second, I'm more intrigued by adding regularization images, increasing repeats to 20, setting the max epoch to 42, and upping the training batch size to 2 or 4. Regularization images are uncharted territory for me, but after reading up on it, I think I understand how it works. I'll cover that in my next blog post once I confirm my grasp on the concept.
Surprisingly, I'm several iterations in with the same LoRA, which I thought would be simpler. But the process has been a fun learning experience! Honestly, I enjoy tweaking settings more than the actual outputs, and I'm not even sure if I want to post them online – it's the creation process that truly excites me.
Alright, let's dissect the images and observations.
First, I generated some images to examine. Typically, I start with a grid of images, but constructing a 30-epoch grid at 768x768 pixels would be time-consuming, so I opted for random epochs instead.
For a baseline test, I remembered there's a LoRA of Lucy from Cyberpunk, which should have a similar art style and make comparisons easier. However, the Lucy LoRA wasn't entirely cooperative. The quality was superior, but it struggled with too few tags. Nonetheless, it's a helpful reference.
The Lucy baseline revealed that her style significantly intensified at 0.8 strength and broke at 0.9 and 1. It's not recommended to run it that high, but it provides an idea of what numbers to aim for.
Next, I prompted "bikini" to see the results, which weren't great. But, it proved the LoRA could generate interesting images.
With the baselines established, I tested my own LoRA. I simply switched Lucy for Kiwi and adjusted the LoRA, prompt, and settings accordingly.
The results were impressive, but there's still work to be done. The difference between 0.65 and 0.7 strength was minimal, the body type felt generic, and the tattoos were missing. However, the face and hair worked well, and she appeared to be grasping objects.
At 0.9 strength, details like the tattoo color, face, neck, body type, and crop top improved. She even held a cigarette, staying true to her character. But the style shouldn't be kicking in at such a high strength; it should've started fixing proportions and other aspects at a lower strength. We're about 30% or more off our target strength. To make matters worse, I realized I made a mistake by using epoch 27, not the final epoch (30).
Alright, let's see how changing the epoch affects the strength.
By simply adjusting the epoch, I hoped that the strength might shift from 0.9 to 0.8. So, let's see what happened:
From 0.65 to 0.70, we have a generic, unremarkable character. At 0.8, Kiwi begins to emerge, but the tattoos are unclear, the arms too thick, and the neck too short – still too generic. At 0.9, her tattoos start to appear, but the overall image is odd, and at a strength of 1, it's a complete failure. Also, her clothing takes on a green hue, suggesting that "kiwi" is being used as a color prompt. I made some changes to the prompt and reran the test at the same epoch, confirming that randomness played a role in the results. At strength 0.9, her tattoos turned pink, and weirdness ensued at strength 1.
Next, I tried a standard prompt to see how well the model would render Kiwi in her outfit.
The same observations came through, but strength 0.8 fared better this time, resulting in a slightly modified, yet appealing, outfit. Another happy accident! Also note that these Y grid are of my initial run so they are not cherry picked, I did re-run some of them again to see if the same observations occurred and it did. Unless someone is really interested in seeing the other tests I won't paste it here as it hardly deviates from the outcome so making reproducible.
Perplexed by the Lucy baseline test's failure, I generated a couple of Lucy images with more tags, then replaced them with Kiwi LoRA at epochs 27 and 30 at the observed strengths. The images turned out quite decent.
Before I leave you with the intriguing image of Kiwi, please be advised that the following illustration, generated by our text-to-image AI, may be considered suggestive or risqué to some viewers. While one could argue about whether AI-generated content constitutes "art," the fact that it evokes a reaction in me makes it worth appreciating. If you're easily offended by such content, I suggest you proceed with caution or skip it altogether. But for those of you who appreciate the captivating side of “visuals”, enjoy this alluring piece.
Here we are, the grand finale. Let's describe the image of , Kiwi, in a tactful manner.
At last, I have an image of Kiwi that I'm genuinely pleased with. The shot captures her looking down and to the side, her gaze meeting the viewer's. Her face and hair are well-rendered, and she sports a form-fitting bikini that showcases her ample assets (Although admittedly this does not match her character 100%). The pose, with her hand on her hip, subtly hints at her tattoos. Also her default clothes are there. While it's a suggestive image, it's also undeniably captivating.
However, there are a few minor issues with the clothing and the jacket's odd "folding in" effect. These imperfections can be easily fixed with inpainting or Photoshop, which I might tackle as a personal project.
In conclusion, I hope you've enjoyed this journey of creating and refining the character LoRA of Kiwi, as much as I have. It's been an exciting process, filled with trial and error, and I appreciate you joining me on this adventure. Keep an eye out for future blog posts as we continue to explore and experiment.