![]() ![]() However, the pose is correct, at least, a cool white hair leather armor knight is still anime, I guess There must be a cool white hair knight wearing leather armor so cool so dark in the dataset, and totally not like any of the anime cute girls wearing dresses I finally made a 101k subset towards ‘the most usual poses’ by statistics, and it does not do well, too less data regarding too much poses The danbooru dataset is way too noisyĮven with all the efforts to clean the dataset, in the final sampling stage, it is easy to spot totally undesirable outputs such as below Well, I tried, making a 101k subset out of the 364k subset, but I can not get it ‘selecting only the appropriate 40%’, by statistics they look alike, the best way I can come up with is to train another resnet model to label them, but this dataset is different from the leonid afremov dataset, I can hand craft segmentation 25% of the 600 paintings, but there is no way I tag sufficient percentage of this 363k dataset all by myself If your dataset only ~40% contains standard looking hands, and ~60% images the hand is holding some item or does not have hands at all, your model are not going to generate hands wellīy intuition the next step is to further clean up the dataset, selecting only the appropriate 40% (top row as example), make it 140k in total and finally getting better results Now to recap the problems we mention earlier Hands have long been a weak point bottom row 6/20 (30%) images is totally unacceptable, it shall make the training unstable and semantically confused.mid row 6/20 (30%) images seems to be questionable, not sure if the model could refine details from such stylized complex-visual image.top row 8/20 (40%) images seems to be near-unified portraits suitable for training.Here’s 20 random samples from the 363k subset Maybe this filtering is a little bit overhead, sometimes I felt like this type of filtering does not eliminate most abnormal samples but hurt total available image count directly Rethink: A cleaner subset is not clean enough Under what circumstances should a anime have hips top of the image like y < 100 ?įinally, applying several data analysis techniques, I finally got a 363k subset which is ~50% smaller than the previous intermediate 600k subset, make sure every shoulder and wrist etc etc not placing too odd Wait, I have a fun quiz about the fore-mentioned figure: the dense area of the main distribution seems to be normal, regarding one single ‘hip position’ alone, are they good samples for training ?.when something went wrong with the image or pose-estimation model, random points are understandable, such as some weird four legged creature may have hip anywhere.the girl is holding a doll face, and all backgroud full of doll faceĪmong the 10x10=100 samples, basic counting tells that = 512 is totally understandable.a girl making weird poses that the feet is too big and no arms.Please be noted that this is from a SFW subset (around 3m+), and down-scaled to 512x512 alreadyįor the scenario of “keypoints based” anime generation, it’s easy to tell most of the samples are not suitable for training, naming a few: Let’s take a look at offical grid sample image To train a pose keypoints based model, a pose keypoints dataset is required, but not all danbooru dataset images is suitable for training My approach to aquire a cleaner subset The danbooru dataset is way too noisy (reddit user comments).So here comes the question: which problems are dataset related and how do they affect the later training process Addressing the known problems discussed in years I trained a keypoint based anime generation model on top of the danbooru 2021 dataset, more specifically, on a filtered subset, and get some satisfying resultsīut after everything is done, the whole process need to be reviewed, I need to do backpropagation towards my mind and do better next time ![]()
0 Comments
Leave a Reply. |