🐦PartCraft: Crafting Creative Objects by Parts (ECCV 2024) 🐦

  • CVSSP, University of Surrey1
  • iFlyTek-Surrey Joint Research Centre2
  • Surrey Institute for People-Centred AI3
overview

Abstract

This paper propels creative control in generative visual AI by allowing users to "select". Departing from traditional text or sketch-based methods, we for the first time allow users to choose visual concepts by parts for their creative endeavors. The outcome is fine-grained generation that precisely captures selected visual concepts, ensuring a holistically faithful and plausible result. To achieve this, we first parse objects into parts through unsupervised feature clustering. Then, we encode parts into text tokens and introduce an entropy-based normalized attention loss that operates on them. This loss design enables our model to learn generic prior topology knowledge about object's part composition, and further generalize to novel part compositions to ensure the generation looks holistically faithful. Lastly, we employ a bottleneck encoder to project the part tokens. This not only enhances fidelity but also accelerates learning, by leveraging shared knowledge and facilitating information exchange among instances. Visual results in the paper and supplementary material showcase the compelling power of PartCraft in crafting highly customized, innovative creations, exemplified by the "charming" and creative birds.

Methodology


overview

Overview of our PartCraft. (Left) Part discovery within a semantic hierarchy involves partitioning each image into distinct parts and forming semantic clusters across unlabeled training data. (Right) All parts are organized into a dictionary, and their semantic embeddings are learned through a textual inversion approach. For instance, a text description like a photo of a [Head,42] [Wing,87]... guides the optimization of the corresponding textual embedding by reconstructing the associated image. To improve generation fidelity, we incorporate a bottleneck encoder $f$ (MLP) to compute the embedding $y$ as input to the text encoder. To promote disentanglement among learned parts, we minimize a specially designed attention loss, denoted as \( \mathcal{L}_{attn} \).

Unseen part composition


overview

Integrating a specific part (e.g., body, head, or even background) of a source concept B to the target concept A.

Comparison


overview

Visual comparison on 4-species (specified on the top row) hybrid generation. The last column indicates generated images with different styles (i.e., DSLR, Van Gogh, Oil Painting, Pencil Drawing). While all images appear realistic, most methods struggle to assemble all 4 subconcepts. In contrast, our methods successfully combine 4 different parts from 4 different species, demonstrating the superior ability of our approach to part composition.


More examples


overview

Creative Generation


overview

We demonstrate that not only it can compose parts within the domain of the target concepts (e.g., birds), but it can also transfer the learned parts to and combine with other domains (e.g., cat). This enables the creation of unique combinations, such as a cat with a dog’s ear. Leveraging the prior knowledge embedded in Stable Diffusion, PartCraft can also repurpose learned parts to design innovative digital assets. These examples showcase PartCraft's immense potential for diverse and limitless creative applications.

Playground

These images are generated automatically, so there may occasionally be some NSFW (black) images that the authors have overlooked.

Trained on CUB-200-2011


Image 1
Image 2
Image 3
Image 4
Prompt:

[...]

Head:

[...]

Body:

[...]

Wings:

[...]

Legs:

[...]

Trained on Stanford Dogs


Image 1
Image 2
Image 3
Image 4
Prompt:

[...]

Upper Face (Forehead/Eye):

[...]

Ears:

[...]

Lower Face (Nose/Mouth/Neck):

[...]

Body:

[...]

Citation

@inproceedings{
                    ng2024partcraft,
                    title={PartCraft: Crafting Creative Objects by Parts},
                    author={Kam Woh Ng and Xiatian Zhu and Yi-Zhe Song and Tao Xiang},
                    booktitle=ECCV,
                    year={2024}
                  }