🐦DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination 🐦

  • CVSSP, University of Surrey1
  • iFlyTek-Surrey Joint Research Centre2
  • Surrey Institute for People-Centred AI3
overview

Abstract

Recent text-to-image (T2I) generative models allow for high-quality synthesis following either text instructions or visual examples. Despite their capabilities, these models face limitations in creating new, detailed creatures within specific categories (e.g., virtual dog or bird species), which are valuable in digital asset creation and biodiversity analysis. To bridge this gap, we introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts (e.g., 200 bird species), we aim to train a T2I model capable of creating new, hybrid concepts within diverse backgrounds and contexts. We propose a new method called DreamCreature, which identifies and extracts the underlying sub-concepts (e.g., body parts of a specific species) in an unsupervised manner. The T2I thus adapts to generate novel concepts (e.g., new bird species) with faithful structures and photorealistic appearance by seamlessly and flexibly composing learned sub-concepts. To enhance sub-concept fidelity and disentanglement, we extend the textual inversion technique by incorporating an additional projector and tailored attention loss regularization. Extensive experiments on two fine-grained image benchmarks demonstrate the superiority of DreamCreature over prior art alternatives in both qualitative and quantitative evaluation. Ultimately, the learned sub-concepts facilitate diverse creative applications, including innovative consumer product designs and nuanced property modifications.

Methodology


overview

Overview of our DreamCreature. (Left) Discovering sub-concepts within a semantic hierarchy involves partitioning each image into distinct parts and forming semantic clusters across unlabeled training data. (Right) These clusters are organized into a dictionary, and their semantic embeddings are learned through a textual inversion approach. For instance, a text description like "a photo of a [Head,42] [Wing,87]..." guides the optimization of the corresponding textual embedding by reconstructing the associated image. To promote disentanglement among learned concepts, we minimize a specially designed attention loss, denoted as \( \mathcal{L}_{attn} \).

Mixing Sub-concepts


overview

Integrating a specific sub-concept (e.g., body, head, or even background) of a source concept B to the target concept A.

Comparison


overview

Visual comparison on 4-species (specified on the top row) hybrid generation. The last column indicates generated images with different styles (i.e., DSLR, Van Gogh, Oil Painting, Pencil Drawing). While all images appear realistic, most methods struggle to assemble all 4 subconcepts. In contrast, our methods successfully combine 4 different sub-concepts from 4 different species, demonstrating the superior ability of our approach to sub-concept composition.


More examples


overview

Creative Generation


overview

We demonstrate that not only it can compose subconcepts within the domain of the target concepts (e.g., birds), but it can also transfer the learned sub-concepts to and combine with other domains (e.g., cat). This enables the creation of unique combinations, such as a cat with a dog’s ear. Leveraging the prior knowledge embedded in Stable Diffusion, DreamCreature can also repurpose learned sub-concepts to design innovative digital assets. These examples showcase DreamCreature’s immense potential for diverse and limitless creative applications.

Playground

These images are generated automatically, so there may occasionally be some NSFW (black) images that the authors have overlooked.

Trained on CUB-200-2011


Image 1
Image 2
Image 3
Image 4
Prompt:

[...]

Head:

[...]

Body:

[...]

Wings:

[...]

Legs:

[...]

Trained on Stanford Dogs


Image 1
Image 2
Image 3
Image 4
Prompt:

[...]

Upper Face (Forehead/Eye):

[...]

Ears:

[...]

Lower Face (Nose/Mouth/Neck):

[...]

Body:

[...]

Citation

@misc{ng2023dreamcreature,
      title={DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination},
      author={Kam Woh Ng and Xiatian Zhu and Yi-Zhe Song and Tao Xiang},
      year={2023},
      eprint={2311.15477},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}