ResMaster Save

Project README

🚀 ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Shuwei Shi¹, Wenbo Li², Yuechen Zhang², Jingwen He², Biao Gong³, Yinqiang Zheng‡¹

¹The University of Tokyo
²The Chinese University of Hong Kong
³Ant Group
^‡Corresponding author

🎏 Abstract

Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in over-smoothed content, structural distortions, and repetitive patterns. To this end, we introduce ResMaster, a novel, training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions. Specifically, ResMaster leverages a low-resolution reference image created by a pre-trained diffusion model to provide structural and fine-grained guidance for crafting high-resolution images on a patch-by-patch basis. To ensure a coherent global structure,ResMaster meticulously aligns the low-frequency components of high-resolution patches with the low-resolution reference at each denoising step. For fine-grained guidance, tailored image prompts based on the low-resolution reference and enriched textual prompts produced by a vision-language model are incorporated. This approach could significantly mitigate local pattern distortions and improve detail refinement. Extensive experiments validate that ResMaster sets a new benchmark for high-resolution image generation and demonstrates promising efficiency.

💻 Overview

ResMaster employs Structural and Fine-Grained Guidance to ensure structural integrity and enhance detail generation. Specifically, ResMaster implements low-frequency component swapping using the low-resolution image generated at each sampling step to maintain global structural coherence in higher-resolution outputs. Additionally, to mitigate repetitive patterns and increase detail accuracy, we employ localized fine-grained guidance using condensed image prompts and enriched textual descriptions. The image prompts, derived from the generated low-resolution counterparts, contain critical semantic and structural information. Simultaneously, the detailed textual prompts produced by a pre-trained visionlanguage model (VLM) contribute to image generation on more complex and accurate patterns.

🌰 More Examples

🔥 Update

2024.6.25 - 🛳️ This repo is released.

🎓 Citation

@misc{shi2024resmaster,
  title={ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance},
  author={Shuwei Shi, Wenbo Li, Yuechen Zhang, Jingwen He, Biao Gong, Yinqiang Zheng},
  year={2024},
  eprint={2406.16476},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Open Source Agenda is not affiliated with "ResMaster" Project. README Source: Shuweis/ResMaster

Stars

Open Issues

Last Commit

5 days ago

Repository

Shuweis/ResMaster

License

Apache-2.0

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/resmaster"><img src="https://www.opensourceagenda.com/projects/resmaster/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022