CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-free Image Editing

arXiv Code (Coming) Demo (Coming)

✨ Highlights ✨

1. High-Quality Region-Specific Image Edits

Our method enables high-quality region-specific image edits, especially useful in cases where SOTA free-form image editing methods fail to ground edits accurately.

2. Support Multiple Edits at One Pass

Our methods can support edits on multiple user-specific regions at one generation pass when multiple masks are given.

3. Precise Local Control

a. By specifying the mask size, our method effectively controls the size of the generated subject.

Add a sofa + Add a painting on the wall.

Input Image

Input

Small cat result

CannyEdit (small mask)

Medium cat result

CannyEdit (medium mask)

Large cat result

CannyEdit (large mask)

Add a slide + Add a swimming pool with children playing on them.

Input Image

Input

Small vase result

CannyEdit (small mask)

Medium vase result

CannyEdit (medium mask)

Large vase result

CannyEdit (large mask)

b. By providing varying local details in the text, subjects with different visual characteristics are generated.

Add a sofa + Add a painting on the wall.

Input Image

Input

Red gift box

CannyEdit (output 1)

Blue gift box

CannyEdit (output 2)

Green gift box

CannyEdit (output 3)

Add a woman customer reading a menu + Add a waiter ready to serve.

Input Image

Input

Sports car

CannyEdit (output 1)

Vintage car

CannyEdit (output 2)

SUV car

CannyEdit (output 3)

Abstract

Recent advances in text-to-image (T2I) models have enabled training-free regional image editing by leveraging the generative priors of foundation models. However, existing methods struggle to balance text adherence in edited regions, context fidelity in unedited areas, and seamless integration of edits. We introduce CannyEdit, a novel training-free framework that addresses these challenges through two key innovations: (1) Selective Canny Control, which masks the structural guidance of Canny ControlNet in user-specified editable regions while strictly preserving the source image’s details in unedited areas via inversion-phase ControlNet information retention. This enables precise, text-driven edits without compromising contextual integrity. (2) Dual-Prompt Guidance, which combines local prompts for object-specific edits with a global target prompt to maintain coherent scene interactions. On real-world image editing tasks (addition, replacement, removal), CannyEdit outperforms prior methods like KV-Edit, achieving a 2.93%–10.49% improvement in the balance of text adherence and context fidelity. In terms of editing seamlessness, user studies reveal only 49.2% of general users and 42.0% of AIGC experts identified CannyEdit's results as AI-edited when paired with real images without edits, versus 76.08–89.09% for competitor methods.

Method

method

The inversion-denoising process of CannyEdit. 1) inversion: Starting from the source image, its Canny edge map, and a source prompt (Psource), we use FireFlow[1] to obtain the inverted noise (xtN) and corresponding Canny ControlNet outputs. The immediate noisy latents during the inversion process {xt1,...,xtN-1} are also cached for enhancing the context fidelity. 2) denoising: Using the inverted noise (xtN), we perform guided generation with selective Canny control (via a mask E), and dual prompts, (Plocal) and (Ptarget), to provide multi-level text guidance.

Typical editing tasks

Details for typical editing tasks

More examples of typical editing

Generalized Editing Tasks

Multi-edit examples

Powered by

ms-logo ascend-logo

BibTeX


        @article{xie2025canny,
          title={CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-free Image Editing},
          author={Xie, Weiyan and Gao, Han and Deng, Didan and Li, Kaican and Liu, April Hua and Huang, Yongxiang and Zhang, Nevin L.},
          journal={arXiv preprint arXiv:2508.06937},
          year={2025}
        }
      

Contact

Contact: Weiyan Xie via wxieai@cse.ust.hk