CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-free Image Editing

Weiyan Xie*, Han Gao*, Didan Deng*, Kaican Li, April Hua Liu,

Yongxiang Huang, Nevin L. Zhang

arXiv Code (MindSpore) Code (PyTorch) RICE Bench (soon)

✨ Highlights ✨

1. High-Quality Region-Specific Image Edits

Our method enables high-quality region-specific image edits, especially useful in cases where SOTA free-form image editing methods fail to ground edits accurately.

Add a cyclist riding the bike, wearing a green and yellow jersey and sunglasses.

Input

CannyEdit Result

Result

Mask

CannyEdit (ours)

FLUX.1 Kontext [dev] Result

FLUX.1 Kontext [dev]

GPT-4o

Doubao

Add a child in a vibrant striped shirt and matching shorts dashes joyfully across the grassy field.

Input

CannyEdit Result

Result

Mask

CannyEdit (ours)

FLUX.1 Kontext [dev] Result

FLUX.1 Kontext [dev]

GPT-4o

Doubao

Add a woman energetically jogging on a treadmill.

Input

CannyEdit Result

Result

Mask

CannyEdit (ours)

FLUX.1 Kontext [dev] Result

FLUX.1 Kontext [dev]

GPT-4o

Doubao

Add an umpire closely observing the play after the catcher.

Input

CannyEdit Result

Result

Mask

CannyEdit (ours)

FLUX.1 Kontext [dev] Result

FLUX.1 Kontext [dev]

GPT-4o

Doubao

2. Support Multiple Edits at One Pass

Our methods can support edits on multiple user-specific regions at one generation pass when multiple masks are given.

Add a slide + Add a group of elderly men and women practicing Tai Chi.

Input

CannyEdit Result

Result

Mask

CannyEdit (ours)

FLUX.1 Kontext [dev] Result

FLUX.1 Kontext [dev]

GPT-4o

Doubao

Add a panda logo at the left door + Add a cat logo at the right flowerpot.

Input

CannyEdit Result

Result

Mask

CannyEdit (ours)

FLUX.1 Kontext [dev] Result

FLUX.1 Kontext [dev]

GPT-4o

Doubao

Add a slide + Add a swimming pool with children playing on them.

Input

CannyEdit Result

Result

Mask

CannyEdit (ours)

FLUX.1 Kontext [dev] Result

FLUX.1 Kontext [dev]

GPT-4o

Doubao

Add a woman customer reading a menu + Add a waiter ready to serve.

Input

CannyEdit Result

Result

Mask

CannyEdit (ours)

FLUX.1 Kontext [dev] Result

FLUX.1 Kontext [dev]

GPT-4o

Doubao

3. Precise Local Control

a. By specifying the mask size, our method effectively controls the size of the generated subject.

Add a sofa + Add a painting on the wall.

Input Image

Input

Small cat result

CannyEdit (small mask)

Medium cat result

CannyEdit (medium mask)

Large cat result

CannyEdit (large mask)

Add a slide + Add a swimming pool with children playing on them.

Input Image

Input

Small vase result

CannyEdit (small mask)

Medium vase result

CannyEdit (medium mask)

Large vase result

CannyEdit (large mask)

b. By providing varying local details in the text, subjects with different visual characteristics are generated.

Add a sofa + Add a painting on the wall.

Input Image

Input

Red gift box

CannyEdit (output 1)

Blue gift box

CannyEdit (output 2)

Green gift box

CannyEdit (output 3)

Add a woman customer reading a menu + Add a waiter ready to serve.

Input Image

Input

Sports car

CannyEdit (output 1)

Vintage car

CannyEdit (output 2)

SUV car

CannyEdit (output 3)

Abstract

Recent advances in text-to-image (T2I) models have enabled training-free regional image editing by leveraging the generative priors of foundation models. However, existing methods struggle to balance text adherence in edited regions, context fidelity in unedited areas, and seamless integration of edits. We introduce CannyEdit, a novel training-free framework that addresses these challenges through two key innovations: (1) Selective Canny Control, which masks the structural guidance of Canny ControlNet in user-specified editable regions while strictly preserving the source image’s details in unedited areas via inversion-phase ControlNet information retention. This enables precise, text-driven edits without compromising contextual integrity. (2) Dual-Prompt Guidance, which combines local prompts for object-specific edits with a global target prompt to maintain coherent scene interactions. On real-world image editing tasks (addition, replacement, removal), CannyEdit outperforms prior methods like KV-Edit, achieving a 2.93%–10.49% improvement in the balance of text adherence and context fidelity. In terms of editing seamlessness, user studies reveal only 49.2% of general users and 42.0% of AIGC experts identified CannyEdit's results as AI-edited when paired with real images without edits, versus 76.08–89.09% for competitor methods.

Method

method

The inversion-denoising process of CannyEdit. 1) inversion: Starting from the source image, its Canny edge map, and a source prompt (P_source), we use FireFlow^[1] to obtain the inverted noise (x_{t_N}) and corresponding Canny ControlNet outputs. The immediate noisy latents during the inversion process {x_t₁,...,x_{t_N-1}} are also cached for enhancing the context fidelity. 2) denoising: Using the inverted noise (x_{t_N}), we perform guided generation with selective Canny control (via a mask E), and dual prompts, (P_local) and (P_target), to provide multi-level text guidance.

Typical editing tasks

Add a woman with her dog and a student reading book on the lawn.

Input

CannyEdit (ours)

KV-Edit

GPT-4o

Replace the woman tennis player with a man tennis player.

Input

CannyEdit (ours)

KV-Edit

GPT-4o

Remove the umbrellas.

Input

CannyEdit (ours)

KV-Edit

GPT-4o

Add a person running on the street.

Input

CannyEdit (ours)

KV-Edit

GPT-4o

Add a baseball catcher on the field.

Input

CannyEdit (ours)

KV-Edit

GPT-4o

Replace the pepper with three apples.

Input

CannyEdit (ours)

KV-Edit

GPT-4o

Details for typical editing tasks

Add a woman with her dog and a student reading book on the lawn.

Add a woman with her dog and a student reading book on the lawn.
SourceP: A man in a blue shirt is jogging on a tree-lined path in a sunny park.
TargetP: A man jogs on a tree-lined path in a sunny park, while a woman walks her dog and a student reads on the grass.
LocalP 1: A student is engrossed in their book, sitting on the lush green lawn.
LocalP 2: A woman in a green jacket and a ponytail is strolling through the park with her small brown dog on a leash.

Input

Mask

CannyEdit (ours)

Replace the woman tennis player with a man tennis player.

Replace the woman tennis player with a man tennis player.
SourceP: A female tennis player in action on a court, with spectators watching.
TargetP: A male tennis player in action on a court, with spectators watching.
LocalP: A male tennis player in action on a court.

Input

Mask

CannyEdit (ours)

Remove the umbrellas.

Remove the umbrellas.
SourceP: Two people walking in the rain with umbrellas.
TargetP: Two people walking in the rain.
Positive LocalP: Empty background.
Negative LocalP: Umbrellas.

Input

Mask

CannyEdit (ours)

Add a person running on the street.

Add a person running on the street.
SourceP: A construction site with barriers and signs.
TargetP: A construction site with barriers and signs, with a person jogging through the area.
LocalP: A person is running on a path, wearing athletic shoes and shorts.

Input

Mask

CannyEdit (ours)

Add a baseball catcher on the field.

Add a baseball catcher on the field.
SourceP: A baseball field with a net and a white line on the dirt.
TargetP: A baseball field with a net and a white line on the dirt, a catcher wearing protective gear and holding a mitt.
LocalP: A baseball catcher wearing a white uniform is kneeling on the ground and holding a mit.

Input

Mask

CannyEdit (ours)

Replace the pepper with three apples.

Replace the pepper with three apples.
SourceP: A red pepper on a towel.
TargetP: Three red apples on a towel.
LocalP: Three red apples.

Input

Mask

CannyEdit (ours)

More examples of typical editing

Add a woman embracing the man.

Replace the man with a woman.

Replace the dog with a boy.

Remove the dog.

Replace the male player with a female player.

Remove the football player.

Add a hat to the table with flowers.

Add a deer in the forest.

Generalized Editing Tasks

Transfer a the fishes to sharks.

Transfer the cartoon boy to a cartoon girl.

Transfer the bunny to a pig.

Transfer the rat to a pig.

Transfer the white tiger to a white cat.

Transfer the cat to a tiger.

Transfer the teacup to a cake.

Transfer the bread to meat.

Modify context: cover the table with food.

Modify context: fill the seats on the right with audience members.

Modify context: make the sky cloudy.

Modify context: change the cloudy sky to a blue sky with white clouds.

Multi-edit examples

Add a a female photographer + Add a child.

Add a boy + Add a girl.

Add a woman + Add another woman talking with a traffic officer.

Add a woman jogging on the treadmill + Add a coach monitoring the training.

Replace the man with a woman + Add a child.

Replace the man with a woman + Replace the sofa with a desk.

Replace the dog with a cat + Add a man standing on the roof.

Add a boy + Add a girl.

Powered by

BibTeX


        @article{xie2025canny,
          title={CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-free Image Editing},
          author={Xie, Weiyan and Gao, Han and Deng, Didan and Li, Kaican and Liu, April Hua and Huang, Yongxiang and Zhang, Nevin L.},
          journal={arXiv preprint arXiv:2508.06937},
          year={2025}
        }

Contact

Contact: Weiyan Xie via wxieai@cse.ust.hk