2026

InSight-doc: Agentic Visual Perception for Long-Document Understanding

Kaican Li*, Weiyan Xie*, Lewei Yao, Jiannan Wu, Lanqing Hong, Yongxiang Huang, Nevin L. Zhang (* Equal contribution, listed in alphabetical order)

In submission 2026 In Submission

Replaces fixed-resolution, single-pass pipelines with iterative perception that selectively acquires high-resolution crops on demand, advancing the accuracy–efficiency Pareto frontier. * Equal contribution (alphabetical order)

InSight-doc: Agentic Visual Perception for Long-Document Understanding

Kaican Li*, Weiyan Xie*, Lewei Yao, Jiannan Wu, Lanqing Hong, Yongxiang Huang, Nevin L. Zhang (* Equal contribution, listed in alphabetical order)

In submission 2026 In Submission

Replaces fixed-resolution, single-pass pipelines with iterative perception that selectively acquires high-resolution crops on demand, advancing the accuracy–efficiency Pareto frontier. * Equal contribution (alphabetical order)

CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

Weiyan Xie, Han Gao, Didan Deng, Kaican Li, April Hua Liu, Yongxiang Huang, Nevin L. Zhang

IEEE Conference on Artificial Intelligence (CAI) 2026

Enables selective edge-based structural control with dual-prompt guidance for training-free, controllable image editing.

CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

Weiyan Xie, Han Gao, Didan Deng, Kaican Li, April Hua Liu, Yongxiang Huang, Nevin L. Zhang

IEEE Conference on Artificial Intelligence (CAI) 2026

Enables selective edge-based structural control with dual-prompt guidance for training-free, controllable image editing.

2024

Dual Risk Minimization: Towards Next-Level Robustness in Fine-Tuning Zero-Shot Models

Kaican Li*, Weiyan Xie*, Yongxiang Huang, Didan Deng, Lanqing Hong, Zhenguo Li, Ricardo Silva, Nevin L. Zhang (* Equal contribution, listed in alphabetical order)

Advances in Neural Information Processing Systems (NeurIPS) 2024

Combats robustness vanishing during foundation-model fine-tuning by jointly optimizing empirical risk with worst-case risk estimated via CLIP and LLM-generated visual descriptions. * Equal contribution (alphabetical order)

Dual Risk Minimization: Towards Next-Level Robustness in Fine-Tuning Zero-Shot Models

Kaican Li*, Weiyan Xie*, Yongxiang Huang, Didan Deng, Lanqing Hong, Zhenguo Li, Ricardo Silva, Nevin L. Zhang (* Equal contribution, listed in alphabetical order)

Advances in Neural Information Processing Systems (NeurIPS) 2024

Combats robustness vanishing during foundation-model fine-tuning by jointly optimizing empirical risk with worst-case risk estimated via CLIP and LLM-generated visual descriptions. * Equal contribution (alphabetical order)

Consistency Regularization for Domain Generalization with Logit Attribution Matching

Han Gao*, Kaican Li*, Weiyan Xie*, Zhi Lin, Yongxiang Huang, Luning Wang, Caleb Chen Cao, Nevin L. Zhang (* Equal contribution, listed in alphabetical order)

Conference on Uncertainty in Artificial Intelligence (UAI) 2024

Logit Attribution Matching (LAM) anchors predictions to domain-invariant causal features by matching logit attributions across semantic-sharing pairs. * Equal contribution (alphabetical order)

Consistency Regularization for Domain Generalization with Logit Attribution Matching

Han Gao*, Kaican Li*, Weiyan Xie*, Zhi Lin, Yongxiang Huang, Luning Wang, Caleb Chen Cao, Nevin L. Zhang (* Equal contribution, listed in alphabetical order)

Conference on Uncertainty in Artificial Intelligence (UAI) 2024

Logit Attribution Matching (LAM) anchors predictions to domain-invariant causal features by matching logit attributions across semantic-sharing pairs. * Equal contribution (alphabetical order)

2023

ViT-CX: Causal Explanation of Vision Transformers

Weiyan Xie, Xiao-Hui Li, Caleb Chen Cao, Nevin L. Zhang

International Joint Conference on Artificial Intelligence (IJCAI) 2023

Estimates the causal effect of semantic patches on Vision Transformer predictions, moving beyond correlational saliency maps.

ViT-CX: Causal Explanation of Vision Transformers

Weiyan Xie, Xiao-Hui Li, Caleb Chen Cao, Nevin L. Zhang

International Joint Conference on Artificial Intelligence (IJCAI) 2023

Estimates the causal effect of semantic patches on Vision Transformer predictions, moving beyond correlational saliency maps.

Two-Stage Holistic and Contrastive Explanation of Image Classification

Weiyan Xie, Xiao-Hui Li, Zhi Lin, Leonard K. M. Poon, Caleb Chen Cao, Nevin L. Zhang

Conference on Uncertainty in Artificial Intelligence (UAI) 2023

Introduces Contrastive Whole-Output Explanation (CWOX), which explains a model's top-K labels by systematically contrasting visually confusable competitors.

Two-Stage Holistic and Contrastive Explanation of Image Classification

Weiyan Xie, Xiao-Hui Li, Zhi Lin, Leonard K. M. Poon, Caleb Chen Cao, Nevin L. Zhang

Conference on Uncertainty in Artificial Intelligence (UAI) 2023

Introduces Contrastive Whole-Output Explanation (CWOX), which explains a model's top-K labels by systematically contrasting visually confusable competitors.

A Causal Framework to Unify Common Domain Generalization Approaches

Nevin L. Zhang, Kaican Li, Han Gao, Weiyan Xie, Zhi Lin, Zhenguo Li, Luning Wang, Yongxiang Huang

arXiv:2307.06825 2023 Preprint

A Causal Framework to Unify Common Domain Generalization Approaches

A Causal Framework to Unify Common Domain Generalization Approaches

Nevin L. Zhang, Kaican Li, Han Gao, Weiyan Xie, Zhi Lin, Zhenguo Li, Luning Wang, Yongxiang Huang

arXiv:2307.06825 2023 Preprint

A Causal Framework to Unify Common Domain Generalization Approaches

2022

Example Perplexity

Nevin L. Zhang, Weiyan Xie, Zhi Lin, Guanfang Dong, Xiao-Hui Li, Caleb Chen Cao, Yunpeng Wang

arXiv:2203.08813 2022 Preprint

Proposes a diagnostic measure for assessing how well a model captures the structure of training examples.

Example Perplexity

Nevin L. Zhang, Weiyan Xie, Zhi Lin, Guanfang Dong, Xiao-Hui Li, Caleb Chen Cao, Yunpeng Wang

arXiv:2203.08813 2022 Preprint

Proposes a diagnostic measure for assessing how well a model captures the structure of training examples.