Homepage - Weiyan Xie

Weiyan Xie (Vayne)

Ph.D. candidate, Dept. of Computer Science & Engineering
Hong Kong University of Science and Technology (HKUST)

wxieai(at)cse.ust.hk

About Me

I plan to graduate in Oct. 2026 and am actively open to discussing industry opportunities in multimodal AI, agentic systems, and LLM/MLLM training and inference.

I am a Ph.D. candidate in the Dept. of Computer Science & Engineering at HKUST, advised by Prof. Nevin L. Zhang, and a recipient of the Huawei PhD Fellowship (HKUST).

My research focuses on the real-world application of deep vision and vision-language models, with emphasis on explainability, generalization, MLLM-based agentic visual perception, and controllability in image editing.

In general, I aim to develop diagnostic tools to understand what models currently depend on, and targeted mechanisms to guide them toward causally relevant, trustworthy, and efficient behavior.

Research Interests & Selected Work

Organized by theme with representative papers and code.

Theme 1

Vision-Language Models, MLLMs, and Agentic Visual Perception

Multimodal LLMs may rely on language priors rather than pertinent visual evidence, especially on long documents. I explore agentic perception frameworks that gather evidence iteratively to improve accuracy and efficiency.

InSight-doc: Agentic Visual Perception for Long-Document Understanding

In submission, 2026

Replaces fixed-resolution, single-pass pipelines with iterative perception that selectively acquires high-resolution crops on demand.

Paper, code, and data will be publicly available soon.

Theme 2

Controllable Image Editing and Generation

Controllable editing requires precise spatial and semantic guidance without costly retraining. I develop training-free methods that combine structural control with flexible prompt guidance.

CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

IEEE CAI, 2026

Enables selective edge-based structural control with dual-prompt guidance for training-free, controllable image editing.

[Paper] [Code]

Theme 3

Robustness, Domain Generalization, and Adaptation

Foundation models can lose robustness during fine-tuning and fail under distribution shift. I design training objectives that anchor decisions to invariant, generalizable features.

Consistency Regularization for Domain Generalization with Logit Attribution Matching

UAI, 2024

Logit Attribution Matching (LAM) anchors predictions to domain-invariant causal features.

[Paper] [Code]
Dual Risk Minimization: Towards Next-Level Robustness in Fine-Tuning Zero-Shot Models

NeurIPS, 2024

Combats robustness vanishing during foundation-model fine-tuning via dual risk minimization.

[Paper] [Code]

Theme 4

Trustworthy and Explainable AI (XAI)

Deep classifiers often rely on spurious correlations rather than causally relevant visual evidence. My work develops explanation methods that diagnose misaligned dependencies and surface discriminative rationales.

ViT-CX: Causal Explanation of Vision Transformers

IJCAI, 2023

Estimates the causal effect of semantic patches on Vision Transformer predictions.

[Paper] [Code]
Two-Stage Holistic and Contrastive Explanation of Image Classification

UAI, 2023

Introduces CWOX, which explains top-K labels by contrasting visually confusable competitors.

[Paper] [Code]
Example Perplexity

arXiv:2203.08813, 2022

Proposes a diagnostic measure for assessing how well a model captures training-example structure.

[Paper]

Education

Hong Kong University of Science and Technology

Ph.D. in Computer Science

Sep. 2022 - Oct. 2026
Hong Kong University of Science and Technology

M.Sc. in Big Data Technology (CGPA 4.11/4.3, Rank 5/120)

Sep. 2019 - Dec. 2020
Hong Kong Baptist University (HKBU)

B.S. in Statistics (CGPA 3.51/4.0, Rank 3/70)

Sep. 2015 - Jun. 2019

Honors & Awards

Huawei PhD Fellowship (HKUST)

2022–2026
MSc Big Data Technology Top Students Award (HKUST)

2020
School of Engineering Excellent Student Scholarship (HKUST)

2020

Teaching Experience

Teaching Assistant, Postgraduate Machine Learning (MSBD5012 / CSIT5910 / COMP5212)

2022 Fall, 2023 Spring, 2023 Fall, 2024 Fall, 2025 Fall
Teaching Assistant, COMP2011 Programming with C++ (UG core course)

2024 Spring
Co-supervised 15+ Master's independent projects

Most students received A-level grades

Professional Service

Conference reviewer / PC member

UAI 2024 Top Reviewer
NeurIPS (2023–2026)
ICML (2023–2026)
ICLR (2024–2026)
UAI (2024–2026)
AAAI (2025–2026)