Portrait
Weiyan Xie (Vayne)
Ph.D. candidate, Dept. of Computer Science & Engineering
Hong Kong University of Science and Technology (HKUST)
About Me

I plan to graduate in Oct. 2026 and am actively open to discussing industry opportunities in multimodal AI, agentic systems, and LLM/MLLM training and inference.

I am a Ph.D. candidate in the Dept. of Computer Science & Engineering at HKUST, advised by Prof. Nevin L. Zhang, and a recipient of the Huawei PhD Fellowship (HKUST).

My research focuses on the real-world application of deep vision and vision-language models, with emphasis on explainability, generalization, MLLM-based agentic visual perception, and controllability in image editing.

In general, I aim to develop diagnostic tools to understand what models currently depend on, and targeted mechanisms to guide them toward causally relevant, trustworthy, and efficient behavior.

Research Interests & Selected Work
Organized by theme with representative papers and code.
Theme 1
Vision-Language Models, MLLMs, and Agentic Visual Perception

Multimodal LLMs may rely on language priors rather than pertinent visual evidence, especially on long documents. I explore agentic perception frameworks that gather evidence iteratively to improve accuracy and efficiency.

  • InSight-doc: Agentic Visual Perception for Long-Document Understanding
    In submission, 2026
    Replaces fixed-resolution, single-pass pipelines with iterative perception that selectively acquires high-resolution crops on demand.
    Paper, code, and data will be publicly available soon.
Theme 2
Controllable Image Editing and Generation

Controllable editing requires precise spatial and semantic guidance without costly retraining. I develop training-free methods that combine structural control with flexible prompt guidance.

  • CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing
    IEEE CAI, 2026
    Enables selective edge-based structural control with dual-prompt guidance for training-free, controllable image editing.
Theme 3
Robustness, Domain Generalization, and Adaptation

Foundation models can lose robustness during fine-tuning and fail under distribution shift. I design training objectives that anchor decisions to invariant, generalizable features.

  • Consistency Regularization for Domain Generalization with Logit Attribution Matching
    UAI, 2024
    Logit Attribution Matching (LAM) anchors predictions to domain-invariant causal features.
  • Dual Risk Minimization: Towards Next-Level Robustness in Fine-Tuning Zero-Shot Models
    NeurIPS, 2024
    Combats robustness vanishing during foundation-model fine-tuning via dual risk minimization.
Theme 4
Trustworthy and Explainable AI (XAI)

Deep classifiers often rely on spurious correlations rather than causally relevant visual evidence. My work develops explanation methods that diagnose misaligned dependencies and surface discriminative rationales.

  • ViT-CX: Causal Explanation of Vision Transformers
    IJCAI, 2023
    Estimates the causal effect of semantic patches on Vision Transformer predictions.
  • Two-Stage Holistic and Contrastive Explanation of Image Classification
    UAI, 2023
    Introduces CWOX, which explains top-K labels by contrasting visually confusable competitors.
  • Example Perplexity
    arXiv:2203.08813, 2022
    Proposes a diagnostic measure for assessing how well a model captures training-example structure.
Education
  • Hong Kong University of Science and Technology
    Ph.D. in Computer Science
    Sep. 2022 - Oct. 2026
  • Hong Kong University of Science and Technology
    M.Sc. in Big Data Technology (CGPA 4.11/4.3, Rank 5/120)
    Sep. 2019 - Dec. 2020
  • Hong Kong Baptist University (HKBU)
    B.S. in Statistics (CGPA 3.51/4.0, Rank 3/70)
    Sep. 2015 - Jun. 2019
Honors & Awards
  • Huawei PhD Fellowship (HKUST)
    2022–2026
  • MSc Big Data Technology Top Students Award (HKUST)
    2020
  • School of Engineering Excellent Student Scholarship (HKUST)
    2020
Teaching Experience
  • Teaching Assistant, Postgraduate Machine Learning (MSBD5012 / CSIT5910 / COMP5212)
    2022 Fall, 2023 Spring, 2023 Fall, 2024 Fall, 2025 Fall
  • Teaching Assistant, COMP2011 Programming with C++ (UG core course)
    2024 Spring
  • Co-supervised 15+ Master's independent projects
    Most students received A-level grades
Professional Service
Conference reviewer / PC member
  • UAI 2024 Top Reviewer
  • NeurIPS (2023–2026)
  • ICML (2023–2026)
  • ICLR (2024–2026)
  • UAI (2024–2026)
  • AAAI (2025–2026)