2 papers at top NLP conferences 两篇 TOP NLP 会议论文
Zhihui Chen 陈致晖
Ph.D. Student in Artificial Intelligence, National University of Singapore
新加坡国立大学人工智能博士生

Published at ACL 2026 and EMNLP 2025, I study trustworthy and multimodal AI for high-stakes healthcare, spanning medical deepfake detection, LLM detection, and medical image editing. 已发表 ACL 2026 与 EMNLP 2025 主会议论文,研究方向聚焦高风险医疗场景中的可信与多模态人工智能,覆盖医学深度伪造检测、大模型文本检测与医学图像编辑。

Trustworthy and Agentic LLM 可信与智能体大语言模型 Multimodal Intelligence in Healthcare 医疗领域多模态智能
ACL 2026 Main ACL 2026 主会议 EMNLP 2025 Main EMNLP 2025 主会议 ACL 2026 Reviewer ACL 2026 审稿人 10+ reviews completed 已审稿 10+ 篇
TikTok 2026 Multimodal Agent PhD Intern TikTok 2026 多模态 Agent 博士实习生
Portrait

I am currently a second-year PhD student at the National University of Singapore, supervised by Prof. Mengling Feng.

Academic Service: Reviewer for ACL 2026, NeurIPS, IJCAI, and ACM TIST, with 10+ papers reviewed.

Research Interests: Trustworthy and Agentic LLM • Multi-modality Intelligence in Healthcare

我目前是新加坡国立大学二年级博士生,导师为 冯梦凌教授

担任NIPS、IJCAI、ACM TIST等会议和期刊的审稿人,已审稿10余篇。

研究方向:可信与智能体大语言模型 • 医疗领域多模态智能

Publication Highlights 论文与影响力
2
top NLP conference papers 篇 TOP NLP 论文
published at ACL 2026 and EMNLP 2025 发表于 ACL 2026 与 EMNLP 2025
100k+
Hugging Face downloads Hugging Face 下载
Med-Banana-50K dataset traction Med-Banana-50K 数据集关注度
100+
GitHub stars GitHub Stars
across DivScore and Med-Banana repositories DivScore 与 Med-Banana 仓库合计
ACL / NeurIPS / IJCAI
reviewer service 学术审稿服务
with 10+ papers reviewed 累计审稿 10+ 篇
2 Top NLP Conference Papers 两篇 TOP NLP 会议论文 (view all ) (查看全部
MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning
MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning 🔗

Zhihui Chen, Kai He, Qingyuan Lei, Bin Pu, Jian Zhang, Yuling Xu, Mengling Feng# (# corresponding author)

Annual Meeting of the Association for Computational Linguistics (ACL) 2026 Main Conference" data-zh=" 主会议"> Main Conference

As generative models improve, medical deepfakes that implant or remove lesions while staying visually plausible pose growing risks to clinical safety and the integrity of medical evidence. Most prior work reduces detection to binary real-vs-fake scoring with little insight into where manipulation occurs or why. We present MedForge, an interpretable framework that introduces MedForge-90K—the first large-scale explainable medical deepfake dataset spanning CT, MRI, and X-ray, covering 19 lesion types with forgeries from 10 state-of-the-art deepfake models, each paired with expert-guided localization and clinical-grade explanations—and MedForge-Reasoner, a detector trained with a Localize-then-Analyze chain-of-thought paradigm and Forgery-aware GSPO reinforcement learning. MedForge-Reasoner achieves state-of-the-art detection while producing localized, verifiable medical rationales.

MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning 🔗

Zhihui Chen, Kai He, Qingyuan Lei, Bin Pu, Jian Zhang, Yuling Xu, Mengling Feng# (# corresponding author)

Annual Meeting of the Association for Computational Linguistics (ACL) 2026 Main Conference" data-zh=" 主会议"> Main Conference

As generative models improve, medical deepfakes that implant or remove lesions while staying visually plausible pose growing risks to clinical safety and the integrity of medical evidence. Most prior work reduces detection to binary real-vs-fake scoring with little insight into where manipulation occurs or why. We present MedForge, an interpretable framework that introduces MedForge-90K—the first large-scale explainable medical deepfake dataset spanning CT, MRI, and X-ray, covering 19 lesion types with forgeries from 10 state-of-the-art deepfake models, each paired with expert-guided localization and clinical-grade explanations—and MedForge-Reasoner, a detector trained with a Localize-then-Analyze chain-of-thought paradigm and Forgery-aware GSPO reinforcement learning. MedForge-Reasoner achieves state-of-the-art detection while producing localized, verifiable medical rationales.

DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains
DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains 🔗

Zhihui Chen, Kai He, Yucheng Huang, Yunxiao Zhu, Mengling Feng

Conference on Empirical Methods in Natural Language Processing (EMNLP) 2025 Main Conference" data-zh=" 主会议"> Main Conference

Detecting LLM-generated text in specialized and high-stakes domains like medicine and law is crucial for combating misinformation and ensuring authenticity. We propose DivScore, a zero-shot detection framework using normalized entropy-based scoring and domain knowledge distillation to robustly identify LLM-generated text in specialized domains. Experiments show that DivScore consistently outperforms state-of-the-art detectors, with 14.4% higher AUROC and 64.0% higher recall at 0.1% false positive rate threshold.

DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains 🔗

Zhihui Chen, Kai He, Yucheng Huang, Yunxiao Zhu, Mengling Feng

Conference on Empirical Methods in Natural Language Processing (EMNLP) 2025 Main Conference" data-zh=" 主会议"> Main Conference

Detecting LLM-generated text in specialized and high-stakes domains like medicine and law is crucial for combating misinformation and ensuring authenticity. We propose DivScore, a zero-shot detection framework using normalized entropy-based scoring and domain knowledge distillation to robustly identify LLM-generated text in specialized domains. Experiments show that DivScore consistently outperforms state-of-the-art detectors, with 14.4% higher AUROC and 64.0% higher recall at 0.1% false positive rate threshold.

Med-Banana-50K: A Large-Scale Cross-Modality Dataset for Medical Image Editing
Med-Banana-50K: A Large-Scale Cross-Modality Dataset for Medical Image Editing 🔗

Zhihui Chen, et al.

arXiv preprint 2025

Recent advances in multimodal large language models have enabled remarkable medical image editing capabilities. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built specifically for medical image editing with strict anatomical and clinical constraints. We introduce Med-Banana-50K, a comprehensive 50K-image dataset for instruction-based medical image editing spanning three modalities (chest X-ray, brain MRI, fundus photography) and 23 disease types. Our dataset is constructed by leveraging Gemini-2.5-Flash-Image to generate bidirectional edits (lesion addition and removal) from real medical images. What distinguishes Med-Banana-50K from general-domain editing datasets is our systematic approach to medical quality control: we employ LLM-as-Judge with a medically grounded rubric and history-aware iterative refinement up to five rounds.

Med-Banana-50K: A Large-Scale Cross-Modality Dataset for Medical Image Editing 🔗

Zhihui Chen, et al.

arXiv preprint 2025

Recent advances in multimodal large language models have enabled remarkable medical image editing capabilities. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built specifically for medical image editing with strict anatomical and clinical constraints. We introduce Med-Banana-50K, a comprehensive 50K-image dataset for instruction-based medical image editing spanning three modalities (chest X-ray, brain MRI, fundus photography) and 23 disease types. Our dataset is constructed by leveraging Gemini-2.5-Flash-Image to generate bidirectional edits (lesion addition and removal) from real medical images. What distinguishes Med-Banana-50K from general-domain editing datasets is our systematic approach to medical quality control: we employ LLM-as-Judge with a medically grounded rubric and history-aware iterative refinement up to five rounds.

All publications 全部论文
Featured Work 代表项目
View showcase 查看项目展示
MedForge
Interpretable medical deepfake detection 可解释医学深度伪造检测

MedForge

Detects medically plausible image forgeries with localized reasoning, using the MedForge-90K benchmark and a Localize-then-Analyze detector. 通过 MedForge-90K 基准与先定位再分析的推理式检测器,对医学影像伪造进行定位、判别和解释。

ACL 2026 Main ACL 2026 主会议 19 lesion types 19 类病灶 10 deepfake models 10 种伪造模型
DivScore
Zero-shot LLM detection in specialized domains 专业领域零样本文本检测

DivScore

A robust detector for medicine and law that uses normalized entropy scoring and domain knowledge distillation without retraining on new domains. 面向医疗与法律等高风险领域,在无需新领域训练数据的前提下,稳健检测大模型生成文本。

EMNLP 2025 Main EMNLP 2025 主会议 +14.4% AUROC +14.4% AUROC +64.0% recall +64.0% 召回
Background 教育与经历
Education 教育经历
  • National University of Singapore
    National University of Singapore
    Ph.D. in Artificial Intelligence
    Jan. 2025 - present
  • The University of Hong Kong
    The University of Hong Kong
    M.Sc. in Artificial Intelligence
    Sep. 2022 - Jul. 2024
  • The Chinese University of Hong Kong, Shenzhen
    The Chinese University of Hong Kong, Shenzhen
    B.Sc. in Statistics, Data Science Stream
    Sep. 2018 - May. 2022
  • 新加坡国立大学
    新加坡国立大学
    人工智能博士
    2025年1月 - 至今
  • 香港大学
    香港大学
    人工智能理学硕士
    2022年9月 - 2024年7月
  • 香港中文大学(深圳)
    香港中文大学(深圳)
    统计学理学学士(数据科学方向)
    2018年9月 - 2022年5月
Experience 经历
  • TikTok
    Multimodal Agent PhD Intern
    Summer 2026
  • StepFun AI Intelligent Technology
    StepFun AI Intelligent Technology
    LLM Research Intern
    Jan. 2025 - June. 2025
  • The Chinese University of Hong Kong
    The Chinese University of Hong Kong
    Research Staff
    Sep. 2024 - Jan. 2025
  • TikTok
    多模态 Agent 博士实习生
    2026 年夏
  • 阶跃星辰(StepFun)
    阶跃星辰(StepFun)
    大模型实习生
    2025年1月 - 2025年6月
  • 香港中文大学
    香港中文大学
    科研助理
    2024年9月 - 2025年1月
Honors & Awards 荣誉与奖励
  • Full Ph.D. Scholarship, National University of Singapore
    2025
  • Outstanding College Graduate, CUHK-Shenzhen Harmonia College
    2022
  • Undergraduate Research Excellence Award, CUHK-Shenzhen
    2021
  • 新加坡国立大学全额博士奖学金
    2025
  • 香港中文大学(深圳)祥波书院优秀毕业生
    2022
  • 香港中文大学(深圳)本科生科研卓越奖
    2021
Recent Updates 近期动态
News动态
2026
[TikTok] Joining TikTok in 2026 as a Multimodal Agent PhD Intern.
Apr 17
[ACL 2026] Our paper MedForge is accepted to the ACL 2026 main conference (CCF-A). This year was extremely competitive—only the top ~19% of submissions were accepted.
Apr 10
[ACL 2026] Honored to serve as a reviewer—glad to contribute to the research community.
Apr 09