
Zhihui Chen, Kai He, Qingyuan Lei, Bin Pu, Jian Zhang, Yuling Xu, Mengling Feng# (# corresponding author)
Annual Meeting of the Association for Computational Linguistics (ACL) 2026 Main Conference" data-zh=" 主会议"> Main Conference
As generative models improve, medical deepfakes that implant or remove lesions while staying visually plausible pose growing risks to clinical safety and the integrity of medical evidence. Most prior work reduces detection to binary real-vs-fake scoring with little insight into where manipulation occurs or why. We present MedForge, an interpretable framework that introduces MedForge-90K—the first large-scale explainable medical deepfake dataset spanning CT, MRI, and X-ray, covering 19 lesion types with forgeries from 10 state-of-the-art deepfake models, each paired with expert-guided localization and clinical-grade explanations—and MedForge-Reasoner, a detector trained with a Localize-then-Analyze chain-of-thought paradigm and Forgery-aware GSPO reinforcement learning. MedForge-Reasoner achieves state-of-the-art detection while producing localized, verifiable medical rationales.
Zhihui Chen, Kai He, Qingyuan Lei, Bin Pu, Jian Zhang, Yuling Xu, Mengling Feng# (# corresponding author)
Annual Meeting of the Association for Computational Linguistics (ACL) 2026 Main Conference" data-zh=" 主会议"> Main Conference
As generative models improve, medical deepfakes that implant or remove lesions while staying visually plausible pose growing risks to clinical safety and the integrity of medical evidence. Most prior work reduces detection to binary real-vs-fake scoring with little insight into where manipulation occurs or why. We present MedForge, an interpretable framework that introduces MedForge-90K—the first large-scale explainable medical deepfake dataset spanning CT, MRI, and X-ray, covering 19 lesion types with forgeries from 10 state-of-the-art deepfake models, each paired with expert-guided localization and clinical-grade explanations—and MedForge-Reasoner, a detector trained with a Localize-then-Analyze chain-of-thought paradigm and Forgery-aware GSPO reinforcement learning. MedForge-Reasoner achieves state-of-the-art detection while producing localized, verifiable medical rationales.

Zhihui Chen, Kai He, Yucheng Huang, Yunxiao Zhu, Mengling Feng
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2025 Main Conference" data-zh=" 主会议"> Main Conference
Detecting LLM-generated text in specialized and high-stakes domains like medicine and law is crucial for combating misinformation and ensuring authenticity. We propose DivScore, a zero-shot detection framework using normalized entropy-based scoring and domain knowledge distillation to robustly identify LLM-generated text in specialized domains. Experiments show that DivScore consistently outperforms state-of-the-art detectors, with 14.4% higher AUROC and 64.0% higher recall at 0.1% false positive rate threshold.
Zhihui Chen, Kai He, Yucheng Huang, Yunxiao Zhu, Mengling Feng
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2025 Main Conference" data-zh=" 主会议"> Main Conference
Detecting LLM-generated text in specialized and high-stakes domains like medicine and law is crucial for combating misinformation and ensuring authenticity. We propose DivScore, a zero-shot detection framework using normalized entropy-based scoring and domain knowledge distillation to robustly identify LLM-generated text in specialized domains. Experiments show that DivScore consistently outperforms state-of-the-art detectors, with 14.4% higher AUROC and 64.0% higher recall at 0.1% false positive rate threshold.

Zhihui Chen, et al.
arXiv preprint 2025
Recent advances in multimodal large language models have enabled remarkable medical image editing capabilities. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built specifically for medical image editing with strict anatomical and clinical constraints. We introduce Med-Banana-50K, a comprehensive dataset for instruction-based medical image editing spanning chest X-ray, brain MRI, and fundus photography across 23 disease types. The public release includes 50,635 successful bidirectional edits and 37,822 failed attempts with full conversation logs, enabling evaluation, preference learning, and alignment research for medically grounded image editing systems.
Zhihui Chen, et al.
arXiv preprint 2025
Recent advances in multimodal large language models have enabled remarkable medical image editing capabilities. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built specifically for medical image editing with strict anatomical and clinical constraints. We introduce Med-Banana-50K, a comprehensive dataset for instruction-based medical image editing spanning chest X-ray, brain MRI, and fundus photography across 23 disease types. The public release includes 50,635 successful bidirectional edits and 37,822 failed attempts with full conversation logs, enabling evaluation, preference learning, and alignment research for medically grounded image editing systems.
Zhihui Chen
52nd NAMRC (North American Manufacturing Research Conference) 2024
This paper presents a production efficiency analysis framework based on RFID-collected manufacturing big data, enabling real-time monitoring and optimization of manufacturing processes.
Zhihui Chen
52nd NAMRC (North American Manufacturing Research Conference) 2024
This paper presents a production efficiency analysis framework based on RFID-collected manufacturing big data, enabling real-time monitoring and optimization of manufacturing processes.
Qiyuan Wang, Zhihui Chen
IEEE iSPEC (International Smart Power and Energy Conference) 2021
We propose a clustering-enabled few-shot learning approach for load forecasting, which significantly improves prediction accuracy with limited training data. This work also led to a patent (CN113887812B).
Qiyuan Wang, Zhihui Chen
IEEE iSPEC (International Smart Power and Energy Conference) 2021
We propose a clustering-enabled few-shot learning approach for load forecasting, which significantly improves prediction accuracy with limited training data. This work also led to a patent (CN113887812B).