projects

Med-Banana-50K: Large-Scale Medical Image Editing Dataset


A comprehensive 50K-image dataset for instruction-based medical image editing spanning three modalities and 23 disease types.

Key Features

  • 50,635 successful edits across 3 medical imaging modalities
  • Chest X-ray: 12 pathology types (Pneumothorax, Pleural Effusion, etc.)
  • Brain MRI: 4 tumor types (Glioma, Meningioma, Pituitary)
  • Fundus photography: 7 disease types (Diabetic Retinopathy, Glaucoma, etc.)
  • Bidirectional editing: lesion addition and removal
  • LLM-as-Judge quality control with medically grounded rubric
  • 37K failed attempts with full conversation logs for preference learning

Dataset Statistics

Modality Task Diseases Success Failed
Chest X-ray Add 12 9,854 7,971
Chest X-ray Remove 12 10,667 4,750
Brain MRI Add 4 4,536 8,630
Brain MRI Remove 4 4,355 6,949
Fundus Add 7 18,505 3,162
Fundus Remove 7 2,718 6,360
Total 23+ 50,635 37,822

Note: Full dataset will be released on Hugging Face upon paper acceptance.

DivScore: Zero-Shot LLM Detection in Specialized Domains


A zero-shot detection framework for identifying LLM-generated text in specialized domains like medicine and law, using normalized entropy-based scoring and domain knowledge distillation.

Key Innovations

  • Zero-shot detection: No training data required for new domains
  • Normalized entropy scoring: Robust metric for specialized text
  • Domain knowledge distillation: Leverages domain-specific patterns
  • Cross-domain robustness: Tested on medical, legal, and financial texts

Performance Highlights

Metric Improvement
AUROC +14.4% vs. SOTA
Recall @ 0.1% FPR +64.0% vs. SOTA
Zero-shot Capability No training needed

Applications

  • Detecting AI-generated medical content to combat misinformation
  • Verifying authenticity of legal documents and contracts
  • Ensuring integrity in academic and scientific publishing
  • Quality control for financial reports and analysis
DivScore framework

Published at

EMNLP 2025 Main Conference

Welcome to Showcase!


Showcase is a page where you can show off almost anything you want. It can be the photo of your pets, your favorite books, your favorite projects, or anything else you want to show to the world.

You can create a new showcase item by creating a new file in the _showcase folder. It gives you the highest flexibility to customize the item using any HTML code.

Cards are ordered by the date field in the front matter in descending order. The width field is used to determine the width of the card, ranging from 1 to 12. Layout is done by the Masonry library.

For a tidy layout, it is recommended to set the width of the cards to be either multiple of 3 or multiple of 4 for all cards, except for small badges that do not take up much space (width=1).

GitHub Star History

This image shows the star history of the GitHub repository of this website.

Give a star!

Image Lazyload

It is highly recommended to use lazyload for images to improve page loading speed, especially for pages with many images. Example code snippet:

<img data-src="[Image URL]" class="lazy w-100 rounded-xl" src="{{ '/assets/images/empty_300x200.png' | relative_url }}">

Disable Showcase Page?

If you want to disable this showcase page, you can hide it from the navigation bar by removing the showcase in data/navigation.yml

$a^2 + b^2 = c^2$

Cats

Meow! I am a cat. Unsplash