Projects 项目展示

MiniMax Cowork Team Fellowship: Medical Foundation Model Development and Clinically Verifiable Agent Workflows


A compute-supported project from MiniMax, focused on turning long-context, multimodal, and Agent capabilities into medical foundation model development and clinically verifiable workflows.

Project focus

  • Medical foundation model development: long-context, multimodal, and Agent capabilities for healthcare workflows
  • Data flywheel: evaluation findings are converted into targeted cases, feedback signals, and iterative improvement loops
  • Clinically verifiable Agent workflows: outputs are structured around traceable evidence, review checkpoints, and reproducible decision paths

Grant support

Program MiniMax Cowork Team Fellowship
Support USD 4,500 compute grant
Direction Medical foundation model development and clinically verifiable Agent workflows
MiniMax Cowork Team Fellowship

Highlights

MiniMax Cowork Team Compute Grant Medical Foundation Models

MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning


A data-and-model framework for trustworthy medical deepfake detection, built around evidence-grounded reasoning, forgery localization, and a public demo stack.

What is included

  • ACL 2026 main conference paper on interpretable medical forgery detection
  • MedForge-90K dataset: 30K real images, 30K lesion implant forgeries, and 30K lesion removal forgeries
  • MedForge-Reasoner: a Qwen3-VL based detector using a Localize-then-Analyze reasoning pipeline
  • Interactive demo for medical image deepfake detection and reasoning visualization

Why it matters

  • Moves beyond black-box real/fake prediction to localized, evidence-grounded explanations
  • Targets realistic lesion implantation and removal risks in chest X-ray, brain MRI, and fundus images
  • Combines dataset, model, and demo into a single research artifact instead of a paper-only release

Public resources

Asset Details
Paper ACL 2026 Main Conference
Dataset MedForge-90K, covering CT, MRI, and X-ray with 19 lesion types
Model MedForge-Reasoner on Hugging Face
Demo Online detector Space for interactive testing
MedForge framework

Highlights

ACL 2026 Demo + Model + Dataset

Med-Banana-50K: Large-Scale Medical Image Editing Dataset


An open medical image editing dataset with 50,635 successful edits and 37,822 failed attempts across three modalities and 23 disease types.

Key Features

  • 50,635 successful edits across 3 medical imaging modalities
  • Chest X-ray: 12 pathology types (Pneumothorax, Pleural Effusion, etc.)
  • Brain MRI: 4 tumor types (Glioma, Meningioma, Pituitary)
  • Fundus photography: 7 disease types (Diabetic Retinopathy, Glaucoma, etc.)
  • Bidirectional editing: lesion addition and removal
  • LLM-as-Judge quality control with medically grounded rubric
  • 37,822 failed attempts with full conversation logs for preference learning and alignment research

Dataset Statistics

Modality Task Diseases Success Failed
Chest X-ray Add 12 9,854 7,971
Chest X-ray Remove 12 10,667 4,750
Brain MRI Add 4 4,536 8,630
Brain MRI Remove 4 4,355 6,949
Fundus Add 7 18,505 3,162
Fundus Remove 7 2,718 6,360
Total 23+ 50,635 37,822

Open asset: Dataset, code, and paper are publicly available for medically grounded image editing research.

DivScore: Zero-Shot LLM Detection in Specialized Domains


A zero-shot detection framework for identifying LLM-generated text in specialized domains like medicine and law, using normalized entropy-based scoring and domain knowledge distillation.

Key Innovations

  • Zero-shot detection: No training data required for new domains
  • Normalized entropy scoring: Robust metric for specialized text
  • Domain knowledge distillation: Leverages domain-specific patterns
  • Cross-domain robustness: Tested on medical, legal, and financial texts

Performance Highlights

Metric Improvement
AUROC +14.4% vs. SOTA
Recall @ 0.1% FPR +64.0% vs. SOTA
Zero-shot Capability No training needed

Applications

  • Detecting AI-generated medical content to combat misinformation
  • Verifying authenticity of legal documents and contracts
  • Ensuring integrity in academic and scientific publishing
  • Quality control for financial reports and analysis
DivScore framework

Published at

EMNLP 2025 Main Conference

Legal ASR Service: Whisper Large-v2 Deployment with Docker + FastAPI


A GPU-accelerated legal-domain speech-to-text service delivered for Haiwen & Partners LLP (HK), built on Whisper Large-v2 and packaged as a production serving stack.

Serving stack

  • Whisper Large-v2 with GPU acceleration for legal-domain audio
  • Docker + FastAPI serving pipeline for reproducible deployment
  • 7.8% average WER on the delivered legal transcription workload
Legal ASR Service

Highlights

Whisper Large-v2 Docker + FastAPI WER 7.8%

Quant Trading Agent: Autonomous LangChain Agent for HK Equities


An autonomous trading agent built at AQUMON on a LangChain architecture, orchestrating market analysis, signal generation, decision-making, and execution monitoring into a single closed loop for programmatic Hong Kong equity trading.

What it does

  • LangChain orchestration connecting market understanding, signal generation, strategy decision, and execution monitoring
  • Futu OpenAPI integration for real-time market-data streaming and automated order execution
  • Strategy validation loop to speed up iteration and deployment of programmatic strategies

Why it matters

  • Demonstrates agent orchestration and tool-calling against a real, latency-sensitive external API
  • End-to-end loop from perception to action, the same shape as agentic post-training environments
Quant Trading Agent

Highlights

LangChain Agent Futu OpenAPI AQUMON (HK)

Smart Word Agent: End-to-End ReAct Agent Harness for Document Workflows


A production agent harness that parses, edits, and re-formats documents from natural-language instructions, built on the ReAct paradigm with Kimi-K2 as the core LLM. Designed as a self-contained, shippable agent rather than a notebook demo.

Harness design

  • ReAct loop with tool-calling over document APIs: parsing, bulk formatting, table manipulation, attachment reasoning
  • Multi-document context with Kimi-K2 long-context handling for complex multi-section documents
  • Stateless agent loop with streaming responses for low-latency interaction
  • Zero-dependency shipping: packaged as a single-file portable executable for end users

Traction

  • 1,000 downloads in the first week after release
  • 100+ GitHub stars as an open-source project
Smart Word Agent

Highlights

ReAct Harness Kimi-K2 Core LLM 1K downloads / week 1