AI0 views

Hollywood Stars Found in Dataset Training Medical AI for Stroke Detection

A medical image dataset described as "bad to the point of being comical" has led to the retraction of several scientific papers after researchers discovered it contained photos of Hollywood actors instead of actual clinical cases. Investigated by two whistleblowers, the Kaggle-hosted dataset used to train AI models for stroke detection featured recognizable faces like Sylvester Stallone (as Rambo), George Clooney, Angelina Jolie, and Daniel Craig.

The discovery highlights a critical failure in data provenance within medical research. While Kaggle maintains that such datasets can be used for benchmarking and software development, they emphasize that the platform relies on community moderation for metadata accuracy. The core issue remains that these images were improperly used as primary evidence for clinical research, leading to unreliable results in a field where accuracy is life-critical.

  • Retracted Papers: Multiple studies relying on the flawed dataset have been pulled.
  • Data Source: The images originated from a public repository without rigorous clinical verification.
  • Risk Factor: Using entertainment imagery to train diagnostic models undermines the safety of clinical AI.