AI7 views

Goldman Sachs Warns of AI Data Crisis

Goldman Sachs has raised a critical alarm for the AI industry: natural data for training language models has run out. This scarcity threatens to slow down AI evolution and the development of better tools.

The Synthetic Data Problem

According to Neema Raphael, Goldman Sachs' Data Director and Head of Data Engineering, the gap is currently being filled with synthetic data—information pre-processed by previous AI models. While technically unlimited, this approach comes with serious risks:

  • Lower quality information
  • Loss of human elements in training data
  • Potential degradation of future AI models

A Possible Solution: Corporate Data Vaults

Raphael believes there's still untapped potential locked away in proprietary corporate databases, beyond the reach of the public internet. This includes:

  • Trading flows
  • Customer interactions
  • Internal business records

Goldman Sachs itself holds vast amounts of such data. However, there's a catch: "The challenge is understanding the data, understanding the business context, and then being able to normalize it in a way that makes sense for the company to consume it," Raphael explained.

What This Means for AI's Future

The AI industry faces a critical crossroads: rely on potentially inferior synthetic data or unlock corporate data treasures currently kept under lock and key.