The Atlantic Releases Searchable Database of Songs Used in AI Training

Datasets identified by the outlet contain millions of tracks used to train artificial intelligence models.

The Atlantic has made a search tool available to the public that allows users to identify which songs were used to train artificial intelligence systems. The initiative exposes the scale and composition of the data driving the development of these models in the music industry.

The database was compiled from the identification of four distinct datasets. Two of these repositories are massive in scale, containing 12 million and 9 million tracks, respectively. The other two sets are smaller in volume but still encompass a significant amount of audio data used for training.

The tool was developed by Atlantic reporter Alex Reisner following an investigation into the materials used by tech companies to create AI models. By making the records searchable, the outlet makes it easier for artists, record labels, and researchers to verify whether specific works are included in these databases.

The revelation comes amid growing scrutiny over data acquisition for AI training. The music industry is closely monitoring debates over copyright and compensation, issues that have already prompted lawsuits from creators and the industry against AI developers in various markets.

With the publication of the searchable database, The Atlantic provides a practical resource for transparency within the AI ecosystem. The measure allows for more detailed tracking of how the global musical catalog has been appropriated to advance generative technologies.