Datasets identified by the outlet contain millions of tracks used to train artificial intelligence models.
The Atlantic has made a search tool available to the public that allows users to identify which songs were used to train artificial intelligence systems. The initiative exposes the scale and composition of the data driving the development of these models in the music industry.
The database was compiled from the identification of four distinct datasets. Two of these repositories are massive in scale, containing 12 million and 9 million tracks, respectively. The other two sets are smaller in volume but still encompass a significant amount of audio data used for training.
The tool was developed by Atlantic reporter Alex Reisner following an investigation into the materials used by tech companies to create AI models. By making the records searchable, the outlet makes it easier for artists, record labels, and researchers to verify whether specific works are included in these databases.
The revelation comes amid growing scrutiny over data acquisition for AI training. The music industry is closely monitoring debates over copyright and compensation, issues that have already prompted lawsuits from creators and the industry against AI developers in various markets.
With the publication of the searchable database, The Atlantic provides a practical resource for transparency within the AI ecosystem. The measure allows for more detailed tracking of how the global musical catalog has been appropriated to advance generative technologies.
The Atlantic has released a public search tool that allows artists, record labels, and researchers to check if specific tracks are included in the datasets used to train artificial intelligence systems.
The database was compiled from four distinct datasets. Two of these are massive, containing 12 million and 9 million tracks respectively, while the other two contain a smaller but still significant volume of audio data.
Developed by reporter Alex Reisner, the tool aims to provide transparency in the AI ecosystem. It allows for detailed tracking of how musical catalogs have been appropriated to advance generative technologies amid growing copyright and compensation debates.