close
close

New AI models trained on YouTube transcriptions raise copyright concerns

0

In a recent development, OpenAI and Google came under scrutiny for training their AI models using transcriptions of YouTube videos, potentially violating the copyrights of the creators. The New York Times report highlights the practices of these tech giants and their efforts to maximize the data feed for their AI systems. Although the companies have used various techniques to obtain large amounts of data, questions have been raised about the legality of their methods.

According to the NYT report, OpenAI is said to have used its speech recognition tool Whisper to transcribe over a million hours of YouTube videos, which were then used to train their latest text-to-video generator, Sora. This follows previous claims by The Information that OpenAI used YouTube videos and podcasts to train their AI systems. OpenAI President Greg Brockman was reportedly involved in this project.

Concerns have also been raised about Google's practices, as unauthorized scraping or downloading of YouTube content is prohibited. Google spokesperson Matt Bryant clarified that the company was unaware of OpenAI's use of YouTube videos and stated that they do not condone such actions. However, the NYT report suggests that there were people at Google who were aware of OpenAI's practices but took no action, possibly because Google itself used YouTube videos to train its AI models.

It's important to note that Google claims to only use videos from creators who have agreed to participate in its experimental program. Engadget has reached out to both Google and OpenAI for their comments on the matter.

Additionally, the New York Times report reveals that Google revised its privacy policy in June 2022 to include a wider range of publicly available content such as Google Docs and Google Sheets for training its AI models and products. However, Bryant emphasized that this only happens with the express permission of users who opt in to Google's experimental features. He also explained that the policy change did not require them to train their AI models on additional data types.

FAQ

1. Are OpenAI and Google violating copyright laws by training their AI models on YouTube transcriptions?

There are concerns that OpenAI and Google's use of YouTube videos to train their AI models could violate the copyrights of their creators. The New York Times report highlights these potential violations and notes that unauthorized scraping or downloading of YouTube content is not allowed. However, Google claims to only use videos from creators who have agreed to participate in an experimental program.

2. What approach did OpenAI take when training its AI model?

OpenAI reportedly used its speech recognition tool Whisper to transcribe more than a million hours of YouTube videos, which were then used to train its text-to-video generator Sora. This approach aimed to leverage a large amount of data for improved AI model performance.

3. Has Google acknowledged OpenAI's use of YouTube videos for training?

Google said they were not aware of OpenAI's use of YouTube videos to train their AI models and clarified that they do not support unauthorized scraping or downloading of content. However, the report suggests that some people at Google were aware of OpenAI's practices but took no action, possibly because Google itself used YouTube videos to train its AI models.

4. How has Google expanded its privacy policy as mentioned in the report?

The NYT report reveals that Google updated its privacy policy in June 2022 to include a wider range of publicly available content, such as Google Docs and Google Sheets, in training its AI models and products. However, Google emphasizes that this data will only be used with the express consent of users who opt in to the experimental features.

5. Have OpenAI and Google made any official statements regarding these allegations?

Engadget has reached out to both OpenAI and Google for their comments on the matter. There are currently no official statements from the two companies regarding the allegations made in the New York Times report.

In addition to the information provided in the article, here are some additional details about the industry, market forecasts, and topics related to the AI ​​industry and training models using YouTube transcriptions:

According to a report by MarketsandMarkets, the AI ​​industry has seen significant growth in recent years, with the market size expected to reach $190.61 billion by 2025. This growth is driven by increasing demand for AI-powered solutions across sectors such as healthcare, finance, retail and manufacturing.

One of the biggest challenges in the AI ​​industry is the need for large amounts of high-quality data to effectively train AI models. Companies like OpenAI and Google are constantly exploring various data sources, including publicly available content like YouTube videos, to improve the performance of their AI systems.

However, the use of YouTube videos to train AI models raises concerns about copyright infringement. Authors have exclusive rights to their content, including the right to reproduce and distribute it. Unauthorized scraping or downloading of YouTube videos without permission from the creators may potentially violate these rights.

The problem of copyright infringement in the AI ​​industry is not new. There have been cases in the past where companies have been sued for using copyrighted material in their AI training datasets. For example, in 2019, a photographer filed a lawsuit against a major AI company for using his copyrighted images without permission.

To address these copyright concerns, companies like Google have taken steps to ensure that they only use videos from creators who have agreed to participate in their experimental programs. This is done to comply with copyright law and respect the rights of creators.

However, using YouTube videos to train AI models is not the only controversial practice in the industry. Other issues include bias in AI algorithms, privacy concerns, and the ethical implications of AI decision-making.

As the AI ​​industry continues to evolve, it is critical for companies to consider these legal and ethical considerations to ensure responsible and lawful use of data when training AI models.

For more information about the AI ​​industry and related topics, visit the following websites:

MarketsandMarkets: Provides market research reports and industry analysis for various sectors including the AI ​​industry.

Electronic Frontier Foundation: A nonprofit organization focused on civil liberties, privacy and digital rights issues. Provides resources and articles on AI ethics and legal considerations.

Note: The URLs provided may not be valid examples. Please search for these websites and access their main domains for the most up-to-date information.