For Byte, whose overseas AI business relies on APIs from foreign manufacturers, Microsoft's attitude will be crucial.

text | Zhou Xinyu

editor | Deng Yongyi

On the morning of December 16, 2023, Beijing time, an article by Alex Health, author of the technology media Command Line, brought OpenAI’s accusation against ByteDance to the forefront.

In this "advocate", Byte is accused of secretly using OpenAI's model API to train and evaluate models at almost every stage in the large language model development project Project Seed.

"The employees involved were well aware of this." Alex Health claimed that he saw it with his own eyes on Feishu, Byte's communication platform, where employees discussed how to whitewash evidence through data desensitization. "The abuse is so common that Project Seed employees frequently reached the limit on the number of times they could access the API.”

The outcome of the complaint was that OpenAI banned ByteDance’s account. OpenAI spokesperson Niko Felix issued a statement via Alex Health:

All API customers are required to comply with our usage policy to ensure that our technology is used well. While ByteDance's use of our API is minimal, we have suspended their account while we investigate further. If we find that their use does not comply with these policies, we will require them to make necessary changes or terminate their account.

Statement from OpenAI spokesperson Niko Felix.

’s so-called “Seed” is a basic large-scale language model development project launched by Byte at the end of 2022. There are two main products under this project, one is the chat robot "Doubao" that has been launched in China, and the other is a robot platform that is under development and plans to provide external services through the Volcano engine.

An industry insider told 36Kr that it is not uncommon for domestic manufacturers to use APIs of mainstream foreign models to test business and train models first: “Use advanced models to start the business first, and then wait until their model training capabilities reach the standard. Replace it."

Many people familiar with the matter revealed to 36Kr that ByteDance's current model business, whether it is the product project Flow or the large model project Seed, has plans to focus on both domestic and overseas business. Due to policy regulations, domestic business will use models independently developed by Byte, while overseas business will first use model API services from foreign manufacturers.

In OpenAI’s service regulations, there is indeed content related to competition protection. In order to prevent customers from using OpenAI's services to develop competing products, OpenAI has strict regulations on the scope of customer use: only the development of non-commercial AI models for data governance or models used to fine-tune OpenAI's external services is allowed.

OpenAI’s Service Terms. After the

"blacklist" incident, ByteDance spokesperson Jodi Seth also responded quickly that day. She said that GPT-generated data was used to annotate models in the early days of Project Seed, and was removed from ByteDance’s training data around the middle of this year:

ByteDance obtained a license from Microsoft to use the GPT API. We use GPT to power products and features for non-China markets, but use our self-development model to power Beanbao, which is only available in China.

This statement admits that Byte has used data generated by GPT to train models, but this behavior occurred before OpenAI set service regulations. It can be seen that the earliest version of OpenAI’s service regulations was released on August 28, 2023, and Byte claimed that it had stopped using data generated by GPT in the training process before the middle of the year. The first version of the service regulations of

OpenAI will be updated in August 2023. Another focus of

byte's response is to emphasize that GPT's API services are obtained through Microsoft cloud service Azure, rather than directly from OpenAI. In other words, OpenAI’s “blacklisting” seems to be more of a substitution.

However, even Microsoft Azure has similar competition protection clauses to OpenAI: “Customers shall not use, and shall not allow third parties to use, Microsoft’s generative artificial intelligence services to create, train, or improve (directly or indirectly) similar or competitive Products or services."

Microsoft Azure Generative Artificial Intelligence Terms of Service

Today, many people are waiting for Microsoft Azure's response. For Byte, whose overseas AI business relies on APIs from foreign manufacturers, Microsoft's attitude will be crucial.

welcome to communicate