Cressey from Ao Fei Si
qubits | Public account qbitai
It’s coming, it’s coming, Apple’s apple intelligence finally meets Apple fans!
With the launch of the ios 18.1 beta version, registered developers can experience some of the functions of Apple AI from now on. The most obvious thing about
is that siri has been completely renewed and transformed into apple intelligence siri.
Another major update is the writing function , which can help polish Twitter comments and arrange advanced expressions in three strokes, five divisions and two.
Even dirty words can become elegant and easy-going in minutes:
After turning on apple intelligence, Apple's self-developed large terminal-side model will be downloaded to the device.
According to feedback from Shoukuai netizens, it is not like other AIs that always refuse service.
At the same time, a report on Apple’s own large model has also been released, revealing a lot of technical details. The
report shows that Apple's cloud large model has achieved results exceeding gpt-4 in tasks such as instruction following and text summarization.
Pang Ruoming (ruoming Pang), head of Apple's basic large model team, also said that its model is competitive with some of the best models in its class.
Pang Ruoming has a Ph.D. in computer science from Princeton. He graduated from Shanghai Jiao Tong University and the University of Southern California with bachelor's and master's degrees respectively. He joined Apple in 2021. He previously worked as an engineer at Google for 15 years. The main dialogue function of
apple intelligence is supported by the model developed by the team led by him.
This time he also emphasized that these basic models "are not chatbots" but support a wide range of functions, including summarization, writing assistance, tool usage and code.
In addition, Apple has also developed many self-developed algorithms to provide support for improving model performance. Specific information was also disclosed in the report.
Some attentive netizens discovered Huadian -
The Apple large model is trained using a Google TPU cluster, and the NVIDIA content is actually zero.
siri has been upgraded, but chatgpt has not yet been connected to
. If you want to experience Apple's apple intelligence, there are many conditions that need to be met.
First of all, the ios 18.1 beta version equipped with it is currently registered developers limited to for 99 US dollars a year, so ordinary users have to wait.
And as mentioned before, it only supports the M series and a17 pro chips, which means that only the 15 pro and 15 pro max in some areas can be used on iPhone.
In addition to the hardware and identity requirements, the system settings also need to be modified. The region must be set to the United States, and the language of the device and Siri must be changed to English.
After meeting all these requirements, you can... join the waiting queue.
The apple intelligence launched this time is part of the function, mainly focusing on the modules of text generation, Siri and photo album.
Let’s first talk about text generation . As an important part of Apple AI, the scope of application of this function is not limited to Apple’s official applications.
As long as the standard input text system is used, this function can also be used in third-party applications for text summarization, proofreading and rewriting.
In addition, combined with the audio transcription function already available in the voice memo of ios 18 beta, the text generation system can also generate summaries for recordings.
The second more important update is siri. On the
interface, the new version of Siri is no longer a circular icon. When running, a colored light surrounding the screen will flash continuously.
and also provides text conversation mode for users who do not want voice dialogue. Double-click the bottom of the screen to bring up the keyboard and communicate with Siri by typing.
In terms of content, the new version of Siri will be able to answer questions related to Apple products and help users troubleshoot.
Additionally, the new Siri will also be able to understand the context from one query to the next, such as asking Siri to create a calendar event and then requesting to create a reminder without having to reiterate what was being said.
However, the screen sensing function introduced previously is not included in this Siri update. Updates to the
photo album allow users to use natural language to search for specific photos or even specific moments in videos.
The above is the general content of AI in this developer test version. It should be pointed out that this is only part of the functions displayed at the previous press conference, and there are still many that have not been launched online.
In particular, the previously mentioned chatgpt integration has not yet been integrated into in this update.
Decrypt Apple's large model
Apple has said that chatgpt is not a required option in Apple AI, and the main function is driven by its own large model.
Regarding this model, Apple also released a comprehensive technical report at the same time it went online. The name of the
model is simple and crude. It is called Apple foundation model (apple foundation model, abbreviated as afm) . It has two versions: end-side (on-device) and cloud-side (server) . The number of parameters of the
client-side model is around 3b, while the cloud side model is not specifically disclosed. It is only said to be larger than the client-side model. Both have a context window of 32k.
training process NVIDIA content is 0
model training through its own jax-based axlearn framework, and adopts tensor parallelism, pipeline parallelism and other strategies.
hardware uses Google TPU, of which 8192 tpuv4 chips are used on the cloud side and 2048 tpuv5p chips are used on the client side. In short, the NVIDIA content is 0. The
data mainly comes from web pages crawled by applebot, as well as publicly licensed code and mathematical data sets.
It is worth mentioning that none of the data sets selected by Apple use GPL, but all open source protocols such as MIT, Apache, and CC0 are more open.
process, afm's pre-training process is divided into three stages - core training, continued training and context extension.
In the core training phase, the data volume of the cloud version is 6.3t tokens and the window length is 4096. The client version is distilled on this basis.
When training continues, the weight of low-quality data will be reduced, and mathematics, code, and authorized high-quality data will be used to improve the model's capabilities.
This process uses data of 1t tokens, and the window length also changes from 4096 to 8192.
In the next stage, the window length was further expanded to 32k, involving long sequences of text and synthetic data, totaling 100b tokens.
The original reinforcement learning new algorithm
afm's post-training includes guidance and supervision fine-tuning (sft) , human feedback reinforcement learning (rlhf) and other work.
The sft stage uses synthetic data and human annotated data. The synthetic data is mainly about mathematics, tool usage and code.
During the rlhf stage, Apple created its own two reinforcement learning algorithms: itec and mdloo.
itec, the full name of iterative teaching committee, can be translated as "iterative teaching committee". It is an algorithm used for reinforcement learning post-training, aiming to optimize the performance of the model through multiple rounds of iterations. The core idea of
is to combine different preference optimization algorithms, including rejection sampling, direct preference optimization (dpo) , so that the model can benefit from multiple optimization strategies, thereby improving its adaptability and performance for specific tasks.
In each iteration, itec will select a group of best-performing models from the latest models to form a "model committee". These models are obtained through different training methods such as sft, rs, dpo/ipo and rl.
By collecting human preference feedback on model responses, itec continuously updates its reward model and is used to train new model ensembles.
After each batch of human preference data is collected, itec will refresh its reward model and train a new model set. This cycle will perform multiple rounds of iterations to gradually improve model performance.
mdloo is an online reinforcement learning algorithm, specially designed to optimize the response quality of the model.
is an online algorithm that decodes responses in real time during model training and applies the RL algorithm to maximize rewards.
That is, this approach enables the model to continuously learn and adjust its strategy during the training process to generate responses that are more consistent with human preferences. Regarding the specific implementation of
, it combines the leave-one-out, loo) advantage estimator and the mirror drop strategy optimization (mdpo) to achieve a more stable and effective strategy update.
End-side mixed precision quantization
In order to make the end-side model run more efficiently and avoid occupying too many memory resources, Apple has performed quantization operations on the end-side version of afm.
Specifically, Apple adopts a mixed precision quantization method, using different quantization precisions for different links.
The approach adopted by Apple is called the "palette" strategy. In palette quantization, the weights are not quantized individually, but they are grouped and the weights within the group share the same quantization constant.
For the projection weights, every 16 columns/rows share the same quantization constants and use the k-means algorithm for 4-bit quantization.
is for the embedded layer. Since the input and output are shared, an 8-bit integer is used for per-channel quantization. In addition, some relatively low-importance layers are further compressed to 2-bit quantization.
In order to restore the performance lost after quantization to maintain the output quality and accuracy of the model, Apple also introduced accuracy-recovery adapters (accuracy-recovery adapters) .
This adapter is a small neural network module that can be inserted into a specific layer of the pre-trained model, trained on the basis of the quantized model, and learns how to compensate for the impact of quantization through fine-tuning. After applying a series of optimization techniques, it is time to accept the model performance.
In this process, Apple adopted a strategy of combining human evaluation and automated evaluation.
Let’s talk about manual evaluation first. The evaluators designed multiple types of questions covering analytical reasoning, brainstorming, chat robots, etc., and let the model generate responses.
At the same time, the question will also be asked to other models for comparison, and then the evaluators will judge which model has better output.
As a result, whether it is a cloud-side or a device-side model, there is at least a 60% probability of not losing to llama 3, gpt-4 and other comparative models.
The remaining tests are mainly implemented using data sets.
In terms of command following ability, Apple conducted an ifeval test. As a result, at both the command and prompt levels, cloud-side afm surpassed gpt-4 and became the new sota. The performance of the
end-side model also exceeds that of similar-scale models such as llama 3-8b and mistral-7b.
In alpacaeval, both the client side and the cloud side afm also achieved second place.
Looking at the performance on specific tasks, afm achieved sota in the summary task in the writing benchmark, and was also close to the first place in the writing task.
In mathematics, Apple used two data sets, gsm8k and math, for evaluation.
As a result, the end-side model is inferior to llama 3-8b and Microsoft's phi 3 mini on gsm8k, and the cloud side is surpassed by gpt-4 and llama 3-70b, but better than gpt-3.5. The results on
math are relatively high. The client version is ahead of models of the same size, and the cloud version also surpasses llama 3-70b.
In addition to performance, security is also very important. Apple manually evaluated afm's ability to resist adversarial attacks.
The results show that afm achieves significantly lower violation rates than other open source and commercial models when faced with adversarial prompts.
The above are some noteworthy contents in Apple’s large model technical report. For more details, please refer to the original report.
one more thing
Although apple intelligence has been provided to developers for testing, Bloomberg broke the news that the official version of may be delayed in launching .
Indeed, according to Apple’s previous version release rules, the 18.1 version number also means that these features will not be launched with the release of new machines in September.
Analyst Gene Munster suggested that Apple should consider delaying the release date of iPhone 16 to be consistent with Apple Intelligence.
As for whether Cook will consider this suggestion, we have to wait and see.
Report address:
https://machinelearning.apple.com/research/apple-intelligence-foundation-language-models
Reference link:
[1]https://x.com/reach_vb/status/181801436655586611
[2]https: //www.cnbc.com/2024/07/29/apple-releases-apple-intelligence-its-long-awaited-ai-features.html
[3]https://www.tomsguide.com/phones/iphones/ ios-181-developer-beta-is-live-with-apple-intelligence-heres-all-the-new-iphone-ai-features
[4]https://www.businessinsider.com/apple-intelligence-delay- wont-hurt-new-iphone-sales-analysts-2024-7