June 30, 2024

Apple says ReALM outperforms GPT-4. What is it?

3 min read

Apple researchers have published a new paper asserting that their ReALM language model surpasses OpenAI’s GPT-4 in “reference resolution.”

On Friday, Apple researchers published a preprint paper on their ReALM large language model, asserting that it can “significantly outperform” OpenAI’s GPT-4 in specific benchmarks. ReALM is purportedly capable of comprehending and managing various contexts. In essence, this capability would enable users to indicate something on the screen or in the background and inquire about it using the language model.

Reference resolution is a linguistic challenge involving understanding the specific object or idea to which a particular expression refers. For instance, in conversation, we often use pronouns like “they” or “that.” While the referent might be clear to humans who can interpret based on context, a chatbot like ChatGPT may find it challenging to grasp precisely what is being referred to.

This capability to accurately determine the referent is highly important for chatbots. According to Apple, the ability for users to refer to an item on a screen using terms like “that” or “it,” and for a chatbot to understand this reference perfectly, would be essential in creating a genuinely hands-free screen experience.

Apple’s most recent paper marks the third publication on AI from the company in recent months. While it is premature to make definitive predictions, these papers could be viewed as an initial preview of features that the company intends to integrate into its software offerings such as iOS and macOS.

In the paper, researchers outlined their intention to utilize ReALM for the comprehension and identification of three types of entities: onscreen entities, conversational entities, and background entities. Onscreen entities refer to objects displayed on the user’s screen. Conversational entities are those relevant to the ongoing conversation. For instance, if a user asks a chatbot, “what workouts am I supposed to do today?” the chatbot should be able to deduce from prior interactions that the user is following a 3-day workout schedule and provide the day’s schedule accordingly.

Background entities encompass items that do not fit into the previous categories but remain pertinent. For instance, there might be a podcast playing in the background or a notification that has just sounded. Apple aims for ReALM to be capable of recognizing when a user refers to these elements.

“In our paper, we demonstrate significant improvements over an existing system with similar functionality across various types of references. Our smallest model shows absolute gains of over 5 percent for on-screen references. We also compare our model against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to GPT-4 and our larger models significantly outperforming it,” the researchers wrote in the paper.

It’s important to consider that with GPT-3.5, which only processes text, the researchers inputted only the prompt. However, with GPT-4, they also included a screenshot for the task, which significantly enhanced performance.

“Our ChatGPT prompt and prompt+image formulation are, to the best of our knowledge, innovative in themselves. While we believe that further improvements are possible, such as sampling semantically similar utterances until reaching the prompt length, this more intricate approach requires additional focused exploration, which we defer to future research,” the researchers added in the paper.

While ReALM outperforms GPT-4 in this specific benchmark, it would be misleading to conclude that ReALM is a superior model overall. ReALM simply outperformed GPT-4 in a benchmark that it was specifically designed to excel in. Additionally, it is uncertain when or how Apple intends to incorporate ReALM into its products.

Copyright © All rights reserved | WebbSocial |