Microsoft Research and Peking University scientists are trying to enable ChatGPT, powered by GPT-4, to autonomously navigate and operate within an operating system.
This development tackles the challenge of equipping large language models (LLMs) to effectively manipulate and interact within digital environments, beyond their established generative text capabilities, a task that has proven difficult.
Did you know?
Want to get smarter & wealthier with crypto?
Subscribe - We publish new crypto explainer videos every week!
What is Cardano in Crypto? (Easily Explained!)
Unlike the success seen in simulated environments like video games, where AI models have demonstrated proficiency through reinforcement learning, operating systems present a multifaceted challenge.
This complexity stems from the need for AI to engage with various components, applications, and programs, coupled with the risk of data loss through errors.
Researchers identified the vast and dynamic action space, the necessity for inter-application cooperation, and the alignment with user constraints such as security and preferences as the three primary obstacles of LLMs.
They also added:
We identify a lack of four key capabilities, i.e., understanding, reasoning, exploration, and reflection, as primary reasons for the failure of LLM agents.
To overcome these challenges, the team introduced AndroidArena, a novel training platform mimicking the Android operating system and allowing LLMs to learn through exploration.
This environment was created to facilitate the learning and exploration of LLMs in a more representative setting, focusing on the unique requirements of operating system manipulation.
By integrating automated feedback about the model's previous attempts and actions into the prompts, a form of "reflection" or memory capability was introduced, which was an effective method to enhance model accuracy by 27%.
This research not only sheds light on the challenges of AI interaction with operating systems but also paves the way for the development of more sophisticated and capable AI assistants.
In November 2023, Microsoft hired former OpenAI CEO Sam Altman and president Greg Brockman to lead a new advanced AI research team.