Fresh juice


Multiple AI models help robots execute complex plans

While basic chores come naturally to humans, they require intricate planning for robots. Now, MIT scientists have developed a new AI system called "HiP" that allows robots to make detailed plans to accomplish complex goals.

HiP works by combining three different foundation models - large AI models trained on massive datasets like images, text or video. Each model specializes in a different capability - language reasoning, visual perception or action planning. By dividing planning tasks between specialized models rather than relying on a single monolithic model, HiP generates more nuanced step-by-step plans for robots to follow.

In tests, HiP directed a robot through multi-phase tasks like stacking blocks, arranging objects, and simulated meal preparation. Unlike other systems, HiP could dynamically adjust its plans to account for changes in the environment and tasks. For example, when instructed to stack certain colored blocks that weren't available, HiP planned for the robot to first paint white blocks the needed colors before stacking them.

Researchers say HiP represents an evolution in robotic planning toward more adaptable systems. While today's robots require meticulous coding of each sub-task, HiP leverages the power of AI for autonomous decision making. Its hierarchical approach also makes the reasoning process more transparent than end-to-end learning models.

"Instead of pushing for one model to do everything, we combine multiple ones that leverage different modalities of internet data," says PhD student Anurag Ajay. "When used in tandem, they help with robotic decision-making and can potentially aid with tasks in homes, factories, and construction sites."

Looking ahead, improved video and multisensory foundation models could further enhance HiP's contextual understanding and lead to more seamless human-robot collaboration. MIT researchers plan to test the system on additional real-world tasks like manufacturing and construction projects requiring long-horizon planning.

As foundation models continue to advance in capabilities, HiP represents a framework for imbuing robots with increasing intelligence and foresight to handle open-ended goals. That could one day provide a helpful hand with chores around the house or allow adaptive automation across many industries.

Share with friends:

Write and read comments can only authorized users