2024-04-03
CMU researchers have expanded the capabilities of video-based learning robots
A new work by researchers from Carnegie Mellon University (CMU) has allowed robots to learn how to do household chores by watching videos of people doing them. The research could help improve the use of robots in everyday life, allowing them to help people with tasks such as cooking and cleaning.
During the study, 2 robots successfully learned 12 tasks, including opening the drawer of the kitchen cabinet and the oven door, removing the pan from the stove and the lid from it, taking the handset, vegetables or cans of canned food.
"The robot can find out where and how people interact with various objects by watching videos," said Deepak Pathak, associate professor at the Institute of Robotics at the CMU School of Computer Science. "Based on this data, we can train a model that will allow two robots to perform the same tasks in different conditions."
Existing methods of teaching robots require either manual demonstration of tasks by a human, or training in a simulated environment. Both take a lot of time and are fraught with failures. In the past, Pathaka and his students demonstrated the WHIRL (In-the-Wild Human Imitating Robot Learning) method, in which robots were trained by observing human tasks. But this method required a human to perform the task in the same environment as the robot.
Pathak's latest work VRB (Vision-Robotics Bridge) is based on WHIRL. The new model eliminates the need for human demonstration and work in an identical environment. Like WHIRL, the robot still requires practice to master the task. The team's research has shown that he can master a new task in about 25 minutes.
"We were able to take robots around the campus and perform a variety of tasks," says Shikhar Bahl, a graduate student in the Department of Robotics. - Robots can use this model to explore the world around them. Instead of just waving his hands, the robot can be more direct in its interaction."
VRB determines where and how a robot can interact with an object based on human behavior. For example, by watching a person open a box, the robot determines the points of contact and the direction of movement of the box. In a press release from the university, it is reported that after watching several similar videos in which people open boxes, the robot can determine how to open any box.
The team used videos from large datasets such as Ego4D and Epic Kitchens. Ego4D contains almost 4,000 hours of video recordings of daily activities from around the world. Epic Kitchens contains similar videos depicting cooking, cleaning and other kitchen chores. Both datasets are designed to train computer vision models.
More detailed information can be found on the project's website and in the article presented in June at the Conference on Computer Vision and Pattern Recognition.
Share with friends:
Write and read comments can only authorized users
Last news