If we are to develop robust robot-based automation, we need to develop solid visual and tactile-based perception and thus equip robots with better perception and learning capabilities than they currently possess today.
Robots currently lack such visual and tactile perception skills and this is especially emphasized during manipulation of 3D compliant objects, which pose an additional challenge compared to manipulation of rigid objects.
The BIFROST project addresses these challenges by developing a visual-tactile perception, control, and learning framework in order to enable robots to manipulate 3D-compliant objects for a set of specific scenarios. By doing so, the project aims to generate new knowledge as a fundament for a future robot technology that may be capable to address in the future, e.g. challenging real-world robotic manipulation in domains characterized as critical society functions, such as those in food/seafood processing, involving compliant food object robotic manipulation.
In terms of image-based tactile perception, we have developed a soft robotic hand featuring a structurally compliant palm and high-resolution tactile sensing. This design incorporates underactuated fingers with a new low-cost illumination system, enabling greater surface contact with objects. The resulting hand is compact, slightly larger than a baseball, and is one of the first to combine soft actuated fingers with a passively compliant palm, all with tactile sensing. This approach could lead to further exploration of soft-rigid tactile robotic hands with enhanced capabilities.
Regarding grasping, we have developed a vision-based 6-DoF grasping framework using Deep Reinforcement Learning (DRL) to directly synthesize continuous actions in Cartesian space. Our approach leverages visual data from an eye-in-hand RGB-D camera and addresses the sim-to-real gap through domain randomization, image augmentation, and segmentation. The method utilizes an off-policy, maximum-entropy Actor-Critic algorithm that learns from binary rewards and simulated grasps, requiring no real-world examples or fine-tuning. The framework, validated in both simulation and real-world tasks, achieves state-of-the-art results with an 86% mean zero-shot success rate on new objects, 85% on adversarial objects, and 74.3% on challenging 6-DoF objects.
Concerning advanced manipulation, we have developed a model-free, long-horizon DRL framework to achieve target shapes through active manipulation. Our approach utilizes segmented image observations to bridge the sim-to-real gap. We address a long-horizon task with a robot arm and an RGB-D camera in eye-in-hand configuration, manipulating an elongated elastoplastic object found in food, marine, and manufacturing domains. Using PPO (Proximal Policy Optimization) with 768 parallel actors and 1.2 million interactions, we trained a DRL agent and tested it on 200 unseen target shapes. The agent achieved over 90% overlap with 82% of the targets in three attempts. The method, which does not require real-world examples or fine-tuning, was validated with a 94.2% mean zero-shot overlap success rate on new target shapes, demonstrating its robustness both in simulation and real-world tasks.
Regarding deformable object manipulation, we have developed a visual control framework for positioning feature points on a 3D deformable object's surface to target 3D locations using a robotic manipulator. Our approach accounts for the dynamic behavior of the object during manipulation. We also derive the analytical relationship between feature point motion and the robot gripper's six Degrees of Freedom (6 DOF). We then design a novel closed-loop deformation controller. To ensure robustness against model inaccuracies, real-time tracking of the object's shape is performed with an RGB-D camera, allowing for on-the-fly corrections.
BIFROST involves the development of a novel visual-tactile perception and control framework for the advanced robotic manipulation of 3D compliant objects. The ability of robots to manipulate such objects remains key to the advancement of robotic manipulation technologies. Despite tremendous progress in robotic manipulation, modern robots are generally only capable of manipulating rigid objects. As with humans, in order to plan and perform complex manipulation tasks on 3D compliant objects, robots need to "see" and "feel by touch" the objects they manipulate, and to understand their shape, compliancy, environment and context. Current visual-only robotic manipulation suffers from an inability to perceive 3D information due to real-world physical occlusions and intra-object or self-occlusions.
BIFROST will enable the simultaneous perception of 3D object shapes and their compliancy by means of visual perception by an RGB-D sensor, augmented with touch through active exploration using an image-based tactile sensor for physically occluded objects, inaccessible to the visual sensor. Based on visual and tactile perception, BIFROST will achieve active manipulation and shape servoing of 3D compliant objects by which robot control tasks are generated in sensor space by mapping raw sensor observations to actions. In order to achieve the learning of complex manipulation tasks and active deformation, we will develop a high-level multi-step reasoning framework for the automatic selection of action types designed to achieve desired states and shapes.
As with the rainbow bridge of Norse mythology, BIFROST connects two worlds. Inspired by human innate perception, understanding, manipulation and learning, the aim is to develop similar capabilities in robots.