BestMan: A Modular Mobile Manipulator Platform for Embodied AI with Unified Simulation-Hardware APIs

Kui Yang^1,†, Nieqing Cao^2,†, Yan Ding^3,*, Chao Chen^1,*,

¹Chongqing University, ²Xi'an Jiaotong-Liverpool University, ³Shanghai Artificial Intelligence Laboratory, ^†Equal contribution, ^*Corresponding author: Yan Ding [yding25 (at) binghamton.edu] and Chao Chen

ArXiv Code Works Citing BestMan

Abstract

Embodied Artificial Intelligence (Embodied AI) emphasizes agents' ability to perceive, understand, and act in physical environments. Simulation platforms play a crucial role in advancing this field by enabling the validation and optimization of algorithms. However, existing platforms face challenges such as multilevel technical integration complexity, insufficient modularity, interface heterogeneity, and adaptation to diverse hardware. We present BestMan, a simulation platform based on PyBullet, designed to address these issues. BestMan introduces an integrated multilevel skill chain for seamless coordination across perception, planning, and control; a highly modular architecture for flexible algorithm integration; unified interfaces for smooth simulation-to-reality transfer; and a hardware-agnostic approach for adapting to various mobile manipulator configurations. These features collectively simplify development and enhance platform expandability, making BestMan a valuable tool for Embodied AI research.

Framework of BestMan

The platform comprises ten major components (highlighted in blue and red): Perception, Task Planning, Navigation, Manipulation, Configuration, Asset, Visualization, Controller, Sensor, and Robotics API. Each component contains modules (highlighted in yellow), where various algorithms can be implemented, represented by rounded rectangles, with the default methods highlighted in green. The ellipsis (`...') indicates customizable modules or algorithms that users can extend. The unified simulation-hardware robotics APIs are constructed based on the control and sensing components, while other components are independent of these robotics APIs. The right panel illustrates the platform's applicability across various real and simulated mobile manipulators and environments.

Partial Asset

In the simulation environment of the platform, we provide an extensive collection of rigid and articulated interactive objects, simulating a wide range of household scenarios. The platform supports diverse configurations of mobile manipulators, which consist of modular components such as customizable bases, robotic arms, and interchangeable end-effectors, offering flexibility for various task executions. Furthermore, we have extended the capabilities of the platform to include advanced robotic systems, such as quadrupedal robots (robotic dogs) and humanoid robots. This extension facilitates the exploration of more complex embodied AI tasks, including dexterous manipulation, dynamic locomotion, and human-robot interaction. These enhancements enable the simulation of real-world environments where embodied agents must perceive, plan, and act within both structured and unstructured settings, fostering advancements in areas such as household robotics, assistive technologies, and autonomous systems.

Basic demos

The video showcases basic demonstrations within the platform's simulation environment, featuring tasks such as autonomous navigation, object manipulation, and interaction with articulated objects in a household scenario. These demonstrations emulate common real-world household interactions, including mobile manipulator path planning in indoor environments, object grasping and placement, as well as operations on articulated objects such as doors and microwave ovens. The entire simulation process is rendered through Blender, delivering high-quality visual effects that enhance the realism and detail of the simulated environment.

The platform supports a diverse range of mobile manipulators, including modular bases, robotic arms, and interchangeable end-effectors, enabling the execution of more complex and varied tasks. Moreover, thanks to the platform's unified simulation-to-real interface, which resolves heterogeneity in interfaces and ensures good decoupling between software and hardware, solutions validated in the simulation can be rapidly and efficiently transferred to real-world devices. This seamless transition from simulation to reality provides a robust validation framework and deployment pathway for the development of mobile manipulators, accelerating their application in real-world scenarios.

BibTeX


@inproceedings{Yang2024BestManAM,
    title={BestMan: A Modular Mobile Manipulator Platform for Embodied AI with Unified Simulation-Hardware APIs},
    author={Kui Yang and Nieqing Cao and Yan Ding and Chao Chen},
    year={2024},
    url={https://api.semanticscholar.org/CorpusID:273403368}
}

Prompt Design for Plan Monitor and Knowledge Acquirer

The realization of our plan monitor relies on repeatedly querying GPT-3 for each action using the following prompt.
Prompt 1: Is it suitable for a robot to [Perform-Action], if [Situation]?

The following template is for querying an LLM for acquiring common sense about action effects.
Prompt 2: Is it suitable for a robot to [Perform-Action-with-Object]?
Prompt 3: There are some objects, such as [Object-1], [Object-2], ..., and [Object-N]. Which is the most suitable for [Current-Task], if [Situation]?
^⁎ Please note that these prompts are zero-shot.

Situation Dataset

Users can download both the MTurk questionnaire and the situation dataset at the top of the website. The dataset is provided as a CSV file comprising 12 separate sheets, each representing situations for a distinct task. The name of the task corresponds to the sheet name. Each sheet consists of five columns.

The situations’ descriptions provided by the MTurkers are in Column A. Column B details the corresponding steps where the described situation occurs. Column C is the index of distinguishable situations, while Column D provides descriptions of these situations. Finally, Column E indicates the number of distinguishable situations.

Experiments

The task completion percentage of COWP (ours) and five baseline methods under 12 different tasks. The x-axis represents the task name, and the y-axis represents the task completion percentage. The task completion percentage for each value is an average of 150 trials. The tasks are sorted based on the performance of COWP, where the very left corresponds to its best performance.

Below shows the prompts and hyperparameters of baselines in the evaluation, where [] is a placeholder.

Prompt for ProgPrompt
Prompt for Inner Monologue
Hyperparameters used in LMZSP


from actions import walk <obj>, run <obj>, grab <obj>, switchon <obj>, switchoff <obj>, open <obj>, close <obj>, find <obj>, putin <obj> <obj>, fill <obj> <obj>, clean <obj>, wash <obj>
objects = ['wine', 'bucket', 'dish bowl', 'chips', 'sponge', 'snack', 'kitchencabinet', 'wastecontainer', 'cleaning bottle', 'drinking glass', 'kitchen cabinet', 'dish', 'coffee table', 'blender', 'dining table', 'mug', 'coffee maker', 'dehumidifier', 'air fryer', 'water filter', 'tea', 'dining', 'coffee filter', 'colander', 'orange juice', 'condiment bottle', 'watermelon juice', 'mat', 'closet', 'beer', garbagecan', 'cutlery knife', 'ice cream', 'sauce', 'table_1', 'oven tray', 'refrigerator', 'table cloth', 'steak', 'cupboard', 'wineglass', 'kitchen', 'cutting board', 'noodles', 'kitchen table', 'wooden chopstick', 'frying pan', 'cloth napkin', 'piano bench', 'toaster']
def put_the_wine_glass_in_the_kitchen_cabinet():
      # 0: walk to kitchen
      walk('kitchen')
      # 1: find wine glass
      find('wineglass')
      # 2: grab wine glass
      assert('close' to 'wineglass')
      else: find('wineglass')
      grab('wineglass')
      # 3: find kitchen cabinet
      find('kitchencabinet')
      # 4: open kitchen cabinet
      assert('close' to 'kitchencabinet' )
      else: find('kitchencabinet')
      assert('kitchencabinet' is 'closed' )
      else: close('kitchencabinet')
      open('kitchencabinet')
      # 5: put wine glass in kitchen cabinet
      assert('wineglass' in 'hands' )
      else: find('wineglass')
      else: grab('wineglass')
      assert('close' to 'kitchencabinet' )
      else: find('kitchencabinet')
      assert('kitchencabinet' is 'opened' )
      else: open('kitchencabinet')
      putin('wineglass', 'kitchencabinet')
      # 6: close kitchen cabinet
      assert('close' to 'kitchencabinet' )
      else: find('kitchencabinet')
      assert('kitchencabinet' is 'opened' )
      else: open('kitchencabinet')
      close('kitchencabinet')
      # 7: Done

from actions import walk <obj>, run <obj>, grab <obj>, switchon <obj>, switchoff <obj>, open <obj>, close <obj>, find <obj>, putin <obj> <obj>, fill <obj> <obj>, clean <obj>, wash <obj>
objects = ['wine', 'bucket', 'dish bowl', 'chips', 'sponge', 'snack', 'kitchencabinet', 'cleaning bottle', 'drinking glass', 'kitchen cabinet', 'dish', 'coffee table', 'blender', 'dining table', 'mug', 'coffee maker', 'dehumidifier', 'air fryer', 'water filter', 'tea', 'dining', 'coffee filter', 'colander', 'orange juice', 'condiment bottle', 'watermelon juice', 'mat', 'closet', 'beer', 'garbagecan_1', 'cutlery knife', 'ice cream', 'sauce', 'table_1', 'oven tray', 'refrigerator', 'table cloth', 'steak', 'cupboard', 'wineglass', 'kitchen', 'cutting board', 'noodles', 'kitchen table', 'wooden chopstick', 'frying pan', 'cloth napkin', 'garbagecan_2', 'piano bench', 'toaster']
def throw_away_the_lime, where garbagecan_1 is broken():
      # 0: find lime
      find('lime')
      # 1: grab lime
      assert('close' to 'lime')
      else: find('lime')
      grab('lime')
      # 2: find garbage can
      find('garbagecan_1')
      assert('broken' to 'garbagecan_1')
      else: find('garbagecan_2')
      # 3: open garbage can
      assert('close' to 'garbagecan_2' )
      else: find('garbagecan_2')
      assert('garbagecan_2' is 'closed' )
      else: close('garbagecan_2')
      open('garbagecan_2')
      # 4: put lime in garbage can
      assert('lime' in 'hands' )
      else: find('lime')
      else: grab('lime')
      assert('close' to 'garbagecan_2' )
      else: find('garbagecan_2')
      assert('garbagecan_2' is 'opened' )
      else: open('garbagecan_2')
      putin('lime', 'garbagecan_2')
      # 5: close garbage can
      assert('close' to 'garbagecan_2' )
      else: find('garbagecan_2')
      assert('garbagecan_2' is 'opened' )
      else: open('garbagecan_2')
      close('garbagecan_2)
      # 6: Done


from actions import walk <obj>, run <obj>, grab <obj>, switchon <obj>, switchoff <obj>, open <obj>, close <obj>, find <obj>, putin <obj> <obj>, fill <obj> <obj>, clean <obj>, wash <obj>
objects = ['wine', 'bucket', 'dish bowl', 'chips', 'sponge', 'snack', 'kitchencabinet', 'wastecontainer', 'cleaning bottle', 'drinking glass', 'kitchen cabinet', 'dish', 'coffee table', 'blender', 'dining table', 'mug', 'coffee maker', 'dehumidifier', 'air fryer', 'water filter', 'tea', 'dining', 'coffee filter', 'colander', 'orange juice', 'condiment bottle', 'watermelon juice', 'mat', 'closet', 'beer', garbagecan', 'cutlery knife', 'ice cream', 'sauce', 'table_1', 'oven tray', 'refrigerator', 'washingsponge', 'table cloth', 'steak', 'cupboard', 'wineglass', 'kitchen', 'cutting board', 'noodles', 'kitchen table', 'wooden chopstick', 'frying pan', 'cloth napkin', 'piano bench', 'toaster', 'dishwashingliquid', 'washingcloth']
def wash_mug, where washingsponge is missing():
      # 0: walk to kitchen
      walk('kitchen')
      # 1: find sink
      find('sink')
      # 2: turn on faucet
      find('faucet')
      assert('close' to 'faucet' )
      else: find('faucet')
      assert('faucet' is 'switchoff' )
      else: switchoff('faucet')
      switchon('faucet')
      # 3: put mug under water
      find('mug')
      assert('close' to 'mug')
      else: find('mug')
      grab('mug')
      find('sink')
      assert('mug' in 'hands' )
      else: find('mug')
      else: grab('mug')
      assert('close' to 'sink' )
      else: find('sink')
      putin('mug', 'sink')
      # 4: grab dishwashing liquid
      find('dishwashingliquid')
      assert('close' to 'dishwashingliquid')
      else: find('dishwashingliquid')
      grab('dishwashingliquid')
      # 5: put dishwashing liquid on mug
      find('sink')
      assert('dishwashingliquid' in 'hands' )
      else: find('dishwashingliquid')
      else: grab('dishwashingliquid')
      assert('close' to 'sink' )
      else: find('sink')
      putin('dishwashingliquid', 'sink')
      # 6: grab washingsponge
      find('washingsponge')
      assert('missing' to 'washingsponge')
      else: find('washingcloth')
      grab('washingcloth')
      # 7: start scrubbing mug
      find('sink')
      assert('washingcloth' in 'hands' )
      else: find('washingcloth')
      else: grab('washingcloth')
      assert('close' to 'sink' )
      else: find('sink')
      putin('washingcloth', 'sink')
      # 8: rinse mug off with water
      # 9: dry mug with towel
      # 10: Done

from actions import walk <obj>, run <obj>, grab <obj>, switchon <obj>, switchoff <obj>, open <obj>, close <obj>, find <obj>, putin<obj> <obj>, fill <obj> <obj>, clean <obj>, wash <obj>
objects = [objects in the environment]
def [task description], where [situation] ():


Here are actions that can be executed by the robot: walk, run, grab, switch on, switch off, open, close, find, put, fill, clean, wash

Human: put the wine glass in the kitchen cabinet
Scene: ['orchid', 'sink', 'peach', 'mouse', 'oven tray', 'hanger', 'clothes pants', 'cupcake', 'power socket', 'bell pepper', 'slippers', 'toaster', 'closet', 'floor', 'pillow', 'door jamb', 'light switch', 'faucet', 'pie', 'bookshelf', 'cutlery fork', 'condiment shaker', 'bathroom counter', 'keyboard', 'cutlery knife', 'bananas', 'washing machine', 'box', 'ceiling', 'creamy buns', 'bed', 'crackers', 'bathroom', 'stove', 'paper', 'condiment bottle', 'lime', 'stove fan', 'washing sponge', 'deodorant', 'radio', 'kitchen', 'toilet', 'fridge', 'bedroom', 'dishwashing liquid', 'kitchen cabinet', 'remote control', 'folder', 'bar soap', 'bench', 'coffee pot', 'frying pan', 'curtains', 'desk', 'door', 'toothpaste', 'computer', 'painkillers', 'towel rack', 'cereal', 'wall', 'wall picture frame', 'bathtub', 'dish bowl', 'living room', 'cabinet', 'ceiling lamp', 'clothes pile', 'cpu screen', 'plum', 'photo frame', 'stall', 'table lamp', 'rug', 'toothbrush', 'coffee table', 'plate', 'water glass', 'chocolate syrup', 'window', 'bathroom cabinet', 'face cream', 'whipped cream', 'closet drawer', 'kitchen counter', 'tv', 'microwave', 'mug', 'perfume', 'salmon', 'candy bar', 'kitchen table', 'coffee maker', 'wall lamp', 'bread slice', 'towel', 'mouse mat', 'apple', 'cellphone', 'wall shelf', 'book', 'sofa', 'chips', 'wall phone', 'kitchen counter drawer', 'clothes shirt', 'candle', 'hair product', 'wine glass', 'garbage can', 'nightstand', 'clock', 'tv stand', 'chair']
Robot:
  0: walk to kitchen
  1: find wine glass
  2: grab wine glass
  3: find kitchen cabinet
  4: open kitchen cabinet
  5: put wine glass in kitchen cabinet
  6: close kitchen cabinet
  7: Done

Human: throw away the lime
Scene: ['garbage can_1 is broken', 'orchid', 'sink', 'peach', 'mouse', 'garbage can_1', 'oven tray', 'hanger', 'clothes pants', 'cupcake', 'power socket', 'bell pepper', 'slippers', 'toaster', 'closet', 'floor', 'pillow', 'door jamb', 'light switch', 'faucet', 'pie', 'bookshelf', 'cutlery fork', 'condiment shaker', 'bathroom counter', 'keyboard', 'cutlery knife', 'bananas', 'washing machine', 'box', 'ceiling', 'creamy buns', 'bed', 'crackers', 'bathroom', 'stove', 'paper', 'condiment bottle', 'lime', 'stove fan', 'washing sponge', 'deodorant', 'radio', 'kitchen', 'toilet', 'fridge', 'bedroom', 'dishwashing liquid', 'kitchen cabinet', 'remote control', 'folder', 'bar soap', 'bench', 'coffee pot', 'frying pan', 'curtains', 'desk', 'door', 'toothpaste', 'computer', 'painkillers', 'towel rack', 'cereal', 'wall', 'wall picture frame', 'bathtub', 'dish bowl', 'living room', 'cabinet', 'ceiling lamp', 'clothes pile', 'cpu screen', 'plum', 'photo frame', 'stall', 'table lamp', 'rug', 'toothbrush', 'coffee table', 'plate', 'water glass', 'chocolate syrup', 'window', 'bathroom cabinet', 'face cream', 'whipped cream', 'closet drawer', 'kitchen counter', 'tv', 'microwave', 'mug', 'perfume', 'salmon', 'candy bar', 'kitchen table', 'coffee maker', 'wall lamp', 'bread slice', 'towel', 'mouse mat', 'apple', 'cellphone', 'wall shelf', 'book', 'sofa', 'chips', 'wall phone', 'kitchen counter drawer', 'clothes shirt', 'candle', 'hair product', 'wine glass', 'garbage can_2', 'nightstand', 'clock', 'tv stand', 'chair']
Robot:
  0: find lime
  1: grab lime
  2: find garbage can_2
  3: open garbage can_2
  4: put lime in garbage can_2
  5: close garbage can_2
  6: Done

Human: wash mug
Scene: ['washing sponge is missing', 'orchid', 'sink', 'peach', 'mouse', 'oven tray', 'hanger', 'clothes pants', 'cupcake', 'power socket', 'bell pepper', 'slippers', 'toaster', 'closet', 'floor', 'pillow', 'door jamb', 'light switch', 'faucet', 'pie', 'bookshelf', 'cutlery fork', 'condiment shaker', 'bathroom counter', 'keyboard', 'cutlery knife', 'bananas', 'washing machine', 'box', 'ceiling', 'creamy buns', 'bed', 'crackers', 'bathroom', 'stove', 'paper', 'condiment bottle', 'lime', 'stove fan', 'washing sponge', 'deodorant', 'radio', 'kitchen', 'toilet', 'fridge', 'bedroom', 'dishwashing liquid', 'kitchen cabinet', 'remote control', 'folder', 'bar soap', 'bench', 'coffee pot', 'frying pan', 'curtains', 'desk', 'door', 'toothpaste', 'computer', 'painkillers', 'towel rack', 'cereal', 'wall', 'wall picture frame', 'bathtub', 'dish bowl', 'living room', 'cabinet', 'ceiling lamp', 'clothes pile', 'cpu screen', 'plum', 'photo frame', 'stall', 'table lamp', 'rug', 'toothbrush', 'coffee table', 'plate', 'water glass', 'chocolate syrup', 'window', 'bathroom cabinet', 'face cream', 'whipped cream', 'closet drawer', 'kitchen counter', 'tv', 'microwave', 'mug', 'perfume', 'salmon', 'candy bar', 'kitchen table', 'coffee maker', 'wall lamp', 'bread slice', 'towel', 'mouse mat', 'apple', 'cellphone', 'wall shelf', 'book', 'sofa', 'chips', 'wall phone', 'kitchen counter drawer', 'clothes shirt', 'candle', 'hair product', 'wine glass', 'garbage can', 'nightstand', 'clock', 'tv stand', 'chair']
Robot:
  0: walk to kitchen
  1: find sink
  2: switch on faucet
  3: put mug in sink
  4: grab dishwashing liquid
  5: put dishwashing liquid in sink
  6: grab washing cloth
  7: put washing cloth in sink
  8: wash mug
  8: Done


Scene: [situation, ojects in the environment]
Human: [task description]
Robot:


planning_lm_id = 'text-davinci-003'
translation_lm_id = 'stsb-roberta-large'
MAX_STEPS = 12  # maximum number of steps to be generated
CUTOFF_THRESHOLD = 0.5  # early stopping threshold based on matching score and likelihood score
P = 0.5  # hyperparameter for early stopping heuristic to detect whether Planning LM believes the plan is finished
BETA = 0.3  # weighting coefficient used to rank generated samples
sampling_params =
              {   "max_tokens": 256,
                  "temperature": 0.9,
                  "top_p": 0.9,
                  "n": 10,
                  "logprobs": 1,
                  "presence_penalty": 0.5,
                  "frequency_penalty": 0.3,
                  "stop": '\n'
              }

Conclusion

In this paper, we develop a Large Language Model-based open-world task planning system for robots, called COWP, towards robust task planning and situation handling in open worlds. The novelty of COWP points to the integration of a classical, knowledge-based task planning system, and a pretrained language model for commonsense knowledge acquisition. The marriage of the two enables COWP to ground domain-independent commonsense knowledge to specific task planning problems. To evaluate COWP systematically, we collected a situation dataset that includes 1085 situations in a dining domain. Experimental results suggest that COWP performed better than existing task planners developed for closed-world and open-world scenarios. We also provided a demonstration of COWP using a mobile manipulator working on delivery tasks, which provides a reference to COWP practitioners for real-world applications.

BibTeX


    @article{ding2023integrating,
      title={Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds},
      author={Ding, Yan and Zhang, Xiaohan and Amiri, Saeid and Cao, Nieqing and Yang, Hao and Kaminski, Andy and Esselink, Chad and Zhang, Shiqi},
      journal={Autonomous Robots},
      year={2023}
    }