BestMan: A Modular Mobile Manipulator Platform for Embodied AI with Unified Simulation-Hardware APIs

1Chongqing University, 2Xi'an Jiaotong-Liverpool University, 3Shanghai Artificial Intelligence Laboratory, Equal contribution, *Corresponding author: Yan Ding & Chao Chen

Abstract

Embodied AI emphasizes agents' ability to perceive, understand, and act in physical environments. Simulation platforms play a crucial role in advancing this field by enabling the validation and optimization of algorithms. However, existing platforms face challenges such as multilevel technical integration complexity, insufficient modularity, interface heterogeneity, and adaptation to diverse hardware.

To address these challenges, we develop the BestMan platform based on the PyBullet simulator, with the following key contributions:

  • Integrated Multilevel Skill Chain: To address multilevel technical complexity.
  • Highly Modular Design: Enables expandability and algorithm integration.
  • Unified Interfaces: Bridges simulation and real devices to address interface heterogeneity.
  • Decoupling Software from Hardware: Tackles hardware diversity challenges.

Architecture

Your image description
  • The platform comprises ten major components (highlighted in blue and red). Each component contains modules (highlighted in yellow), where various algorithms can be implemented, represented by rounded rectangles, with the default methods highlighted in green.
  • The ellipsis (...) indicates customizable modules or algorithms that users can extend.
  • The unified simulation-hardware robotics APIs are constructed based on the control and sensing components, while other components are independent of these robotics APIs.

Asset

Your image description
  • We provide an extensive collection of rigid and articulated interactive objects, simulating a wide range of household scenarios.
  • The platform supports diverse configurations of mobile manipulators, which consist of modular components such as customizable bases, robotic arms, and interchangeable end-effectors, offering flexibility for various task executions.
  • We integrate state-of-the-art digital twin algorithms, including URDFormer, enabling users to create highly customized assets.

Demos


  • The video showcases basic demonstrations within the platform's simulation environment, featuring tasks such as autonomous navigation, object manipulation, and interaction with articulated objects in a household scenario.
  • The entire simulation process is rendered through Blender, delivering high-quality visual effects that enhance the realism and detail of the simulated environment.

Contact

Dr. Yan Ding

Dr. Yan Ding

Researcher at Shanghai AI Lab

yding25@binghamton.edu

Kui Yang

Kui Yang

Master Student, Chongqing University

202414021045T@stu.cqu.edu.cn

BibTeX


@inproceedings{Yang2024BestManAM,
    title={BestMan: A Modular Mobile Manipulator Platform for Embodied AI with Unified Simulation-Hardware APIs},
    author={Kui Yang and Nieqing Cao and Yan Ding and Chao Chen},
    year={2024},
    url={https://api.semanticscholar.org/CorpusID:273403368}
}
            

Prompt Design for Plan Monitor and Knowledge Acquirer

The realization of our plan monitor relies on repeatedly querying GPT-3 for each action using the following prompt.
Prompt 1: Is it suitable for a robot to [Perform-Action], if [Situation]?

The following template is for querying an LLM for acquiring common sense about action effects.
Prompt 2: Is it suitable for a robot to [Perform-Action-with-Object]?
Prompt 3: There are some objects, such as [Object-1], [Object-2], ..., and [Object-N]. Which is the most suitable for [Current-Task], if [Situation]?
Please note that these prompts are zero-shot.

Situation Dataset

Users can download both the MTurk questionnaire and the situation dataset at the top of the website. The dataset is provided as a CSV file comprising 12 separate sheets, each representing situations for a distinct task. The name of the task corresponds to the sheet name. Each sheet consists of five columns.

The situations' descriptions provided by the MTurkers are in Column A. Column B details the corresponding steps where the described situation occurs. Column C is the index of distinguishable situations, while Column D provides descriptions of these situations. Finally, Column E indicates the number of distinguishable situations.

Experiments

Your image description The task completion percentage of COWP (ours) and five baseline methods under 12 different tasks. The x-axis represents the task name, and the y-axis represents the task completion percentage. The task completion percentage for each value is an average of 150 trials. The tasks are sorted based on the performance of COWP, where the very left corresponds to its best performance.

Below shows the prompts and hyperparameters of baselines in the evaluation, where [] is a placeholder.

from actions import walk <obj>, run <obj>, grab <obj>, switchon <obj>, switchoff <obj>, open <obj>, close <obj>, find <obj>, putin <obj> <obj>, fill <obj> <obj>, clean <obj>, wash <obj>
objects = ['wine', 'bucket', 'dish bowl', 'chips', 'sponge', 'snack', 'kitchencabinet', 'wastecontainer', 'cleaning bottle', 'drinking glass', 'kitchen cabinet', 'dish', 'coffee table', 'blender', 'dining table', 'mug', 'coffee maker', 'dehumidifier', 'air fryer', 'water filter', 'tea', 'dining', 'coffee filter', 'colander', 'orange juice', 'condiment bottle', 'watermelon juice', 'mat', 'closet', 'beer', garbagecan', 'cutlery knife', 'ice cream', 'sauce', 'table_1', 'oven tray', 'refrigerator', 'table cloth', 'steak', 'cupboard', 'wineglass', 'kitchen', 'cutting board', 'noodles', 'kitchen table', 'wooden chopstick', 'frying pan', 'cloth napkin', 'piano bench', 'toaster']
def put_the_wine_glass_in_the_kitchen_cabinet():
      # 0: walk to kitchen
      walk('kitchen')
      # 1: find wine glass
      find('wineglass')
      # 2: grab wine glass
      assert('close' to 'wineglass')
      else: find('wineglass')
      grab('wineglass')
      # 3: find kitchen cabinet
      find('kitchencabinet')
      # 4: open kitchen cabinet
      assert('close' to 'kitchencabinet' )
      else: find('kitchencabinet')
      assert('kitchencabinet' is 'closed' )
      else: close('kitchencabinet')
      open('kitchencabinet')
      # 5: put wine glass in kitchen cabinet
      assert('wineglass' in 'hands' )
      else: find('wineglass')
      else: grab('wineglass')
      assert('close' to 'kitchencabinet' )
      else: find('kitchencabinet')
      assert('kitchencabinet' is 'opened' )
      else: open('kitchencabinet')
      putin('wineglass', 'kitchencabinet')
      # 6: close kitchen cabinet
      assert('close' to 'kitchencabinet' )
      else: find('kitchencabinet')
      assert('kitchencabinet' is 'opened' )
      else: open('kitchencabinet')
      close('kitchencabinet')
      # 7: Done

from actions import walk <obj>, run <obj>, grab <obj>, switchon <obj>, switchoff <obj>, open <obj>, close <obj>, find <obj>, putin <obj> <obj>, fill <obj> <obj>, clean <obj>, wash <obj>
objects = ['wine', 'bucket', 'dish bowl', 'chips', 'sponge', 'snack', 'kitchencabinet', 'cleaning bottle', 'drinking glass', 'kitchen cabinet', 'dish', 'coffee table', 'blender', 'dining table', 'mug', 'coffee maker', 'dehumidifier', 'air fryer', 'water filter', 'tea', 'dining', 'coffee filter', 'colander', 'orange juice', 'condiment bottle', 'watermelon juice', 'mat', 'closet', 'beer', 'garbagecan_1', 'cutlery knife', 'ice cream', 'sauce', 'table_1', 'oven tray', 'refrigerator', 'table cloth', 'steak', 'cupboard', 'wineglass', 'kitchen', 'cutting board', 'noodles', 'kitchen table', 'wooden chopstick', 'frying pan', 'cloth napkin', 'garbagecan_2', 'piano bench', 'toaster']
def throw_away_the_lime, where garbagecan_1 is broken():
      # 0: find lime
      find('lime')
      # 1: grab lime
      assert('close' to 'lime')
      else: find('lime')
      grab('lime')
      # 2: find garbage can
      find('garbagecan_1')
      assert('broken' to 'garbagecan_1')
      else: find('garbagecan_2')
      # 3: open garbage can
      assert('close' to 'garbagecan_2' )
      else: find('garbagecan_2')
      assert('garbagecan_2' is 'closed' )
      else: close('garbagecan_2')
      open('garbagecan_2')
      # 4: put lime in garbage can
      assert('lime' in 'hands' )
      else: find('lime')
      else: grab('lime')
      assert('close' to 'garbagecan_2' )
      else: find('garbagecan_2')
      assert('garbagecan_2' is 'opened' )
      else: open('garbagecan_2')
      putin('lime', 'garbagecan_2')
      # 5: close garbage can
      assert('close' to 'garbagecan_2' )
      else: find('garbagecan_2')
      assert('garbagecan_2' is 'opened' )
      else: open('garbagecan_2')
      close('garbagecan_2)
      # 6: Done


from actions import walk <obj>, run <obj>, grab <obj>, switchon <obj>, switchoff <obj>, open <obj>, close <obj>, find <obj>, putin <obj> <obj>, fill <obj> <obj>, clean <obj>, wash <obj>
objects = ['wine', 'bucket', 'dish bowl', 'chips', 'sponge', 'snack', 'kitchencabinet', 'wastecontainer', 'cleaning bottle', 'drinking glass', 'kitchen cabinet', 'dish', 'coffee table', 'blender', 'dining table', 'mug', 'coffee maker', 'dehumidifier', 'air fryer', 'water filter', 'tea', 'dining', 'coffee filter', 'colander', 'orange juice', 'condiment bottle', 'watermelon juice', 'mat', 'closet', 'beer', garbagecan', 'cutlery knife', 'ice cream', 'sauce', 'table_1', 'oven tray', 'refrigerator', 'washingsponge', 'table cloth', 'steak', 'cupboard', 'wineglass', 'kitchen', 'cutting board', 'noodles', 'kitchen table', 'wooden chopstick', 'frying pan', 'cloth napkin', 'piano bench', 'toaster', 'dishwashingliquid', 'washingcloth']
def wash_mug, where washingsponge is missing():
      # 0: walk to kitchen
      walk('kitchen')
      # 1: find sink
      find('sink')
      # 2: turn on faucet
      find('faucet')
      assert('close' to 'faucet' )
      else: find('faucet')
      assert('faucet' is 'switchoff' )
      else: switchoff('faucet')
      switchon('faucet')
      # 3: put mug under water
      find('mug')
      assert('close' to 'mug')
      else: find('mug')
      grab('mug')
      find('sink')
      assert('mug' in 'hands' )
      else: find('mug')
      else: grab('mug')
      assert('close' to 'sink' )
      else: find('sink')
      putin('mug', 'sink')
      # 4: grab dishwashing liquid
      find('dishwashingliquid')
      assert('close' to 'dishwashingliquid')
      else: find('dishwashingliquid')
      grab('dishwashingliquid')
      # 5: put dishwashing liquid on mug
      find('sink')
      assert('dishwashingliquid' in 'hands' )
      else: find('dishwashingliquid')
      else: grab('dishwashingliquid')
      assert('close' to 'sink' )
      else: find('sink')
      putin('dishwashingliquid', 'sink')
      # 6: grab washingsponge
      find('washingsponge')
      assert('missing' to 'washingsponge')
      else: find('washingcloth')
      grab('washingcloth')
      # 7: start scrubbing mug
      find('sink')
      assert('washingcloth' in 'hands' )
      else: find('washingcloth')
      else: grab('washingcloth')
      assert('close' to 'sink' )
      else: find('sink')
      putin('washingcloth', 'sink')
      # 8: rinse mug off with water
      # 9: dry mug with towel
      # 10: Done

from actions import walk <obj>, run <obj>, grab <obj>, switchon <obj>, switchoff <obj>, open <obj>, close <obj>, find <obj>, putin<obj> <obj>, fill <obj> <obj>, clean <obj>, wash <obj>
objects = [objects in the environment]
def [task description], where [situation] ():

Here are actions that can be executed by the robot: walk, run, grab, switch on, switch off, open, close, find, put, fill, clean, wash

Human: put the wine glass in the kitchen cabinet
Scene: ['orchid', 'sink', 'peach', 'mouse', 'oven tray', 'hanger', 'clothes pants', 'cupcake', 'power socket', 'bell pepper', 'slippers', 'toaster', 'closet', 'floor', 'pillow', 'door jamb', 'light switch', 'faucet', 'pie', 'bookshelf', 'cutlery fork', 'condiment shaker', 'bathroom counter', 'keyboard', 'cutlery knife', 'bananas', 'washing machine', 'box', 'ceiling', 'creamy buns', 'bed', 'crackers', 'bathroom', 'stove', 'paper', 'condiment bottle', 'lime', 'stove fan', 'washing sponge', 'deodorant', 'radio', 'kitchen', 'toilet', 'fridge', 'bedroom', 'dishwashing liquid', 'kitchen cabinet', 'remote control', 'folder', 'bar soap', 'bench', 'coffee pot', 'frying pan', 'curtains', 'desk', 'door', 'toothpaste', 'computer', 'painkillers', 'towel rack', 'cereal', 'wall', 'wall picture frame', 'bathtub', 'dish bowl', 'living room', 'cabinet', 'ceiling lamp', 'clothes pile', 'cpu screen', 'plum', 'photo frame', 'stall', 'table lamp', 'rug', 'toothbrush', 'coffee table', 'plate', 'water glass', 'chocolate syrup', 'window', 'bathroom cabinet', 'face cream', 'whipped cream', 'closet drawer', 'kitchen counter', 'tv', 'microwave', 'mug', 'perfume', 'salmon', 'candy bar', 'kitchen table', 'coffee maker', 'wall lamp', 'bread slice', 'towel', 'mouse mat', 'apple', 'cellphone', 'wall shelf', 'book', 'sofa', 'chips', 'wall phone', 'kitchen counter drawer', 'clothes shirt', 'candle', 'hair product', 'wine glass', 'garbage can', 'nightstand', 'clock', 'tv stand', 'chair']
Robot:
  0: walk to kitchen
  1: find wine glass
  2: grab wine glass
  3: find kitchen cabinet
  4: open kitchen cabinet
  5: put wine glass in kitchen cabinet
  6: close kitchen cabinet
  7: Done

Human: throw away the lime
Scene: ['garbage can_1 is broken', 'orchid', 'sink', 'peach', 'mouse', 'garbage can_1', 'oven tray', 'hanger', 'clothes pants', 'cupcake', 'power socket', 'bell pepper', 'slippers', 'toaster', 'closet', 'floor', 'pillow', 'door jamb', 'light switch', 'faucet', 'pie', 'bookshelf', 'cutlery fork', 'condiment shaker', 'bathroom counter', 'keyboard', 'cutlery knife', 'bananas', 'washing machine', 'box', 'ceiling', 'creamy buns', 'bed', 'crackers', 'bathroom', 'stove', 'paper', 'condiment bottle', 'lime', 'stove fan', 'washing sponge', 'deodorant', 'radio', 'kitchen', 'toilet', 'fridge', 'bedroom', 'dishwashing liquid', 'kitchen cabinet', 'remote control', 'folder', 'bar soap', 'bench', 'coffee pot', 'frying pan', 'curtains', 'desk', 'door', 'toothpaste', 'computer', 'painkillers', 'towel rack', 'cereal', 'wall', 'wall picture frame', 'bathtub', 'dish bowl', 'living room', 'cabinet', 'ceiling lamp', 'clothes pile', 'cpu screen', 'plum', 'photo frame', 'stall', 'table lamp', 'rug', 'toothbrush', 'coffee table', 'plate', 'water glass', 'chocolate syrup', 'window', 'bathroom cabinet', 'face cream', 'whipped cream', 'closet drawer', 'kitchen counter', 'tv', 'microwave', 'mug', 'perfume', 'salmon', 'candy bar', 'kitchen table', 'coffee maker', 'wall lamp', 'bread slice', 'towel', 'mouse mat', 'apple', 'cellphone', 'wall shelf', 'book', 'sofa', 'chips', 'wall phone', 'kitchen counter drawer', 'clothes shirt', 'candle', 'hair product', 'wine glass', 'garbage can_2', 'nightstand', 'clock', 'tv stand', 'chair']
Robot:
  0: find lime
  1: grab lime
  2: find garbage can_2
  3: open garbage can_2
  4: put lime in garbage can_2
  5: close garbage can_2
  6: Done

Human: wash mug
Scene: ['washing sponge is missing', 'orchid', 'sink', 'peach', 'mouse', 'oven tray', 'hanger', 'clothes pants', 'cupcake', 'power socket', 'bell pepper', 'slippers', 'toaster', 'closet', 'floor', 'pillow', 'door jamb', 'light switch', 'faucet', 'pie', 'bookshelf', 'cutlery fork', 'condiment shaker', 'bathroom counter', 'keyboard', 'cutlery knife', 'bananas', 'washing machine', 'box', 'ceiling', 'creamy buns', 'bed', 'crackers', 'bathroom', 'stove', 'paper', 'condiment bottle', 'lime', 'stove fan', 'washing sponge', 'deodorant', 'radio', 'kitchen', 'toilet', 'fridge', 'bedroom', 'dishwashing liquid', 'kitchen cabinet', 'remote control', 'folder', 'bar soap', 'bench', 'coffee pot', 'frying pan', 'curtains', 'desk', 'door', 'toothpaste', 'computer', 'painkillers', 'towel rack', 'cereal', 'wall', 'wall picture frame', 'bathtub', 'dish bowl', 'living room', 'cabinet', 'ceiling lamp', 'clothes pile', 'cpu screen', 'plum', 'photo frame', 'stall', 'table lamp', 'rug', 'toothbrush', 'coffee table', 'plate', 'water glass', 'chocolate syrup', 'window', 'bathroom cabinet', 'face cream', 'whipped cream', 'closet drawer', 'kitchen counter', 'tv', 'microwave', 'mug', 'perfume', 'salmon', 'candy bar', 'kitchen table', 'coffee maker', 'wall lamp', 'bread slice', 'towel', 'mouse mat', 'apple', 'cellphone', 'wall shelf', 'book', 'sofa', 'chips', 'wall phone', 'kitchen counter drawer', 'clothes shirt', 'candle', 'hair product', 'wine glass', 'garbage can', 'nightstand', 'clock', 'tv stand', 'chair']
Robot:
  0: walk to kitchen
  1: find sink
  2: switch on faucet
  3: put mug in sink
  4: grab dishwashing liquid
  5: put dishwashing liquid in sink
  6: grab washing cloth
  7: put washing cloth in sink
  8: wash mug
  8: Done


Scene: [situation, ojects in the environment]
Human: [task description]
Robot:

planning_lm_id = 'text-davinci-003'
translation_lm_id = 'stsb-roberta-large'
MAX_STEPS = 12  # maximum number of steps to be generated
CUTOFF_THRESHOLD = 0.5  # early stopping threshold based on matching score and likelihood score
P = 0.5  # hyperparameter for early stopping heuristic to detect whether Planning LM believes the plan is finished
BETA = 0.3  # weighting coefficient used to rank generated samples
sampling_params =
              {   "max_tokens": 256,
                  "temperature": 0.9,
                  "top_p": 0.9,
                  "n": 10,
                  "logprobs": 1,
                  "presence_penalty": 0.5,
                  "frequency_penalty": 0.3,
                  "stop": '\n'
              }
                        

Conclusion

In this paper, we develop a Large Language Model-based open-world task planning system for robots, called COWP, towards robust task planning and situation handling in open worlds. The novelty of COWP points to the integration of a classical, knowledge-based task planning system, and a pretrained language model for commonsense knowledge acquisition. The marriage of the two enables COWP to ground domain-independent commonsense knowledge to specific task planning problems. To evaluate COWP systematically, we collected a situation dataset that includes 1085 situations in a dining domain. Experimental results suggest that COWP performed better than existing task planners developed for closed-world and open-world scenarios. We also provided a demonstration of COWP using a mobile manipulator working on delivery tasks, which provides a reference to COWP practitioners for real-world applications.

BibTeX


    @article{ding2023integrating,
      title={Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds},
      author={Ding, Yan and Zhang, Xiaohan and Amiri, Saeid and Cao, Nieqing and Yang, Hao and Kaminski, Andy and Esselink, Chad and Zhang, Shiqi},
      journal={Autonomous Robots},
      year={2023}
    }