【Robotics】open X Embodiment RT-X 数据集下载与使用和分析

# 0 数据集说明
- 数据集：https://docs.google.com/spreadsheets/d/1rPBD77tk60AEIGZrGSODwyyzs5FgCU9Uz3h-3_t2A9g/edit?gid=0#gid=0
- 下载使用，gsutil下载数据集，注意如果没有googledrive的下载工具gsutil，注意使用pip install 进行安装，不要使用apt install，并不是同一个应用，使用apt install 安装会变成另一个cli。安装完成后使用 gsutil -m cp -r gs://gresearch/robotics/[dataset_name] .  指令进行下载，关于数据集的名字，可以参考数据集汇总了表格的S列Registered Dataset Name。
- 数据集信息：数据集非常全面是目前机器人领域最大的数据集，可以统计的数据片段有2419193段，大小达到了8964.94GB。https://blog.csdn.net/OpenDataLab/article/details/134399456
- 数据集使用RLDS标准https://github.com/google-research/rlds , https://zhuanlan.zhihu.com/p/1892304095701873850 , https://research.google/blog/rlds-an-ecosystem-to-generate-share-and-use-datasets-in-reinforcement-learning/ 格式保存，兼容tensorflow，如果要使用pytorch，需要使用numpy进行转换，可视化部分的代码参见Annexe1 
- 数据格式：通过构建builder并打印builder.info信息，可以看到不同的数据集都有一个标准的格式信息见Annexe2，根据表格中（joint position || EEF position || EEF velocity，camera nb）等信息，可以在FeaturesDict中找到对应的信息并可以加以分析和利用。值得注意的是，图像数据都经过裁剪和压缩，分辨率不是特别高，基本与使用的image encoder网络的输入相当，例如224*224*3， 256*320*3 等常见尺寸

![](/media/202602/37915c7a-31b5-4517-a82c-f8e8162ca46a_20260228115439070845.gif)

![](/media/202602/a70a194f-ae6b-4867-bf3c-4ff88375724a_20260228115451010633.gif)

# 1 数据集使用
- 数据集均采用RLDS标准，RLDS的数据包结构定义如下

![](/media/202602/1280X1280_20260228115717464610.PNG)

# 2 如何定义使用自己的机器人数据集方案
- 自建数据集的整体数据流程及要解决的问题如下图

![](/media/202602/whiteboard_exported_image_20260228115727842818.png)

# Annexe
1. 数据可视化和分析

``` python
import numpy as np
import tensorflow_datasets as tfds
from PIL import Image
from IPython import display
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# images PIL list
# duration ms
def as_gif(images, path='temp.gif'):
    images[0].save(path, save_all=True, append_images=images[1:], loop=0, duration=10)
    gif_bytes = open(path, 'rb').read()
    return gif_bytes
    
def check_key_exist(builder, display_key): 
    if display_key not in builder.info.features['steps']['observation']:
        raise ValueError('Error:' + str(builder.info.features['steps']['observation']))
        
def save_image(builder, episode, display_key, path):
    check_key_exist(builder, display_key)
    images1 = [step['observation'][display_key] for step in episode['steps']]
    print('image len: ', display_key, len(images1))
    images2 = [Image.fromarray(image.numpy()) for image in images1]
    display.Image(as_gif(images2, path))

builder = tfds.builder_from_directory(builder_dir='/home/hihonor/Datas/jaco_play/0.1.0')

print(builder.info.features['steps']['observation'])

print(builder.info)
  
#ds = builder.as_dataset(split='train[:10]').shuffle(10)
ds = builder.as_dataset(split='train')

print('nb_episode: ', len(ds))

# episode = next(iter(ds))

episode_index = 5
episode = next(iter(ds.skip(episode_index).take(1)))

save_image(builder, episode, 'image', 'image.gif')
save_image(builder, episode, 'image_wrist', 'image_wrist.gif')

# builder
# builder.info
# builder.info.features
# builder.info.features['steps']
# builder.info.features['steps']['observation']
# len(builder.as_dataset(split='train[0:]'))
# episode
# episode['steps']
# len(images1), images1
```

2. 数据格式（以fractal20220817_data为例）

``` 
FeaturesDict({
    'base_pose_tool_reached': Tensor(shape=(7,), dtype=float32),
    'gripper_closed': Tensor(shape=(1,), dtype=float32),
    'gripper_closedness_commanded': Tensor(shape=(1,), dtype=float32),
    'height_to_bottom': Tensor(shape=(1,), dtype=float32),
    'image': Image(shape=(256, 320, 3), dtype=uint8),
    'natural_language_embedding': Tensor(shape=(512,), dtype=float32),
    'natural_language_instruction': string,
    'orientation_box': Tensor(shape=(2, 3), dtype=float32),
    'orientation_start': Tensor(shape=(4,), dtype=float32),
    'robot_orientation_positions_box': Tensor(shape=(3, 3), dtype=float32),
    'rotation_delta_to_go': Tensor(shape=(3,), dtype=float32),
    'src_rotation': Tensor(shape=(4,), dtype=float32),
    'vector_to_go': Tensor(shape=(3,), dtype=float32),
    'workspace_bounds': Tensor(shape=(3, 3), dtype=float32),
})
tfds.core.DatasetInfo(
    name='fractal20220817_data',
    full_name='fractal20220817_data/0.1.0',
    description="""
    Table-top manipulation with 17 objects
    """,
    homepage='https://ai.googleblog.com/2022/12/rt-1-robotics-transformer-for-real.html',
    data_path='/home/hihonor/Datas/fractal20220817_data/0.1.0',
    file_format=tfrecord,
    download_size=Unknown size,
    dataset_size=111.07 GiB,
    features=FeaturesDict({
        'aspects': FeaturesDict({
            'already_success': bool,
            'feasible': bool,
            'has_aspects': bool,
            'success': bool,
            'undesirable': bool,
        }),
        'attributes': FeaturesDict({
            'collection_mode': int64,
            'collection_mode_name': string,
            'data_type': int64,
            'data_type_name': string,
            'env': int64,
            'env_name': string,
            'location': int64,
            'location_name': string,
            'objects_family': int64,
            'objects_family_name': string,
            'task_family': int64,
            'task_family_name': string,
        }),
        'steps': Dataset({
            'action': FeaturesDict({
                'base_displacement_vector': Tensor(shape=(2,), dtype=float32),
                'base_displacement_vertical_rotation': Tensor(shape=(1,), dtype=float32),
                'gripper_closedness_action': Tensor(shape=(1,), dtype=float32),
                'rotation_delta': Tensor(shape=(3,), dtype=float32),
                'terminate_episode': Tensor(shape=(3,), dtype=int32),
                'world_vector': Tensor(shape=(3,), dtype=float32),
            }),
            'is_first': bool,
            'is_last': bool,
            'is_terminal': bool,
            'observation': FeaturesDict({
                'base_pose_tool_reached': Tensor(shape=(7,), dtype=float32),
                'gripper_closed': Tensor(shape=(1,), dtype=float32),
                'gripper_closedness_commanded': Tensor(shape=(1,), dtype=float32),
                'height_to_bottom': Tensor(shape=(1,), dtype=float32),
                'image': Image(shape=(256, 320, 3), dtype=uint8),
                'natural_language_embedding': Tensor(shape=(512,), dtype=float32),
                'natural_language_instruction': string,
                'orientation_box': Tensor(shape=(2, 3), dtype=float32),
                'orientation_start': Tensor(shape=(4,), dtype=float32),
                'robot_orientation_positions_box': Tensor(shape=(3, 3), dtype=float32),
                'rotation_delta_to_go': Tensor(shape=(3,), dtype=float32),
                'src_rotation': Tensor(shape=(4,), dtype=float32),
                'vector_to_go': Tensor(shape=(3,), dtype=float32),
                'workspace_bounds': Tensor(shape=(3, 3), dtype=float32),
            }),
            'reward': Scalar(shape=(), dtype=float32),
        }),
    }),
    supervised_keys=None,
    disable_shuffling=False,
    splits={
        'train': <SplitInfo num_examples=87212, num_shards=1024>,
    },
    citation="""@article{brohan2022rt,
      title={Rt-1: Robotics transformer for real-world control at scale},
      author={Brohan, Anthony and Brown, Noah and Carbajal, Justice and Chebotar, Yevgen and Dabis, Joseph and Finn, Chelsea and Gopalakrishnan, Keerthana and Hausman, Karol and Herzog, Alex and Hsu, Jasmine and others},
      journal={arXiv preprint arXiv:2212.06817},
      year={2022}
    }""",
)
nb_episode:  87212
```