How to use LMDB with PyTorch DataLoader and DistributedDataParallel

How to use LMDB with PyTorch DataLoader and DistributedDataParallel

Since LMDB cannot be pickled, an error, ...can't pickle Environment Object..., occurs when we naively implement LMDB into data.dataset while wrapping data.DataLoader with Distributed Data Parallel(DDP).

March 9, 2021 1 minute read

To resolve the error, we need to delay the loading of the LMDB environment in data.dataset;

# This code is modified from https://raw.githubusercontent.com/rmccorm4/PyTorch-LMDB/master

class my_dataset_LMDB(data.Dataset):
    def __init__(self, db_path, file_path) 
        self.db_path = db_path
        self.file_path = file_path

        # Delay loading LMDB data until after initialization to avoid "can't pickle Environment Object error"
        self.env = None
        self.txn = None

    def _init_db(self):
        self.env = lmdb.open(self.db_path, subdir=os.path.isdir(self.db_path),
            readonly=True, lock=False,
            readahead=False, meminit=False)
        self.txn = self.env.begin()

    def read_lmdb(self, key):
        lmdb_data = self.txn.get(key.encode())
        lmdb_data = np.frombuffer(lmdb_data)

        return lmdb_data

    def __getitem__(self, index):
        # Delay loading LMDB data until after initialization
        if self.env is None:
            self._init_db()

        file_name = self.file_paths[index]
        data = self.read_lmdb(file_name)
        ...

Tip. The option lock=False for lmdb.open(...) fixes the error MDB_READERS_FULL: Environment maxreaders limit reached.

References

PyTorch-LMDB

Share on

Twitter Facebook LinkedIn

How to keep stable adb connection between multiple android devices and docker container in ubuntu PC

September 1, 2024 3 minute read

This guide will help you maintain a stable ADB connection between multiple Android devices and a Docker container running on an Ubuntu server.

How to fix `[forkpty: Device not configured][Could not create a new process and open a pseudo-tty.]`

April 12, 2023 1 minute read

If you’re using a terminal on macOS (Ventura 13.5) and encounter the following error message when trying to open a new terminal session, this guide will help...

Jump multiple remote hosts using ProxyCommand (SSH Tunneling)

June 12, 2022 2 minute read

This article introduces how to ssh-jump on a remote intermediate server(s) to ssh-connect into a target server with a single command.

How to use LMDB with PyTorch `DataLoader` and `DistributedDataParallel`

Junyong Lee,
Ph.D. in CSE

References

Share on

Leave a comment

You may also enjoy

How to keep stable adb connection between multiple android devices and docker container in ubuntu PC

How to fix `[forkpty: Device not configured][Could not create a new process and open a pseudo-tty.]`

Jump multiple remote hosts using ProxyCommand (SSH Tunneling)

How to use LMDB with PyTorch DataLoader and DistributedDataParallel

Junyong Lee, Ph.D. in CSE

References

Share on

Leave a comment

You may also enjoy

How to keep stable adb connection between multiple android devices and docker container in ubuntu PC

How to fix [forkpty: Device not configured][Could not create a new process and open a pseudo-tty.]

Jump multiple remote hosts using ProxyCommand (SSH Tunneling)

How to use LMDB with PyTorch `DataLoader` and `DistributedDataParallel`

Junyong Lee,
Ph.D. in CSE

How to fix `[forkpty: Device not configured][Could not create a new process and open a pseudo-tty.]`