Skip to content


Datadynamics contains several metrics for comparing similarity between simulations.

Optimal Transport Dataset Distance (OTDD)

otdd(d1_collections_filename, d2_collections_filename, include_timestamps=True)

Optimal transport dataset distance between two collections over time.

The collections must be of equal length and created using datadynamics.utils.post_processing.save_collections. We use Microsoft's OTDD library to compute the distance between the collections for each timestamp to see how the distance changes over time during the simulation.


This function requires the OTDD library to be installed which is not included by default in datadynamics.


We skip any timestamps for which the distance cannot be computed. Also, the OTDD values will likely not be affected by whether or not timestamps are included in the input features.


Name Type Description Default
d1_collections_filename str

The filename of the first collection.

d2_collections_filename str

The filename of the second collection.

include_timestamps bool

Whether to include timestamps in the input features. Defaults to True.



Name Type Description

A tuple of two lists. The first list contains the timestamps for which the distance was computed. The second list contains the distances for each timestamp.

Source code in datadynamics/utils/metrics/
def otdd(
    d1_collections_filename, d2_collections_filename, include_timestamps=True
    """Optimal transport dataset distance between two collections over time.

    The collections must be of equal length and created using
    datadynamics.utils.post_processing.save_collections. We use Microsoft's
    OTDD library to compute the distance between the collections for each
    timestamp to see how the distance changes over time during the simulation.

        This function requires the OTDD library to be installed which is not
        included by default in datadynamics.

        We skip any timestamps for which the distance cannot be computed.
        Also, the OTDD values will likely not be affected by whether or not
        timestamps are included in the input features.

        d1_collections_filename (str): The filename of the first collection.
        d2_collections_filename (str): The filename of the second collection.
        include_timestamps (bool, optional): Whether to include timestamps in
            the input features. Defaults to True.

        tuple: A tuple of two lists. The first list contains the timestamps
            for which the distance was computed. The second list contains the
            distances for each timestamp.
    with open(d1_collections_filename, "rb") as f:
        d1_collections = pickle.load(f)
    with open(d2_collections_filename, "rb") as f:
        d2_collections = pickle.load(f)

    d1_timestamps, d1_feats, d1_targets = extract.feats_targets_timestamps(
        d1_collections, include_timestamps
    d2_timestamps, d2_feats, d2_targets = extract.feats_targets_timestamps(
        d2_collections, include_timestamps
    n1, n2 = len(d1_timestamps), len(d2_timestamps)
    assert n1 == n2, "The collections must be of equal length."

    completed_timestamps = []
    distances = []

    for i in tqdm.tqdm(range(1, n1 + 1), desc="Computing OTDD"):
            d1 = dataset_from_numpy(d1_feats[:i], d1_targets[:i])
            d2 = dataset_from_numpy(d2_feats[:i], d2_targets[:i])
            dist = DatasetDistance(d1, d2, inner_ot_method="exact")
            d = dist.distance(maxsamples=1000)
            completed_timestamps.append(d1_timestamps[i - 1])
        except Exception as e:
            print(f"Skipping {i} due to {e}...")

    return completed_timestamps, distances