Simulating Data Collection Dynamics in a Plane

In this tutorial, we will explore how to use Datadynamics to simulate data collection tasks in a plane-based environment.

Plane-Based Environments

The plane-based environment provides a simple space for quick iterations and testing of new strategies. The environment is based on a two-dimensional Euclidean space (using (x, y) coordinates), where agents can navigate the environment with a cost proportional to the Euclidean distance travelled. Although this environment does not allow for complex terrain and obstacle representations, it is optimized for performance.

Example of Plane-Based Collector Environment

In the following example, we will define a simple a Euclidean plane consisting of 300 points sampled uniformly at random in the range [0, 10] for each dimension. We also define two agents that start at coordinates (0, 0) and (1, 1), respectively. The agents can collect a maximum of 120 and 180 points. The behavior of the agents are specified by a dummy policy that simply cycles through the available actions (i.e. collecting a point) in a round-robin fashion.

import numpy as np

from datadynamics.environments import collector_v0
from datadynamics.policies import dummy_policy_v0

env = collector_v0.env(
    point_positions=np.random.uniform(0, 10, (300, 2)),
    init_agent_positions=np.array([[0, 0], [1, 1]]),
    max_collect=[120, 180],
    render_mode="human",
)
policy = dummy_policy_v0.policy(env=env)
env.reset()
for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()
    action = policy.action(observation, agent)
    env.step(action)

Example of Wrapper Environment

Datadynamics also provides a wrapper environment that can be used to randomly generate points based on a given sampler and a given number of points. In the following example, we will again define a simple Euclidean plane consisting of 300 points, but this time sampled randomly from a standard Normal. Agents can again collect a maximum of 120 and 180 points. The behavior of the agents are, however, instead specified by a greedy policy that always collects the point with the highest expected reward.

from datadynamics.environments import collector_v0
from datadynamics.policies import greedy_policy_v0

env = collector_v0.env(
    n_points=300, n_agents=2, max_collect=[120, 180], render_mode="human"
)
policy = greedy_policy_v0.policy(env=env)
env.reset()
for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()
    action = policy.action(observation, agent)
    env.step(action)