Reference Manual

Experiment

The experiment tag is the parent of all other tags in the specification. An experiment requires only that you specify a boot attribute, that references a valid docker image of Kollaps.

Example:

<?xml version="1.0" encoding="UTF-8" ?>
<experiment boot="kollaps:2.0">
<experiment/>

The Kollaps image specified in the boot attribute contains the emulation core program that will execute in parallel with every service. There will be one emulation core instance for every service replica declared in the experiment.

The emulation core is responsible for performing the emulation in a decentralized fashion, and will enforce the network characteristics at every service container using Linux tc (see man 8 tc).

To bootstrap an experiment, a service called bootstrapper will be created, using the Kollaps image, with a replica at each physical node in the Swarm cluster.

This service will then create an higher privileged container, called “god”, at each node in the Swarm cluster, using the same Kollaps image. The god container has enough privileges to inject the emulation core instances in the network namespace of other containers, allowing them to monitor and adjust the network at each container, while executing inside the god container.

The god containers will be assigned random names, and are not services in the Swarm, but rather standalone containers. God containers will however terminate once the experiment stack is removed.

You can see the logs of the locally executed emulation cores by checking the logs of the local god container.

Config

In the config tag you can specify three parameters (if not specified the default is used), shortest_path describes the way the shortests paths between services are calculated, two options exists hop and latency. The first one uses the number of hops (links) and the second one latency as the metric in the Dijkstra algorithm. pool_period, is the time in seconds that each emulation core sleeps before making new network calculations. max_age, is the number of pool_perios metadata previously received remains in the network calculations. These two should be adjusted according to the deployment needs, if there are a lot of physical machines (5+) or if the physical machines in the experiment have slow CPUs a longer pool_period should be used. With a large number of services (50+) in the deployment a longer max_age should be used.

Example:

<config>
<property shortest_path="hop" />
<property pool_period="0.05" />
<property max_age="2" />
</config>

Brigdes

Bridges correspond to generic networking devices that can bridge together multiple links. Incoming connections on a link can be forwarded to any other link attached to the same bridge.

Bridges are specified with the bridge tag, and must have a unique name attribute.

Bridge declaration example:

<bridges>
    <bridge name="s1"/>
    <bridge name="s2"/>
</bridges>

Services

Services are a group of containers running the same Docker image, and correspond to the same term (“service”) in Docker terminology.

Services should use the service tag, and must be specified with attributes defining a unique name, and a valid Docker image. Optionally, the following attributes can also be specified, replicas, command, share, supervisor and port.

The replicas attribute allows the user to specify how many replicas of the same service should be created, when absent this attribute defaults to 1.

The command attribute can be used to override the command (in Docker terminology) that is passed to the container. When absent the containers use their default command.

The share attribute is a boolean value that only makes sense to use when there are multiple replicas. When set to true, it specifies that the replicas should share the link that is attached to them. This means that the link attached to the service is instead connected to an invisible bridge and each replica is then connected to this new bridge with an “infinite capacity”, 0 latency link. This forces all replicas to share the specified link. When share is set to false, the link attached to the service is duplicated for each replica, meaning each replica gets an identical but separate link. This attribute defaults to false.

Finally the supervisor attribute is a boolean value that indicates that this service is a supervisor service. Supervisors are a plugin architecture for Kollaps that allow to extend experiment logic. Examples of this are the provided dashboard and logger images. The port attribute indicates what port the supervisor should expose to the outside network, so that users can interact with a given supervisor even if the experiment runs on an isolated network. (Note that not all supervisors expose ports)

With Kollaps 2.0 we added the activepaths attribute which can be set to specify what other services we need to shape traffic to, this attribute helps in deployment time in topologies with thousands of containers in situations where all paths do not need to be shapped.

Services declaration example:

<services>
      <service name="dashboard" image="kollaps/dashboard:1.0" supervisor="true" port="8088"/>
      <service name="client1" image="kollaps/alpineclient:1.0" command="['server', '0', '0']"/>
      <service name="client2" image="kollaps/alpineclient:1.0" command="['server', '0', '0']"/>
      <service name="client3" image="kollaps/alpineclient:1.0" command="['server', '0', '0']"/>
      <service name="server" image="kollaps/alpineserver:1.0" replicas="3" share="false"/>
</services>

In this example (topology5.xml in the examples folder), we have two supervisors: the dashboard and the logger. Notice that only the dashboard has a port attribute, this is because the logger does not communicate with the outside, hence that attribute is unnecessary.

Also server is a replicated service, hence the share attribute may apply, in this case however we want all the replicas to have dedicated links attached to them so we set share="false" (which is the default value).

Links

Edges in the network graph should be specified using the link tag. By default links are unidirectional. A link must always have the following attributes, origin, dest, latency, upload and network.

The origin and dest attributes must be filled with valid names that correspond to previously declared services or bridges.

The upload attribute must be filled with a value for bandwidth capacity. The following units are accepted: bps, Kbps, Mbps and Gbps. This bandwidth capacity applies to connections from the origin to the destination.

The latency attribute must be filled with an integer latency value specified in milliseconds.

The network attribute must be filled with the name of an already existing Docker network with Swarm scope, that must be available at all nodes in the Swarm cluster. Services attached to this link will be attached to the network it specifies.

It’s important to note that in the current version of Kollaps, support for multiple networks is not yet implemented. This link attribute is present for supporting future developments.

Optionally a link can also be specified with the following additional attributes,drop, jitter and download.

The drop attribute should be filled with a float value in the range 0.0 to 1.0 that specifies a packet loss rate for the current link.

The jitter attribute should be filled with a float value indicating a standard deviation. This will cause the latency of that link to follow a normal distribution around the specified latency attribute and with the indicated standard deviation.

Finally the download attribute indicates that the current link should be bidirectional, and should be filled with a bandwidth capacity in the same way as the upload attribute. The indicated capacity will be enforced to connections from dest to origin. It is important to note that internally in Kollaps, all links are unidirectional, so declaring a link with the download attribute will cause the creation of two identical links in opposite directions that share the same attributes except for the bandwidth capacity.

Dynamic experiments

Some aspects of the topology can be changed dynamically during the course of an experiment, for a reference on how to do so please check Dynamics.

A detailed description of Kollaps DSL to describe network topologies