Usage
Please replace the <placeholder>
with the corresponding value:
<workspace_name>
: Name of the folder in/opt/hpe/swarm-learning-hpe/workspace
where the configuration files and model code is stored. e.g.:odelia-breast-mri
<sentinel_ip>
: IP address of the sentinel host. The sentinel host is the initiator of the network and the operator of the license server in our case. e.g.:172.24.4.67
<host_index>
: Name of the institution or partner participating in the training e.g.:TUD
Data Preparation
Place the data in the folder before you start training!
-
Create the folder structure for storing the data as follows:
-
Copy your Data to
opt/hpe/swarm-learning-hpe/workspace/<workspace_name>/user/data-and-scratch/data
Running Swarm Learning Nodes
1. Run a Swarm Network (or Sentinel) node:
SN node
SN nodes form the blockchain network. The current version of Swarm Learning uses an open-source version of Ethereum as the underlying blockchain platform. The SN nodes interact with each other using this blockchain platform to maintain and track progress. The SN nodes use this state and progress information to co-ordinate the working of the other swarm learning components. Sentinel Node is a special SN node. The Sentinel node is responsible for initializing the blockchain network. This is the first node to start.
NOTE: Only metadata is written to the blockchain. The model itself is not stored in the blockchain.
2. Run a Swarm SWOP (Swarm Operator) node:
SWOP
SWOP is an agent that can manage Swarm Learning operations. SWOP is responsible to execute tasks that are assigned to it. A SWOP node can execute only one task at a time. SWOP helps in executing tasks such as starting and stopping Swarm runs, building and upgrading ML containers, and sharing models for training.
./workspace/automate_scripts/launch_sl/run_swop.sh -w <workspace_name> -s <sentinel_ip> -d <host_index>
3. Run a Swarm SWCI
SWCI
Swarm Learning Command Line Interface (SWCI) is the command interface tool to the Swarm Learning framework. It is used to view the status, control, and manage the Swarm Learning framework. SWCI manages the Swarm Learning framework using contexts and contracts.
Warning
SWCI node is used to generate training task runners, could be initiated by any host, but currently we suggest ONLY THE SENTINEL HOST IS ALLOWED TO INITIATE
./workspace/automate_scripts/launch_sl/run_swci.sh -w <workspace_name> -s <sentinel_ip> -d <host_index>
Results
View results under workspace/<workspace_name>/user/data-and-scratch/scratch
Stop Swarm Learning
Stop and remove all Swarm Learning containers and volumes that are no longer needed with:
Info
--[node_type] is optional, if not specified, all the nodes will be stopped. Otherwise, specify --sn, --swop, --swci, --sl.
Manually Remove containers and volumes
- List all Docker containers:
- Remove all containers listed with docker ps -a
- List all Docker volumes:
- Remove all volumes except sl-cli-lib
Bugs and Problems
Did you find a bug in the code or other problems? Then raise an issue in our Github repository: https://github.com/KatherLab/swarm-learning-hpe/issues
In case of problems or requests for improvement of the documentation, please raise an issue at: https://github.com/odelia-ai/odelia-ai.github.io/issues