Getting started
HPE Swarm Learning
This project uses HPE Swarm Learning. For general documentation see: https://github.com/HewlettPackard/swarm-learning
Prerequisites
HPE Swarm Learning can run on any hardware that supports executing container software (Docker).
Hardware
- Any x86-64 hardware
- System memory of 32 GB or more
- Hard disk space of 200 GB or more
- Qualified with HPE Edgeline, Proliant DL380, and Apollo 6500
Note
HPE Swarm Learning can be deployed with Nvidia GPUs (accelerator cards) and AMD GPUS (accelerator cards).
Supported Operating Systems and Platforms
HPE recommends that you run each Swarm Network node, and Swarm Learning node on dedicated systems to get the best performance from the platform. The recommended requirements for each system are as follows:
Note
The requirements of system running the user ML node is driven by the complexity of the ML algorithm. GPUs may also be needed.
Network
- A minimum of one or a maximum four open TCP/IP ports in each node. All swarm nodes must be able to access the ports of every other node.
- Stable internet connectivity to download Swarm Learning package and Docker images.
Expoded ports
Depending on the type of Swarm Learning components that are running on a host, some or all these ports must be opened to allow the Swarm Learning containers to communicate with each other:
-
A Swarm Network peer-to-peer port on the hosts running Swarm Network nodes. By default, port 30303 is used.
-
A Swarm Network API server port on the hosts running Swarm Network nodes. By default, port 30304 is used.
-
Swarm Learning file server port on the hosts running Swarm Learning nodes. By default, port 30305 is used.
-
A License Server API port on the host running the License Server. By default, port 5814 is used.
-
(Optional). An SWCI API server port that is used by the SWCI node to run a REST based API service. By default, port 30306 is used.
Operating Systems
- Linux - Qualified on Ubuntu 22.04 RHEL 8.5. and SLES 15.0
- For Swarm Web UI installer, any x86-64 hardware running Linux, Windows, or Mac.
Container Hosting Platform
- HPE Swarm Learning is qualified with Docker 20.10.5.
- Configure Docker to run as a non-root user.
- Configure network proxy settings for Docker.
- Configure Docker to use IPv4.
Machine Learning Framework
- Qualified with Keras (TensorFlow 2 backend) and PyTorch 1.5 based Machine Learning models implemented using Python3.
Multi System Cluster Requirements
- Synchronized time across all systems using NTP.
System preparation
Operating system
All the following instructions are only tested on Ubuntu.
-
Update the system with
Software Updater
or in viaterminal
with: -
Install the Nvidia driver via
Software Updater → Settings → Additonal Drivers → NVIDIA driver metapackage from nvidia-driver-525 (proprietary) -> Apply Changes
. -
Restart the System.
-
Test the successful driver installation with:
-
Install SSH with:
-
Install Git with:
-
Install Curl with:
-
Install Docker with:
Creation of swarm user and Download of the repository
-
Create a user named
swarm
and add it to the sudoers group: -
Login with
swarm
: -
Run the following commands to download the repository:
Installing the License Server
Warning
The license server must be provided by only one network member (Sentinel Node).
-
Go to MY HPE SOFTWARE CENTER.
-
If you have the HPE Passport account, enter the credentials and Sign In. If you do not have it, create the HPE Passport Account and Sign In.
-
After signing in, click Software from the left pane.
-
Locate Search and select Product Info from the Search Type dropdown, and search for Swarm Learning. Search results list the available Swarm Learning products.
-
"HPE Swarm Learning Community edition" is the evaluation version that you need to download. Click Action and select Product Details to view the Swarm Learning product details.
-
Click Installation tab to view the APLS download link. Click the link here to view the APLS software downloads. Scroll down and view search results.
-
Download AutoPass License Server. To download, click Action and select Get downloads.
-
Click Download to copy the APLS software (apls-xx.xx.xx.zip) to your system.
-
Extract zip file in Downloads.
-
Execute setup script:
Info
When it is successfully executed the following appears:
Pre-Installation Summary
------------------------
Please Review the Following Before Continuing:
Product Name:
HPE AutoPass License Server
Install Folder:
/opt/HP/HP AutoPass License Server
Link Folder:
/usr/bin
Java VM Installation Folder:
/opt/HP/HP AutoPass License Server/jre
Data Folder Directory
/var/opt/HP/HP AutoPass License Server
Product Version
9.12.0.0
Disk Space Information (for Installation Target):
Required: 304.12 MegaBytes
Available: 420,910.62 MegaBytes
Congratulations HPE AutoPass License Server 9.12.0.0 has been successfully
installed to:
/opt/HP/HP AutoPass License Server
HPE AutoPass License Server GUI can be accessed at :
https://<Host/IP address>:5814/autopass
HPE AutoPass License Server Service Usage:
hpLicenseServer {start|stop|restart|status}
- visit: https://localhost:5000
Info
username: admin
password: password
If the service is not working
cd "/opt/HP/HP AutoPass License Server/HP AutoPass License Server/HP AutoPass License Server/conf"
sudo nano server.xml
Search with Strg + W for “5814” and replace it with “5000” and save.
cd "/opt/HP/HP AutoPass License Server/HP AutoPass License Server/HP AutoPass License Server/bin"
sudo cp hpLicenseServer /etc/init.d/hpLicenseServer
sudo chmod 755 /etc/init.d/hpLicenseServer
cd /etc/init.d
sudo update-rc.d hpLicenseServer defaults 97 03
service hpLicenseServer start
service hpLicenseServer status
- In the APLS web GUI, go to
License Management -> Install License
and note down the lock code.
Info
Lock Code = Serial Number
-
Navigate to the MY HPE SOFTWARE CENTER home page. After signing in with your HPE Passport credentials and perform the following actions:
Click Software (left pane) -> Under Search Select "Product Info" -> enter the string "Swarm Learning".
Under the search results, For the product "HPE-SWARM-CMT x.x.x"-> Click on Action -> Get License
-
Enter the lock code (Serial Number) you got from the Install Licenses page in the HPE Serial Number field and click Activate.
- Once you activate the licenses, you will see the Download Files page.
- Select and download the keys and all the listed software files (7 files).
- Install and manage the Swarm Learning license:
- Open the APLS management console.
- Select License Management -> Install License.
- Select Choose file to upload the license file that you downloaded and click Next.
- Select the required feature IDs and click Install Licenses.
Bugs and Problems
Did you find a bug in the code or other problems? Then raise an issue in our Github repository: https://github.com/KatherLab/swarm-learning-hpe/issues
In case of problems or requests for improvement of the documentation, please raise an issue at: https://github.com/odelia-ai/odelia-ai.github.io/issues