Advanced Storage Concepts

In this workshop you will learn about the advanced storage concepts in Azure Kubernetes Service (AKS). You will learn about the different storage options available in Azure, how to use Azure Container Storage to manage local NVMe disks, and how to use Azure Container Storage to replicate local NVMe disks across multiple nodes. You will also learn about the different orchestration options available in Azure, including CSI drivers and Azure Container Storage.

Prerequisites

Before you begin, you will need an Azure subscription with Owner permissions and a GitHub account.

In addition, you will need the following tools installed on your local machine:

Visual Studio Code with the following extensions:
Azure CLI
GitHub CLI
Git
kubectl
POSIX-compliant shell (bash, zsh, Azure Cloud Shell)

Setup Azure CLI

Start by logging into Azure by run the following command and follow the prompts:

az login --use-device-code

tip

You can log into a different tenant by passing in the --tenant flag to specify your tenant domain or tenant ID.

Run the following command to register preview features.

az extension add --name aks-preview

Setup Resource Group

In this workshop, we will set environment variables for the resource group name and location.

Important

The following commands will set the environment variables for your current terminal session. If you close the current terminal session, you will need to set the environment variables again.

To keep the resource names unique, we will use a random number as a suffix for the resource names. This will also help you to avoid naming conflicts with other resources in your Azure subscription.

Run the following command to generate a random number.

RAND=$RANDOM
export RAND
echo "Random resource identifier will be: ${RAND}"

Set the location to a region of your choice. For example, eastus or westeurope but you should make sure this region supports availability zones.

export LOCATION=eastus

Create a resource group name using the random number.

export RG_NAME=myresourcegroup$RAND

tip

You can list the regions that support availability zones with the following command:

az account list-locations \
--query "[?metadata.regionType=='Physical' && metadata.supportsAvailabilityZones==true].{Region:name}" \
--output table

Run the following command to create a resource group using the environment variables you just created.

az group create \
--name ${RG_NAME} \
--location ${LOCATION}

Setup AKS Cluster

Set the AKS cluster name.

export AKS_NAME=myakscluster$RAND

Run the following command to create an AKS cluster with some best practices in place.

az aks create \
--resource-group ${RG_NAME} \
--name ${AKS_NAME} \
--location ${LOCATION} \
--network-plugin azure \
--network-plugin-mode overlay \
--network-dataplane cilium \
--network-policy cilium \
--enable-managed-identity \
--enable-workload-identity \
--enable-oidc-issuer \
--generate-ssh-keys

tip

The AKS cluster created for this lab only includes a few best practices such as enabling Workload Identity and setting the cluster networking to Azure CNI powered by Cilium. For complete guidance on implementing AKS best practices be sure to check out the best practices and baseline architecture for an AKS cluster guides on Microsoft Learn.

Once the AKS cluster has been created, run the following command to connect to it.

az aks get-credentials \
--resource-group ${RG_NAME} \
--name ${AKS_NAME}

Once the AKS cluster is deployed, you can proceed with the workshop.

Storage Options

Azure offers rich set of storage options that can be categorized into two buckets: Block Storage and Shared File Storage. You can choose the best match option based on the workload requirements.

The following guidance can facilitate your evaluation:

Select storage category based on the attach mode.
Block Storage can be attached to a single node one time (RWO: Read Write Once), while Shared File Storage can be attached to different nodes one time (RWX: Read Write Many). If you need to access the same file from different nodes, you would need Shared File Storage.

Select a storage option in each category based on characteristics and user cases.

Block storage category:

Storage option	Characteristics	User Cases
Azure Disks	Rich SKUs from low-cost HDD disks to high performance Ultra Disks.	Generic option for all user cases from Backup to database to SAP Hana.
Elastic SAN	Scalability up to millions of IOPS, Cost efficiency at scale	Tier 1 & 2 workloads, Databases, VDI hosted on any Compute options (VM, Containers, AVS)
Local Disks	Priced in VM, High IOPS/Throughput and extremely low latency.	Applications with no data durability requirement or with built-in data replication support (e.g., Cassandra), AI training

Shared File Storage category:

Storage option	Characteristics	User Cases
Azure Files	Fully managed, multiple redundancy options.	General purpose file shares, LOB apps, shared app or config data for CI/CD, AI/ML.
Azure NetApp Files	Fully managed ONTAP with high performance and low latency.	Analytics, HPC, CMS, CI/CD, custom apps currently using NetApp.
Azure Blobs	Unlimited amounts of unstructured data, data lifecycle management, rich redundancy options.	Large scale of object data handling, backup

Select performance tier, redundancy type on the storage option. See the product page from above table for further evaluation of performance tier, redundancy type or other requirements.

Orchestration Options

Besides invoking service REST API to ingest remote storage resources, there are two major ways to use storage options in AKS workloads: CSI (Container Storage Interface) drivers and Azure Container Storage.

CSI Drivers

Container Storage Interface is industry standard that enables storage vendors (SP) to develop a plugin once and have it work across a number of container orchestration systems. It’s widely adopted by both OSS community and major cloud storage vendors. If you already build storage management and operation with CSI drivers, or you plan to build cloud independent k8s cluster setup, it’s the preferred option.

Azure Container Storage

Azure Container Storage is built on top of CSI drivers to support greater scaling capability with storage pool and unified management experience across local & remote storage. If you want to simplify the use of local NVMe disks, or achieve higher pod scaling target, it’s the preferred option.

Storage option support on CSI drivers and Azure Container Storage:

Storage option	CSI drivers	Azure Container Storage
Azure Disks	Support(CSI disks driver)	Support
Elastic SAN	N/A	Support
Local Disks	N/A (Host Path + Static Provisioner)	Support
Azure Files	Support(CSI files driver)	N/A
Azure NetApp Files	Support(CSI NetApp driver)	N/A
Azure Blobs	Support(CSI Blobs driver)	N/A

Use Azure Container Storage for Replicated Ephemeral NVMe Disk

Deploy a MySQL Server to mount volumes using local NVMe storage via Azure Container Storage and demonstrate replication and failover of replicated local NVMe storage in Azure Container Storage.

Setup Azure Container Storage

Follow the below steps to enable Azure Container Storage in an existing AKS cluster

Run the following command to set the new node pool name.

cat <<EOF >> .env
ACSTOR_NODEPOOL_NAME="acstorpool"
EOF
source .env

Run the following command to create a new node pool with Standard_L8s_v3 VMs.

az aks nodepool add \
--cluster-name ${AKS_NAME} \
--resource-group ${RG_NAME} \
--name ${ACSTOR_NODEPOOL_NAME} \
--node-vm-size Standard_L8s_v3 \
--node-count 3

Where the environment variable RG_NAME is the set to the name of the resource group in your lab environment and the environment variable AKS_NAME is set to the name of the AKS cluster in your lab environment.

info

You may or may not have enough quota to deploy Standard_L8s_v3 VMs. If you encounter an error, please try with a different VM size within the L-family or request additional quota by following the instructions here.

Update the cluster to enable Azure Container Storage.

az aks update \
--resource-group ${RG_NAME} \
--name ${AKS_NAME} \
--enable-azure-container-storage ephemeralDisk \
--azure-container-storage-nodepools ${ACSTOR_NODEPOOL_NAME} \
--storage-pool-option NVMe \
--ephemeral-disk-volume-type PersistentVolumeWithAnnotation

note

This command can take up to 20 minutes to complete.

Run the following command and wait until all the pods reaches Running state.

kubectl get pods -n acstor --watch

You will see a lot of activity with pods being created, completed, and terminated. This is expected as the Azure Container Storage is being enabled.

Delete the default storage pool created.

kubectl delete sp -n acstor ephemeraldisk-nvme

Create a replicated ephemeral storage pool

With Azure Container Storage enabled, storage pools can also be created using Kubernetes CRDs. Run the following command to deploy a new StoragePool custom resource. This will create a new storage class using the storage pool name prefixed with acstor-.

kubectl apply -f - <<EOF
apiVersion: containerstorage.azure.com/v1
kind: StoragePool
metadata:
  name: ephemeraldisk-nvme
  namespace: acstor
spec:
  poolType:
    ephemeralDisk:
      diskType: nvme
      replicas: 3
EOF

Now you should see the new storage class called acstor-ephemeraldisk-nvme has been created.

kubectl get sc

Deploy a MySQL server using new storage class

This setup is a modified version of this guide.

Run the following command to download the MySQL manifest file.

curl -o acstor-mysql-config-services.yaml https://gist.githubusercontent.com/pauldotyu/f459c834558fd83a6254fae0eb23b1e6/raw/ad1b5db804060b18b3ea123db9189f1a2d56414b/acstor-mysql-config-services.yaml

Optionally, run the following command to take a look at the MySQL manifest file.

cat acstor-mysql-config-services.yaml

Run the following command to deploy the config map and services for the MySQL server.

kubectl apply -f acstor-mysql-config-services.yaml

Next, we'll deploy the MySQL server using the new storage class.

Run the following command to download the MySQL statefulset manifest file.

curl -o acstor-mysql-statefulset.yaml https://gist.githubusercontent.com/pauldotyu/f7539f4fc991cf5fc3ecb22383cb227c/raw/274b0747f1094db53869bcb0eb25faccf0f37a6a/acstor-mysql-statefulset.yaml

Optionally, run the following command to take a look at the MySQL statefulset manifest file.

cat acstor-mysql-statefulset.yaml

Run the following command to deploy the statefulset for MySQL server.

kubectl apply -f acstor-mysql-statefulset.yaml

Verify that all the MySQL server's components are available

Run the following command to verify that both mysql services were created (headless one for the statefulset and mysql-read for the reads).

kubectl get svc -l app=mysql

You should see output similar to the following:

NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
mysql        ClusterIP   None           <none>        3306/TCP   5h43m
mysql-read   ClusterIP   10.0.205.191   <none>        3306/TCP   5h43m

Run the following command to verify that MySql server pod is running. Add the --watch to wait and watch until the pod goes from Init to Running state.

kubectl get pods -l app=mysql -o wide --watch

You should see output similar to the following:

NAME      READY   STATUS    RESTARTS   AGE   IP             NODE                                NOMINATED NODE   READINESS GATES
mysql-0   2/2     Running   0          1m34s  10.244.3.16   aks-nodepool1-28567125-vmss000003   <none>           <none>

note

Keep a note of the node on which the mysql-0 pod is running.

Inject data to the MySql database

Run the following command to run and exec into a mysql client pod to the create a database named school and a table students. Also, make a few entries in the table to verify persistence.

kubectl run mysql-client --image=mysql:5.7 -i --rm --restart=Never -- \
mysql -h mysql-0.mysql <<EOF
CREATE DATABASE school;
CREATE TABLE school.students (RollNumber INT, Name VARCHAR(250));
INSERT INTO school.students VALUES (1, 'Student1');
INSERT INTO school.students VALUES (2, 'Student2');
EOF

Verify the entries in the MySQL server

Run the following command to verify the creation of database, table, and entries.

kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never -- \
mysql -h mysql-read -e "SELECT * FROM school.students"

You should see output similar to the following:

+------------+----------+
| RollNumber | Name     |
+------------+----------+
|          1 | Student1 |
+------------+----------+
|          2 | Student2 |
+------------+----------+

Initiate the node failover

Now we will simulate a failover scenario by deleting the node on which the mysql-0 pod is running.

Run the following command to get the current node count in the Azure Container Storage node pool.

NODE_COUNT=$(az aks nodepool show \
--resource-group ${RG_NAME} \
--cluster-name ${AKS_NAME} \
--name ${ACSTOR_NODEPOOL_NAME} \
--query count \
--output tsv)

Run the following command to scale up the Azure Container Storage node pool by 1 node.

az aks nodepool scale \
--resource-group ${RG_NAME} \
--cluster-name ${AKS_NAME} \
--name ${ACSTOR_NODEPOOL_NAME} \
--node-count $((NODE_COUNT+1)) \
--no-wait

Now we want to force the failover by deleting the node on which the mysql-0 pod is running.

Run the following commands to get the name of the node on which the mysql-0 pod is running.

POD_NAME=$(kubectl get pods -l app=mysql -o custom-columns=":metadata.name" --no-headers)
NODE_NAME=$(kubectl get pods $POD_NAME -o jsonpath='{.spec.nodeName}')

Run the following command to delete the node on which the mysql-0 pod is running.

kubectl delete node $NODE_NAME

Observe that the mysql pods are running

Run the following command to get the pods and observe that the mysql-0 pod is running on a different node.

kubectl get pods -l app=mysql -o wide --watch

Eventually you should see output similar to the following:

NAME      READY   STATUS    RESTARTS   AGE   IP             NODE                                NOMINATED NODE   READINESS GATES
mysql-0   2/2     Running   0          3m25s  10.244.3.16   aks-nodepool1-28567125-vmss000002   <none>           <none>

note

You should see that the mysql-0 pod is now running on a different node than you noted before the failover.

Verify successful data replication and persistence for MySQL Server

Run the following command to verify the mount volume by injecting new data by running the following command.

kubectl run mysql-client --image=mysql:5.7 -i --rm --restart=Never -- \
mysql -h mysql-0.mysql <<EOF
INSERT INTO school.students VALUES (3, 'Student3');
INSERT INTO school.students VALUES (4, 'Student4');
EOF

Run the command to fetch the entries previously inserted into the database.

kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never -- \
mysql -h mysql-read -e "SELECT * FROM school.students"

You should see output similar to the following:

+------------+----------+
| RollNumber | Name     |
+------------+----------+
|          1 | Student1 |
+------------+----------+
|          2 | Student2 |
+------------+----------+
|          3 | Student3 |
+------------+----------+
|          4 | Student4 |
+------------+----------+

The output obtained contains the values entered before the failover. This shows that the database and table entries in the MySQL Server were replicated and persisted across the failover of mysql-0 pod. The output also demonstrates that, newer entries were successfully appended on the newly spawned mysql server application.

Congratulations! You successfully created a replicated local NVMe storage pool using Azure Container Storage. You deployed a MySQL server with the storage pool's storage class and added entries to the database. You then triggered a failover by deleting the node hosting the workload pod and scaled up the cluster by one node to maintain three active nodes. Finally, you verified that the pre-failover data were successfully replicated and persisted, with new data added on top of the replicated data.

Summary

In this workshop, you explored advanced storage options in Azure Kubernetes Service (AKS), covering both Block Storage (Azure Disks, Elastic SAN, Local Disks) and Shared File Storage solutions (Azure Files, NetApp Files, Blobs). You learned how different orchestration methods like CSI drivers and Azure Container Storage can be leveraged depending on your requirements.

Through hands-on exercises, you successfully deployed a MySQL database using replicated local NVMe storage via Azure Container Storage. You demonstrated real-world resilience by simulating node failure and verifying data persistence across failover events - a critical capability for production workloads.

For more information, check out these resources:

Prerequisites​

Setup Azure CLI​

Setup Resource Group​

Setup AKS Cluster​

Storage Options​

Orchestration Options​

CSI Drivers​

Azure Container Storage​

Use Azure Container Storage for Replicated Ephemeral NVMe Disk​

Setup Azure Container Storage​

Create a replicated ephemeral storage pool​

Deploy a MySQL server using new storage class​

Verify that all the MySQL server's components are available​

Inject data to the MySql database​

Verify the entries in the MySQL server​

Initiate the node failover​

Observe that the mysql pods are running​

Verify successful data replication and persistence for MySQL Server​

Summary​

Prerequisites

Setup Azure CLI

Setup Resource Group

Setup AKS Cluster

Storage Options

Orchestration Options

CSI Drivers

Azure Container Storage

Use Azure Container Storage for Replicated Ephemeral NVMe Disk

Setup Azure Container Storage

Create a replicated ephemeral storage pool

Deploy a MySQL server using new storage class

Verify that all the MySQL server's components are available

Inject data to the MySql database

Verify the entries in the MySQL server

Initiate the node failover

Observe that the mysql pods are running

Verify successful data replication and persistence for MySQL Server

Summary