Casper Feng - Software developer, consultant, and nerd

Introduction

Recently, I developed several standalone applications and deployed them on the cloud. Each of these applications had a need for data storage, and the process of choosing a suitable storage solution involved taking into account multiple factors.

Considerations

When selecting a database, there are several factors to take into account, and the most suitable database varies on a case by case basis. But in general these were the key factors I considered:

The structure of the data: As the applications didn't need a complex form of storage with multiple models and relations between them a simple key/value storage would suffice. Is the wanted data separated into multiple models, where a more complex form of retrieval is needed?

Ease of use: The cognitive cost of learning and setting up a working database fulfilling the requirements.
Cost: The price and cost of using a database can vary and minimising costs while not impacting performance was important.
Community: The documentation, support and number of users. Using a database with frequent breaking changes was not an option.

Selecting a database

In theory a simple key-value object like a JavaScript dictionary would fulfil the minimum requirement of the application, but as performance was important, and the applications could have periods of high load requiring multiple replicas it would not work as the applications would require some form of shared memory. The choice fell on Redis, an in-memory database. There were a number of reasons why, but the mains ones were:

The applications required no form of complex query language and only needed simple set and get methods.
Redis is very fast and also has a low latency
Redis has a large community and is an industry standard when it comes to in-memory databases and is a very popular choice.
Developing with Redis locally is also very simple, and can be done through a docker-compose.yml-file. An example of a compose-file running a node application and setting up a local redis instance:

docker-compose.yml

version: '3.9'
services:
  cache:
    image: redis:alpine
    ports:
      - '6379:6379'

  app:
    env_file:
      - .env
    image: node:18
    working_dir: /app
    environment:
      - REDIS_URI=redis://cache:6379
    volumes:
      - ./:/app
    command: npm run dev
    ports:
      - '8080:8080'
    depends_on:
      - cache

Deployment and persistence

For deployment, Kubernetes was chosen due to its powerful eco system and community (and the fact that a cluster was readily available to deploy more artifacts to).

As this article focuses on Redis and Data persistence, I will not include the manifests related to the application here. As mentioned briefly earlier, Redis is an in-memory database, which means that the default behaviour is to wipe all data upon system restarts or crashes. Luckily!!! There are ways around this tradeoff by making use of a brilliant feature called Persistent Volume Claims (...or PVC for short)

What are Persistent Volume Claims (PVC)?

A PVC is a way to request specific storage resources, such as disk space or IOPS, and have the cluster automatically provision the storage and make it available to the pod(s) that the PVC is bound to. When a pod is deleted, the PVC will not be deleted automatically, the storage resources will still exist in the cluster, this allows to keep the data even if the pod is deleted, and reuse it later.

Creating the manifest

The manifest I used in order to persist data is added below. It's a fairly simple manifest that uses most of the default configuration without any key eviction strategies (this is due to the relatively small amount of data to be stored). Another option to persist the data would be to use a managed Redis instance (although it would cost alot more 💵)!

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-service-config
  namespace: casper-prod
  labels:
    app: redis-service
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: redis-pv-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-service
  namespace: casper-prod
  labels:
    app: redis-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-service
  template:
    metadata:
      labels:
        app: redis-service
    spec:
      containers:
        - name: redis-service
          image: redis:4.0.11-alpine
          args: ['/usr/local/etc/redis/redis.conf']
          volumeMounts:
            - name: config
              mountPath: /usr/local/etc/redis/redis.conf
              readOnly: true
              subPath: redis.conf
            - name: redis-persistent-storage
              mountPath: /data/redis
          ports:
            - containerPort: 6379
          resources:
            limits:
              cpu: '1000m'
              memory: '2000Mi'
            requests:
              cpu: '1000m'
              memory: '2000Mi'
      volumes:
        - name: config
          configMap:
            defaultMode: 0666
            name: redis-service-config
        - name: redis-persistent-storage
          persistentVolumeClaim:
            claimName: redis-pv-claim
---
kind: Service
apiVersion: v1
metadata:
  name: redis-service
  namespace: casper-prod
  labels:
    app: redis-service
spec:
  selector:
    app: redis-service
  ports:
    - name: redis-service
      port: 6379

By using this setup, data is persisted even when the application crashes, kubernetes updates or the pod shuts down. It is also possible to mount the PVC to multiple Redis instances (although not recommended as writing to the same data results in errors with locking and data consistency!!! If the application requires sharing data between multiple Redis instances, check out Redis cluster instead).

Conclusion

In conclusion, when creating standalone applications and deploying them on the cloud, data storage is an important factor to consider. Applications have different needs and selecting the correct storage can make or break an application. When selecting Redis, an in-memory database that is fast and has a low latency, one has to be aware that it by default is an in-memory database and does not persist data. This can be prevented by using PVCs allowing for specific storage available to Pods, which makes it a great tool to know about, and something I will consider as an alternative to managed database instances in the future.