Justin

(Justin)

Tech nerd from Sweden

Matrix: @jlh:jlh.name

  • 12 Posts
  • 2.11K Comments
Joined 2 years ago
cake
Cake day: June 10th, 2023

help-circle
  • This conversation is about ssds vs hdds in a server environment, but I’m not sure if those claims are true on either environment.

    sata ssds are identical to sata hdds, the controller is just able to write down faster.

    I could see some argument about nvme interrupts/polling being slower than sata at scale, but you’re not going to see a difference on a modern CPU with less than 10 nvme drives.

    Sequential performance is meaningless these days, workstation and server performance are both limited by iops and latency. Raid increases latency slightly, but iops scale linearly until you run out of CPU or memory bandwidth.

    Any file system will always be faster on an ssd than on an hdd. xfs/ext4/btrfs don’t have any hdd specific optimizations as far as I know. ZFS does, but it’s not going to make ssds slower than hdds, it just causes some write amplification.

    Enterprise ssds are cheaper and faster than consumer ssds, you can buy them super cheap on eBay. 2TB with PLP for $100. However, you need to make sure you can fit a 22110 m.2 or have an adapter cable for u.2.

    You’re always going to be better off building raid on ssd than hdd as long as you have the budget for it.




  • Yeah, I think you pick up things from all over the place as a consultant. I see lots of different environments and learn from them.

    Ah yeah, external-dns operator is great! it’s maybe a bit basic at times but its super convenient to just have A/AAAA records appear for all your loadbalancer svcs and HTTPRoutes. Saves a ton of time.

    That’s super unfortunate that the certs are siloed off. Maybe they can give you a NS record for a subdomain for you to use ACME on? I’ve seen that at some customers. Super important that all engineers have access to self-service certs, imo.

    Rook is great! It definitely can be quite picky about hardware and balancing, as I’ve learned from trying to set it up with two nodes at home with spare hdds and ssds 😅 Very automated once it’s all set up and you understand its needs, though. NFS provisioner is also a good option for a storageclass as a first step, that’s what I used in my homelab from 2021 to 2023.

    Heres my rook config:
    https://codeberg.org/jlh/h5b/src/branch/main/argo/external_applications/rook-ceph-helm.yaml
    https://codeberg.org/jlh/h5b/src/branch/main/argo/custom_applications/rook-ceph

    Up to 3 nodes and 120TiB now and I’m about to add 4 more nodes. I probably would recommend just automatically adding disks instead of manually adding them, I’m just a bit more cautious and manual with my homelab “pets”.

    I’m not very far on my RHCE yet tbh 😅 Red hat courses are a bit hard to follow 😅 But hopefully will make some progress before the summer.

    The CKA and CKS certs are great! Some really good courses for those on udemy and acloudguru, there’s a good lab environment on killer.sh, and the practice exams are super useful. I definitely recommend those certs, you learn a lot and it’s a good way to demonstrate your expertise.



  • Well, my point was to explain how Kubernetes simplifies devops to the point of being simpler than most proxmox or Ansible setups. That’s especially true if you have a platform/operations team managing the cluster for you.

    Some more details missed here would be that external-dns and cert-manager operators usually handle the DNS records and certs for you in k8s, you just have to specify the hostname in the HTTPRoute/VirtualService and in the Certificate. For storage, ansible probably simplifies some of this away, but LVM is likely more manual to set up and manage than pointing a PVC at a storageclass and saying “100Gi”.

    Either way, I appreciate the discussion, it’s always good to compare notes on production setups. No hard feelings even in the case that we disagree on things. I’m a Red Hat Openshift consultant myself these days, working on my RHCE, so maybe we’ll cross paths some day in a Red Hat environment!


  • You’re not using a reverse proxy on rhel, so you’ll need to also make sure that the ports you want are available, and set up a dns record for it, and set up certbot.

    On k8s, I believe istio gateways are meant to be reused across services. You’re using a reverse proxy so the ports will already be open, so no need to use firewall-cmd. What would be wrong with the Service included in the elasticsearch chart?

    It’s also worth looking at the day 2 implications.

    For backups you’re looking at bespoke cronjobs to either rsync your database or clone your entire 100gb disk image, compared to either using velero or backing up your underlying storage.

    For updates, you need to run system updates manually on rhel, likely requiring a full reboot of the node, while in kubernetes, renovate can handle rolling updates in the background with minimal downtime. Not to mention the process required to find a new repo when rhel 11 comes out.






  • Yeah I’m not saying everybody has to go and delete their infra, I just think that all new production environments should be k8s by default.

    The production-scale Grafana LGTM stack only runs on Kubernetes fwiw. Docker and VMs are not supported. I’m a bit surprised that Kubernetes wouldn’t have enough availability to be able to co-locate your general workloads and your observability stack, but that’s totally fair to segment those workloads.

    I’ve heard the argument that “kubernetes has more moving parts” a lot, and I think that is a misunderstanding. At a base level, all computers have infinite moving parts. QEMU has a lot of moving parts, containerd has a lot of moving parts. The reason why people use kubernetes is that all of those moving parts are automated and abstracted away to reduce the daily cognitive load for us operations folk. As an example, I don’t run manual updates for minor versions in my homelab. I have a k8s CronJob that runs renovate, which goes and updates my Deployments in git, and ArgoCD automatically deploys the changes. Technically that’s a lot of moving parts to use, but it saves me a lot of manual work and thinking, and turns my whole homelab into a sort of automated cloud service that I can go a month without thinking about.

    I’m not sure if container break-out attacks are a reasonable concern for homelabs. See the relatively minor concern in the announcement I made as an Unraid employee last year when Leaky Vessels happened. Keep in mind that containerd uses cgroups under the hood.

    Yeah, apparmor/selinux isn’t very popular in the k8s space. I think it’s easy enough to use them, plenty of documentation out there; but Openshift/okd is the only distribution that runs it out of the box.



  • Sure!

    I haven’t used quadlets yet, but I did set up a few systemd services for containers back in the day before quadlets came out. I also used to use docker compose back in 2017/2018.

    Docker compose and Kubernetes are very similar as a homelab admin. Docker compose syntax is a little less verbose, and it has some shortcuts for storage and networking. But that also means it’s less flexible if you are doing more complex things. Docker compose doesn’t start containers on boot by default I think(?) which is pretty bad for application hosting. Docker-compose has no way of automatically deploying from git like ArgoCD does.

    Kubernetes also has a lot of self-healing automation, like health checks that can either disable the load balancer and/or restart the container if an app is failing, automatic killing of containers when resources are low, preventing the scheduling of new containers when resources are low, gradual roll-out of containers so that the old version of a container doesn’t get killed until the new version is up and healthy (helpful in case the new config is broken), mounting secrets as files in a container, and automatic retry on failed containers.

    There’s also a lot of ubiquitous automation tools in the Kubernetes space, like cert-manager for setting up certificates (both ACME and local CA), Ingress for setting up reverse proxy, CNPG for setting up postgres clusters with automated backups, and first-class instrumentation/integration with prometheus and loki (both were designed for kubernetes first).

    The main downsides with Kubernetes in a homelab is that there is about a 1-2GiB RAM overhead for small clusters, and most documentation and examples are written for docker-compose, so you have to convert apps into a Deployment (you get used to writing deployments for new apps though). I would say installing things like Ingress or CNPG is probably easier than installing similar reverse-proxy automations on Docker-compose, though.





  • Yes, it’s fine to still have VMs, but you shouldn’t be building out new applications and new environments on VMs or LXC.

    The only VMs I’ve seen in production at my customers recently are application test environments for applications that require kernel access. Those test environments are managed by software running in containers, and often even use something like Openshift Virtualization so that the entire VM runs inside a container.


  • I’m a DevOps/ Platform Engineering consultant, so I’ve worked with about a dozen different customers on all different sorts of environments.

    I have seen some of my customers use nested VMs, but that was because they were still using VMware or similar for all of their compute. My coworkers say they’re working on shutting down their VMware environments now.

    Otherwise, most of my customers are running Kubernetes directly on bare metal or directly on cloud instances. Typically the distributions they’re using are Openshift, AKS, or EKS.

    My homelab is all bare metal. If a node goes down, all the containers get restarted on a different node.

    My homelab is fully gitops, you can see all of my kubernetes manifests and nixos configs here:

    https://codeberg.org/jlh/h5b