Sudip Sengupta

CORE

Technical Writer at Javelynn

Norwich, GB

Joined Apr 2020

https://www.javelynn.com

Stats

Reputation:	6497
Pageviews:	1.0M
Articles:	65
Comments:	0

Expertise

Maintenance

Articles

How To Build Docker Images in Docker Hub Using Jenkins Pipeline

Learn how to set up your environment, create your first Jenkins Pipeline, define said pipeline, and run your pipeline and build images.

Updated March 7, 2024

· 48,203 Views · 2 Likes

A Comprehensive Approach to Performance Monitoring and Observability: Enhancing APM With Observability Insights at Scale

Delve into core concepts of observability and monitoring and how the modern observability approach differs from/complements traditional monitoring practices.

November 17, 2023

· 5,066 Views · 4 Likes

Optimizing Kubernetes Costs With FinOps Best Practices

Delve into the multifaceted complexities of a distributed Kubernetes ecosystem and cost implications; discuss the recommended FinOps practices for Kubernetes.

October 24, 2023

· 4,451 Views · 2 Likes

Breaking Down the Monolith: The Containerization Journey of Transforming Monolithic Applications Into Microservices

This article explores monolithic architecture's limitations and demonstrates how containers and microservices can support modern application delivery.

June 8, 2023

· 10,192 Views · 8 Likes

Building an Open-Source Observability Toolchain

Read this article to explore the benefits of building an open-source toolchain for the observability of distributed systems and more.

November 19, 2022

· 8,456 Views · 1 Like

An Overview of CI/CD Pipelines With Kubernetes

Take a look at CI/CD approaches in a Kubernetes ecosystem, best practices for implementing an efficient CI/CD framework, and popular open-source CI/CD tools.

October 25, 2022

· 7,128 Views · 4 Likes

What Is a CSRF Token?

CSRF token is a secret value that should be handled securely to remain valid during cookie-based sessions.

Updated June 3, 2022

· 10,776 Views · 6 Likes

How to Find a Vulnerability in a Website

Explore how to find a vulnerability in a website and how adopting security practices can prevent application issues.

May 25, 2022

· 6,324 Views · 2 Likes

An Overview of Key Components of a Data Pipeline

Dive into how a data pipeline helps process enormous amounts of data, key components, various architecture options, and best practices for maximum benefits.

May 21, 2022

· 8,836 Views · 3 Likes

Securing Your CI/CD Pipeline

In this article, learn about common CI/CD security challenges and advanced strategies to mitigate threats.

March 31, 2022

· 16,354 Views · 6 Likes

Best Practices to Secure Your API

Explore various risks associated with API vulnerabilities while learning common API security best practices to implement robust security mechanisms.

March 15, 2022

· 11,257 Views · 7 Likes

Advanced Kubernetes Deployment Strategies

This article reviews concepts in Kubernetes deployment, as well as delves into various advanced Kubernetes deployment strategies, pros and cons, and use cases.

January 30, 2022

· 7,717 Views · 5 Likes

Understanding Persistent Volumes and PVCs in Kubernetes and OpenEBS

Persistent Volumes expose physical storage implementations to Kubernetes clusters so PODs can store and share data.

November 21, 2021

· 10,056 Views · 9 Likes

Code Injection – Examples and Prevention

This article explores how a code injection attack is performed, the types of attacks, and how software teams can protect their web applications from injection flaws.

November 21, 2021

· 7,021 Views · 5 Likes

Broken Access Control and How to Prevent It

This post explores broken access control vulnerabilities and what firms can do to prevent access control flaws. Read below to find out more.

September 30, 2021

· 3,793 Views · 2 Likes

Application Security Checklist

Evaluating security controls with OWASP’s top 10 security tests

September 24, 2021

· 20,132 Views · 8 Likes

Your Guide to Automated Vulnerability Scanners: Types, Benefits, and More

This article delves into why organizations should embrace automated vulnerability scanning, various scanning mechanisms, and more.

Updated September 21, 2021

· 13,866 Views · 2 Likes

Provisioning High-Performance Storage for NoSQL databases with OpenEBS

Modern applications rely on multiple data models that generate different data types. To help such use cases, NoSQL (Not only SQL) databases allow for the storage and processing of different data types in a non-tabular fashion. NoSQL databases process unstructured data using flexible schemas to enable efficient storage and analysis for distributed, data-driven applications. By relaxing data consistency restrictions of SQL-based databases, NoSQL databases enable low latency, scalability and high performance for data access. The performance of SQL databases varies with the cluster size, the application’s configuration and network latency. This means that developers don’t have to worry about optimizing data structures, indexes and queries to achieve peak performance from the storage subsystem. NoSQL databases are popular with modern cloud applications since they use APIs for the storage and retrieval of data structures. This makes it easy to interface NoSQL databases with a host of microservice-based applications for agile, cloud-native deployment. OpenEBS is a leading Container Attached Storage (CAS) that helps developers deploy workloads efficiently by turning storage available on worker nodes into dynamically provisioned volumes. With OpenEBS, developers can implement granular policies per workload, reduce storage costs, avoid vendor lock-in and develop persistent storage for applications with high availability. This post delves into how OpenEBS can be used to provision high-performance storage for NoSQL databases. A Deep Dive into NoSQL Databases Before the mid-2000s, relational databases dominated application development. The Structured Query Language (SQL) was used to store and retrieve data structures, with popular databases being MySQL, Oracle, PostgreSQL, DB2 and SQL Server. The decrease in the cost of storage options eliminated the need to use strict, complex data models allowing for data applications to store and query more types of data. NoSQL databases evolved to handle the flexible schema needed to process different combinations of structured, semi-structured and polymorphic data. This section explores various features, types and popular distributions of NoSQL databases. Features of NoSQL Databases Though there are various flavours of NoSQL databases, here’s a list of common features that distinguish them from SQL databases: Multi-Model NoSQL databases are built for flexibility to handle large amounts of data. Unlike SQL-based databases that access & analyze data in tables and columns, NoSQL databases ingest all types of data with relative ease. NoSQL databases enable the creation of specific data models for each application, enhancing agility in handling different data types without having to manage separate databases. Schema Agnostic Schemas are used to instruct a relational database on what data to expect and how to store the data. Any changes in data structures or data paths require extensive re-architecting of the database using a modified schema. On the contrary, NoSQL databases require no upfront design work before data is stored. These databases shorten development time by allowing developers to start coding and accessing data without having to understand the internal implementation of the database. Non-Relational Relational databases have strict restrictions on how tables associate with each other while relying on traditional master-slave architecture. On the other hand, NoSQL databases do not rely on tabularized data with fixed row and column records while running on peer-to-peer networks with no concept of relationships between their records. The databases, therefore, facilitate easy storage & retrieval and fast query speed for data-driven applications. Distributable NoSQL database systems are designed to use multiple locations involving different data centres/regions for global scalability. Thanks to their masterless architecture, NoSQL databases can maintain continuous availability through the replication of data in multiple read/write locations. Easily Scalable While relational databases are also scalable, their scalability is costly and complex since their architecture requires the addition of larger, more powerful hardware for scaling. NoSQL databases allow for linear scalability both vertically and horizontally, so developers can either provide more processors or powerful ones for increased workloads. Types of NoSQL Databases Document Databases In this database, data is represented as a JSON-type document or object for efficient and intuitive data modelling. Document-type NoSQL databases simplify application development since developers can query and store data with the same document model format used in application source code. Document databases are flexible, hierarchical and semi-structured, so they scale to match with an application’s evolving needs. Graph NoSQL Databases These databases store data in edges and nodes. A node stores information about entities while the edge stores information about the relationship between these entities. These databases are mostly used in applications that analyze relationships between users and other entities to identify patterns. These include social networking, knowledge graphs, recommendation engines and fraud detection. Key-Value Databases These store data in simple key-value pairs. Values are retrieved by referencing their keys, which simplifies how developers query for specific data entries. These databases are highly applicable in applications that need to store large amounts of data without needing complex query operations for retrieval. The most common use-case for key-value databases is the storage of user preferences. Wide-Column Stores These databases use dynamic columns, rows and tables to store data. They are more flexible than relational databases since each row can have a different number of columns. Wide column stores are used for large amounts of data with predictable query patterns. Popular NoSQL Databases Some popular NoSQL Database management solutions include: Cassandra A free, open-source, distributed, wide-column store that powers mission-critical deployments with fault-tolerant replication, hybrid cloud support and audit logging. Apache Cassandra is trusted by major brands, such as Facebook, Netflix, Macy’s and MobilePay for scalability, high availability and improved performance. Follow this guide to explore the steps to deploy Cassandra StatefulSets with OpenEBS storage. MongoDB MongoDB is popular with modern application development as it relies on the document data model to simplify database management for developers. MongoDB includes functionality such as data-geolocation, horizontal scaling and automatic failover to accelerate development and adapt to application changes. Read this guide to discover how to provision Persistent Volumes for MongoDB StatefulSets in Kubernetes with OpenEBS. Redis An open-source, distributed, in-memory key-value data store that can be used in application development as a message broker, cache and database. Redis supports multiple types of abstract data structures and provides high availability through automatic disk partitioning and the Redis Sentinel framework. This guide demonstrates how OpenEBS can be used to provide persistent storage for Redis StatefulSets. OpenEBS for NoSQL Databases OpenEBS simplifies the deployment of stateful Kubernetes workloads using a collection of data engines to implement persistent volumes. The OpenEBS control plane is deeply integrated with Kubernetes, and uses Kubernetes-friendly constructs to manage how volumes are provisioned, scheduled and maintained. With OpenEBS, cluster administrators can take advantage of dynamic provisioning for local and distributed volumes, depending on workload and cluster size. These features make OpenEBS a popular choice to orchestrate storage for stateful applications. The following section explores the steps taken to provision OpenEBS storage for NoSQL databases. Why Adopt OpenEBS for NoSQL? Many organizations and users have adopted OpenEBS to deploy and provision storage for their stateful workloads, including those who use NoSQL. Some of the following are reasons to adopt OpenEBS for NoSQL databases include: Open-Source, Kubernetes-Centric Cloud Native Storage OpenEBS follows a loosely coupled Container Attached Storage (CAS) architecture. OpenEBS itself is deployed as a workload on Kubernetes nodes, bringing the DevOps benefits of container orchestration in the application layer to the data layer. This allows developers to leverage the cloud-native benefits of Kubernetes such as agility and scalability for developing reliable, effective data-driven applications. No Cloud Lock-in With OpenEBS, application data is written into storage engines, creating a data abstraction layer. This allows developers to move data easily between multiple Kubernetes environments. OpenEBS can be deployed on-premises, local storage or managed cloud services -- thereby allowing NoSQL applications to simultaneously access data stored on different deployment platforms. OpenEBS Enables Granular Deployment and Management of Workloads The cloud-native, loosely-coupled architecture of OpenEBS clusters enable multiple teams to deliver faster since they are free of cross-functional dependencies. OpenEBS also makes it easier to declare policies and settings on a per-workload or per-volume basis, with constant monitoring to ensure workloads achieve desired results. Granularity makes it easier for developers to segregate large amounts of data based on data type, structure or use-case. Reduced Storage Costs The dynamic nature of NoSQL data often necessitates the over-provision of cloud storage resources to achieve higher performance and a lower risk of disruption. To help with this, OpenEBS relies on thin provisioning mechanisms to pool storage and grow data volumes when the NoSQL database needs it. While adjusting storage on the fly without disrupting volumes attached to workloads, OpenEBS enables cost savings of up to 60% leveraging a thin, dynamic provisioning. Configuration Workflow To provision high-performance storage for NoSQL databases using OpenEBS, the following steps are undertaken: Installing OpenEBS - OpenEBS integrates seamlessly into the Kubernetes workflow and is installed into the cluster using installation modes available to Kubernetes applications, such as helm and with YAML files via kubectl. This activates a declarative storage control plane that can be managed from within the cluster. Selecting the OpenEBS storage engine - The storage engine represents OpenEBS data plane components. OpenEBS includes a number of storage engines and may automatically choose one that suits the storage available on nodes and application requirements. OpenEBS comes with three storage engines: cStor, Jiva, Local PV and Mayastor. Creating a Storage Class - The StorageClass is used to provision volumes on physical storage devices. These classes consume storage pools created on the disks attached to cluster nodes. Launching the NoSQL database - The database is then deployed into the cluster either using operators or helm charts. A typical architecture of NoSQL database with OpenEBS Persistent Volumes Monitoring Deployment Metrics OpenEBS includes post-deployment recommendations to ensure a successful deployment for NoSQL workloads. While developers are advised to allocate sufficient volume size during initial configuration, the volume size should be constantly monitored to ensure seamless operation. Additionally, developers and administrators should watch pool capacity and add more physical disks once the workload hits an 80% threshold. OpenEBS integrates with Kubernetes centric monitoring & logging tools such as Prometheus and Grafana for easier metric collection, analysis and visualization. NoSQL databases enable data-driven application development since they facilitate global scalability, flexibility and delivery agility. Cloud-native application architectures are considered perfect for NoSQL databases since they deliver on-demand infrastructure for dynamic workloads. As an essential enabler, OpenEBS can orchestrate stateful Kubernetes workloads including NoSQL databases, offering multiple benefits such as reduced storage costs and zero lock-in. To know more on how OpenEBS can help to manage your organization’s stateful workloads, contact us here. This article has already been published on https://blog.mayadata.io/provisioning-high-performance-storage-for-nosql-databases-with-openebs and has been authorized by MayaData for a republish.

September 10, 2021

· 3,813 Views · 2 Likes

Why Use LocalPV with NVMe for Your Workload?

Containerized applications are ephemeral, which means any data created by a container is lost as soon as the process terminates. This requires a pragmatic approach to data persistence and management when orchestrating containers using Kubernetes. To deal with this, the Kubernetes orchestration platform uses Volume plugins to isolate storage consumption from provisioned hardware. A Persistent Volume (PV) is a Kubernetes API resource that provisions persistent storage for PODs. Cluster resources can use a PV construct to mount any storage unit -- file system folders or block storage options -- to Kubernetes nodes. PODs request for a PV using Persistent Volumes Claims (PVC). These storage integrations and other features make it possible for containerized applications to share data with other containers and preserve the container state. PVs can be provisioned statically by the cluster administrator or dynamically using Storage Classes. Some important features that distinguish different Storage Classes include Capacity, Volume Mode, Access Modes, Performance and Resiliency. When a Local Disk is attached directly to a single Kubernetes node, it is known as a Local PV which provides the best performance and is only accessible from a single node where it is attached. This post explores why LocalPV and NVMe storage should be used for Kubernetes workloads. Non-Volatile Memory Express (NVMe) for Kubernetes NVMe is a high-speed access protocol that delivers low latency and high throughput for SSD storage devices by connecting them to the processor through a PCIe interface. Early SSDs connected to the CPU through SATA or Serial Attached SCSi (SAS). These relied on legacy standards customized for Hard Disk speeds which were considered inefficient since each connection to the processor remained limited by synchronized locking or the SAS Host Bus Adapter (HBA). To overcome this challenge, NVMe unlocks the true potential of flash storage using the Peripheral Component Interconnect Express (PCIe) that supports high performance, Non-Uniform Memory Access (NUMA). NVMe also supports parallel processing, with 64K Input-Output queues with each queue having 64K entries. This high-bandwidth, low-latency storage hosts applications that can create as many I/O queues as system configuration, workload and the NVMe controller allows. Following a NUMA based storage protocol, NVMe allows different CPUs to manage I/O queues, using various arbitration mechanisms. Modern enterprises are data-driven, with users and devices generating huge amounts of data that may overwhelm companies. By enhancing the capabilities of multi-core CPUs, NVMe provides low latency and fast transfer rates for better access and processing of large data sets. NVMe devices typically rely on NAND Flash Memory that can be hosted on various SSD form factors including normal SSDs, U2 Cards, M2 Cards, and PCIe Add-In Cards. NVMe over Fabrics (NVMe-oF) extends the advantages of NVMe storage access by implementing the NVMe protocol for remotely connected devices. The architecture allows one node to directly access a storage device of another computer over several transport protocols. NVMe Architecture In NVMe architecture, the host computer is connected to SSD storage devices via a high throughput Host-Controller Interface. The storage service is composed of three main elements: SSD Controllers The PCIe Host Interface Non-Volatile Memory (e.g., NAND Flash) To submit queues to the Input/Output, the NVMe controller utilizes Memory-Mapped Controller Registers and the host system’s DRAM. The number of mapped registers determines the number of parallel I/O operations the protocol can support. A typical NVMe storage architecture Advantages of Using NVMe for Kubernetes Clusters PCIe reduces the need for various abstract implementation layers, allowing for faster, efficient storage. Some benefits of using NVMe for storage include: Efficient memory transfer - NVMe protocol only requires one ring per CPU to communicate directly with Non-Volatile Memory, thereby reducing locking speeds for I/O controllers. NVMe also enables parallelism by combining the number of Message Signalled Interrupts with multi-core CPUs to further reduce latency. Secured Cluster Data - NVMe-oF enables secure tunnelling protocols developed and managed by reputable data security communities such as the Trusted Security Group (TSG). This enables enterprise-grade security features such as Encryption at REST, Access Control and Crypto-Erase for cluster nodes and SSD storage devices. Supports Multi-Core Computing - The NVMe protocol utilizes a private queueing strategy to support up to 64K commands per queue over 64K queues. Since every controller has its own set of queues, the throughput increases linearly with the number of CPU cores available. Requires Fewer Instructions to Process I/O requests - NVMe relies on an efficient set of commands to half the number of CPU instructions required to implement Input-Output operations. This reduces latency while enabling advanced security features like reservations and power management for cluster administrators. Why use LocalPV with NVMe Storage for Kubernetes Clusters? While most storage systems used to persist data for Kubernetes clusters are remote and independent of the source nodes, it is possible to attach a local disk directly to a single node. Locally attached storage typically guarantees higher performance and tighter security than remote storage. A Kubernetes LocalPV represents a portion of local disk storage that can be used for data persistence in StatefulSets. With LocalPV, the local disk is specified as a persistent volume that can be consumed with the same PVC and Storage Class abstractions used for remote storage. This results in low latency storage that is suitable for fault-tolerant use-cases such as: Distributed data stores that share replicated data across multiple nodes LocalPV can also be used to cache data sets that require faster processing over data gravity LocalPV vs. hostPath Volumes Before the introduction of LocalPV volumes, hostPath volumes were used for accessing local storage. There were certain challenges while orchestrating local storage with hostPath as it didn’t support important Kubernetes features, such as StatefulSets. Additionally, hostPath volumes required separate operators for disk management, POD scheduling and topology, making them difficult to use in production environments. LocalPV volumes were designed in response to issues with the scheduling, disk accounting and portability of hostPath volumes. One of the major distinctions is that the Kubernetes control plane knows the Node that owns a LocalPV. With hostPath, data is lost when a POD referencing the volume is scheduled to a different node. LocalPV volumes can only be referenced using a Persistent Volume Claim (PVC) while hostPath volumes can be referenced both directly in the POD definition file and via PVC. How to Configure a Kubernetes Cluster with LocalPV NVMe Storage Workloads can be configured to access NVMe SSDs on a local machine using LocalPV and a Persistent Volume Claim, or StatefulSet with volume claim attributes. This section explores how to attach a local disk to a Kubernetes cluster with NVMe storage configured. The first step is to create a storage class that enables Volume Topology-Aware Scheduling. This will instruct the Kubernetes API to not bind a PVC until a Pod consuming the PVC is scheduled. The configuration file for the storage class will be similar to: YAML $ cat sc.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: openebs-device-sc allowVolumeExpansion: true parameters: devname: "test-device" provisioner: device.csi.openebs.io volumeBindingMode: WaitForFirstConsumer Check the doc on storageclasses to know all the supported parameters for Device LocalPV. If the device with a meta partition is available on certain nodes only, then make use of topology to tell the list of nodes where we have the devices available. As shown in the below storage class, we can use allowedTopologies to describe device availability on nodes. YAML apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: openebs-device-sc allowVolumeExpansion: true parameters: devname: "test-device" provisioner: device.csi.openebs.io allowedTopologies: - matchLabelExpressions: - key: kubernetes.io/hostname values: - device-node1 - device-node2 The above storage class tells that device with meta partition test-device is available on nodes device-node1 and device-node2 only. The Device CSI driver will create volumes on those nodes only. The OpenEBS Device driver has its own scheduler which will try to distribute the PV across the nodes so that one node should not be loaded with all the volumes. Currently, the driver supports two scheduling algorithms: VolumeWeighted and CapacityWeighted, in which it will try to find a device which has lesser number of volumes provisioned in it or less capacity of volume provisioned out of a device respectively, from all the nodes where the devices are available. To know about how to select a scheduler via storage-class, refer this link. Once it is able to find the node, it will create a PV for that node and also create a DeviceVolume custom resource for the volume with the node information. The watcher for this DeviceVolume CR will get all the information for this object and create a partition with the given size on the mentioned node. The scheduling algorithm currently only accounts for either the number of volumes or total capacity occupied from a device and does not account for other factors like available cpu or memory while making scheduling decisions. So if you want to use node selector/affinity rules on the application pod, or have cpu/memory constraints, a Kubernetes scheduler should be used. To make use of kubernetes scheduler, you can set the volumeBindingMode as WaitForFirstConsumer in the storage class. This will cause a delayed binding, i.e Kubernetes scheduler will schedule the application pod first, and then it will ask the Device driver to create the PV. The driver will then create the PV on the node where the pod is scheduled. YAML apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: openebs-device-sc allowVolumeExpansion: true parameters: devname: "test-device" provisioner: device.csi.openebs.io volumeBindingMode: WaitForFirstConsumer Please note that once a PV is created for a node, the application using that PV will always get scheduled to that particular node only, as PV will be sticky to that node. The scheduling algorithm by Device driver or kubernetes will come into picture only during the deployment time. Once the PV is created, the application can not move anywhere as the data is there on the node where the PV is. Create a PVC using the storage class created for the device driver. YAML $ cat pvc.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: csi-devicepv spec: storageClassName: openebs-device-sc accessModes: - ReadWriteOnce resources: requests: storage: 4Gi Create the deployment YAML using the PVC backed by device driver storage. YAML $ cat fio.yaml apiVersion: v1 kind: Pod metadata: name: fio spec: restartPolicy: Never containers: - name: perfrunner image: openebs/tests-fio command: ["/bin/bash"] args: ["-c", "while true ;do sleep 50; done"] volumeMounts: - mountPath: /datadir name: fio-vol tty: true volumes: - name: fio-vol persistentVolumeClaim: claimName: csi-devicepv After the deployment of the application, we can go to the node and see that the partition is created and is being used as a volume by the application for reading/writing the data. Advantages of Using LocalPV with NVMe for Kubernetes Operators Some benefits of integrating LocalPV into clusters using NVMe for storage include: Compared to remotely connected storage systems, Local Persistent Volumes support more Input-Output Operations Per Second (IOPS) and throughput since the volume directory is directly mounted on the node. This means that with LocalPV Volumes, organizations can hone in on the high performance offered by NVMe SSDs. LocalPV also enables the dynamic reservation of storage resources needed for stateful services. This makes it easy to relaunch a process on the same node using the same SSD volume. LocalPV volume configuration pins tasks to the nodes where their data resides, eliminating the need for scheduling constraints, thereby enabling quicker access of SSDs through NVMe. Destroying a LocalPV is as easy as deleting the PVC consuming it, allowing for simpler storage management. Summary Non-Volatile Memory Express (NVMe) enhances data storage and access by leveraging the performance benefits of flash memory for SSD based storage. By connecting storage devices to the CPU directly via the PCIe interface, data companies eliminate the bottlenecks associated with SATA or SAS based access. LocalPV reduces the data path between storage and Kubernetes nodes by mounting a volume directly on a Kubernetes node. This results in higher throughput and IOPS, suitable for fault-tolerant stateful applications. OpenEBS by MayaData is one of the popular open-source, agile storage stacks for performance-sensitive databases orchestrated by Kubernetes. Mayastor, OpenEBS’s latest storage engine, delivers very low overhead versus the performance capabilities of underlying devices.. OpenEBS Mayastor does not require NVMe devices or that workloads consume NVMe, although in both cases performance will increase. OpenEBS Mayastor is unique currently amongst open source storage projects in utilizing NVMe internally, to communicate to options OpenEBS replicas. To learn more about how OpenEBS Mayastor, leveraging NVMe as a protocol, performs when leveraging some of the fastest NVMe devices currently available on the market, visit this article. OpenEBS Mayastor builds a foundational layer that enables workloads to coalesce and control storage as needed in a declarative, Kubernetes-native way. While doing so, the user can focus on what's important, that is, deploying and operating stateful workloads. If you’re interested in trying out Mayastor for yourself, instructions for how to set up your own cluster, and run a benchmark like `fio` may be found at https://docs.openebs.io/docs/next/mayastor.html. Related Blogs: https://blog.mayadata.io/the-benefits-of-using-nvme-for-kubernetes https://blog.mayadata.io/mayastor-nvme-of-tcp-performance

August 12, 2021

· 4,222 Views · 2 Likes

Container Attached Storage (CAS) vs. Shared Storage: Which One to Choose?

An Overview of Storage in Kubernetes Kubernetes supports a powerful storage architecture that is often complex to implement unless done right. The Kubernetes orchestrator relies on volumes-abstracted storage resources - that help to save and share data between ephemeral containers. Since these storage resources abstract the underlying infrastructure, volumes enable dynamic provisioning of storage for containerized workloads. In Kubernetes, shared storage is typically achieved by mounting volumes and connecting to an external filesystem or block storage solution. Container Attached Storage (CAS) is a relatively newer solution that allows Kubernetes administrators to deploy storage as containerized microservices in a cluster. The CAS architecture makes workloads more portable and simpler to modify storage based on application needs. Because CAS is deployed per workload or per cluster, it also eliminates the cross workload and cluster blast radius of traditional shared storage. This article compares CAS with traditional shared storage to explore their similarities, differences and architecture overview. Container Attached Storage: Container Attached Storage (CAS) is a solution for stateful workloads that deploys storage as a cluster running in the cloud or on-premises. Unlike traditional storage options where storage is a shared filesystem or block storage running externally, CAS enables storage controllers that can be managed by Kubernetes. These storage controllers can run anywhere with a Kubernetes distribution, whether on top of traditional shared storage systems, or managed storage services like Amazon EBS. Data stored in CAS is accessed directly from containers within the cluster, thereby significantly reducing Read/Write times. Architecture Overview: CAS leverages the container orchestrator’s environment to enable persistent storage. The CAS software has storage targets in containers that run as services. If desired, these services are replicated as microservice-based storage replicas that can easily be scheduled and scaled independently of each other. CAS services can be orchestrated using Kubernetes or any other orchestration platform as containerized workloads, ensuring the autonomy and agility of software development teams. For any CAS solution, the cluster is typically divided into two layers: The control plane consists of the storage controllers, storage policies, and instructions on how to configure the data plane. Control plane components are responsible for the provisioning volumes and other storage associated tasks. The data plane components receive and execute instructions from the control plane on how to save and access container information. The main element of the data plane is the Storage Engine which implements pooled storage. The engine is essentially responsible for the Input-Output volume path. Some popular storage engines of OpenEBS include Mayastor, cStor, Jiva and OpenEBS LocalPV. Some prominent users of OpenEBS include the CNCF, ByteDance(Tiktok), Optro, Flipkart, Bloomberg and others. Features: Container Attached Storage is built to primarily run on Kubernetes and other cloud-native container orchestrators. This makes the solution inherently platform-agnostic and portable, thereby making it an efficient storage solution that can be deployed on any platform without the inconvenience of vendor lock-in. CAS decomposes storage controllers into constituent units that can be scaled and run independently. Every storage controller is attached to a Persistent Volume and typically runs within the user-space, achieving storage granularity and independence from underlying operating systems Control plane entities are deployed as Custom Resource Definitions that deal with physical storage entities such as disks Data plane entities are deployed as a collection of PODs running in the same cluster as the workload The CAS architecture can offer synchronous replication in order to add additional availability. When to Use: Container Attached Storage is steadily becoming the de-facto standard for persistent storage of stateful Kubernetes workloads. CAS is most like the Direct Attached Storage that many current workloads expect, such as NoSQL, logging, machine learning pipelines, Kafka and Pulsar. Many workload communities and users have embraced CAS. CAS also allows small teams to retain control over their workloads. In short, CAS may be preferred where: The workloads expect local storage Teams want to be able to efficiently turn local storage, including disks or cloud volumes, into volumes on demand for Kubernetes workloads Performance is a concern The loose coupling of the architecture is desired to be maintained at the storage layer Increased density of workloads on hosts is desired Small team autonomy is desired to be maintained Traditional Shared Storage: Shared storage was designed to allow multiple users/machines to access and store data in a pool of devices. Shared storage provided additional availability to workloads that themselves were unable to provide for their own availability; additionally, shared storage was able to work around the poor performance of the underlying disk which at the time we're able to deliver no more than 150 I/O operations per second. Today’s underlying drives can be 10,000 times more performant; massively faster than the performance requirements of most workloads. A shared storage infrastructure typically consists of block storage systems in Storage Area Networks (SANs) or file system based storage devices in Network Attached Storage (NAS) configurations. Adoption While the storage industry was once a rapidly growing industry, with growth rates in excess of 30% - 50% YoY in the late 1990s and early 2000s. In the 2010s this growth rate moderated and in certain years stopped entirely. In the 2020s growth started again, however, at a rate much slower than the exponential growth in the amount of data storage. Meanwhile, Direct Attached Storage and Cloud storage each grew more quickly in terms of capacity shipped and overall spending. Architecture Overview In traditional shared storage, all nodes in a network share the same physical storage resources but have their own private memory and processing devices. Files and other data can be accessed by any machine connected to the central storage. For a Kubernetes application, traditional shared storage is first implemented by using monolithic storage software to virtualize physical storage resources, which could either be bare-metal servers, SAN/NAS networks or block storage solutions. The software then connects to Persistent Volumes that store cluster data. Each Persistent Volume (PV) is bound to a Persistent Volume Claim (PVC) which application PODs use to request a portion of the shared storage. Both CAS and shared storage can utilize the Container Storage Interface (CSI). CSI is used to issue the commands to the underlying storage such as the need to provision a PV or to expand or snapshot that capacity. Features: Embraces centralized, consolidated storage for Block and File Storage systems, allowing administration from a single interface. Traditional storage is distinctly divided into 3 layers: the Hosts tier which has client machines, the Fabric layer which includes switches and other networking devices, and the storage layer which includes the controllers used to read/write data onto physical disks. Shared storage integrates redundancy into the design of storage devices, allowing systems to withstand failure to a sizable degree. To scale up traditional shared storage, additional storage devices should be deployed and configured into the existing array. When to Use Shared storage is used to manage large amounts of data generated and accessed by a number of different machines. This is because traditional shared storage enables high performance for large files with no bottlenecks or downtimes. Shared storage is also the go-to storage solution for organizations that depend on collaboration between teams. As data and files are managed centrally, shared storage allows efficient version control and consolidated information management. Traditional Shared Storage is also used to eliminate the need for multiple drives containing the same information, which helps reduce redundancies, thus increasing storage capacity. CAS vs. Shared Storage The two storage options vary greatly in how they persist application data. While traditional shared storage relies on an external array of storage devices to persist data, CAS uses containers within an orchestrated environment. Following are a few similarities and differences between CAS and Traditional Shared Storage: Similarities: Both CAS and traditional shared storage offer high availability storage for applications. CAS allows high availability using Data POD replicas that ensure storage is always available for the CAS cluster. While traditional shared storage uses a redundant design to ensure that the storage system can withstand failure. Both options provide quick storage options for critical applications. CAS uses agile microservices to ensure quick I/O times while shared storage allows multiple machines to quickly read and write data on a shared pool of storage devices, reducing the need to create connections between individual machines. Both solutions accommodate software-defined storage which leverages the performance of physical devices with the agility of software. Both can utilize the Container Storage Interface (CSI) to issue the commands to the underlying storage. Both can be Open Source, extending the openness of Kubernetes to the data layer. It appears that container attached storage is somewhat more likely to be open source however that is yet to be determined conclusively. Differences CAS follows a container-based microservice framework for storage, which means teams can take advantage of the agility and portability of containers to ensure faster, more efficient storage. On the contrary, traditional shared storage involves different virtual or physical machines reading/writing into a shared pool of storage devices, thereby increasing latency and reducing access speeds. CAS is platform-agnostic. This means CAS-based storage solutions can run either on-premises or the cloud, without requiring extensive configuration changes. While shared storage relies on Kernel modifications, making it is inefficient to deploy for workloads across different environments. While traditional shared storage relies on consolidated monolithic storage software, CAS runs on the userspace, enabling independent management capabilities for efficient storage administration at the granular level. CAS allows linear scalability since storage containers can be brought up as required, while in traditional shared storage, scaling involves adding newer devices to an existing storage array Summary Designed in Kubernetes, CAS enables agility, granularity and linear scalability, making it a favourite for cloud-native applications. Traditional shared storage offers a mature stack of storage technology that mainly falls short in persisting storage for stateful applications due to the inherent lack of linear scalability. CAS is a novel solution that enables the implementation of storage controllers to exist in userspace, allowing maximum scalability. OpenEBS, a popular CAS based storage solution, has helped several enterprises run stateful workloads. Originally developed by MayaData, OpenEBS is now a CNCF project with a vibrant community of organizations and individuals alike. This was also evident from CNCF’s 2020 Survey Report that highlighted MayaData (OpenEBS) in the top-5 list of popular storage solutions. Resources: Canonical definition of Container Attached Storage: https://www.cncf.io/blog/2020/09/22/container-attached-storage-is-cloud-native-storage-cas/ To read Adopter use-cases or contribute your own, visit: https://github.com/openebs/openebs/blob/master/ADOPTERS.md. CNCF 2020 Survey Report: https://www.cncf.io/wp-content/uploads/2020/11/CNCF_Survey_Report_2020.pdf OpenEBS LocalPV Quick Start Guide: https://docs.openebs.io/docs/next/localpv.html This article has already been published on https://blog.mayadata.io/container-attached-storage-cas-vs.-shared-storage-which-one-to-choose and has been authorized by MayaData for a republish.

August 10, 2021

· 3,627 Views · 3 Likes

Container Attached Storage (CAS) vs. Software-Defined Storage - Which One to Choose?

Hardware abstraction involves the creation of a programming layer that allows the computer operating system to interact with hardware devices at a general rather than detailed level. This layer involves logical code implementation that avails the hardware to any software program. For storage devices, abstraction provides a uniform interface for users accessing shared storage, concealing the hardware’s implementation from the operating system. This allows software running on user machines to get the highest possible performance from the storage devices. It also allows for device-independent programs since storage hardware abstraction enables device drivers to access each storage device directly. Kubernetes is, by nature, infrastructure agnostic, for that it relies on plugins and volume abstractions to decouple storage hardware from applications and services. On the other hand, containers are ephemeral, and immediately lose data when they terminate. Kubernetes persists data created and processed by containerized applications on Physical Storage devices using Volumes and Persistent Volumes. These abstractions connect to storage hardware through various types of Hardware Abstraction Layer (HAL) implementations. Two commonly used HAL storage implementations for Kubernetes clusters are Container Attached Storage (CAS) and Software Designed Storage (SDS). This blog delves into fundamental differences of CAS and SDS, the benefits of each, and the most appropriate use-cases for typical HAL storage implementations. Container Attached Storage Vs Software-Defined Storage Kubernetes employs abstracted storage for portable, highly available and distributed storage. The Kubernetes API supports various CAS and SDS storage solutions connecting through the CSI interface. Let us take a closer look into the functioning of both the abstraction models and the purpose each solves for storage in a Kubernetes cluster. Container Attached Storage Container Attached Storage (CAS) introduces a novel approach of persisting data for stateful workloads in Kubernetes clusters. With CAS, storage controllers are managed and run in containers as part of the Kubernetes cluster. This allows storage portability since these controllers can be run on any Kubernetes platform, whether on personal machines, on-premises data centres or public cloud offerings. Since a CAS leverages a microservice architecture, the storage solution remains closely associated with the application that binds to physical storage devices, reducing I/O times. Container Attached Storage Architecture CAS leverages the Kubernetes environment to enable the persistence of cluster data. The storage solution runs storage targets in containers. These targets are microservices that can be replicated for independent scaling and management. For enhanced autonomy and agility, these microservice-based storage targets can then be orchestrated using a platform like Kubernetes. A CAS cluster uses the control plane layer for storage management while the data plane layer is used to run storage targets/workloads. Storage controllers in the control plane provision volumes, spin up storage target replicas and perform other management associated tasks. Data plane components execute storage policies and instructions from control plane elements. These instructions typically include file paths, storage and access methods. The data plane additionally contains the storage engine which is responsible for implementing the actual Input-Output Path for file storage. Benefits of Container Attached Storage Container Attached Storage enables agile storage for stateful containerized applications. This is because it follows a microservice-based pattern which allows the storage controller and target replicas to be upgraded seamlessly. Containerization of storage software means that administrative teams can dynamically allocate and update storage policies for each volume. With CAS, low-level storage resources are represented using Kubernetes Custom Resource Definitions. This allows for seamless integration between storage and cloud-native tooling, which enables easier management and monitoring. CAS also ensures storage is vendor-agnostic since stateful workloads can be moved from one Kubernetes deployment environment to another without disrupting services. Container Attached Storage Use-Cases CAS uses storage target replication to ensure high availability, avoiding blast radius limitations of traditional distributed storage architecture. This makes CAS the top storage choice for cloud-native applications. CAS is also appropriate for organizations looking to orchestrate their storage across multiple clouds. This is because CAS can be deployed on any Kubernetes platform. Container Attached Storage enables simple storage backup and replication, making it perfect for applications that require scale-out storage. It is also perfect for development teams that want to improve read-write times for their Continuous Integration and Development (CI/CD) pipelines. Popular CAS solutions providers for Kubernetes include: OpenEBS StorageOS Portworx Longhorn Software-Defined Storage Software-Defined Storage architecture relies on data programs to decouple running applications from storage hardware. This simplifies the management of storage devices by abstracting them into virtual partitions. Management is then enabled on a Data Management Interface (DMI) that hosts command and control functions. Features of Software-Defined Storage With Software-Defined Storage, the data/service management interface is hosted on a master server that controls storage layers consisting of shared storage pools. This makes provisioning and allocation of storage easy and flexible. Following are some of the key features of software-defined storage: Device Abstraction - Data I/O services should be delivered uniformly to users regardless of the underlying hardware. Through SDS, storage abstraction constructs, such as repositories, file shares, volumes, and Logical Unit Numbers (LUNs) are used to create a clear divide between physical hardware and logical aspects of data storage. Automation - The SDS solution implements workflows and algorithms that reduce the amount of manual work performed by administrators. To enable efficient automation, SDS storage systems adapt to varying performance and data needs that require little human intervention. Disaggregated, Pooled Storage - Physical storage devices are part of a shared tool from which the software can carve out storage for services and applications. This allows SDS to use available storage efficiently when required, thereby resulting in optimum usage of resources. Advantages of Software-Defined Storage Some benefits of using SDS include: Enhanced Scalability - Decoupling hardware resources allows administrators to allocate physical storage dynamically depending on workload. Pooled, disaggregated storage enabled by SDS allows for both vertical and horizontal scaling of physical volumes, supporting larger capacity and higher performance. Improved I/O Performance - SDS enables input-output parallelism to process host requests dynamically across multiple CPUs. SDS also supports large caching memory of up to 8TB, while enabling automatic data tiering. This allows faster input-output operations for quicker data processing. Interoperability - SDS uses the Data Management Interface as a translator that allows storage solutions running on different platforms to interact with each other. It also groups physically isolated storage hardware into logical pools, allowing organizations to host shared storage from different vendors. Reduced Costs - SDS storage solutions typically run on existing commodity hardware while optimizing the consumption of storage. SDS also enables automation that reduces the number of administrators required to manage storage infrastructure. These factors lead to lower upfront and operational expenses towards managing workloads. When to Use Software-Defined Storage SDS offers several benefits for teams looking to enhance storage flexibility at reduced costs. Some common use-cases for SDS include: Data centre infrastructure modernization Creating robust systems for mobile and challenging environments Creating Hybrid Cloud Implementations to be managed on the same platform Leveraging existing infrastructure for Remote and Branch Offices Comparing Container Attached Storage with Software-Defined Storage Similarities: Both CAS and SDS enable isolation between physical storage hardware and running applications. While doing so, both technologies abstract data management from data storage resources. The two HAL implementations share several features in common, including: Vendor-agnostic Both CAS and SDS architectures allow multiple workloads running on a single host. This allows administrators to avail a separation between storage devices and the access software. As a result, organizations can choose either CAS or SDS to implement a storage solution that can run on any platform, regardless of who develops or manages the tooling. Allow dynamic storage allocation SDS and CAS allow for the dynamic attachment and detachment of storage tools, thereby enabling automatic provisioning of data backups and replicas for high availability applications. Both SDS and CAS allow for automatic deployment of storage infrastructure, which allows for storage technology diversity and heterogeneity. Allow efficient infrastructure scaling CAS and SDS allow horizontal and vertical infrastructure scaling to automate data workflows. The two HAL approaches enable the creation of a composable disaggregated infrastructure that enhances the creation of versatile, distributed environments. Differences While SDS enables distributed storage management and reduced hardware dependencies, CAS allows for disintegrated storage that can be run using any container orchestration platform. This introduces various differences between CAS and SDS, including: Software-Defined Storage relies on traditional shared software with limitations on blast radius, while Container Attached Storage (CAS) allows the replication of storage software, allowing for independent management and scaling. CAS allows for scaling up/sideways in both storage and volume performance, while SDS enables the scaling up of storage nodes to improve storage capacity. SDS enables a Hyper-Converged Infrastructure (HCI) while CAS enables Highly Disaggregated Storage Infrastructure. Container Attached Storage and Software-Defined Storage both allow cluster administrators to leverage the benefits of hardware abstraction to persist data for stateful applications in Kubernetes. CAS allows the flexible management of storage controllers by allowing microservices-based storage orchestration using Kubernetes. On the other hand, Software-Defined Storage allows the abstraction of storage hardware using a programmable data control plane. CAS has all the features that a typical SDS provides, albeit tailored for container workloads and built with the latest software and hardware primitives. OpenEBS, a popular CAS based storage solution, has helped several enterprises run stateful workloads. Originally developed by MayaData, OpenEBS is now a CNCF project with a vibrant community of organizations and individuals alike. This was also evident from CNCF’s 2020 Survey Report that highlighted MayaData (OpenEBS) in the top-5 list of most popular storage solutions. To know more on how OpenEBS can help your organization run stateful workloads, contact us here. This article has already been published on https://blog.mayadata.io/container-attached-storage-cas-vs.-software-defined-storage-which-one-to-choose and has been authorized by MayaData for a republish.

July 30, 2021

· 6,593 Views · 1 Like

The Importance of Persistent Storage in Kubernetes- OpenEBS

Containers are not built to persist data. When a container is created, it only runs the process it hosts and terminates. Any data that it creates or processes are also discarded when the container exits. Containers are built to be ephemeral as this makes them lightweight and helps to keep containerized workloads independent of a host’s filesystem which results in flexible, portable and platform-agnostic applications. These benefits, however, creates a few challenges as well when it comes to orchestrating storage for containerized workloads: The need for appropriate tools that enable data sharing across immutable containers Options for backup and discovery in the event of application failure Means to get rid of stored data once it is no longer needed so that hosts can efficiently handle newer workloads In Kubernetes, PODs are also ephemeral. Kubernetes supports various options to persist data for containerized workloads in different formats. This article explores various tools and strategies that facilitate persistent data storage in Kubernetes. How Kubernetes handles Persistent Storage Kubernetes supports multiple options for the requisition and consumption of storage resources. The basic building block of Kubernetes storage architecture is the volume. This section explores central Kubernetes storage concepts and other integrations that allow for the provisioning of highly available storage for containerized applications. Kubernetes Storage Primitives: Volume A volume is a directory that contains data that can be consumed by containers running in a POD. To attach a volume to a specific POD, it is specified in .spec.volumes and is mounted to containers by specifying in .spec.containers[*].volumeMounts. Kubernetes supports different types of volumes depending on the medium hosting them and their contents. There are two main classes of volumes: Ephemeral volumes - These share the life of a POD and are destroyed as soon as the POD ceases to exist. Persistent Volumes - Exist beyond a POD’s lifetime. Persistent Volume and Persistent Volume Claims A Persistent Volume (PV) is a Kubernetes resource that represents a unit of storage available to the cluster. The lifecycle of a PV is independent of the POD consuming it, so data stored in the PV is available even after containers restart. A PV is an actual Kubernetes object that captures details of how a volume implements storage and is configured using a YAML file with specifications similar to: YAML apiVersion: v1 kind: PersistentVolume metadata: name: darwin-volume labels: type: local spec: storageClassName: dev capacity: storage: 10Gi accessModes: - ReadWriteMany hostPath: path: "/mnt/data" A Persistent Volume Claim (PVC) is the Kubernetes object that PODs use to request a specific portion of storage. The PVC is also a Kubernetes resource and can have specifications similar to: YAML apiVersion: v1 kind: PersistentVolumeClaim metadata: name: darwin-claim spec: storageClassName: dev accessModes: - ReadWriteMany resources: requests: storage: 3Gi A PVC can be attached to a pod with a specification similar to: YAML spec: volumes: - name: darwin-storage persistentVolumeClaim: claimName: darwin-claim containers: - name: darwin-container image: nginx ports: - containerPort: 80 Once the PVC is created in the cluster via the kubectl apply command, the Kubernetes Master Node searches for a PV that meets the requirements listed by the claim. If an appropriate PV exists with the same storageClassName specification, the PV is bound to the volume. Supported Persistent Volumes in Kubernetes Kubernetes supports different kinds of PVs for containerized applications. These include: awsElasticBlockStore (EBS) azureDisk azureFile cephfs Cinder gcePersistentDisk glusterfs hostPath Iscsi portworxVolume storageOS vsphereVolume Storage Classes Every application typically requires storage with different properties to run different workloads. PVCs allow for the static provision of abstracted storage, which restricts the volume’s properties of size and access modes. The StorageClass resource allows cluster administrators to offer volumes with different properties such as performance, access modes or size without exposing the implementation of abstracted storage to users. Using the storageClass resource, cluster administrators can describe the different flavors of storage on offer, mapping to different quality-of-service levels or security policies. The storageClass resource is defined using three main specifications: Provisioner- determines the type of volume plugin used to avail Persistent Volumes Reclaim Policy- tells the cluster how to handle a volume after a PVC releases the PV it is attached to. Parameters- properties of the volumes accepted by a storage class Since the StorageClass lets PVCs access volume resources with minimal human intervention, it enables dynamic provisioning of storage resources. Storage Architecture Kubernetes applications run in containers hosted in PODs. Each POD uses a PVC to request a specific portion of PV. PVs are managed by a control plane, which calls volume plugin API actions using logic used to implement storage. The volume plugins, therefore, allow for access to physical storage. The architecture of Kubernetes storage in clusters: A typical Kubernetes Storage Cluster Storage Plugins Kubernetes supports different storage plugins- managed solutions built with a focus on enabling persistent storage for Kubernetes applications. With third-party plugins, Kubernetes developers and administrators can focus on an enhanced user experience, while vendors configure storage systems. Kubernetes supports all three types of file systems: File Storage Systems - Stores data as a single piece of information organized in a folder accessible through paths. It uses a logical hierarchy to store large arrays of files, making access and navigation simpler. Block Storage - Data is segregated into distinct clusters (blocks). Each block has a unique identifier so that storage drivers can store information in convenient chunks without needing file structures. Block storage offers granular control which is desirable for such use-cases as Mail Servers, Virtual Machines and Databases. Object Storage - Isolates data in encapsulated containers known as objects. Each object gets a unique ID which is stored in a flat memory model. This makes data easier to find in a large pool and also allows for the storage of data in different deployment environments. Object storage is most appropriate for highly flexible and scalable workloads such as Big Data, web applications and backup archives. Container Storage Interface Container Storage Interface (CSI) standardizes the management of persistent storage for containerized applications so that storage plugins can be developed for any container runtime or orchestrator. The CSI is a standard interface that allows storage systems to be exposed to containerized workloads. With CSI, storage product vendors can develop plugins and drivers that work across different orchestration platforms. CSI also defines Remote Procedure Calls (RPCs) that enable various storage-related tasks. These tasks include: Dynamic volume provisioning Attaching and detaching nodes to volumes Mounting and unmounting volumes from nodes Consumption of volumes Identification of local storage providers Creating and deleting volume snapshots With the CSI providing a standard approach to the implementation and consumption of storage by containerized applications, a number of solutions have been developed to enable persistent storage. Some top cloud-native storage solutions include: OpenEBS Developed by Mayadata, OpenEBS is an open-source storage solution that entirely runs within the user space as Container Attached Storage. It allows for automated provisioning and high availability through replicated and dynamic Persistent Volumes. Some key features of OpenEBS include: Open-source and vendor-agnostic Utilizes hyperconverged infrastructure Supports both local and replicated volumes Built using microservices-based CAS architecture Portworx Portworx is an end-to-end storage solution for Kubernetes that offers granular storage, data security, migration across multiple cloud platforms and disaster recovery options. Portworx is built for containers from the ground up, making it a popular choice for cloud-native storage. Some features include: Elastic scaling with container-optimized volumes Uses multi-writer shared volumes Storage-aware and application-aware I/O tuning Enables data encryption at volume, storage class and cluster levels Ceph Ceph is founded on the Reliable Autonomic Distributed Object Store (RADOS) to provide pooled storage in a single unified cluster that is highly available, flexible and easy to manage. Ceph relies on the RADOS block storage system to decouple the namespace from underlying hardware to enable the creation of extensive, flexible storage clusters. Ceph features include: Uses the CRUSH algorithm for High Availability Supports file, block and object-storage systems Open-source StorageOS StorageOS is a complete cloud-native software-defined storage platform for running stateful Kubernetes applications. The solution is orchestrated as a cluster of containers that monitor and maintain the state of volumes and cluster nodes. Some features of StorageOS include: Reduces latency by enforcing data locality Uses in-memory caching to speed up volume access Enforces high availability using synchronous replication Utilizes standard AES encryption LongHorn Longhorn is a distributed, lightweight and reliable block storage solution for Kubernetes. It is built using container constructs and orchestrated using Kubernetes, making it a popular cloud native storage solution. Features of LongHorn include: Distributed, enterprise-grade storage with no single point of failure Change block detection for backup Automated, non-disruptive upgrades Incremental snapshots of storage for recovery Directory Mounts Kubernetes uses a hostPath volume to mount a directory from the host’s file system directly to a POD. This is mostly applicable for development and testing on single-node clusters. HostPath volumes are referenced via static provisioning. While not useful in a production environment, this method of persistence is beneficial for several use-cases, including: Running the container advisor (cAdvisor) inside a container When running a container that requires access to Docker internals Allowing a POD to specify whether a given volume path should exist before the POD starts running Containers are immutable, necessitating orchestration mechanisms that allow for the persistence of data they generate and process. Kubernetes uses volume primitives to enable the storage of cluster data. These include Volumes, Persistent Volumes and Persistent Volume Claims. Kubernetes also supports third-party storage vendors through the CSI. For single-node clusters, the hostPath volume attaches PODs directly to a node’s file system, facilitating development and testing. This article has already been published on https://blog.mayadata.io/the-importance-of-persistent-storage-in-kubernetes-mayadata-openebs and has been authorized by MayaData for a republish.

July 17, 2021

· 6,289 Views · 1 Like

The Benefits of Using NVMe for Kubernetes

Introduction- The NVMe Protocol Non-Volatile Memory express (NVMe) is a storage access protocol that lets the CPU access SSD memory through the Peripheral Component Interconnect Express (PCIe). Through a set of protocols and technologies, NVMe dramatically accelerates the way data is transmitted, stored and retrieved. With NVMe, the CPU accesses data on SSDs directly, enabling maximum SSD utilization and flexible scalability. NVMe allows for Storage Disaggregation and can be combined with Kubernetes for scale-out applications. This blog explores how NVMe redefines storage orchestration in Kubernetes. Advantages of NVMe for Distributed Storage: By using the PCIe interface to connect CPUs to SSDs, NVMe removes layers connecting compute to storage, allowing efficient storage abstraction and disaggregation. This offers various benefits for modern data centers, including: Efficient Memory Transfer - NVMe uses one ring per CPU to communicate directly with SSD storage, reducing the internal locking speeds for Input-Output controllers. NVMe also supports message signaled interrupts to prevent CPU bottlenecks, making storage efficient and scalable. NVMe reduces latency by combining message signaled interrupts with the large number of cores in CPUs to enable I/O parallelism. NVMe offers massive Queue Parallelism - Unlike SATA which supports a maximum 32 commands per queue, NVMe utilizes a private queuing which provides up to 64 thousand commands per queue over 64 thousand queues. This is because Each I/O controller gets its own set of queues, which linearly increases throughput with the number of CPU cores available. NVMe offers improved Security - The NVMe over Fabric specification supports secure tunneling protocols produced by reputable security communities such as the Trusted Computing Group (TCG). This means that NVMe enables enterprise-grade security features such as Access Control, Data Encryption at REST, Purge-Level Erase and Crypto-erase among others. NVMe relies on an efficient command set - The protocol relies on a simple, streamlined command set which halves the number of CPU instructions needed to process I/O requests. Besides offering lower latencies, this scheme enables advanced features such as power management and reservations, which extends the benefits beyond input-output operations. NVMe-oF Non Volatile Memory express-over Fabrics (NVMe-oF) is a specification that allows CPUs to connect to SSD Storage devices across a network fabric. This is designed to harness the benefits of the NVMe protocol over a Storage Area Network (SAN). The host computer can target an SSD storage device using an MSI-X based command while the network can be implemented using various networking protocols, including Fiber Channel, Ethernet or Infiniband. NVMe-oF has found wider popularity in modern networks since it allows software organizations to implement scaled out storage for highly-distributed, highly-available applications. By extending the NVMe protocol to SAN devices, NVMe-oF makes CPU usage efficient while improving connection speeds between applications on servers and storage. NVMe-oF supports various data transfer mechanisms, such as: RDMA Fiber Channel TCP/IP NVMe-oF interfaces networked flash storage with compute servers, enabling applications to run on shared network storage, thereby providing additional network consolidation for data centers. The SSD targets can be shared dynamically among application workloads, allowing for the efficient consumption of resources, flexibility and scalability. Kubernetes Orchestration and Storage Persistence While containers are transient, Kubernetes enables stateful applications by providing abstractions that reference a physical storage device. A containerized application is virtually isolated from other processes and applications running on other containers. This makes the Kubernetes environment highly flexible and scalable, as it allows applications to run in virtual machines, bare metal systems, supported cloud systems, or a combination of various deployments. While there are benefits to this approach, it also presents a challenge when there is the need to store and share data between containers. Kubernetes offers various abstractions and options for attaching container PODs to physical storage, such as: Volumes Persistent Volumes & Persistent Volume Claims Storage Classes The Container Storage Interface (CSI) and Storage Plugins Challenges of Orchestration using Direct Attached Storage (DAS) While Direct Attached Storage (DAS) offers a simple, highly available and quick storage, DAS alone is not sufficient to run Kubernetes clusters. This is because DAS devices have a limited storage capacity that cannot be dynamically provisioned to match stateful Kubernetes workloads. Additionally, DAS doesn’t incorporate networking capabilities or facilitate data access by different user groups since storage is only directly accessible to individual servers/desktop machines, while Kubernetes orchestrates on distributed clusters. NVMe for Kubernetes NVMe extends the low latency of DAS to Network Attached Storage devices by connecting servers to SSDs over a high-speed PCIe-oF interface. This makes NVMe an efficient option to provide storage for dynamic, extensible and flexible stateful applications running on Kubernetes. The Container Storage Interface (CSI) standard connects these pooled NVMe devices to Kubernetes clusters running stateful applications. By combining the low-latency networked storage offered by NVMe-oF and the flexibility of the CSI plugin, organizations can provide an efficient, agile and demand driven storage solution for Kubernetes applications. NVMe-oF Persistent Volumes To avoid the bottlenecks of running NVMe SSDs on a single, local server, several organizations are working to enable an NVMe-oF plugin for Kubernetes storage. Kubernetes enables the use of REST APIs to allow control of the storage provisioner through the NVMe-oF protocol. The storage provisioner then creates standard Volume API objects that can be used to attach a portion of pooled NVMe SSDs to a POD. Kubernetes PODs and other resources can then read and write data onto this pooled storage like any persistent volume object. OpenEBS created by MayaData is a popular agile storage stack for stateful Kubernetes applications that need minimal latency. The software infrastructure and plugins from OpenEBS integrate perfectly with the rapid, disaggregated physical storage offered by NVMe-oF. Integrating NVMe SSDs with OpenEBS plugins allows for simpler storage configurations for loosely coupled applications with stateful workloads. OpenEBS is one of the popular open-source, agile storage stacks for performance-sensitive databases orchestrated by Kubernetes. Mayastor, OpenEBS’s latest storage engine, delivers very low overhead versus the performance capabilities of underlying devices. While OpenEBS Mayastor does not require NVMe devices and does not require the workloads to access data via NVMe, an end to end deployment from a workload running a container supporting NVMe over TCP through the low overhead OpenEBS Mayastor and ultimately NVMe devices will understandably perform as close as possible to the theoretical maximum performance of the underlying devices.To learn more about how OpenEBS Mayastor, leveraging NVMe as a protocol, performs when leveraging some of the fastest NVMe devices currently available on the market, visit this article. OpenEBS Mayastor builds a foundational layer that enables workloads to coalesce and control storage as needed in a declarative, Kubernetes-native way. While doing so, the user can focus on what's important, that is, deploying and operating stateful workloads. If you’re interested in trying out Mayastor for yourself, instructions for how to set up your own cluster, and run a benchmark like `fio` may be found at https://docs.openebs.io/docs/next/mayastor.html. References Mayastor NVMe-oF TCP performance - https://openebs.io/blog/mayastor-nvme-of-tcp-performance/ Lightning-fast storage solutions with OpenEBS Mayastor and Intel Optane - https://mayadata.io/assets/pdf/product/intel-and-mayadata-benchmarking-of-openEBS-mayastor.pdf This article has already been published on https://blog.mayadata.io/the-benefits-of-using-nvme-for-kubernetes and has been authorized by MayaData for a republish.

July 15, 2021

· 4,289 Views · 2 Likes

A Detailed and Comprehensive Guide to Disaggregated Storage

Learn more about disaggregated storage, including the architecture, the adoption, the requirements for optimum performance, and more!

June 18, 2021

· 8,460 Views · 2 Likes

Deploying CockroachDB on Kubernetes using OpenEBS LocalPV

CockroachDB is a cloud-native SQL database that features both scalability and consistency. The database is designed to withstand data center failures by deploying multiple instances of symmetric nodes in a cluster consisting of several machines, disks, and data centers. Kubernetes’ built-in capabilities to scale and survive node failures make it well suited to orchestrate CockroachDB’s databases. This is particularly for the reason that Kubernetes simplifies cluster management and helps maintain high-availability by replicating data across independent nodes. This guide focuses on how OpenEBS LocalPV devices can be used to persist storage for Kubernetes-Hosted CockroachDB clusters. Introduction to Distributed, Scaled-out Databases Ever growing demands for resilience, performance, scalability and ease of use have led to an explosion of choices for developers and data scientists in search of an open source database to address their needs. Databases are often characterized as either SQL databases, noted for their consistency guarantees with PostgreSQL and MariaDB considered to be ACID compliant (Atomic, Consistent, Isolated, Durable), or NoSQL databases which have been noted for their scalability and flexibility however not considered to be either ACID compliant or completely compatible with SQL. More recently Distributed, Scaled-out Databases were introduced that promise to avoid the trade-offs between SQL and NoSQL databases, allowing for the scalability of NoSQL DBs along with the ACID (Atomic, Consistent, Isolated, Durable) transactions, strong consistency, and relational schemas of SQL DBs. CockroachDB is a distributed database that is built on top of RocksDB as its transactional and key-value store. Cockroach DB supports both ACID transactions and vertical & horizontal scalability. With extensive geographical distribution, CockroachDB can maintain availability with controlled latency in case of disk, machine or even a data center failure. How CockroachDB works: CockroachDB is deployed in clusters consisting of multiple nodes. Each node is divided into five layers: The SQL Layer converts client queries to key-value entities by first parsing them against a YACC file then converting them into an abstract syntax tree. With this tree, the database will generate a network of plan nodes containing a key-value code. When the plan nodes are executed, they initiate communication with the transaction layer. The Transaction Layer then uses two-phase commits to implement the semantics of ACID transactions. These commits are executed across all nodes in the cluster. The commit involves posting write extents and transaction records, then executing read operations. Once a commit has been made at the transaction layer, a request is made to the respective node’s Distribution Layer. This layer then identifies the destination node for the request and forwards the request to its replication layer. The Replication Layer’s primary responsibility is creating multiple copies of data across cluster nodes. It also uses a raft algorithm to ensure consensus between different nodes holding similar copies of data. The Storage Layer uses RocksDB to store data as key-value pairs. Although CockroachDB can run on Mac, Linux, and Windows OS, production instances of CockroachDB are typically run on Linux Virtual machines or containers. The database can be orchestrated either on cloud or on-premises setup. For running stateful applications, orchestration tools like Kubernetes are considered perfect. Orchestrating CockroachDB with Kubernetes Clusters: Before we begin To understand how CockroachDB is orchestrated on Kubernetes, here are some Kubernetes terminology applicable to storage and stateful applications: A StatefulSet is a collection of Kubernetes PODs viewed as a single stateful unit with its own network identity. A StatefulSet is a stable Kubernetes object that always binds to the same persistent storage when it restarts. A Persistent Volume is a block-storage-based file system that is bound to a POD. A volume’s lifecycle is not tied to the POD to which it is attached, and every CockroachDB node can attach to the same persistent volume every time it restarts. A Certificate Signing Request is a request by a client to have their TLS certificate signed by the Certificate Authority built into Kubernetes by default. Role-Based Access Control (RBAC) is the system used by Kubernetes to administer access permissions in the cluster. Roles allow users to access certain resources within the cluster. To use the most up-to-date files, Kubernetes version 1.15 or higher is required to run CockroachDB clusters. The database can be deployed on any Kubernetes distribution, including a Local cluster (such as Minikube), Amazon AWS, EKS, Google GKE and GCE, among others. For persistence and replication, CockroachDB relies on external persistent volumes such as OpenEBS LocalPV. Installing CockroachDB Operators on OpenEBS LocalPV Devices When using OpenEBS with CockroachDB, a LocalPV is provisioned on the node where a CockroachDB POD is attached. The volume uses an unattached block device, which is used to store data. OpenEBS Dynamic LocalPV provisioner can create Kubernetes Local Persistent Volumes using block devices available on the node to persist data, hereafter referred to as OpenEBS LocalPV Device volumes. When compared to native Kubernetes Local Persistent Volumes, OpenEBS LocalPV Device volumes have the following advantages. Dynamic Volume provisioner as opposed to a Static Provisioner. Better management of the block devices used for creating LocalPVs by OpenEBS NDM. NDM provides capabilities like discovering block device properties, setting up device filters, metrics collection and the ability to detect if the block devices have moved across nodes. Once a volume claims a block device, no other application can use the device for storage. If there are limited block devices in other nodes, nodeSelectors can be used to provision storage for applications on particular cluster nodes. The recommended configuration for CockroachDB clusters is at least three nodes with one unclaimed Local SSD per node. This solution guide takes you through installing CockroachDB Kubernetes operators, and then configuring the cluster to use Local OpenEBS devices as the storage engines. The guide also highlights how to access the database for SQL queries, and finally demonstrates how to monitor the database using Prometheus and Grafana. Let us know how you use CockroachDB in production and if you have an interesting use case to share. Also, please check out other OpenEBS deployment guides on common Kubernetes stateful workloads at: Deploying Kafka on Kubernetes Deploying Elasticsearch on Kubernetes Deploying WordPress on DigitalOcean Kubernetes Deploying Magento on Kubernetes Deploying Percona on Kubernetes Deploying Cassandra on Kubernetes Deploying MinIO on Kubernetes Deploying Prometheus on Kubernetes This article has already been published on https://blog.mayadata.io/deploying-cockroachdb-on-kubernetes-using-openebs-localpv and authorised by MayaData for a republish.

May 31, 2021

· 12,877 Views · 3 Likes

Deploy Elasticsearch on Kubernetes Using OpenEBS LocalPV

Overview Elastic Stack is a group of open-source tools that includes Elasticsearch for supporting data ingestion, storage, enrichment, visualization, and analysis for containerized applications. As a distributed search and analytics engine, Elasticsearch is an open-source tool that ingests application data, indexes it then stores it for analytics. Since it gathers large volumes of data while indexing different data types, Elasticsearch is often considered write-heavy. To manage such dynamic volumes of data, Kubernetes makes it easy to configure, manage, and scale Elasticsearch clusters. Kubernetes also simplifies the provisioning of resources for Elasticsearch using Infrastructure-as-Code configurations, abstracting cluster management. While Kubernetes alone cannot store data generated by a cluster, persistent volumes can be used to sustain it for future use. To help with this, OpenEBS provisions local persistent volumes or LocalPV and allows for data to be stored on physical disks. Many users have shared their experience of using OpenEBS for local storage management in Kubernetes for Elasticsearch, including the Cloud Native Computing Foundation, ByteDance (TikTok), and Zeta Associates (Lockheed Martin) on the Adopters list in the OpenEBS community available here. In this guide, we explore how OpenEBS LocalPV can provision data storage for Elasticsearch clusters. This guide will also cover - Primary functions of Elastic Stack operators in a Kubernetes cluster Integrating Elasticsearch operators with Fluentd and Kibana to form the EFK stack Monitoring Elasticsearch cluster metrics with Prometheus and Grafana Getting Started with Elasticsearch Analytics Elasticsearch extends the ability to store and search large amounts of textual, graphical or numerical data efficiently. Kubernetes makes it easy to manage the connections between Elasticsearch nodes, thereby simplifying deploying Elasticsearch on-premises or in hosted cloud environments. It must be noted that Elasticsearch nodes are different from Kubernetes nodes of a cluster. While an Elasticsearch node runs a single instance of Elasticsearch, a Kubernetes node is a physical or virtual machine that the orchestrator runs on. Elasticsearch Cluster Topology From Kubernetes’ point of view, an Elasticsearch node can be considered as a POD. Whenever an Elasticsearch cluster is deployed, three types of Elasticsearch PODs are created: Master - manage the Elasticsearch cluster Client - direct incoming traffic to appropriate PODs Data - responsible for storing and availing cluster data The diagram below shows the topology of a typical 7 POD Elasticsearch cluster with 3-master, 2-client and 2-data nodes: Deploying Elasticsearch involves creating manifest files for each of the cluster’s PODs. By connecting to the cluster, OpenEBS creates a visibility tier that enables cluster monitoring, logging and topology checks for LocalPV Storage. Additionally, to enable cluster-wide analytics, the following tools are deployed : Fluentd - An open-source data collection agent that integrates with Elasticsearch to collect log data, transform it then ship it to the Elastic Backend. Fluentd is set up on cluster nodes to collect and convert POD information and send it to the Elasticsearch data PODs for storage and indexing. It is typically set up as a DaemonSet to run on each Kubernetes worker node. Kibana - Once the cluster is deployed on Kubernetes, it needs to be monitored and managed. To help with this, Kibana is used as a visualization tool for cluster data by providing the Elasticsearch client service as an environment variable in PODs that Kibana should connect to. Solution Guide The following solution guide explains the steps and important considerations for deploying Elasticsearch clusters on Kubernetes using OpenEBS Persistent Volumes. By following the guide, you can create persistent storage for the EFK stack supported by Kubernetes, to which OpenEBS is deployed. The guide includes steps on performing metric checks and performance monitoring for the Elasticsearch cluster using Prometheus and Grafana. Let us know how you use Elasticsearch in production and if you have an interesting use case to share. Also, please check out other OpenEBS deployment guides on common Kubernetes stateful workloads on our website. Deploying Kafka on Kubernetes Deploying WordPress on DigitalOcean Kubernetes Deploying Magento on Kubernetes Deploying Percona on Kubernetes Deploying Cassandra on Kubernetes Deploying MinIO on Kubernetes Deploying Prometheus on Kubernetes This article has already been published on https://blog.mayadata.io/deploy-elasticsearch-on-kubernetes-using-openebs-localpv and has been authorized by MayaData for a republish.

May 12, 2021

· 7,056 Views · 3 Likes

AppOps with Kubernetes and Devtron - The Perfect Fit

Kubernetes needs no introduction in this cloud-native world. It was born when I was a middle-aged man. Years later, I am still as young as earlier (take with a pinch of salt) while Kubernetes grew out to be a fine tool that outperformed other platforms in enabling operational efficiency and application resilience. In the past, I wrote several articles and guides on Kubernetes and supported platforms. But then, in the pursuit of appyness, there is no end to innovation. In this article today, I take up another interesting use-case of Devtron as an open-source platform that refactors the Kubernetes ecosystem into an easy-to-use AppOps model. First, a few basics. What Are We Trying to Achieve? Exploring the Devtron platform to validate its capabilities to build, deploy and manage apps on a K8s cluster. What is Devtron? Devtron is an open-source, GUI based platform to deploy and manage Kubernetes applications. The platform allows you to manage an entire application ecosystem through a single control pane that spans across multiple cloud service providers with varying environments. While doing so, efficient collaboration and security sits at its core to naturally enable a DevOps model. Let's get a hands on. Stage 1: Creating an Amazon EKS Cluster We will start with configuring an AWS EKS cluster (you may also use a GKE or AKS cluster as per your choice). In case you already have a cluster setup, you may skip over to Stage 2. Part One: Installing the AWS CLI To work with AWS on the command line, you need to install the AWS CLI, that enables the creation and management of AWS services from a Linux, macOS, or Windows console. If you’re logged into the AWS console, you can spin up the AWS CloudShell, which comes preinstalled with the AWS CLI. To install AWS CLI on Linux, log into your Linux operating system and access the console. Fetch the latest AWS CLI version for Linux-x86(64-bit) platform. $ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" Quick Tip: The URL above is different for the ARM platform-based Linux. More details can be found here. 2. Unzip the fetched zip file. $ unzip awscliv2.zip 3. Run the install script with admin privileges. $ sudo ./aws/install 4. Finally, validate the version of the AWS CLI installed. $ aws --version Output of the command above shows the current version of the AWS CLI installed. Once confirmed, we are ready to proceed with the next steps. Part Two: Installing the eksctl command-line tool. To create the Amazon EKS Cluster from the command line, install the eksctl command-line tool on the Linux operating system. Note that this is also possible on macOS and Windows. The following commands will cover how to install the tool on 64-bit Linux. Fetch the latest version of eksctl. $ curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp 2. Move the extracted files to /usr/local/bin. $ sudo mv /tmp/eksctl /usr/local/bin 3. Then confirm your eksctl version. In my case, it was 0.40.0 and might be different for you. $ eksctl version 0.40.0 Part Three: Deploy an EKS Cluster with Active Nodes. The next step is to deploy an Amazon EKS cluster. To do so, use the create cluster command, that sets up all the required resources. $ eksctl create cluster \ --name devtron-darwin-cluster \ --region us-east-2 \ --nodegroup-name devtron-darwin-nodegroup \ --node-type t2.micro \ --nodes 4 \ --nodes-min 1 \ --nodes-max 5 \ --managed The command above creates an EKS Cluster with - a given name, region, nodegroup-name, node-type, nodes, minimum nodes and maximum nodes of the node group, and then deploys the cluster on a managed node group type. You may also change the values of the above to those as desired for your environment. 2. When the command execution completes, you should have your EKS cluster ready with a similar message as shown below. Don’t worry about the message “kubectl not found”, that is what we will install in the next part. You can then confirm your created EKS cluster and node configuration on the AWS portal as shown below. Part Four: Installing kubectl To work with our new EKS cluster, we need to install kubectl,an open-source command-line tool used to interact with Kubernetes clusters. Fetch the kubectl tool for Kubernetes version 1.18. Note that there are different links available for lower or higher versions, however, to use Devtron, version 1.18 or above is recommended. $ curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.18.9/2020-11-02/bin/linux/a md64/kubectl 2. Change the permissions to allow execution on the kubectl binary fetched. $ chmod +x ./kubectl 3. Copy the binary in the current path to your home directory. $ mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$PATH:$HOME/bin 4. Check the version of kubectl installed to confirm the installation has been successful and as intended. $ kubectl version --short --client Client Version: v1.18.9-eks-d1db3c 5. Next, you should be able to run the command below to see your active Kubernetes cluster. $ kubectl get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.100.0.1 443/TCP 33d Stage 2: Installing Devtron Installing Devtron is supported through three ways, installation using kubectl, Helm 2, or Helm 3. For the purpose of this article, we shall cover installing Devtron using Helm 3 with default configurations that use Minio to store build cache and logs. Start by changing the Kubernetes context to the EKS cluster that you want to use, change the region and cluster name values to those specific to your environment. $ aws eks --region us-east-2 update-kubeconfig --name darwin-devtron-cluster02 2. Create a namespace to use for Devtron using devtroncd as the name. $ kubectl create namespace devtroncd 3. Fetch Helm 3 from the official helm repo using the curl command. $ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 4. Change the permissions on the helm script to allow execution. $ chmod 700 get_helm.sh 5. Execute the helm script. $ ./get_helm.sh Quick Tip: For Helm to install, you need the OpenSSL library on your Linux operating system. For this article, we are using a RedHat-based Linux operating system so we will use the yum package manager to install OpenSSL. $ sudo yum -y install openssl 6. To confirm the Helm installation, type the command below to receive the output as shown. $ helm version WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/cloudshell-user/.kube/config WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/cloudshell-user/.kube/config version.BuildInfo{Version:"v3.5.3", GitCommit:"041ce5a2c17a58be0fcd5f5e16fb3e7e95fea622", GitTreeState:"dirty", GoVersion:"go1.15.8"} 7. After installing Helm, add the Devtron Helm chart repo that uses the Helm installation method. (Source: https://docs.devtron.ai/) $ helm repo add devtron https://helm.devtron.ai 8. Next, use the command below to navigate to the Devtron installation directory, set the namespace for Devtron, the password for PostgreSQL, and start installing Devtron. (Source: https://docs.devtron.ai/) $ helm install devtron devtron/devtron-operator --namespace devtroncd --set secrets.POSTGRESQL_PASSWORD=new-password-here 9. To monitor the Devtron installation’s progress, access the pods within thedevtroncd namespace and take note of the pod that begins with inception-849d647c4d. $ kubectl get pod -n devtroncd 10. Type the command below to monitor the logs that show the installation progress. $ kubectl logs -f inception-849d647c4d-ldncs -n devtroncd The installation takes about 15 to 30 minutes to complete. When that is done, use the following command to verify that the installation has been completed successfully. The command should echo a result Applied onto the console. $ kubectl -n devtroncd get installers installer-devtron -o jsonpath='{.status.sync.status}' Applied[cloudshell-user@ip-10-0-XX-XX ~]$ 11. You can also verify the running pods in the devtroncd namespace by typing the command shown below. $ kubectl get pods -n devtroncd The command output above shows us the running pods within the devtroncd namespace. 12. Before we can access the dashboard, obtain the admin password that was created during the Devtron installation process. Use the command below to get the password. $ kubectl -n devtroncd get secret devtron-secret -o jsonpath='{.data.ACD_PASSWORD}' | base64 -d argocd-server-59XXXXX-XXXX[cloudshell-user@ip-10-0-XX-XX ~]$ 13. After that, type a command that gets all the services within the devtroncd namespace and choose the load balancer service, which has a public address that you can paste into your browser. YAML xxxxxxxxxx 1 10 1 $ kubectl get svc -n devtroncd 2 3 devtron-grafana ClusterIP 10.100.159.169 4 devtron-kubernetes-external-secrets ClusterIP 10.100.171.60 5 devtron-minio ClusterIP 10.100.230.236 6 devtron-minio-svc ClusterIP None 7 devtron-nats ClusterIP 10.100.195.145 8 devtron-nats-mgmt ClusterIP None 9 devtron-service LoadBalancer 10.100.204.202 a4777dd57858a4dc59afb0a4517b1748-XXXXXXXXXX.us-east-2.elb.amazonaws.com 10 PS: Dashboard URL above is edited for security purposes. 14. Paste the URL of the load balancer in a browser, which then loads up the Devtron dashboard. The next step is to insert the username “admin” and the password that was provided. Quick Tip: To access the password that was automatically generated for the default admin user account, type the below command into your console. $ kubectl -n devtroncd get secret devtron-secret -o jsonpath='{.data.ACD_PASSWORD}' | base64 -d That’s it, you have successfully installed Devtron on your Amazon EKS cluster and accessed the dashboard. You are now ready to start creating applications using Devtron. Quick Tip: Once you are through with the Devtron dashboard setup, you can also enable Single Sign-On for the Devtron dashboard using third-party Identity providers. To do so, you need to access the Devtron Global Configurations page and then apply settings under the SSO Login Services tab. Stage 3: Deploying an Application using Devtron The following steps will cover how to deploy an application using Devtron. To get started, we need to set up a few prerequisites for successful deployment. Part One: Setting the Host URL In the Devtron dashboard, click on the Global Configurations button at the bottom left and select the option Host URL. 2. Update the URL with that of the Devtron Service Load Balancer. Quick Tip: You can use a custom domain within AWS Route 53 that points to the Devtron service load balancer to avoid using the long URL. Part Two: Adding a GitOps account For successful Continuous Integration and Continuous Deployment (CI/CD) using Devtron, the tool needs to access a GitOps account that holds different parameters and configurations (Helm charts) for your Devtron applications. A repository will be automatically created once CI is initiated, as we shall see in the following steps. To add a GitOps account, click on the GitOps tab on the left of your global configurations screen and add the Github organization and access credentials. Add the Git Host, the Github organization name, Git access credentials, and then click Save. Part Three: Adding a Git Account After adding a GitOps account, the next step is adding a Git account. Click on add git account, Name to your Git provider, set the URL, Authentication type, the Git Username, Password, and then click Save. Part Three: Adding a Docker Registry The next step is adding a docker registry that we shall use to build, push, and pull our images to the designated Docker Repo using the Devtron CI/CD pipeline. Click Add docker registry, set the Registry name, the Registry type (in our case its ECR), add the Registry URL, the AWS region, the Amazon IAM role access Id, Secret access key, and then click Save. Part Four: Adding a New App On the Devtron dashboard, click on the Applications tab; this will bring up a display with a button Add New App at the top right of the screen, click on the button. 2. The next step is to give the Application a Name, select the Project for the Application, and set the Applications Template. In this demo, we are using a Blank App template. Quick Tip: You can create multiple projects and groups for your applications across different projects. This enables ease of management of deployments. To create new projects, access the projects tab under the global configuration menu. 3. After clicking Create App, a new screen will add the Git Provider, Git Repo URL, and Checkout Path. In this demo, we are using the Github public provider, add the Git repo URL for our application, set the checkout path to its defaults, and click Save. 4. Next, the Git Materials configuration will appear, where you are ready to move on to the next step. Click Next at the bottom right of the screen, which takes you to the Docker Build Configuration page. 5. Set the Dockerfile’s Relative Path for the application, select the Container Registry, set the Docker Repository, click Save Configuration, and then click Next. 6. The next page allows us to set the Deployment Template parameters. For this demo, we shall leave all the settings with their defaults and click Save. 7. Once you click Save, you should see the Devtron Application’s Helm chart repo automatically created in the Github organization specified. Then, click Next to proceed to the workflow editor page. Stage 4: Triggering Continuous Integration & Delivery (CI/CD) In this section, we will build and deploy our containerized application to a Kubernetes cluster using Devtron. Part One: Triggering Continuous Integration To enable Continuous Integration, we need to add a new workflow to our application. Click add a New Workflow, give the workflow a Name and click Save. 2. Once that is done, click Add CI Pipeline to start setting up Continuous Integration. Then, select the Continuous Integration option to build a docker image from the Git repo that contains the Dockerfile. 3. Next, assign the CI pipeline a name, set the Git branch name to build the image, click create pipeline and leave the other settings with their defaults for demo purposes. Quick Tip: You may select the Manual pipeline execution method in a situation where you want to maintain complete control over the builds and deployments of your workload. 4. After creating the pipeline, click the + button and then select the Deploy to Environment option. 5. Next, assign the Name to the deployment pipeline, click Create pipeline, and leave other settings with their defaults unless required specifically. Quick Tip: You can select Rolling, Canary, or Blue-green deployment strategies for your applications within the deployment pipeline and can change any time with just a single click. After setting the configurations, the new Workflow should appear as shown below. 6. Next, click on the Trigger tab, and select the build material step button to initiate the Continuous Integration pipeline. 7. Once you Select Material, a popup shows a list of previous commits to build the application from. 8. Select the commit and click Start Build. You can also monitor progress by selecting the Build History tab and select the Logs section. The screenshot displays the build logs, the build and push process of the container image, and the container registry. Part Two: Triggering Continuous Deployment The final step is to deploy the application to the EKS cluster using the deployment pipeline step. Select the Trigger tab to view your available workflows. Click Select Image on the deploy step to trigger the deployment using the deployment strategy that we defined, and the pipeline will display as Progressing. 2. To confirm that the deployment is done, click on the Deployment Metrics tab, and select your environment. And we are done! The Devtron platform is now all setup with your workload. Below I have also captured a few troubleshooting steps in case you face any of the issues. Stage 5: Troubleshooting Typical Setup and Deployment Issues Issue 1: Re-installation of Devtron freezes or returns errors. If an earlier Installation of Devtron failed on the cluster, you might receive errors in case you try to re-install Devtron. Make sure to look at the errors that are being thrown from the inception pod. You can quickly check the logs using the command below. $ kubectl logs -f inception-d95bc9478-7blw6 -n devtroncd 2. Change the name of the inception pod to that of your environment. The command below also produces similar output on any cluster. $ pod=$(kubectl -n devtroncd get po -l app=inception -o jsonpath='{.items[0].metadata.name}')&& kubectl -n devtroncd logs -f $pod 3. Then run the following commands to clean up the previous installations. $ cd devtron-installation-script/ $ kubectl delete -n devtroncd -f yamls/ $ kubectl delete -n devtroncd -f charts/devtron/templates/devtron-installer.yaml $ kubectl delete -n devtroncd -f charts/devtron/templates/install.yaml $ kubectl delete -n devtroncd -f charts/devtron/crds $ kubectl delete ns devtroncd 4. Additionally, you can also run the command below to make sure that all components of the past installation are removed. $ cd devtron-installation-script/ $ kubectl delete -n devtroncd -f yamls/ $ kubectl -n devtroncd patch installer installer-devtron --type json -p '[{"op": "remove", "path": "/status"}]' Issue 2: Initial Devtron Application build error using CI pipeline. You may get a build error as shown in the screenshot below just after logging in to the Devtron dashboard while creating your first application, and then triggering Continuous Integration. This may also occur when the previous installation was set to Store Build and cache logs initially set to AWS S3 storage and then later changed to the default storage configuration. 1. To resolve the issue, run the commands below to upgrade your Devtron installation. $ helm repo update $ helm upgrade devtron devtron/devtron-operator -n devtroncd 2. Then, monitor the status of the upgrade and confirm completion by running the command below. $ kubectl -n devtroncd get installers installer-devtron -o jsonpath='{.status.sync.status}' Issue 3: Devtron fails to connect to Github for GitOps Often, this might not be a Devtron issue, but the Github API’s connection might be broken. The solution to this is to make sure that you can call Github to get your organization’s repository. The screenshot below shows a failed call. The screenshot below shows a successful API call to the organization’s (Brollyca) repo. Additionally, you can monitor the logs for all your Git operations by checking the activity within the git-sensor pod that runs all the Git operations. $ kubectl logs -f git-sensor-0 -n devtroncd Verdict? Straightforward and simple! The concept of an AppOps model has been in theory since a long time. Several other platforms including GitLab, Azure DevOps, Harness, etc. tried this in the past but had their own set of limitations to be considered for a wider adoption. It is interesting to note that Devtron has held the bull by its horn by making itself platform agnostic, where collaboration is the key. In a set of forthcoming articles, I wish to dig deeper on Devtron’s claimed features which aren’t covered in this article, such as - scanning Docker images for vulnerabilities, writing config maps, storing application secrets, and defining an application deployment strategy. I am sure the platform has much in store to explore, make use of and review. Till then, happy coding with masks on.

May 8, 2021

· 8,720 Views · 2 Likes

Kubernetes — Replication, and Self-Healing

The benefit of using replication for your microservices and how the Kubernetes cluster can automatically recover from a service failure.

October 15, 2020

· 6,044 Views · 13 Likes

Trusted Repositories and Container Registries in Kubernetes

We will cover secure authentication, scanning, and signing of content as necessary practices that ensure a secure Kubernetes environment.

October 6, 2020

· 3,561 Views · 2 Likes

Configure Kubernetes Network With Flannel

In this article, see how to configure Kubernetes network with Flannel.

October 2, 2020

· 19,627 Views · 5 Likes

Refcards

Refcard #389

Threat Detection

Refcard #301

Kubernetes Monitoring Essentials

Refcard #387

Getting Started With CI/CD Pipeline Security

Refcard #380

Continuous Delivery Pipeline Security Essentials

Refcard #254

Apache Kafka Essentials

Refcard #371

Data Pipeline Essentials

Refcard #370

Data Orchestration on Cloud Essentials

Refcard #359

Event Stream Processing Essentials

Refcard #350

Getting Started With Data Lakes

Trend Reports

Trend Report

Observability and Application Performance

Making data-driven decisions, as well as business-critical and technical considerations, first comes down to the accuracy, depth, and usability of the data itself. To build the most performant and resilient applications, teams must stretch beyond monitoring into the world of data, telemetry, and observability. And as a result, you'll gain a far deeper understanding of system performance, enabling you to tackle key challenges that arise from the distributed, modular, and complex nature of modern technical environments.Today, and moving into the future, it's no longer about monitoring logs, metrics, and traces alone — instead, it’s more deeply rooted in a performance-centric team culture, end-to-end monitoring and observability, and the thoughtful usage of data analytics.In DZone's 2023 Observability and Application Performance Trend Report, we delve into emerging trends, covering everything from site reliability and app performance monitoring to observability maturity and AIOps, in our original research. Readers will also find insights from members of the DZone Community, who cover a selection of hand-picked topics, including the benefits and challenges of managing modern application performance, distributed cloud architecture considerations and design patterns for resiliency, observability vs. monitoring and how to practice both effectively, SRE team scalability, and more.

Trend Report

Kubernetes in the Enterprise

Kubernetes: it’s everywhere. To fully capture or articulate the prevalence and far-reaching impacts of this monumental platform is no small task — from its initial aims to manage and orchestrate containers to the more nuanced techniques to scale deployments, leverage data and AI/ML capabilities, and manage observability and performance — it’s no wonder we, DZone, research and cover the Kubernetes ecosystem at great lengths each year.In our 2023 Kubernetes in the Enterprise Trend Report, we further dive into Kubernetes over the last year, its core usages as well as emerging trends (and challenges), and what these all mean for our developer and tech community. Featured in this report are actionable observations from our original research, expert content written by members of the DZone Community, and other helpful resources to help you go forth in your organizations, projects, and repos with deeper knowledge of and skills for using Kubernetes.

Trend Report

Containers

The proliferation of containers in recent years has increased the speed, portability, and scalability of software infrastructure and deployments across all kinds of application architectures and cloud-native environments. Now, with more and more organizations migrated to the cloud, what's next? The subsequent need to efficiently manage and monitor containerized environments remains a crucial task for teams. With organizations looking to better leverage their containers — and some still working to migrate out of their own monolithic environments — the path to containerization and architectural modernization remains a perpetual climb. In DZone's 2023 Containers Trend Report, we will explore the current state of containers, key trends and advancements in global containerization strategies, and constructive content for modernizing your software architecture. This will be examined through DZone-led research, expert community articles, and other helpful resources for designing and building containerized applications.

Trend Report

Performance and Site Reliability

The concept of observability was first leveraged over 110 years ago. It was initially known as telemetry, and in 1912, it used the city of Chicago’s telephone lines to transmit data from the electric power plants to a central control station. Today, modern observability is still very much focused on the interplay of data to yield informed inputs and outputs of systems. Sprinkle in site reliability engineering (SRE), and there should be little to no performance issues in distributed systems, right? In an ideal world, yes, but in reality, there is still work to be done.DZone’s 2022 Trend Report, Performance and Site Reliability: Observability for Distributed Systems, takes a holistic view of where developers stand in their observability practices. Through the research and expert-contributed articles, it offers a primer on distributed systems observability, including how to build an open-source observability toolchain, dives into distributed tracing, and takes a look at prospective performance degradation patterns. It also provides insight into how to create an SRE practice, as well as tactics to conduct an effective incident retrospective. The goal of this Trend Report is to offer a developer-focused assessment of what the current state of observability is and how it fits in with modern performance practices.

Trend Report

Kubernetes in the Enterprise

In 2022, Kubernetes has become a central component for containerized applications. And it is nowhere near its peak. In fact, based on our research, 94 percent of survey respondents believe that Kubernetes will be a bigger part of their system design over the next two to three years. With the expectations of Kubernetes becoming more entrenched into systems, what do the adoption and deployment methods look like compared to previous years?DZone's Kubernetes in the Enterprise Trend Report provides insights into how developers are leveraging Kubernetes in their organizations. It focuses on the evolution of Kubernetes beyond container orchestration, advancements in Kubernetes observability, Kubernetes in AI and ML, and more. Our goal for this Trend Report is to help inspire developers to leverage Kubernetes in their own organizations.

Trend Report

Data Pipelines

Data is at the center of everything we do. As each day passes, more and more of it is collected. With that, there’s a need to improve how we accept, store, and interpret it. What role do data pipelines play in the software profession? How are data pipelines designed? What are some common data pipeline challenges? These are just a few of the questions we address in our research.In DZone’s 2022 Trend Report, "Data Pipelines: Ingestion, Warehousing, and Processing," we review the key components of a data pipeline, explore the differences between ETL, ELT, and reverse ETL, propose solutions to common data pipeline design challenges, dive into engineered decision intelligence, and provide an assessment on the best way to modernize testing with data synthesis. The goal of this Trend Report is to provide insights into and recommendations for the best ways to accept, store, and interpret data.

Trend Report

DevOps

With the need for companies to deliver capabilities faster, it has become increasingly clear that DevOps is a practice that many enterprises must adopt (if they haven’t already). A strong CI/CD pipeline leads to a smoother release process, and a smoother release process decreases time to market.In DZone’s DevOps: CI/CD and Application Release Orchestration Trend Report, we provide insight into how CI/CD has revolutionized automated testing, offer advice on why an SRE is important to CI/CD, explore the differences between managed and self-hosted CI/CD, and much more. The goal of this Trend Report is to offer guidance to our global audience of DevOps Engineers, Automation Architects, and all those in between on how to best adopt DevOps practices to help scale the productivity of their teams.

Trend Report

Kubernetes and the Enterprise

In DZone’s 2020 Kubernetes and the Enterprise Trend Report, we found that over 90% of respondents to our survey reported leveraging containerized applications in a production environment, nearly doubling since we asked the same question in 2018. As containerization approaches peak saturation, Kubernetes has also become an indispensable tool for enterprises managing large and complex, container-based architectures, with 77% of respondents reporting Kubernetes usage in their organizations. Building upon findings from previous years that indicate the technical maturity of containers and container orchestration, DZone’s 2021 Kubernetes and the Enterprise Trend Report will explore more closely the growing ecosystem and tooling, use cases, and advanced strategies for Kubernetes adoption in the enterprise.

Trend Report

Application Security

In the era of high-profile data breaches, rampant ransomware, and a constantly shifting government regulatory environment, development teams are increasingly taking on the responsibility of integrating security design and practices into all stages of the software development lifecycle (SDLC).In DZone’s 2021 Application Security Trend Report, readers will discover how the shift in security focus across the SDLC is impacting development teams — from addressing the most common threat agents and attack vectors to exploring the best practices and tools being employed to develop secure applications.

Trend Report

CI/CD

In 2020, DevOps became more crucial than ever as companies moved to distributed work and accelerated their push toward cloud-native and hybrid infrastructures. In this Trend Report, we will examine what this acceleration looked like for development teams across the globe, and dive deeper into the latest DevOps practices that are advancing continuous integration, continuous delivery, and release automation.

Trend Report

Kubernetes and the Enterprise

Want to know how the average Kubernetes user thinks? Wondering how modern infrastructure and application architectures interact? Interested in container orchestration trends? Look no further than DZone’s latest Trend Report, “Kubernetes and the Enterprise.” This report will explore key developments in myriad technical areas related to the omnipresent container management platform, plus expert contributor articles highlighting key research findings like scaling a microservices architecture, cluster management, deployment strategies, and much more!