Gautam Goswami

CORE

Founder at DataView

Bangalore, IN

Joined Sep 2020

https://dataview.in/

About

Enthusiastic about learning and sharing knowledge on Big Data, Data Science & related headways including data streaming platforms through knowledge sharing platform Dataview.in. Presently serving as Head of Engineering & Data Streaming at Irisidea TechSolutions, Bangalore, India. https://www.irisidea.com/gautam-goswami/

Stats

Reputation:	1174
Pageviews:	232.6K
Articles:	30
Comments:	2

Expertise

Big Data

Articles
Comments

Articles

Streaming Real-Time Data From Kafka 3.7.0 to Flink 1.18.1 for Processing

Flink seamlessly integrates with Kafka and offers robust support for exactly-once semantics, ensuring each event is processed precisely once. Learn more here.

March 10, 2024

· 6,991 Views · 1 Like

Why Apache Kafka and Apache Flink Work Well Together to Boost Real-Time Data Analytics

Use Flink and Kafka to create reliable, scalable, low-latency real-time data processing pipelines with fault tolerance and exactly-once processing guarantees.

February 13, 2024

· 3,765 Views · 1 Like

Integrating Rate-Limiting and Backpressure Strategies Synergistically To Handle and Alleviate Consumer Lag in Apache Kafka

Kafka Consumer Lag refers to the variance between the most recent message within a Kafka topic and the message that has been processed by a consumer. This lag may arise when the consumer struggles to match the pace at which new messages are generated and appended to the topic.

January 23, 2024

· 2,165 Views · 2 Likes

Leveraging Apache Kafka for the Distribution of Large Messages

In this article, we will explore the architectural approach for separating the actual payload (the large video file) from the message intended to be circulated via Kafka.

December 19, 2023

· 3,748 Views · 3 Likes

The Zero Copy Principle With Apache Kafka

When doing computer processes, the zero-copy technique is employed to prevent the CPU from being used for data copying across memory regions.

November 17, 2023

· 2,414 Views · 1 Like

Understanding Supervisor in Apache Druid

A supervisor is a built-in part of Druid, making it easier to ingest, analyze, and monitor data in real-time. Learn more!

October 16, 2023

· 2,493 Views · 3 Likes

Causes and Remedies of Poison Pill in Apache Kafka

A poison pill is a message deliberately sent to a Kafka topic, designed to consistently fail when consumed, regardless of the number of consumption attempts.

September 25, 2023

· 3,319 Views · 3 Likes

Apache Kafka’s Built-In Command Line Tools

I want to highlight the five scripts/tools that I believe will have the biggest influence on your development work, mostly related to real-time data stream processing.

August 21, 2023

· 2,306 Views · 2 Likes

The Significance of Deep Storage in Apache Druid

Druid’s Deep storage guarantees long-term data persistence even if data is deleted from the live cluster after compaction.

July 7, 2023

· 3,132 Views · 2 Likes

Forging Druid With Apache Kafka for Real-Time Streaming Analytics

A real-time analytics database called Apache Druid can be leveraged very effectively where real-time ingestion, fast query performance, and high uptime are crucial.

June 16, 2023

· 4,203 Views · 1 Like

Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)

To get more clarity about ISR in Apache Kafka, we should first carefully examine the replication process in the Kafka broker.

June 1, 2023

· 3,721 Views · 1 Like

Handling Bad Messages via DLQ by Configuring JDBC Kafka Sink Connector

When an error occurs, or bad data is encountered by the JDBC Kafka sink connector, these unprocessed messages are forwarded to the DLQ.

April 11, 2023

· 4,743 Views · 1 Like

Streaming Data to RDBMS via Kafka JDBC Sink Connector Without Leveraging Schema Registry

This article covers the biggest difficulty with the JDBC sink connector: it requires knowledge of the schema of data that has already landed on the Kafka topic.

February 22, 2023

· 6,872 Views · 2 Likes

Intrinsic Aspects of Apache ZooKeeper and Their Importance

This article explores ZNodes, sessions, watches, quorum, transactions, and local storage and snapshots, all aspects of Apache ZooKeeper.

January 23, 2023

· 1,671 Views · 1 Like

Internal Components of Apache ZooKeeper and Their Importance

In this article, readers will learn about the internal components of Apache ZooKeeper. The key concept is the zNode, which be acted as files or directories.

January 20, 2023

· 4,447 Views · 2 Likes

Resolve Apache Kafka Starting Issue Installed on Single/Multi-Node Cluster

Without integrating Apache Zookeeper, Kafka alone won’t be able to form the complete Kafka cluster.

January 12, 2023

· 3,217 Views · 1 Like

Processing of Streaming Data: Kappa vs Lambda Architectures

In today’s Big Data landscape, Lambda architecture is a new archetype for handling a vast amount of data. How does it compare to Kappa architecture?

August 19, 2022

· 5,676 Views · 1 Like

The Lakehouse: An Uplift of Data Warehouse Architecture

This article highlights how an architectural pattern is enhanced and transformed into a traditional data warehouse, eventually turning it into a data lakehouse.

April 5, 2022

· 5,238 Views · 5 Likes

A Short Introduction to Apache Iceberg

This tutorial shows how to use Apache Iceberg in order to address data consistency and performance issues. Read on to see how it can help you!

August 20, 2021

· 8,717 Views · 3 Likes

Confluent’s Kafka REST Proxy, The Silk Route for Data Movement to Operational Kafka Cluster

In this article, I am going to detailing out the steps to integrate the prebuilt versions of Confluent REST Proxy with running a multi-broker Apache Kafka cluster.

June 13, 2021

· 18,070 Views · 3 Likes

Resolving Permission Issue in Multi-node Hadoop Cluster

It has been observed when we configure and deploy a multi-node Hadoop cluster or add new DataNodes, there is an SSH permission issue in communication with Hadoop daemons.

April 22, 2021

· 6,672 Views · 2 Likes

Data Ingestion From RDBMS: Leveraging Confluent's JDBC Kafka Connector

Kafka Connect assumes a significant part for streaming data between Apache Kafka and other data systems. Importing data from the Database set to Kafka topic.

April 17, 2021

· 6,927 Views · 3 Likes

How Checksum Smartly Manages Data Integrity in HDFS

In two words, data integrity can be defined as an assurance of the accuracy and consistency of data throughout the entire life cycle.

March 16, 2021

· 5,316 Views · 2 Likes

Resolving a Common Error in Apache Zookeeper

Explains how to resolve: Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain” when starting Apache Zookeeper.

Updated January 28, 2021

· 10,287 Views · 4 Likes

Streaming Data From Files Into Multi-Broker Kafka Clusters

FileSource and FileSink Connector can be leveraged for streaming data from a text file to a multi-broker Apache topic and subsequently sink to another file.

January 16, 2021

· 9,031 Views · 3 Likes

Coupling Schema Registry (Confluent) With Multi-Broker Apache Kafka Cluster

We will explain the steps to coupling Confluent Schema Registry with existed/operational multi-broker Apache Kafka cluster(Local deployment).

December 15, 2020

· 4,605 Views · 1 Like

Install and Configuration of Apache Hive-3.1.2 on Multi-Node

The Apache Hive is a data warehouse system built on top of the Apache Hadoop. Hive can be utilized for easy data summarization, and more!

December 2, 2020

· 14,514 Views · 2 Likes

Setup Zookeeper Cluster – A Minute Chore

Apache Zookeeper’s functionalities are not legitimately noticeable to end-client however it remains as the spine for hyped components like Hadoop to oversee.

November 24, 2020

· 9,995 Views · 3 Likes

Importance of Schema Registry on Kafka Based Data Streaming Pipelines

Schema Registry acts as a service layer for metadata. It stores a versioned history of all the schema of registered data streams and schema change history.

November 11, 2020

· 6,466 Views · 2 Likes

Crafting a Multi-Node Multi-Broker Kafka Cluster- A Weekend Project

This article explains how to install and configure the multi-node multi-broker Kafka cluster where Ubuntu 14.04 LTS as an OS on all the nodes in the cluster.

September 24, 2020

· 13,735 Views · 4 Likes

Comments

Apache Kafka in a Smart City Architecture

Mar 15, 2021 · Kai Wähner

Nice read.

Install and Configure Confluent Platform (Kafka) in AWS EC2 Instance RHEL 8

Dec 01, 2020 · Enrico Rafols Dela Cruz

Nicely explained.