Skip to main content

Posts

Showing posts with the label HPC

Navigating the Data Lake: Deploying Hadoop/HDFS and Spark on Kubernetes

Introduction: In the modern era of data-driven decision-making, organizations are constantly seeking innovative solutions to harness the power of big data. One such solution is the deployment of Hadoop/HDFS and Spark on Kubernetes—a strategic endeavor that requires careful planning and execution. In this blog post, I will explore the process of deploying these technologies on Kubernetes, unlocking new possibilities for data management and analysis. This is the same solution that we used when we started to build our no code/low code data pipeline and visualization platform DataSetu. For Datasetu we needed to create a Data lake for processing huge dataset with high performance. This how we went about it, Chapter 1: Setting the Foundation with Kubernetes Before embarking on our journey to the data lake, it's essential to establish a solid foundation. Kubernetes serves as the cornerstone of our infrastructure, providing the orchestration and scalability needed to manage our distributed...

What is big data ? and what role Hadoop has to play?

In recent times we have been hearing , reading (in many blogs) quite often about cloud computing ,big data and HPC ie. high performance computing. Undoubtedly, these are the buzz words for this decade. If you have been in the industry for quite some time then you can recall last decade was web 2.0 decade. So what is big data? Is it a new framework or new data modelling technique or part of NO SQL movement? I have seen many times people relate big data to one of these or they are confused about it. Same is with HADOOP, it is thought as a NO SQL database framework. Big data is nothing but just huge amount of data which requires mining. Per wiki , Big Data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.So big data is a hovering p...