Data Structures of Big Data: How They Scale

Rapid data growth has led to significant innovations on various technologies. All innovations revolve around how the growing volume of data can be captured, stored, and eventually processed to extract meaningful insights that will help make better decisions faster, perform predictions of various outcomes, and more.

Big Data technology innovations can be broadly categorized into following areas:

• Technologies around Batch Processing of Big Data (Hadoop, Hive, Pig, etc.).

• Technologies around Real Time Processing of Big Data (Storm, Spark, etc.)

• Technologies around Big Data Messaging infrastructure (Kafka)

• Big Data Database: NoSQL technologies (HBase, MongoDB, etc.).

• Big Data Search technologies (ElasticSearch, SolrCloud, etc.).

• Massively Parallel Processing (MPP) technologies (HAWQ, Impala, Drill, etc.).

Various products or solutions are either already established or are evolving for each of these categories of Big Data technologies. In this Knowledge Sharing article, Dibyendu Bhattacharya and Manidipa Mitra select a few of the more popular Open Source solutions in each of these areas and explain how efficiently each of these solutions uses various Data Structures or fundamental Computer Science concepts to solve a very complex problem, focusing on only the most prominent from each of these solutions.

This article will help readers develop a deep understanding of the technology spectrum, the challenges in the Big Data space, and how various solutions try to solve those challenges.

Read the full article.

View All

No Events found!

General Discussion

Data Structures of Big Data: How They Scale

Was this post helpful?