Real Time Searching of Big Data using Hadoop, Lucene, and Solr

There are various approaches one can take to solve Big Data problems; most prominently Hadoop and Solr, popular open source software widely used in large-scale distributed systems.

Apache Hadoop, a software framework that supports data-intensive distributed applications, enables applications to work with thousands of nodes and petabytes of data. Technically, Hadoop consists of two key services: reliable data storage using the Hadoop Distributed File System (HDFS) and high-performance parallel data processing using a technique called MapReduce.

On the other hand, Solr is the popular, open source enterprise search platform built on Lucene Java search library. Solr runs as a standalone server and uses Lucene for full-text indexing and search. It has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language.

This Knowledge Sharing article, awarded Best of Big Data in the 2012 Knowledge Sharing Competition, author Dibyendu Bhattacharya provides:

• knowledge about how the Hadoop framework works

• the concept of MapReduce

• how to perform large-scale distributed indexing using Lunece

• how to query and search the indexed artifacts using Solr.

Read the article >>

View All

No Events found!

General Discussion

Real Time Searching of Big Data using Hadoop, Lucene, and Solr

Was this post helpful?