Why HBase ?

Why/When do we need Apache HBase
  • When the amount of data is very huge, like in terms of petabytes or exabytes, we use column-oriented approach, because the data of a single column is stored together and can be accessed faster.
  • Row-oriented database handles less number of data and it stores data in a structured format. When we need to store and analyze a large set of semi-structured or unstructured data, we use column oriented approach.
  • Quick access to data: If you need a random and real time access to your data, then HBase is a suitable candidate. It is also a perfect fit for storing large tables with multi structured data. It gives 'flashback' support to queries, which makes it more suitable for fetching data in a particular instance of time.
  • HBase clusters expand by adding RegionServers, it doubles both in terms of storage and as well as processing capacity.
  • HBase provides fast record lookups (and updates) for large tables. HBase internally uses Hash tables and provides random access, and it stores the data in indexed HDFS files for faster lookups.
  • If you have hundreds of millions or billions of rows, then HBase is a good choice to store and process, If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice. Because all of your data might fill within single node (or two) and the rest of the node/cluster may be sitting idle.
  • No requirement of relational features: Your application should not have any requirement for RDBMS features like transaction, triggers, complex query, complex joins etc. If you can build your application without these features, then go for HBase.
  • You have to make sure you have enough hardware (minimum 5). So, if you have good hardware support, then HBase can be a good selection. Even HDFS doesn't do well with anything less than 5 DataNodes, a NameNode.
Why do we need Apache HBase when we have Hive?


    Comments

    Popular posts from this blog

    Hive File Formats

    HDFS Infographic

    Why We Need Hadoop?