Hive Data Types

 All the data types in Hive are classified into four types, given as follows:

  • Column Types
  • Literals
  • Null Values
  • Misc Types
  • Complex Types

Column Types

Integral Types

  • Integer type data can be specified using integral data types, INT.
  • When the data range exceeds the range of INT, you need to use BIGINT and if the data range is smaller than the INT, you use SMALLINT.
  • TINYINT is smaller than SMALLINT.
  • The data ranges for TINYINT, SMALLINT, INT, BIGINT is below,
      • TINYINT (1-byte signed integer, from -128 to 127)
      • SMALLINT (2-byte signed integer, from -32,768 to 32,767)
      • INT (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)
      • BIGINT (8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
Example

create table test1(id tinyint) row format delimited fields terminated by ',';
insert into test1 values(1);
insert into test1 values(127);
insert into test1 values(128);

hive> select * from test1;
OK
1
127
NULL
Time taken: 0.055 seconds, Fetched: 3 row(s)

STRING

  • String type data types can be specified using single quotes (' ') or double quotes (" "). 
  • It contains two data types: VARCHAR and CHAR. 
  • Hive follows C-types escape characters.
                Data Type                Length
                STRING
                VARCHAR            1 to 65355
                CHAR                    255

Example
drop table test1;
create table test1(id tinyint, name string);
create table test1(id tinyint, name varchar(30));
create table test1(id tinyint, name char(10));
insert into test1 values(1, 'name1');
  • We need to mention size for varchar and char. we dont want to mention size for string.

TIMESTAMP

  • It supports traditional UNIX timestamp with optional nanosecond precision.
  • It supports java.sql.Timestamp format "YYYY-MM-DD HH:MM:SS.fffffffff" and format "yyyy-mm-dd hh:mm:ss.ffffffffff".
Example

drop table test1;
create table test1(id tinyint, name string, time_info timestamp);
insert into test1 values(1, 'name1', current_timestamp); -- this technique wont work for inserting timestamp
insert into test1 values(1, 'name1', '2023-08-03 09:56:05.064');
insert into test1 select 1, 'name1', current_timestamp;

DATE

  • DATE values are described in year/month/day format in the form {{YYYY-MM-DD}}.
            SELECT current_date; 
            SELECT current_timestamp;

Example

drop table test1;
create table test1(id tinyint, name string, date_info date);
insert into test1 values(1, 'name1', current_date); -- this technique wont work for inserting date
insert into test1 values(1, 'name1', '2023-08-03');
insert into test1 select 1, 'name1', current_date;

Decimal

  • The DECIMAL type in Hive is as same as Big Decimal format of Java. 
  • It is used for representing immutable arbitrary precision.
  • DECIMAL (Hive 0.13.0 introduced user definable precision and scale)
            DECIMAL(precision, scale)
            decimal(10,0)

Union Types

  • Union is a collection of heterogeneous data types.
  • You can create an instance using create union.
            UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>
  • Exmple data for union column will be like below,
            {0:1} 
            {1:2.0} 
            {2:["three","four"]} 
            {3:{"a":5,"b":"five"}} 
            {2:["six","seven"]} 
            {3:{"a":8,"b":"eight"}} 
            {0:9} 
            {1:10.0}

Literals

  • Literals support both floating-point types and decimal types.
  • Floating Point Types - These are assumed to be DOUBLE data types in the Hive.
  • Decimal Types - These are assumed to be higher value than float point value with a range more than DOUBLE data type. It ranges between -10-308 to 10308.
  • Decimal literals provide precise values and greater range for floating point numbers than the DOUBLE type. Decimal data types store exact representations of numeric values, while DOUBLE data types store very close approximations of numeric values.
  • Decimal types are needed for use cases in which the (very close) approximation of a DOUBLE is insufficient, such as financial applications, equality and inequality checks, and rounding operations. They are also needed for use cases that deal with numbers outside the DOUBLE range (approximately -10308 to 10308) or very close to zero (-10-308 to 10-308). 
            1. Floating Point Types
    • Floating point types are nothing but numbers with decimal points.
    • Generally, this type of data is composed of DOUBLE data type
    • FLOAT (4-byte single precision floating point number)
            2. Decimal Type
    • Decimal type data is nothing but floating point value with higher range than DOUBLE data type.
    • DOUBLE (8-byte double precision floating point number)
    • if it is under 8 chat then it will display as it is like 1036214 - 1036214.
      • 10362145 - 1.0362145E7
      • 12000138625024 - 1.20001386E13
Example

drop table test1;
create table test1(id float);
insert into test1 values(1);
insert into test1 values(1.1);
insert into test1 values(1.5);
hive> select * from test1;
OK
1.0
1.1
1.5

drop table test1;
create table test1(id decimal);
insert into test1 values(1);
insert into test1 values(1.1);
insert into test1 values(1.5);
hive> select * from test1;
OK
1
1
2

drop table test1;
create table test1(id double);
insert into test1 values(1);
insert into test1 values(1.1);
insert into test1 values(1.5);
hive> select * from test1;
OK
1.0
1.1
1.5

Null Value

  • Missing values are represented by the special value NULL.
  • NULL is not a datatype.

Complex Types

  • Arrays : Arrays in Hive are used the same way they are used in Java.
            Syntax: ARRAY<data_type>
  • Maps : Maps in Hive are similar to Java Maps.
            Syntax: MAP<primitive_type, data_type>
  • Structs : Structs in Hive is similar to using complex data with comment.
            Syntax: STRUCT<col_name : data_type [COMMENT col_comment], ...>
  • we will discuss about this datatype on separate session.

Misc Types

  • BOOLEAN - used to store only true or false values.
  • BINARY - Store the photo/image as binary in the hive table. Store images in binary format see below but retrieval is another process altogether.
Example

drop table image;
create table image(picture binary);
load data local inpath '/home/cloudera/photo/photo.jpg' into table image;
select count(*) from image;
  • When we try to retrieve this binary data when we running query, then that time it will open the browser and it will fetch whatever image it is having in that respective columns/table.

Comments

Popular posts from this blog

Hive File Formats

HDFS Infographic

Why We Need Hadoop?