Tag Archives: Hive

Apache hive Installation with ACID

Follow The list of steps to install Hive with Transaction

Video Reference Click here 

Set this in your Command line

export HADOOP_USER_CLASSPATH_FIRST=true

Set the environment path and variable in .bashrc

Set HADOOP_HOME in .bashrc

Download and move the mysqlconnector.jar in apache-hive-1.2.1  lib folder(here we are using 1.2.1) this steps is applicable for all version of Hive

1) Extract the tar file of apache-hive
2) hadoop dfs -chmod 700 /tmp
3) Set or add the bellow properties in hive-site.xml inside conf directory (by default conf folder doesn’t have this xml file)

set hive.support.concurrency = true;
set hive.enforce.bucketing = true;
set hive.exec.dynamic.partition.mode = nonstrict;
set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on = true;
set hive.compactor.worker.threads = 1;

4) bin/hive

5) if you are using mysql means (run the metastore table)
Enter in to mysql
source /home/hadoop/apache-hive-1.2.1-bin/scripts/metastore/upgrade/mysql/hive-txn-schema-0.14.0.mysql.sql

6) create table with transactional = true (your table must need to be bucked + ORC)

create table acid_table (id INT, name STRING, country STRING,salary INT) clustered by (id) into 4 buckets
stored as orc TBLPROPERTIES (‘transactional’=’true’) ;

7) Insert

insert into table acid_table values(1,’john’,’IND’,50000);

8) Update

UPDATE acid_table SET salary = 300 WHERE id = 1;

9) Delete

delete from acid_table where id=1;

Click stream analysis using hive and pig an POC

Click stream analysis suing hive and pig an POC

clickstream

All the e-commerce portals store the user activities on their site as clickstream activity and later they analyze it to identify what the user has browsed and show the appropriate recommendations when the user visits the site again.

Continue reading Click stream analysis using hive and pig an POC