Tag Archives: apache hive

Apache hive Installation with ACID

Follow The list of steps to install Hive with Transaction

Video Reference Click here 

Set this in your Command line


Set the environment path and variable in .bashrc

Set HADOOP_HOME in .bashrc

Download and move the mysqlconnector.jar in apache-hive-1.2.1  lib folder(here we are using 1.2.1) this steps is applicable for all version of Hive

1) Extract the tar file of apache-hive
2) hadoop dfs -chmod 700 /tmp
3) Set or add the bellow properties in hive-site.xml inside conf directory (by default conf folder doesn’t have this xml file)

set hive.support.concurrency = true;
set hive.enforce.bucketing = true;
set hive.exec.dynamic.partition.mode = nonstrict;
set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on = true;
set hive.compactor.worker.threads = 1;

4) bin/hive

5) if you are using mysql means (run the metastore table)
Enter in to mysql
source /home/hadoop/apache-hive-1.2.1-bin/scripts/metastore/upgrade/mysql/hive-txn-schema-0.14.0.mysql.sql

6) create table with transactional = true (your table must need to be bucked + ORC)

create table acid_table (id INT, name STRING, country STRING,salary INT) clustered by (id) into 4 buckets
stored as orc TBLPROPERTIES (‘transactional’=’true’) ;

7) Insert

insert into table acid_table values(1,’john’,’IND’,50000);

8) Update

UPDATE acid_table SET salary = 300 WHERE id = 1;

9) Delete

delete from acid_table where id=1;

SparkR installation

SparkR installation

spark hive R
spark hive R

                                                                                                                          CLICK HERE FOR VIDEO

I am excited to announce that the upcoming Apache Spark 1.4 release will include SparkR, an R package that allows data scientists to analyze large datasets and interactively run jobs on them from the R shell.
Continue reading SparkR installation

Click stream analysis using hive and pig an POC

Click stream analysis suing hive and pig an POC


All the e-commerce portals store the user activities on their site as clickstream activity and later they analyze it to identify what the user has browsed and show the appropriate recommendations when the user visits the site again.

Continue reading Click stream analysis using hive and pig an POC