Tag Archives: hadoop PoC

Click stream analysis using hive and pig an POC

Click stream analysis suing hive and pig an POC


All the e-commerce portals store the user activities on their site as clickstream activity and later they analyze it to identify what the user has browsed and show the appropriate recommendations when the user visits the site again.

Continue reading Click stream analysis using hive and pig an POC

Web analytic using Storm,Kafka and Cassandra

Web analytic using Storm,Kafka and Cassandra

Storm kafka and cassandra

Apache Kafka is a highly-available, high-throughput, distributedmessage broker that handles real-time data feeds. Kafka was originally developed by LinkedIn and open sourced in January 2011. Since then, it’s found use at Yahoo, Twitter, Spotify and many others.

Apache Storm is a distributed real-time computation system. It is often compared withApache Hadoop, a similar system albeit one that is batch-oriented, unlike Storm which processes a stream of data. Storm was initially developed by BackType and then acquired and open sourced by Twitter in September 2011. Like Kafka, Storm is also used by Yahoo, Twitter and Spotify and many others.

Apache Cassandra is a distributed, decentralized, high-throughput and highly-available database with no single point of failure. It was initially developed by Facebook and open sourced in July 2008. Along with Thrift, Cassandra has its own CQL (Cassandra Query Language) which is similar to SQL. With an SQL-like interface, Cassandra is a superior solution for storing and managing huge amounts of data. According to their own benchmarks, the more nodes your cluster has, the higher its performance.

Find the unique and repeated user count who visit the website based on result client will provide a different offers to different users.

Steps :

1.Input data is from the website it’s a JSON data

2.Created a Kafka topic to produce the JSON data

3.Written a Storm code to consume the JSON data from Kafka.

4.Written a Storm code to parse the JSON data and insert in Cassnadra.

5.In Cassandra create a KeySpace and Table with counter type.

Click here for POC video about the same

Click here for CODE