Web analytic using Storm,Kafka and Cassandra
Apache Kafka is a highly-available, high-throughput, distributedmessage broker that handles real-time data feeds. Kafka was originally developed by LinkedIn and open sourced in January 2011. Since then, it’s found use at Yahoo, Twitter, Spotify and many others.
Apache Storm is a distributed real-time computation system. It is often compared withApache Hadoop, a similar system albeit one that is batch-oriented, unlike Storm which processes a stream of data. Storm was initially developed by BackType and then acquired and open sourced by Twitter in September 2011. Like Kafka, Storm is also used by Yahoo, Twitter and Spotify and many others.
Apache Cassandra is a distributed, decentralized, high-throughput and highly-available database with no single point of failure. It was initially developed by Facebook and open sourced in July 2008. Along with Thrift, Cassandra has its own CQL (Cassandra Query Language) which is similar to SQL. With an SQL-like interface, Cassandra is a superior solution for storing and managing huge amounts of data. According to their own benchmarks, the more nodes your cluster has, the higher its performance.
Find the unique and repeated user count who visit the website based on result client will provide a different offers to different users.
1.Input data is from the website it’s a JSON data
2.Created a Kafka topic to produce the JSON data
3.Written a Storm code to consume the JSON data from Kafka.
4.Written a Storm code to parse the JSON data and insert in Cassnadra.
5.In Cassandra create a KeySpace and Table with counter type.
Click here for POC video about the same
Click here for CODE