Tag Archives: kafka

Streaming with Spark Kafka

Streaming with Spark Kafka


Apache Kafka 

is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

Apache spark streaming

is an extension of the core SparkAPI that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. …Spark Streaming provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data.

Video Click Here


import org.apache.spark.SparkConf
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.kafka.KafkaUtils

object sparkkafka {

def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("KafkaWordCount").setMaster("spark://big-virtual-machine:7077")
val ssc = new StreamingContext(sparkConf, Seconds(2))

val lines = KafkaUtils.createStream(ssc, "localhost:2181", "spark-streaming-consumer-group", Map("customer" -> 5))



<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">


Apache Kafka single node installation

Apache Kafka single node installation

Apache Kafka is a highly-available, high-throughput, distributedmessage broker that handles real-time data feeds. Kafka was originally developed by LinkedIn and open sourced in January 2011. Since then, it’s found use at Yahoo, Twitter, Spotify and many others.now we can start the Apache Kafka single node installation

Step 1: Download  Zookeeper , Kafka (any latest version) and JDK 1.7

Step 2: Untar the zookeeper

Step 3: Start the Zookeeper (find the command from the below picture and give jsp to check the daemon).


Step 4: Download Install kafka

Step 5: Start the kafka
bin/kafka-server-start.sh config/server.properties
Step 6: Now give jps to check both the Zookeeper and kafka daemon is up and running.
Step 7: Create a new Topic “demo” and list all the topics.
Step 8: Start the console producer to produce (send) some message in Kafka
Step 9:  Start the console consumer to consume (receive) some message in Kafka

Apache Kafka Online Test

Apache Kafka Online Test

Check your big data knowledge by online test

Top quiz on apache kafka an publish-subscribe messaging system !!!

1. Mention what happens if the preferred replica is not in the ISR?


2. We need zookeeper for running kafka ?


3. What is ISR in kafka?


4. Mention what is the maximum size of the message does Kafka server can receive?


5. Mention what is Apache Kafka?


6. what is Zookeeper in Kafka?