Category Archives: kafka

Streaming with Spark Kafka

Streaming with Spark Kafka

structured-streaming-kafka-blog-image-1-overview

Apache Kafka 

is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

Apache spark streaming

is an extension of the core SparkAPI that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. …Spark Streaming provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data.

Video Click Here

Code


import org.apache.spark.SparkConf
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.kafka.KafkaUtils

object sparkkafka {

def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("KafkaWordCount").setMaster("spark://big-virtual-machine:7077")
val ssc = new StreamingContext(sparkConf, Seconds(2))

val lines = KafkaUtils.createStream(ssc, "localhost:2181", "spark-streaming-consumer-group", Map("customer" -> 5))
lines.print()

ssc.start()
ssc.awaitTermination()
}
}

POM.XML

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>sparkkafka</groupId>
  <artifactId>sparkkafka</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <build>
    <sourceDirectory>src</sourceDirectory>
    <plugins>
      <plugin>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.1</version>
        <configuration>
          <source>1.7</source>
          <target>1.7</target>
        </configuration>
      </plugin>
    </plugins>
  </build>
   <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>1.5.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.10</artifactId>
      <version>1.5.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka_2.10</artifactId>
      <version>1.5.1</version>
    </dependency>

    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

Apache Kafka single node installation

Apache Kafka single node installation

Apache Kafka is a highly-available, high-throughput, distributedmessage broker that handles real-time data feeds. Kafka was originally developed by LinkedIn and open sourced in January 2011. Since then, it’s found use at Yahoo, Twitter, Spotify and many others.now we can start the Apache Kafka single node installation

Step 1: Download  Zookeeper , Kafka (any latest version) and JDK 1.7

Step 2: Untar the zookeeper

Step 3: Start the Zookeeper (find the command from the below picture and give jsp to check the daemon).

kafka

Step 4: Download Install kafka

Step 5: Start the kafka
bin/kafka-server-start.sh config/server.properties
 k2
kafka
Step 6: Now give jps to check both the Zookeeper and kafka daemon is up and running.
kafka
Step 7: Create a new Topic “demo” and list all the topics.
kafka
Step 8: Start the console producer to produce (send) some message in Kafka
kafka
Step 9:  Start the console consumer to consume (receive) some message in Kafka
kafka