Tag Archives: apache spark

SparkR installation

SparkR installation

spark hive R
spark hive R

                                                                                                                          CLICK HERE FOR VIDEO

I am excited to announce that the upcoming Apache Spark 1.4 release will include SparkR, an R package that allows data scientists to analyze large datasets and interactively run jobs on them from the R shell.
Continue reading SparkR installation

Apache spark single node installation

Apache spark single node installation


Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop‘s two-stage disk-based MapReduce paradigm, Spark’s in-memory primitives provide performance up to 100 times faster for certain applications.

Continue reading Apache spark single node installation

Apache Spark online test

Apache Spark™ is a fast and general engine for large-scale data processing.

Apache Spark is the leading and upcoming Technology so make yourself comfortable by taking the test.

1. Relation between Apache spark and Mapreduce


2. What is the status of Apache Spark as an Apache Software Foundation project?


3. When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?


4. Types of  Transformations in apache spark?


5. What is MLib in apache spark ?


6. What is a RDD?


7. Which of the following is not an associated component of Spark?


8. What is the USE for Apache Spark?


9. What are Actions?


10. Who among the following offers commercial distribution of Apache Spark?