Category Archives: Hadoop

Apache hadoop online test

Check your big data knowledge by online test

Hadoop online Test which includes question in Hdfs,mapreduce,hive,hbase,pig,sqoop etc

1. When is the earliest point at which the reduce method of a given Reducer can be called?

 
 
 
 

2. You have user profile records in your OLPT database, that you want to join with web logs you have already ingested into the Hadoop file system. How will you obtain these user records?

 
 
 
 

3. You have the following key-value pairs as output from your Map task:
(the, 1)
(fox, 1)
(faster, 1)
(than, 1)
(the, 1)
(dog, 1)
How many keys will be passed to the Reducer’s reduce method?

 
 
 
 

4. Assuming default settings, which best describes the order of data provided to a reducer’s reduce method

 
 
 
 

5. You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement?

 
 
 
 

6. MapReduce v2 (MRv2/YARN) is designed to address which two issues?

 
 
 
 

7. You want to understand more about how users browse your public website, such as which pages they visit prior to placing an order. You have a farm of 200 web servers hosting your website. How will you gather this data for your analysis?

 
 
 
 

8. Indentify the number of failed task attempts you can expect when you run the job with mapred.max.map.attempts set to 4: and total number of map is 5

 
 
 
 

9. You need to create a job that does frequency analysis on input data. You will do this by writing a Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into individual characters. For each one of these characters, you will emit the character as a key and an InputWritable as the value. As this will produce proportionally more intermediate data than input data, which two resources should you expect to be bottlenecks?

 
 
 
 

10. Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated files in HDFS.

 
 
 
 

11. All keys used for intermediate output from mappers must:

 
 
 
 

12. What is the disadvantage of using multiple reducers with the default HashPartitioner and distributing your workload across you cluster?

 
 
 
 

13. Identify the utility that allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?

 
 
 
 

14. Which best describes how TextInputFormat processes input files and line breaks?

 
 
 
 

15. The Hadoop framework provides a mechanism for coping with machine issues such as faulty configuration or impending hardware failure. MapReduce detects that one or a number of machines are performing poorly and starts more copies of a map or reduce task. All the tasks run simultaneously and the task finish first are used. This is called:

 
 
 
 

16. In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

 
 
 
 

17. What data does a Reducer reduce method process?

 
 
 
 

18. Indentify which best defines a SequenceFile?

 
 
 
 

19. Given a directory of files with the following structure: line number, tab character, string:
Example:
1    abialkjfjkaoasdfjksdlkjhqweroij
2   kadfjhuwqounahagtnbvaswslmnbfgy
3   kjfteiomndscxeqalkzhtopedkfsikj
You want to send each line as one record to your Mapper. Which InputFormat should you use to
complete the line: conf.setInputFormat (____.class) ; ?

 
 
 
 

20. A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?