Category Archives: Hadoop

Apache hadoop online test

Check your big data knowledge by online test

Hadoop online Test which includes question in Hdfs,mapreduce,hive,hbase,pig,sqoop etc

1. Identify the utility that allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?

 
 
 
 

2. What data does a Reducer reduce method process?

 
 
 
 

3. The Hadoop framework provides a mechanism for coping with machine issues such as faulty configuration or impending hardware failure. MapReduce detects that one or a number of machines are performing poorly and starts more copies of a map or reduce task. All the tasks run simultaneously and the task finish first are used. This is called:

 
 
 
 

4. You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement?

 
 
 
 

5. When is the earliest point at which the reduce method of a given Reducer can be called?

 
 
 
 

6. Indentify which best defines a SequenceFile?

 
 
 
 

7. You want to understand more about how users browse your public website, such as which pages they visit prior to placing an order. You have a farm of 200 web servers hosting your website. How will you gather this data for your analysis?

 
 
 
 

8. Which best describes how TextInputFormat processes input files and line breaks?

 
 
 
 

9. In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

 
 
 
 

10. You have user profile records in your OLPT database, that you want to join with web logs you have already ingested into the Hadoop file system. How will you obtain these user records?

 
 
 
 

11. Given a directory of files with the following structure: line number, tab character, string:
Example:
1    abialkjfjkaoasdfjksdlkjhqweroij
2   kadfjhuwqounahagtnbvaswslmnbfgy
3   kjfteiomndscxeqalkzhtopedkfsikj
You want to send each line as one record to your Mapper. Which InputFormat should you use to
complete the line: conf.setInputFormat (____.class) ; ?

 
 
 
 

12. MapReduce v2 (MRv2/YARN) is designed to address which two issues?

 
 
 
 

13. You have the following key-value pairs as output from your Map task:
(the, 1)
(fox, 1)
(faster, 1)
(than, 1)
(the, 1)
(dog, 1)
How many keys will be passed to the Reducer’s reduce method?

 
 
 
 

14. Indentify the number of failed task attempts you can expect when you run the job with mapred.max.map.attempts set to 4: and total number of map is 5

 
 
 
 

15. Assuming default settings, which best describes the order of data provided to a reducer’s reduce method

 
 
 
 

16. You need to create a job that does frequency analysis on input data. You will do this by writing a Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into individual characters. For each one of these characters, you will emit the character as a key and an InputWritable as the value. As this will produce proportionally more intermediate data than input data, which two resources should you expect to be bottlenecks?

 
 
 
 

17. What is the disadvantage of using multiple reducers with the default HashPartitioner and distributing your workload across you cluster?

 
 
 
 

18. Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated files in HDFS.

 
 
 
 

19. A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?

 
 
 
 

20. All keys used for intermediate output from mappers must: