If you run a Hadoop MapReduce Job and check its status page on job tracker
for its job id, you will get performance counters like the below.
File System
Counters
                FILE: Number of bytes
read=4386096368
                FILE: Number of bytes
written=8805370803
                FILE: Number of read
operations=0
                FILE: Number of large read
operations=0
                FILE: Number of write
operations=0
                HDFS: Number of bytes read=54583718086
                HDFS: Number of bytes
written=4382090874
                HDFS: Number of read
operations=1479
                HDFS: Number of large read
operations=0
                HDFS: Number of write
operations=2
        Job Counters
                Launched map tasks=369
                Launched reduce tasks=1
                Data-local map tasks=369
                Total time spent by all maps in
occupied slots (ms)=34288552
                Total time spent by all reduces
in occupied slots (ms)=232084
                Total time spent by all map
tasks (ms)=8572138
                Total time spent by all reduce
tasks (ms)=58021
                Total vcore-seconds taken by
all map tasks=8572138
                Total vcore-seconds taken by
all reduce tasks=58021
                Total megabyte-seconds taken by
all map tasks=35111477248
                Total megabyte-seconds taken by
all reduce tasks=237654016
        Map-Reduce Framework
                Map input records=14753874
                Map output records=666776
                Map output bytes=4383426830
                Map output materialized
bytes=4386098552
                Input split bytes=47970
                Combine input records=0
                Combine output records=0
                Reduce input groups=1
                Reduce shuffle bytes=4386098552
                Reduce input records=666776
                Reduce output records=666776
                Spilled Records=1333552
                Shuffled Maps =369
                Failed Shuffles=0
                Merged Map outputs=369
                GC time elapsed (ms)=1121584
                CPU time spent (ms)=23707900
                Physical memory (bytes)
snapshot=152915259392
                Virtual memory (bytes)
snapshot=2370755190784
                Total committed heap usage
(bytes)=126644912128
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=49449743227
        File Output Format Counters
                Bytes Written=4382090874
Counters are used to determine if
and how often a particular event occurred during a job execution.
A task’s counters are sent in full every time, rather than sending
the counts since the last transmission, since this guards against errors due to
lost messages. Furthermore, during a job run, counters may go down if a task
fails. Counter values are definitive only once a job has successfully
completed.
Performance Monitoring: Tracking the counter values provides vital stats about the mapreduce job. In addition to stats on mapreduce job and its tasks, it also provide stats on operation done on underlying filesystems. 
Performance Testing: Knowing the expected values of counters provide first line of performance validation for mapreduce jobs. It helps in ascertaining whether mapreduce job is behaving as expected by the ideal mapreduce paradigm. For example, the simple check that sum of number of records produced by mapper should be equal to total number of records consumed by producer.
Performance Optimization: This forms the second line of performance validation for mapreduce jobs. Counter values gave hints on what can be done to optimize that mapreduce job further.
To describe in detail how the above objectives of Performance Monitoring, Testing and Optimization can be achieved using Counters, I have listed in tabular format all mapreduce counters, their description, ideal expected values and how to optimize hints.
There are 2 categories of
counters in Hadoop:
In-built(file system, job, framework) and custom.
In-built Counters
Source Code hadoop-mapreduce-client-core/2.6.0/org/apache/hadoop/mapreduce
Counter    
 | 
  
Brief Description 
 | 
  
Detailed Description 
 | 
  
Expected value for ideally
  optimized MapReduce Performance 
 | 
  
How to Optimize in case of not
  expected value/ Bad Performance 
 | 
 
           Counter Group: FileSystem Counters  
           (Source
  FileName in Hadoop Code = FileSystemCounter.properties) 
           (CounterGroupName
  = File System Counters) 
 | 
 ||||
FOR
  each filesystem(HDFS, MAPRFS, S3): 
 | 
  ||||
·   
  <FS>_BYTES_READ 
 | 
  
Number of bytes read from filesystem,
  <FS> 
 | 
  
The
  total number of bytes read from a specific filesystem.  
 | 
  
No
  ideal expected value as it depends on volume of data being read. 
 | 
  
It could be reduced by using efficient
  serialization techniques (like Avro, Parquet, etc) and compression algorithms
  (like LZO, Snappy, bzip2 etc).   
Choices of algorithms purely depend on use
  case. For example, Parquet is useful only if read pattern for data is limited
  to some columns. 
Also, Tradeoff of time taken to
  serialize-deserialize and compress-decompress should be taken into
  consideration. 
 | 
 
·   
  <FS>_BYTES_WRITTEN 
 | 
  
Number of bytes written to filesystem,
  <FS> 
 | 
  
The
  total number of bytes written into a specific filesystem.  
 | 
  
No
  ideal expected value as it depends on volume of data being generated by
  mapreduce algorithm. 
 | 
 |
Example,
  FOR MAPRFS: 
 | 
  ||||
·   
  MAPRFS_BYTES_READ 
 | 
  
total number of bytes read from mapr file
  system.  
 | 
  
Off the shelf counter for HDFS_BYTES_READ. 
 | 
  ||
·   
  MAPRFS_BYTES_WRITTEN 
 | 
  
total number of bytes written to mapr file
  system.  
 | 
  
Off the shelf counter for
  HDFS_BYTES_WRITTEN. 
 | 
  ||
       
  FILE_BYTES_READ 
 | 
  
total number of bytes read from local file
  system 
 | 
  
This counter is incremented for each byte read
  from the local file system.  
These writes occur:  
1) during the shuffle phase, when the map
  phase intermediate data is read by HTTP worker threads(“tasktracker.http.threads”)
  into reducer tasks.  
2) during the reduce phase, when reducers
  read grouped and sorted data from local filesystem. 
 | 
  
No
  ideal expected value as it depends on shuffle phase and reducer logic. 
 | 
  
It could be reduced by using efficient
  compression algorithms (like LZO, Snappy, bzip2 etc). Also, Tradeoff of time
  taken to compress-decompress should be taken into consideration 
 | 
 
       
  FILE_BYTES_WRITTEN 
 | 
  
total number of bytes written to local
  file system 
 | 
  
This counter is incremented for each byte
  written to the local file system.  
These writes occur:  
1) during the map phase when the mappers
  write their intermediate results to the local file system.  
2) during the shuffle phase when the
  reducers spill intermediate results to their local disks while sorting. 
 | 
  
No
  ideal expected value as it depends on mapper logic and shuffle phase. 
 | 
  
It could be reduced by using efficient
  compression algorithms (like LZO, Snappy, bzip2 etc). Also, Tradeoff of time
  taken to compress-decompress should be taken into consideration 
 | 
 
READ_OPS 
 | 
  
Number of read operations 
 | 
  
It is accumulated at the client per file
  system. It is number of read operations such as listStatus,
  getFileBlockLocations, open etc. 
 | 
  
This is useful in the interim to identify
  jobs that heavily load HDFS. 
 | 
  |
LARGE_READ_OPS 
 | 
  
Number of large read operations 
 | 
  
It is accumulated at the client per file
  system. On file system, most of the operations are small except listFiles for
  a large directory. Iterative listFiles was introduced in HDFS to break down a
  single large operation into smaller steps. This counter is incremented for
  every iteration of listFiles, when listing files under a large directory. 
 | 
  
This is useful in the interim to identify
  jobs that heavily load HDFS. 
 | 
  |
WRITE_OPS 
 | 
  
Number of write operations 
 | 
  
It is accumulated at the client per file
  system. It is number of write operations such as create, append,
  setPermission etc. 
 | 
  
This is useful in the
  interim to identify jobs that heavily load HDFS. 
 | 
  |
Counter Group: Job
  Counters  
(Source
  FileName in Hadoop Code = JobCounter.properties) 
(CounterGroupName= Job Counters) 
 | 
 ||||
SLOTS_MILLIS_MAPS 
 | 
  
Total time spent by all maps in occupied
  slots (ms) 
 | 
  
This value indicates wall clock time for
  the map tasks. It equals to FALLOW_SLOTS_MILLIS_MAPS + MILLIS_MAPS 
 | 
  
Optimally, should be
  equal to MILLIS_MAPS 
 | 
  
Reduce time spent in FALLOW_SLOTS_MILLIS_MAPS
  as much as possible 
 | 
 
FALLOW_SLOTS_MILLIS_MAPS 
 | 
  
Total time spent by all maps waiting after
  reserving slots (ms) 
 | 
  
Indicates how much time map tasks wait in
  the queue after the slots are reserved but before the map tasks execute. Slot
  reservation is a capacity scheduler feature for memory-intensive jobs. 
Not used by YARN-based mapreduce. 
 | 
  
Optimally, should be
  zero  
 | 
  
A high number indicates a possible
  mismatch between the number of slots configured for a task tracker and how many resources are
  actually available. 
The way to reduce it can be one of the
  following: 
1) Increase the number of nodes in cluster
  as it will distribute the data and computing capacity further. Overall, load
  on a particular node will decrease. 
2) Reduce the mapper-task capacity of task-tracker
  nodes with low level of resources. 
Reduce the value of  “mapred.tasktracker.map.tasks.maximum” on
  that task-tracker’s mapred-site.xml   
 | 
 
MILLIS_MAPS 
 | 
  
Total time spent by all map tasks (ms) 
 | 
  
The total time taken in running all the
  map tasks in ms, including speculative tasks 
 | 
  
No ideal optimal value
  as it depends on mapper logic. 
 | 
  
Optimize the mapper code as much as
  possible by profiling it for memory leaks and following best coding
  standards. 
 | 
 
SLOTS_MILLIS_REDUCES 
 | 
  
Total time spent by all reduces in
  occupied slots (ms) 
 | 
  
This value indicates wall clock time for
  the reduce tasks. It equals to FALLOW_SLOTS_MILLIS_REDUCES + MILLIS_REDUCES 
 | 
  
Optimally, should be
  equal to MILLIS_REDUCES 
 | 
  
Reduce time spent in
  FALLOW_SLOTS_MILLIS_REDUCES as much as possible 
 | 
 
FALLOW_SLOTS_MILLIS_REDUCES 
 | 
  
Total time spent by all reduces waiting
  after reserving slots (ms) 
 | 
  
indicates how much time map tasks wait in
  the queue after the slots are reserved but before the reduce tasks execute.
  Slot reservation is a capacity scheduler feature for memory-intensive jobs. 
Not used by YARN-based mapreduce. 
 | 
  
Optimally, should be
  zero  
 | 
  
A high number indicates a possible
  mismatch between the number of slots configured for a task tracker and how many resources are
  actually available. 
The way to reduce it can be one of the
  following: 
1) Increase the number of nodes in cluster
  as it will distribute the data and computing capacity further. Overall, load
  on a particular node will decrease. 
2) Reduce the reducer-task capacity of task-tracker
  nodes with low level of resources. 
Reduce the value of  “mapred.tasktracker.reduce.tasks.maximum”
  on that task-tracker’s mapred-site.xml   
 | 
 
MILLIS_REDUCES 
 | 
  
Total time spent by all reduce tasks (ms) 
 | 
  
The total time taken in running all the
  reduce tasks in ms, including speculative tasks 
 | 
  
No ideal optimal value
  as it depends on reducer logic. 
 | 
  
Optimize the reducer code as much as
  possible by profiling it for memory leaks and following best coding
  standards. 
 | 
 
VCORES_MILLIS_MAPS 
 | 
  
Total vcore-seconds taken by all map tasks 
 | 
  
Vcore stands for virtual core in CPU of a computer.  
This counter measures the cpu resources used by all
  the mappers.  
It is the aggregated number of vcores that each mapper
  had been allocated times the number of seconds that mapper had run. 
The number of vcores allocated to a mapper is
  controlled by property 
“mapreduce.map.cpu.vcores” (default 1) 
And in yarn, it is controlled by 
“yarn.nodemanager.resource.cpu-vcores” (default 8). 
In simple language, it is equal to
  (“mapreduce.map.cpu.vcores” X MILLIS_MAPS)  
 | 
  
It gives the value of cpu resources
  that were blocked by mappers.  
The ideal optimal value is as much
  low value for this counter as possible. 
 | 
  
To reduce value to optimize the value of this counter,
  either reduce the number of vcore allocated to each mapper or the time taken
  to execute a mapper instance. 
In other words, mapper should be made optimized and
  less resource-intensive as much possible. Then, reduce value of
  “mapreduce.map.cpu.vcores” that fits its requirement perfectly. 
 | 
 
VCORES_MILLIS_REDUCES 
 | 
  
Total vcore-seconds taken by all reduce tasks 
 | 
  
Vcore stands for virtual core in CPU of a computer.  
This counter measures the cpu resources used by all
  the reducers.  
It is the aggregated number of vcores that each
  reducer had been allocated times the number of seconds that reducer had run. 
The number of vcores allocated to a reducer is
  controlled by property 
“mapreduce.reduce.cpu.vcores” (default 1) 
And in yarn, it is controlled by 
“yarn.nodemanager.resource.cpu-vcores” (default 8). 
In simple language, it is equal to
  (“mapreduce.reduce.cpu.vcores” X MILLIS_REDUCES) 
 | 
  
It gives the value of cpu resources
  that were blocked by reducers.  
The ideal optimal value is as much
  low value for this counter as possible. 
 | 
  
To reduce value to optimize the value of this counter,
  either reduce the number of vcore allocated to each reducer or the time taken
  to execute a mapper instance. 
In other words, mapper should be made optimized and
  less resource-intensive as much possible. Then, reduce value of
  “mapreduce.reduce.cpu.vcores” that fits its requirement perfectly. 
 | 
 
MB_MILLIS_MAPS 
 | 
  
Total megabyte-seconds taken by all map tasks 
 | 
  
This counter measures the memory resources used by all
  the mappers.  
It is the aggregated amount of memory (in megabytes)
  that each mapper had been allocated times the number of seconds that mapper had
  run. 
The amount of memory(MB) allocated to a mapper is
  controlled by property 
“mapreduce.map.memory.mb” (default 1024) 
And in yarn, it is controlled by 
“yarn.nodemanager.resource.memory-mb” (default 8192). 
In simple language, it is equal to
  (“mapreduce.map.memory.mb” X MILLIS_MAPS) 
 | 
  
It gives the value of memory
  resources that were blocked by mappers.  
The ideal optimal value is as much
  low value for this counter as possible. 
 | 
  
To reduce value to optimize the value of this counter,
  either reduce the number of memory-mb allocated to each mapper or the time
  taken to execute a mapper instance. 
In other words, mapper should be made optimized and
  less resource-intensive as much possible. Then, reduce value of
  “mapreduce.map.memory.mb” that fits its requirement perfectly. 
 | 
 
MB_MILLIS_REDUCES 
 | 
  
Total megabyte-seconds taken by all reduce tasks 
 | 
  
This counter measures the memory resources used by all
  the reducers.  
It is the aggregated amount of memory (in megabytes)
  that each reducer had been allocated times the number of seconds that reducer
  had run. 
The amount of memory(MB) allocated to a reducer is
  controlled by property 
“mapreduce.reduce.memory.mb” (default 1024) 
And in yarn, it is controlled by 
“yarn.nodemanager.resource.memory-mb” (default 8192). 
In simple language, it is equal to
  (“mapreduce.reduce.memory.mb” X MILLIS_ REDUCES) 
 | 
  
It gives the value of memory
  resources that were blocked by reducers.  
The ideal optimal value is as much
  low value for this counter as possible. 
 | 
  
To reduce value to optimize the value of this counter,
  either reduce the number of memory-mb allocated to each reducer or the time
  taken to execute a reducer instance. 
In other words, mapper should be made optimized and
  less resource-intensive as much possible. Then, reduce value of
  “mapreduce.reduce.memory.mb” that fits its requirement perfectly. 
 | 
 
DATA_LOCAL_MAPS 
 | 
  
Data-local map tasks 
 | 
  
Total number of map tasks run on local
  data blocks (data locality). 
 | 
  
It should be as much
  near to total maps as possible. Optimally, all the map tasks will execute on
  local data to exploit locality of reference 
 | 
  
The only solution is to ensure maximum
  data locality. 
The way to ensure it can be one of the
  following: 
1) Increase the number of nodes in cluster
  as it will distribute the data and computing capacity further. Overall, load
  on a particular node will decrease and probability of map task getting data
  locally will increase. 
2) Increase the mapper-task capacity of task-tracker
  nodes with high level of resources. 
Increase the value of  “mapred.tasktracker.reduce.tasks.maximum”
  on that task-tracker’s mapred-site.xml   
 | 
 
RACK_LOCAL_MAPS  
 | 
  
Rack-local map tasks 
 | 
  
Total number of map tasks for which data
  blocks are on same rack as node on which map task is executing. 
 | 
  
Ideally should be zero. 
 | 
 |
OTHER_LOCAL_MAPS 
 | 
  
Other local map tasks tasks 
 | 
  
Total number of map tasks for which data
  blocks are on some other rack as node on which map task is executing. 
 | 
  
Ideally should be zero. 
 | 
 |
TOTAL_LAUNCHED_MAPS 
 | 
  
Launched map tasks. 
 | 
  
It defines how many map tasks were launched
  for the job, including failed tasks, killed tasks and tasks that were started
  speculatively. 
 | 
  
Optimally, this number
  is the same as the number of splits for the job. 
 | 
  
In case of value not maching number of
  input splits, verify the InputFormat used. 
 | 
 
TOTAL_LAUNCHED_REDUCES 
 | 
  
Launched reduce tasks 
 | 
  
It defines how many reduce tasks were
  launched for the job, including failed tasks, killed tasks and tasks that
  were started speculatively. 
 | 
  
Optimally, should be
  equal to the number of reducers configured for that mapreduce job (default is
  2 reducers) 
 | 
  |
NUM_FAILED_MAPS 
 | 
  
Failed map tasks 
 | 
  
The number of map attempts/tasks that were
  failed.  
The reason of failure could be runtime
  exception in code or errors like Out Of Memory, I/O errors, etc. 
 | 
  
Ideally should be zero. 
 | 
  |
NUM_FAILED_REDUCES 
 | 
  
Failed reduce tasks 
 | 
  
The number of reduce attempts/tasks that
  were failed.  
The reason of failure could be runtime
  exception in code or errors like Out Of Memory, I/O errors, etc. 
 | 
  
Ideally should be zero. 
 | 
  |
NUM_KILLED_MAPS 
 | 
  
Killed map tasks 
 | 
  
The number of map attempts/tasks that were
  killed.  
The reason of killing could be speculative
  execution (in which slower task is killed) or failure of any node on which
  task is running. 
 | 
  
Ideally should be zero. 
 | 
  
A high number indicates too many tasks
  getting killed due to speculative execution or node failures. 
The way to reduce the instances of
  speculative execution can be one of the following: 
1) Increase the number of nodes in cluster
  as it will distribute the data and computing capacity further. Overall, load
  on a particular node will decrease. 
2) Remove or minimize the number of nodes
  with low level of resources from cluster. 
 | 
 
NUM_KILLED_REDUCES 
 | 
  
Killed reduce tasks 
 | 
  
The number of reduce attempts/tasks that
  were killed.  
The reason of killing could be speculative
  execution (in which slower task is killed) or failure of any node on which
  task is running. 
 | 
  
Ideally should be zero. 
 | 
 |
·        
  Counter Group: Map-Reduce Framework Counters  
·        
  (Source
  FileName in Hadoop Code = TaskCounter.properties) 
·        
  (CounterGroupName= Map-Reduce Framework) 
 | 
 ||||
SPLIT_RAW_BYTES 
 | 
  
Input split bytes 
 | 
  
The total number of
  bytes of input-split object consumed read by maps. These objects represent
  the split metadata (offset and length within a file) rather than split data
  itself.  
 | 
  
No ideal optimal value 
 | 
  |
MAP_INPUT_BYTES 
 | 
  
Map input bytes 
 | 
  
The number of bytes of uncompressed input
  read by all the maps in the job.  
 | 
  
Must be equal to number
  of bytes in input data 
 | 
  |
MAP_INPUT_RECORDS 
 | 
  
Map input records 
 | 
  
The number of input records consumed by
  all the maps in the job. It is incremented for every successful record read from
  RecordReader and passed to the mapper’s map method. Records that the map
  tasks failed to read are not included in these counters. 
 | 
  
Must be equal to number
  of records in input data 
 | 
  
In case of non-matching value, check
  RecordReader logic 
 | 
 
MAP_SKIPPED_RECORDS 
 | 
  
Map skipped records. 
 | 
  
The number of Input records skipped by all
  the maps in the job. 
Skipped records are bad/corrupt records that are making the map
  task attempts to fail due to run-time exceptions in third-party libraries
  being used in mapper. Skipping mode is turned on for a task only after it has
  failed twice.  
For a task consistently failing on a bad record, the tasktracker
  runs the following 
task attempts with these outcomes: 
1. Task fails. 
2. Task fails. 
3. Skipping mode is enabled. Task fails but failed record is
  stored by the tasktracker. 
4. Skipping mode is still enabled. Task succeeds by skipping the
  bad record that failed in the previous attempt. 
It can detect only one bad record per task attempt, so this
  mechanism is appropriate only for detecting 
occasional bad records. 
To give skipping mode enough attempts to detect and skip 
all the bad records in an input split, increase the value of
  “mapred.map.max.attempts”. Bad records are saved as sequence files in the
  job’s 
output directory under the _logs/skip
  subdirectory. 
 | 
  
Ideally, should be 0 
 | 
  |
MAP_OUTPUT_RECORDS 
 | 
  
Map output records 
 | 
  
The number of map output records produced
  by all the maps in the job. It is incremented for every successful record written
  by the mappers. Records that the map tasks failed to write are not included
  in these counters. 
 | 
  
No ideal optimal value
  as it depends on mapper logic. 
 | 
  |
MAP_OUTPUT_BYTES 
 | 
  
Map output bytes 
 | 
  
The number of bytes of uncompressed output
  produced by all the maps in the job. Incremented every time collect() method
  is called on a map’s Output Collector. 
 | 
  
No ideal optimal value
  as it depends on mapper logic. 
 | 
  
To optimize this counter, use Hadoop
  Serialization or Avro to serialize the Mapper output. 
 | 
 
MAP_OUTPUT_MATERIALIZED_BYTES 
 | 
  
Map output materialized bytes 
 | 
  
The number of bytes of map output actually
  written to disk.  
This is visible and relevant only if map
  output compression is enabled. In that case, it is the number of bytes of
  compressed output produced by all the maps in the job. 
 | 
  
No ideal optimal value
  as it depends on mapper logic. 
 | 
  |
COMBINE_INPUT_RECORDS 
 | 
  
Combine input records 
 | 
  
It indicates the number of records that were
  read by the optional combiner. 
Incremented every time a value is read
  from the combiner’s iterator over values 
 | 
  
If you don’t specify a
  combiner, these counters should be 0. Otherwise, should be equal to MAP_OUTPUT_RECORDS 
 | 
  
In case of unexpected value, check
  combiner logic 
 | 
 
COMBINE_OUTPUT_RECORDS 
 | 
  
Combine output records 
 | 
  
It indicates the number of records that were
  written by the optional combiner. 
Incremented every time the collect()
  method is called on a combiner’s 
OutputCollector. 
 | 
  
If you don’t specify a
  combiner, these counters should be 0. Otherwise, should be equal to
  REDUCE_INPUT_RECORDS 
 | 
  
In case of unexpected value, check
  combiner logic 
 | 
 
      
  REDUCE_INPUT_RECORDS 
 | 
  
Reduce input records 
 | 
  
It indicates how many records were
  successfully read by the reducers. 
Incremented every time a value is read from the reducer’s
  iterator over values.  
 | 
  
The input record
  counter should be equal to the MAP_OUTPUT_RECORDS counter, in case of no
  combiner. Otherwise, should be equal to REDUCE_INPUT_RECORDS 
 | 
  |
REDUCE_SKIPPED_RECORDS 
 | 
  
Reduce skipped records 
 | 
  
The number of Input records skipped by all
  the reducers in the job. 
Skipped records are bad/corrupt records that are making the reduce
  task attempts to fail due to run-time exceptions in third-party libraries
  being used in reducer. Skipping mode is turned on for a task only after it
  has failed twice.  
For a task consistently failing on a bad record, the tasktracker
  runs the following 
task attempts with these outcomes: 
1. Task fails. 
2. Task fails. 
3. Skipping mode is enabled. Task fails but failed record is
  stored by the tasktracker. 
4. Skipping mode is still enabled. Task succeeds by skipping the
  bad record that failed in the previous attempt. 
It can detect only one bad record per task attempt, so this
  mechanism is appropriate only for detecting 
occasional bad records. 
To give skipping mode enough attempts to detect and skip 
all the bad records in an input split, increase the value of “mapred.reduce.max.attempts”.
  Bad records are saved as sequence files in the job’s 
output directory under the _logs/skip
  subdirectory. 
 | 
  
Ideally should be zero. 
 | 
  |
REDUCE_INPUT_GROUPS 
 | 
  
Reduce input groups 
 | 
  
The total number of distinct unique key
  groups consumed by all the reducers in the job. It is incremented for every
  unique key that the reducers process. 
Incremented every time the reducer’s
  reduce() method is called by the 
framework. 
 | 
  
This value should be
  equal to the total number of different keys in the intermediate results from
  the mappers. 
 | 
  |
REDUCE_SKIPPED_GROUPS 
 | 
  
Reduce skipped groups 
 | 
  
The number of distinct key groups skipped by
  all the reducers in the job. 
 | 
  
 This value should be zero 
 | 
  |
REDUCE_OUTPUT_RECORDS 
 | 
  
Reduce output records 
 | 
  
It indicates how many records were
  successfully written by all the reducers in the job. 
Incremented every time the collect() method is called on a
  reducer’s OutputCollector. 
 | 
  
Ideally, ratio of
  (REDUCE_OUTPUT_RECORDS/ MAP_INPUT_RECORDS) should be less than 1 and as much
  closer to 0 as possible. 
 | 
  
Although even if ratio is > 1, it is
  not invalid. It only violates the basic assumption of mapreduce paradigm that
  reduce-phase aggregates the data considerably. 
 | 
 
REDUCE_SHUFFLE_BYTES 
 | 
  
Reduce shuffle bytes 
 | 
  
It indicates how many bytes of the map
  output were copied by the shuffle to the reducers.   
 | 
  
Should be as much lower
  as possible. 
Higher numbers here
  will make the job go slower as the shuffle process is the primary network
  consumer in the MapReduce job. 
 | 
  
Compress and Serialize the Mapper Output. 
It could be reduced by using efficient
  serialization techniques (like Avro, Parquet, etc) and compression algorithms
  (like LZO, Snappy, bzip2 etc).   
Choices of algorithms purely depend on use
  case. For example, Parquet is useful only if read pattern for data is limited
  to some columns. 
Also, Tradeoff of time taken to
  serialize-deserialize and compress-decompress should be taken into
  consideration. 
 | 
 
SPILLED_RECORDS 
 | 
  
Spilled Records 
 | 
  
It indicates how much data the map and reduce
  tasks wrote (spilled) to disk when processing the records. 
 | 
  
Must be equal to sum of
  REDUCE_INPUT_RECORDS and MAP_OUTPUT_RECORDS. In case, combiner is also
  invoked, It must be equal to sum of REDUCE_INPUT_RECORDS and
  COMBINE_OUTPUT_RECORDS. 
Should be as much lower
  as possible as it involves costly Disk I/O.  
 | 
  |
SHUFFLED_MAPS 
 | 
  
Shuffled Maps 
 | 
  
The number of map output files transferred to reducers
  by shuffle. 
In shuffle phase, each mapper’s output is partitioned
  and sorted into files, one for each reducer. There might be scenario that
  keys are not uniformly distributed across mappers and some mappers don’t
  generate partitions for all reducers.  
 | 
  
Must be less than or equal to
  product of number of successful mappers and successful reducers. 
Mathematically, must be less than
  or equal to (TOTAL_LAUNCHED_MAPS – NUM_KILLED_MAPS – NUM_FAILED_MAPS) *
  (TOTAL_LAUNCHED_REDUCES – NUM_KILLED_ REDUCES – NUM_FAILED_ REDUCES) 
 | 
  |
FAILED_SHUFFLE 
 | 
  
Failed Shuffles 
 | 
  
The number of map output copy failures during the
  shuffle. 
 | 
  
Should be zero. 
 | 
  |
MERGED_MAP_OUTPUTS 
 | 
  
Merged Map outputs 
 | 
  
The number of map outputs that have been merged on the
  reduce side of the shuffle. 
 | 
  
Must be equal to the value of
  SHUFFLED_MAPS 
 | 
  |
CPU_MILLISECONDS 
 | 
  
CPU time spent (ms) 
 | 
  
Total time spent by all tasks on CPU. It is
  gathered from /proc/cpuinfo and indicate how much total time was spent
  executing map and reduce tasks for a job. 
 | 
  
Should be as much lower
  as possible. 
 | 
  |
GC_TIME_MILLIS 
 | 
  
GC time elapsed (ms) 
 | 
  
The total time spent in milliseconds doing garbage
  collection. The garbage collection counter is reported from
  GarbageCollectorMXBean.getCollectionTime(). 
The Garbage Collection happens in Java to collect
  unused objects (objects that are no longer referenced). 
If garbage collection is frequent and represents a lot
  of time, you may be allocating unnecessary objects. 
 | 
  
It should be closer to 0 as much as
  possible  
 | 
  
If GC_TIME is amounting to
  considerable proportion of task run time, follow steps can help 
1) Add -verbose:gc -XX:+PrintGCDetails to “mapred.child.java.opts”.
  Then inspect the logs for some tasks (mappers or reducers). If new objects
  are being created (like “new Text” or “new IntWritable”) unnecessarily or
  inside a loop or inside a function, try to optimize it. The Hadoop Writable
  Objects are mutable. 
2) If too much object creation is
  necessity, try to increase heap-size allocation to task attempt jvm by adding
  –Xmx parameter to “mapred.map.child.java.opts” for map tasks and “mapred.reduce.child.java.opts”
  for reduce tasks. For example, to set committed heap usage to 4GB for map
  tasks, set “–Xmx4096m” to
  “mapred.map.child.java.opts”.     
 | 
 
VIRTUAL_MEMORY_BYTES 
 | 
  
Virtual memory (bytes) snapshot 
 | 
  
Total number of Virtual
  Memory Bytes (RAM+SWAP) consumed by all the tasks. It shows how much physical
  memory plus swap space is consumed by all tasks in a job. 
 | 
  
(VIRTUAL_MEMORY_BYTES  - PHYSICAL_MEMORY_BYTES ) indicates how much
  swap space or paged memory is used for a job, which should be close to 0 as
  much as possible 
 | 
  
The only reason for high value of swap
  space is memory committed to current tasks on a node far exceeding available
  memory resource on that node. The reason could be other system processes or
  applications consuming memory. This will result in pagination of memory which
  is being indicated by high value of swap space. 
The way to ensure low value of swap memory
  can be one of the following: 
1) Increase the number of nodes in cluster
  as it will distribute the data and computing capacity further. Overall, load
  on a particular node will decrease. 
2) Decrease the mapper-task capacity and
  reducer-task capacity of task-tracker nodes having low level of resources. 
Decrease the value of  “mapred.tasktracker.map.tasks.maximum” and
  “mapred.tasktracker.reduce.tasks.maximum” on that task-tracker’s
  mapred-site.xml   
 | 
 
PHYSICAL_MEMORY_BYTES 
 | 
  
Physical memory (bytes) snapshot 
 | 
  
Total Physical Memory
  consumed by all the tasks. These statistics are gathered from /proc/meminfo
  and indicate how much RAM (not including swap space) was consumed by all the
  tasks. 
 | 
 ||
COMMITTED_HEAP_BYTES 
 | 
  
Total committed heap usage (bytes) 
 | 
  
The total amount of memory available in the JVM in
  bytes, as reported by Runtime.getRuntime().totalMemory() 
 | 
  
Should be close to total task
  requirement 
 | 
  |
References:
- https://www.mapr.com/blog/managing-monitoring-and-testing-mapreduce-jobs-how-work-counters
 - https://www.mapr.com/blog/how-to-avoid-java-heap-space-errors-understanding-and-managing-task-attempt-memory
 - http://stackoverflow.com/questions/18264735/virtual-core-vs-physical-core
 - http://stackoverflow.com/questions/26522967/how-to-set-the-vcores-in-hadoop-mapreduce-yarn
 - http://jason4zhu.blogspot.in/2014/10/vcore-configuration-in-hadoop.html
 
I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
ReplyDeleteHadoop Online Training
R Programming Online Training|
Data Science Online Training|
Hi,Your post on hadoop performance monitoring and testing was the best post and I understood the concepts very well and thanks for posting Hadoop Training in Velachery | Hadoop Training .
ReplyDeleteVery nice article post,Thank you for sharing this awesome blog.
ReplyDeletekeep updating more big data hadoop tutorials.
Big Data Online Training