Download - Hadoop源码分析 mapreduce部分
2009-02-21
HadoopMapReduceHDFSHDFSHadoopMapReducehttp://horicky.blogspot.com/2008/11/hadoop-mapreduce-implementation.htmlMapReduce
Hadoopwordcounthadoop jar hadoop-0.19.0-examples.jar wordcount /usr/input /usr/outputJobTrackerMapM1M2M3ReduceR1R2MapReduceTaskTrackerTaskTrackerJavaHDFSInputFormatASCIIJDBCInputFormatInputSplitsplite1splite5InputFormatRecordReadermapmapcontext.collectOutputCollector. collectcontextMapperPartitionerMapperCombinerMapperlistkeylistCombinerPartitionerM1CombinerPartitionerMapReduce3ShufflesortreduceHadoopMapReduceMapkeyReducerMapperkeykeyReducerHTTPMapperkeyReduceShufflesortReducer. reduceOutputFormatDFS2009-02-25
Hadooporg.apache.hadoop.mapreduceorg.apache.hadoop.mapreduce
4WriteableCounterCounterGroupCountersIDMapReduceContext*ContextMapperReducerMapReduceMapperReducerJobHadoop jobtaskMapperReduceHTTPServlet HadoopHadoopMapperReduce*Context
IDJobContext*ContextJobJobIDMapReduce
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/index.html
InputFormat InputSplits, InputSplit Mapper InputFormat RecordReader , InputSplit map TextInputFormat( InputSplits, LineRecordReader InputSplit key value ) SequenceFileInputFormat
OutputFormat RecordWriter TextOutputFormat( LineRecordWriter , key value tab ) SequenceFileOutputFormat
OutputKeyClass key LongWritable
OutputValueClass value Text
MapperClass Mapper map IdentityMapper( ) LongSumReducer,LogRegexMapper,InverseMapper
CombinerClass combine key null( key )
ReducerClass Reducer reduce IdentityReducer() AccumulatingReducer, LongSumReducer
InputPath job , job null
OutputPath job job null
MapOutputKeyClass map key OutputKeyClass
MapOutputValueClass map value OutputValuesClass
OutputKeyComparator key WritableComparable
PartitionerClass key Partition R, Reducer HashPartitioner( Hash partition) KeyFieldBasedPartitioner PipesPartitioner
JobJobContextsetJobJobJobContextJobJob
mapProgressmap01.0
reduceProgressreduce01.0
isComplete
isSuccessful
killJob
getTaskCompletionEvents/
killTask
2009-02-25
Hadoopmapreduce.lib.inputMapReduceorg.apache.hadoop.mapreduce.lib.*inputMapReduce
InputFormatMapReduce JobInputFormatHadoop MapReduce
InputSplitMapper
RecordReaderMapperInputSplit
org.apache.hadoop.mapreduce.lib.inputHadoopInputFormatFileInputFormatFileInputFormat
mapred.input.pathFilter.classInputFormat
mapred.min.split.size
mapred.max.split.size
mapred.input.dir
protected List listStatus(Configuration job)jobConfiguration public List getSplits(JobContext context)InputSplit/FileInputFormatInputFormatcreateRecordReader
FileInputFormatSequenceFileInputFormatHadoop/http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.htmlSequenceFileInputFormatlistStatuscreateRecordReaderSequenceFileRecordReaderTextInputFormatcreateRecordReaderLineRecordReaderFileInputFormatgetSplitsRecordReaderFileInputFormat
FileInputFormatgetSplitsFileSplit
RecordReaderRecordReader
initializeReaderInputSplitJobcontext
nextKeyKey
nextValueKeyValuenextKey
getProgress
closejava.ioCloseableRecordReader
LineRecordReaderRecordReaderFileInputFormatSplitLineRecordReaderinitializeLineReaderorg.apache.hadoop.utilSplit0SplitnextKeyKeynextValuegetProgressclose2009-02-25
Hadoopmapreduce.lib.mapHadoopMapReduceMapMapperMapperMapperMapReduceMapper
ContextMapContextMappersetupmapcleanuprunsetupcleanupMappersetupMappermapcleanupmapmapkey/valuemaprunsetupkey/valuemapcleanuporg.apache.hadoop.mapreduce.lib.mapMapperInverseMapper mapMultithreadedMappermapTokenCounterMappervaluetokenMultithreadedMapperMapperMultithreadedMapperMappermapmapred.map.multithreadedrunner.threadsMappermapred.map.multithreadedrunner.classMultithreadedMapperMapperrunNMapRunnermapred.map.multithreadedrunner.classMapperrunMappersetupcleanupMapperInputSplitInputSplitMultithreadedMapperSubMapRecordReaderSubMapRecordWriterSubMapStatusReporterRecordReaderRecordWriterStatusReporterMultithreadedMapperMapper.ContextInputSplitMapperContext2009-02-26
Hadoopmapreduce.lib.partition/reduce/outputMappartitionReducerReducerReduceOutputFormat
MapperCombinerCombinerReducerCombinerMapperReducerkey/ReducerkeyReducerPartitionerMapReducerReducerPartitionerHashPartitionerkeyHashReducerReducerReducerReducerMappersetupreducecleanuprunsetupcleanupMapperreduceMapperkeykeyvalueReducerReducerIntSumReducerLongSumReducer/valueReduceReducer.ContextcollectHadoopOutputFormatOutputFormatRecordWriterOutputCommitterRecordWriterwritecloseOutputCommitterOutputFormatTaskInputOutputContextTaskInputOutputContextOutputFormatReducerOutputFormatRecordWriterInputFormatRecordReaderNullOutputFormatNullOutputFormat.RecordWriterLazyOutputFormatFilterOutputFormatFileOutputFormatSequenceFileOutputFormatTextOutputFormatFileOutputFormatmapred.output.compressmapred.output.compression.codecmapred.output.dirmapred.work.output.dirFileOutputFormatFileOutputCommitterFileOutputCommitterJobTaskFileOutputCommittersetupJob_temporarycleanupJobSequenceFileOutputFormatTextOutputFormatSequenceFileInputFormatTextInputFormat2009-03-06
Hadoophadoop.mapredMapReduceorg.apache.hadoop.mapreduceHadoop MapReduceAPIMapReduceMapReduceMapReduceHADOOP-1230org.apache.hadoop.mapredmapredMapReduce@Deprecated/
mapreduceorg.apache.hadoop.mapredMapReduce APIContextmapreducecontextMap output.collect(key, result); // outputs type is: OutputCollector
context.write(key, result); // outputs type is: ContextOutputCollectorMapper.ContextmapOutputCollectorAPIAPIMapReduceFileOutputFormat
2009-03-10
Hadoop*IDs*ContextHadoop MapReduceHadoopJobJobTrackerJobTaskTaskTrackerTaskMapTaskReduceTaskMapReduceMapReduceHDFSNameNodeDataNodeNameNodeJobTrackerDataNodeTaskTrackerJobTrackerTaskTrackerMapReduceRPCHDFSIDID
Hadooporg.apache.hadoop.mapredorg.apache.hadoop.mapreduce@DeprecatedIDWritableComparableHadoopio/compareTo/readFields/writeJobIDtoStringjob__job_200707121733_0003jobtracker 200707121733jobtrackerID3TaskIDIDIDMapisMaptask___[m|r]_task_200707121733_0003_m_000005200707121733_0003000005Map/StragglersTaskAttemptIDattempt_200707121733_0003_m_000005_0task_200707121733_0003_m_0000050JVMIdJavaJobTaskHadoopJobTask
org.apache.hadoop.mapreduce.JobContextJobJobIDJobConfJobContextJobIDJobConf mapreduce.inputformat.classInputFormat mapreduce.map.classMapper mapreduce.combine.class: Reducer mapreduce.reduce.classReducer mapreduce.outputformat.class: OutputFormat mapreduce.partitioner.class: PartitionerJavaClass.forNameClassorg.apache.hadoop.mapredJobContextorg.apache.hadoop.mapreduce.JobContextprogressJobConfjobmapreduce.JobContextJobConfConfigurationMapReduce46mapreducemapreduce.map.class mapred.mapper.classorg.apache.hadoop.mapreduce.JobContextJobTaskAttemptContextTaskAttemptIDstatusorg.apache.hadoop.mapredTaskAttemptContextmapreduceprogressTaskInputOutputContextorg.apache.hadoop.mapreduce2009-03-10
HadoopTaskStatusTaskTaskMapTaskReduceTaskMapReduce
TaskIDJobIDTaskIDTaskStatusTaskTaskStatusTaskMapReduceMapReduce6TaskStatus.Phase
STARTING
MAPMap
SHUFFLE
SORT
REDUCEReduce
CLEANUP
TaskStatusTaskStatusJobTracker
COMMIT_PENDINGSUCCESSEDFAILEDTaskStatus
taskidprogressrunStatediagnosticInfostateStringtaskTrackertaskTrackerstartTimefinishTimeoutputSizephasecountersincludeCounterscountersnextRecordRangeTaskStatusMapTaskStatusReduceTaskStatusReduceTaskStatusReduceshuffleFinishTimeshufflesortFinishTimesortMapMapTaskAttemptIDfailedFetchTasksCounters/Counters.Counter/Counters.GroupMapReduceCounters.CounterCounters.GroupCounters.Counter2009-05-24
HadoopIFileMapperReducerIFileMapperMapperIFileIFilekey-lenvalue-len
IFile
IFIleInputStreamIFIleOutputStream/IFile.Reader/IFile.WriterIFile.InMemoryReaderIFile
Serialization/Deserializercodeorg.apache.hadoop.io.serializerSerializerSerializerserializeopenDeserializerdeserializehadoop.io.serializerSerializationSerializationFactoryWritableSerializationJavaSerializationWritebleJava
Serializer/DeserializerIFile.WriterWriter
public Writer(Configuration conf, FSDataOutputStream out, Class keyClass, Class valueClass, CompressionCodec codec, Counters.Counter writesCounter)confoutWriterkeyClass valueClass KayValueclasscodecwritesCounterCounters.CounteroutrawOutoutKayValueSerializer
Writerappendwrite
public void append(K key, V value) throws IOException {public void append(DataInputBuffer key, DataInputBuffer value)append(K key, V value)keyvalueDataOutputBuffer2DataOutputBufferDataOutputBufferappend(DataInputBuffer key, DataInputBuffer value)
close2EOF_MARKER
IFileOutputStreamWriterIFilesWriterIFileOutputStreamwriteIFileOutputStreamclose
Reader
2009-05-24
HadoopTaskTaskTask
MapOutputFileMappergetMapper
MapOutputFileJobIDjob_200707121733_0003TaskIDtask_200707121733_0003_m_000005MapOutputFile{mapred.local.dir}/taskTracker/jobcache/{jobid}/{taskid}/output
{MapOutputFileRoot}
JogIDTaskID{mapred.local.dir}/taskTracker/jobcache/job_200707121733_0003/task_200707121733_0003_m_000005/output
{mapred.local.dir}HadoopMapOutputFileForWriteForWriteForWriteForWrite
getOutputFile{MapOutputFileRoot}/file.out
getOutputIndexFile{MapOutputFileRoot}/file.out.index
getSpillFile{MapOutputFileRoot}/spill{spillNumber}.out
getSpillIndexFile{MapOutputFileRoot}/spill{spillNumber}.out.index
TaskMapTask
getInputFile{MapOutputFileRoot}/map_{mapId}.out
ReduceTask
Task.CombineOutputCollectororg.apache.hadoop.mapred.OutputCollectorOutputCollectorIFile.WriterAdapterIFile.Writer
ValuesIteratorRawKeyValueIteratorKeyValueDataInputBufferValuesIteratorRawComparator comparatorTaskCombineValuesIterator
Task.TaskReporterJobTrackerReporterStatusReporterTaskReporterTaskUmbilicalProtocolHadoopRPCJobTrackerTask
FileSystemStatisticUpdater/
2009-05-25
HadoopTaskTaskTaskMapTaskReduceTaskMapReduce
jobFiletaskIdIDIDpartitionJobIDtaskStatusjobCleanupjobSetuptaskCleanup
TaskskipRangesskippingwriteSkipRecs
currentRecStartIndexcurrentRecIndexIterator
confJobConfMapOutputFilelDirAllocLocalDirAllocatorjobContexttaskContextJobTaskcommitterTask
outputFormat
/spilledRecordsCountertaskProgresscounters
TaskTask3
public abstract void run(JobConf job, TaskUmbilicalProtocol umbilical) throws IOException, ClassNotFoundException, InterruptedException;Task
public abstract TaskRunner createRunner(TaskTracker tracker, TaskTracker.TaskInProgress tip) throws IOException;TaskRunner
public abstract boolean isMapTask();Map3MapTaskReduceTask
initializeTasksettergetterWritablewritereadFields
localizeConfigurationTaskJobConfHadoop MapReduce
Task
public void done(TaskUmbilicalProtocol umbilical, TaskReporter reporter ) done
updateCounters()
TaksCOMMIT_PENDINGTaskUmbilicalProtocolTaskcommit
Reporter
sendLastUpdate
TaskUmbilicalProtocolsendDone
commitdoneTaskTrackerTaskTaskTrackerTaskTaskTrackercommitorg.apache.hadoop.mapreduce.OutputCommittercommitTaskcommit
runJobCleanupTaskrunJobSetupTaskrunTaskCleanupTaskMaptaskReduceTaskrun
runJobSetupTaskJoborg.apache.hadoop.mapreduce.OutputCommittersetupJobdoneTaskTracker
runJobCleanupTaskJobTaskTrackerorg.apache.hadoop.mapreduce.OutputCommitterdoneTaskTracker
runTaskCleanupTaskTaskrunJobCleanupTask
org.apache.hadoop.mapreduce.OutputCommitter
2009-05-29
HadoopMapTaskTaskMapTaskReduceTaskMapTask
MapTaskMapTaskMapTasksplitsplitClassMapsplitorg.apache.hadoop.mapred.InputSplitorg.apache.hadoop.mapreduce.InputSplitInputSplitAPIsplitClassInputSplitJavaInputSplitsplitBytesWritableInputSplitInputSplitreadFieldsInputSplit
MapTaskrunrunTaskReporterrunJobCleanupTaskrunJobSetupTaskrunTaskCleanupTaskMapperMapReduceAPIMapTaskAPIMapTaskMapperrunNewMapperrunOldMapperrun*MapperMapTaskdone
runOldMapperMapperInputSplitTaskMapperRecordReaderrawInTrackedRecordReaderSkippingRecordReaderin
MapMapReduceMapper
MapperMapOutputCollectorReducerDirectMapOutputCollectorMapOutputBuffer
MapperMapRunnableMapperMapRunnableMapRunnerMultithreadedMapRunner
APIAPIMapRunnableAPIMapperrunMapperMapRunnerMapTask
MapRunnable runner = ReflectionUtils.newInstance(job.getMapRunnerClass(), job);MapRunnerconfigurenewInstanceconfigureMapper
MapRunnerrunkeyvalueInputSplitMappermapMapperkeyvaluerunMappermapkeyvaluekeyvalueclone
APIAPIMultithreadedMapRunnerJavaMapperMapper
runNewMapperMapperrunOldMapper
2009-06-03
HadoopMapTask IMapTaskMapperMapTaskMapper
MapTask.TrackedRecordReaderWrapperRecordReader
MapTask.SkippingRecordReaderWrapperMapTask.TrackedRecordReaderMapTask.SkippingRecordReaderSortedRanges
SortedRanges.Ranges0getEndIndexSortedRangesaddremoveSkipRangeIteratorSortedRangesRanges
MapTask.SkippingRecordReaderSortedRanges.RangesnextSortedRanges.Ranges
NewTrackingRecordReaderNewOutputCollectorAPI
MapTaskMapOutputCollectorOutputCollectorcloseflush
DirectMapOutputCollectorReducer0Reduce
out = job.getOutputFormat().getRecordWriter(fs, job, finalName, reporter);RecordWritercollectRecordWriter
MapperreduceMapOutputBuffer1k
MapperOutputCollectorMapHadoopcircle buffer Mapper, io.sort.mb * percentspilldiskkvindicesReducekeyvaluekvindiceskvoffetskvindicespartitionkvindicesspill{spill}.out.indexspill{spill}.out
MapperspillMapperspill.outspill.out.index
partitionMapperReducerspill.out
Hadoop Map Stage http://www.cnblogs.com/OnlyXP/archive/2009/05/25/1488811.html
2009-06-04
HadoopMapTaskIIMapperMapOutputBuffer
jobreporterMapOutputBufferlocalFsrfsReducerPartitioner
SpillRecordspill.out{spill}.indexIndexRecord
IndexRecord3startOffsetrawLengthpartLengthSpillRecordIndexRecord
kvbufferkvindiceskvoffsets
3io.sort.spill.percentkvbufferkvindiceskvoffsetsM100100MMapOutputBufferio.sort.record.percentkvindiceskvoffsets0.05kvindiceskvoffsetsN4N*4bytesio.sort.record.percentkvindiceskvoffsetsio.sort.spill.percentkvindiceskvoffsetsspillsoftBufferLimitsoftRecordLimit
SerializerCombinercombiner
spillThreadThreadspill-MapTaskcollectspillThreadspillLockReentrantLockspillLockspillDonespillReady
MapOutputBuffer.collect
Mapper
spillLock.lock()
spillspillReady.signal()spillThreadspillspillDone.await()
spillLock.unlock()
keyvaluekvindiceskvoffsetscollectsynchronizedkeyvalue
kvstartkvendkvindexspillspillkvstartkvindexkvendkvstart==kvendspillkvindexspillkvstartkvstart==kvendkvstartkvendspillkvstart==kvendkvstart==kvend
kvstartkvendkvindex0
spill
kvindexkvnext
spill
kvnextkvend==kvstartspillkvindexkvendkvendkvstartkvstartkvendkvstart