hadoop重寫方法有哪些

146次閱讀

共計 5544 個字符，預計需要花費 14 分鐘才能閱讀完成。

這篇文章主要介紹“hadoop 重寫方法有哪些”，在日常操作中，相信很多人在 hadoop 重寫方法有哪些問題上存在疑惑，丸趣 TV 小編查閱了各式資料，整理出簡單好用的操作方法，希望對大家解答”hadoop 重寫方法有哪些”的疑惑有所幫助！接下來，請跟著丸趣 TV 小編一起來學習吧！

1. 下載（略）

2. 編譯（略）

3. 配置（偽分布、集群略）

4. Hdfs

1. Web interface:http://namenode-name:50070/(顯示 datanode 列表和集群統計信息)

2. shell command dfsadmin comman

3. checkpoint node backup node

1. fsimage 和 edits 文件 merge 原理

2. （猜測是早期版本的特性）手動恢復宕掉的集群：import checkpoint；

3. backupnode: Backup Node 在內存中維護了一份從 Namenode 同步過來的 fsimage，同時它還從 namenode 接收 edits 文件的日志流，并把它們持久化硬盤，Backup Node 把收到的這些 edits 文件和內存中的 fsimage 文件進行合并，創建一份元數據備份。Backup Node 高效的秘密就在這兒，它不需要從 Namenode 下載 fsimage 和 edit，把內存中的元數據持久化到磁盤然后進行合并即可。

4. banlancer: 平衡各 rock 和 datanodes 數據不均衡

5. Rock awareness：機架感知

6. Safemode：當數據文件不完整或者手動進入 safemode 時，hdfs 只讀，當集群檢查達到閾值或手動離開安全模式時，集群恢復讀寫。

7. Fsck：塊文件檢查命令

8. Fetchdt: 獲取 token（安全）

9. Recovery mode: 恢復模式

10. Upgrade and Rollback：升級、回滾

11. File Permissions and Security

12. Scalability

13.

5. Mapreduce

1.

public class MyMapper extends Mapper Object, Text, Text, IntWritable {

private Text word = new Text();

private IntWritable one = new IntWritable(1);

// 重寫 map 方法

@Override

public void map(Object key, Text value, Context context)

throws IOException, InterruptedException {

StringTokenizer stringTokenizer = new StringTokenizer(value.toString());

while(stringTokenizer.hasMoreTokens()){

word.set(stringTokenizer.nextToken());

//（word,1）進行傳遞

context.write(word, one);

}

public class MyReducer extends Reducer Text, IntWritable, Text, IntWritable {

private IntWritable result = new IntWritable(0);

// 重寫 reduce 方法

@Override

protected void reduce(Text key, Iterable IntWritable iterator,

Context context) throws IOException, InterruptedException {

int sum = 0;

for(IntWritable i : iterator){

sum += i.get();

}

result.set(sum);

// reduce 輸出的值

context.write(key, result);

}

public class WordCountDemo {

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, word count

job.setJarByClass(WordCountDemo.class);

// 設置 map、reduce class

job.setMapperClass(MyMapper.class);

job.setReducerClass(MyReducer.class);

job.setCombinerClass(MyReducer.class);

// 設置最終輸出的格式

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

// 設置 FileInputFormat outputFormat

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

2. Job.setGroupingComparatorClass(Class).

3. Job.setCombinerClass(Class),

4. CompressionCodec

5. Map 數：Configuration.set(MRJobConfig.NUM_MAPS, int) = dataSize/blockSize

6. Reducer 數：Job.setNumReduceTasks(int).

With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

7. Reduce- shuffle: Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. – reduce 是 mapper 排序后的輸出的結果。在這一階段，框架通過 http 抓取所有 mapper 輸出的有關分區。

8. Reduce – sort：The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.- 在這一階段，框架按照輸入的 key（不同的 mapper 可能輸出相同的 key）分組 reducer。Shuffle 和 sort 會同時發生，當 map 輸出被捕捉時，他們又會進行合并。

9. Reduce – reduce：

10. Secondary sort

11. Partitioner

12. Counter ：Mapper and Reducer implementations can use the Counter to report statistics.

13. Job conf：配置 – speculative manner ( setMapSpeculativeExecution(boolean))/setReduceSpeculativeExecution(boolean)), maximum number of attempts per task (setMaxMapAttempts(int)/ setMaxReduceAttempts(int)) etc.

Or

Configuration.set(String, String)/ Configuration.get(String)

14. Task executor environment – The user can specify additional options to the child-jvm via the mapreduce.{map|reduce}.java.opts and configuration parameter in the Job such as non-standard paths for the run-time linker to search shared libraries via -Djava.library.path= etc. If the mapreduce.{map|reduce}.java.opts parameters contains the symbol @taskid@ it is interpolated with value of taskid of the MapReduce task.

15. Memory management – Users/admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, using mapreduce.{map|reduce}.memory.mb. Note that the value set here is a per process limit. The value for mapreduce.{map|reduce}.memory.mb should be specified in mega bytes (MB). And also the value must be greater than or equal to the -Xmx passed to JavaVM, else the VM might not start.

16. Map Parameters …… (http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#MapReduce_Tutorial)

17. Parameters ()

18. Job submission and monitoring:

1.Job provides facilities to submit jobs, track their progress, access component-tasks reports and logs, get the MapReduce cluster s status information and so on.

2. The job submission process involves:

1. Checking the input and output specifications of the job.

2. Computing the InputSplit values for the job.

3. Setting up the requisite accounting information for the DistributedCache of the job, if necessary.

4. Copying the job s jar and configuration to the MapReduce system directory on the FileSystem.

5. Submitting the job to the ResourceManager and optionally monitoring it s status.

3. Job history

19. Job controller

1. Job.submit() || Job.waitForCompletion(boolean)

2. 多 Mapreduce job

1. 迭代式 mapreduce（上一個 mr 作為下一個 mr 的輸入，缺點：創建 job 對象的開銷、本地磁盤讀寫 io 和網絡開銷大）

2. MapReduce-JobControl：job 封裝各個 job 的依賴關系，jobcontrol 線程管理各個作業的狀態。

3. MapReduce-ChainMapper/ChainReduce：（chainMapper.addMap(). 可以在一個 job 中鏈接多個 mapper 任務，不可用于多 reduce 的 job）。

20. Job input output

1. InputFormat TextInputFormat FileInputFormat

2. InputSplit FileSplit

3. RecordReader

4. OutputFormat OutputCommitter

到此，關于“hadoop 重寫方法有哪些”的學習就結束了，希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習，快去試試吧！若想繼續學習更多相關知識，請繼續關注丸趣 TV 網站，丸趣 TV 小編會繼續努力為大家帶來更多實用的文章！

正文完

a ab ac ace ack

發表至：計算機運維

2023-08-16

轉載說明：除特殊說明外本站除技術相關以外文章皆由網絡搜集發布，轉載請注明出處。

MapReduce Map Join怎么使用

Spark提供了哪些RDD

win101903版本更新失敗01900223怎么解決

win11預覽計劃如何退出

IBatchSpout API怎么使用

久久精品人人爽,华人av在线,亚洲性视频网站,欧美专区一二三

hadoop重寫方法有哪些