根据mahout in action中的14.6章节做的测试,记录如下:
1:将20news-bydate-train和20news-bydata-test中的每个目录中的数据转换为以目录名称开始的包含所有单词的简单文本文件,使用的mahout命令如下:
mahout prepare20newsgroups -p 20news-bydate-train/ -o 20news-train/ -a org.apache.lucene.analysis.atandard.StandardAnalyzer -c UTF-8
mahout prepare20newsgroups -p 20news-bydate-test/ -o 20news-test/ -a org.apache.lucene.analysis.standard.StandardAnalyzer -c UTF-8
2:启动集群
start-all.sh
3:将第一步中生成的20news-train目录拷贝到hdfs中。
hadoop fs -put 20news-train /user/root/
4:通过naive Bayes算法训练样本生成20news-model,命令及运行过程如下:
mahout trainclassifier -i 20news-train -o 20news-model -type cbays -ng 1 -source hdfs
Running on hadoop, using HADOOP_HOME=/usr/Hadoop/hadoop-0.20.2
No HADOOP_CONF_DIR set, using /usr/Hadoop/hadoop-0.20.2/conf
13/06/19 09:52:11 INFO bayes.TrainClassifier: Training Bayes Classifier
13/06/19 09:52:12 INFO bayes.BayesDriver: Reading features...
13/06/19 09:52:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/06/19 09:52:12 INFO mapred.FileInputFormat: Total input paths to process : 20
13/06/19 09:52:13 INFO mapred.JobClient: Running job: job_201306190949_0001
13/06/19 09:52:14 INFO mapred.JobClient: map 0% reduce 0%
13/06/19 09:52:27 INFO mapred.JobClient: map 7% reduce 0%
13/06/19 09:52:30 INFO mapred.JobClient: map 9% reduce 0%
13/06/19 09:52:33 INFO mapred.JobClient: map 10% reduce 0%
13/06/19 09:52:42 INFO mapred.JobClient: map 18% reduce 3%
13/06/19 09:52:45 INFO mapred.JobClient: map 20% reduce 3%
13/06/19 09:52:54 INFO mapred.JobClient: map 24% reduce 3%
13/06/19 09:52:57 INFO mapred.JobClient: map 29% reduce 6%
13/06/19 09:53:00 INFO mapred.JobClient: map 30% reduce 6%
13/06/19 09:53:06 INFO mapred.JobClient: map 35% reduce 8%
13/06/19 09:53:09 INFO mapred.JobClient: map 39% reduce 8%
13/06/19 09:53:12 INFO mapred.JobClient: map 40% reduce 11%
13/06/19 09:53:15 INFO mapred.JobClient: map 44% reduce 11%
13/06/19 09:53:18 INFO mapred.JobClient: map 45% reduce 11%
13/06/19 09:53:21 INFO mapred.JobClient: map 49% reduce 13%
13/06/19 09:53:24 INFO mapred.JobClient: map 50% reduce 13%
13/06/19 09:53:27 INFO mapred.JobClient: map 55% reduce 15%
13/06/19 09:53:33 INFO mapred.JobClient: map 60% reduce 15%
13/06/19 09:53:36 INFO mapred.JobClient: map 60% reduce 16%
13/06/19 09:53:39 INFO mapred.JobClient: map 65% reduce 16%
13/06/19 09:53:42 INFO mapred.JobClient: map 70% reduce 20%
13/06/19 09:53:51 INFO mapred.JobClient: map 80% reduce 23%
13/06/19 09:53:57 INFO mapred.JobClient: map 80% reduce 25%
13/06/19 09:54:00 INFO mapred.JobClient: map 90% reduce 26%
13/06/19 09:54:09 INFO mapred.JobClient: map 100% reduce 26%
13/06/19 09:54:12 INFO mapred.JobClient: map 100% reduce 30%
13/06/19 09:54:18 INFO mapred.JobClient: map 100% reduce 33%
13/06/19 09:54:24 INFO mapred.JobClient: map 100% reduce 67%
13/06/19 09:54:30 INFO mapred.JobClient: map 100% reduce 100%
13/06/19 09:54:32 INFO mapred.JobClient: Job complete: job_201306190949_0001
13/06/19 09:54:32 INFO mapred.JobClient: Counters: 18
13/06/19 09:54:32 INFO mapred.JobClient: Job Counters
13/06/19 09:54:32 INFO mapred.JobClient: Launched reduce tasks=1
13/06/19 09:54:32 INFO mapred.JobClient: Launched map tasks=20
13/06/19 09:54:32 INFO mapred.JobClient: Data-local map tasks=20
13/06/19 09:54:32 INFO mapred.JobClient: FileSystemCounters
13/06/19 09:54:32 INFO mapred.JobClient: FILE_BYTES_READ=95754881
13/06/19 09:54:32 INFO mapred.JobClient: HDFS_BYTES_READ=16537368
13/06/19 09:54:32 INFO mapred.JobClient: FILE_BYTES_WRITTEN=148988140
13/06/19 09:54:32 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=36447002
13/06/19 09:54:32 INFO mapred.JobClient: Map-Reduce Framework
13/06/19 09:54:32 INFO mapred.JobClient: Reduce input groups=901416
13/06/19 09:54:32 INFO mapred.JobClient: Combine output records=1473727
13/06/19 09:54:32 INFO mapred.JobClient: Map input records=11314
13/06/19 09:54:32 INFO mapred.JobClient: Reduce shuffle bytes=51554535
13/06/19 09:54:32 INFO mapred.JobClient: Reduce output records=754846
13/06/19 09:54:32 INFO mapred.JobClient: Spilled Records=4131595
13/06/19 09:54:32 INFO mapred.JobClient: Map output bytes=205586582
13/06/19 09:54:32 INFO mapred.JobClient: Map input bytes=16537368
13/06/19 09:54:32 INFO mapred.JobClient: Combine input records=6337086
13/06/19 09:54:32 INFO mapred.JobClient: Map output records=6337086
13/06/19 09:54:32 INFO mapred.JobClient: Reduce input records=1473727
13/06/19 09:54:32 INFO bayes.BayesDriver: Calculating Tf-Idf...
13/06/19 09:54:32 INFO common.BayesTfIdfDriver: Counts of documents in Each Label
13/06/19 09:54:32 INFO common.BayesTfIdfDriver: {rec.motorcycles=598.0, comp.windows.x=593.0, talk.politics.guns=546.0, talk.politics.mideast=564.0, talk.religion.misc=377.0, rec.sport.baseball=597.0, rec.autos=594.0, rec.sport.hockey=600.0, comp.sys.mac.hardware=578.0, comp.sys.ibm.pc.hardware=590.0, sci.space=593.0, talk.politics.misc=465.0, sci.electronics=591.0, comp.graphics=584.0, sci.crypt=595.0, sci.med=594.0, soc.religion.christian=599.0, alt.atheism=480.0, misc.forsale=585.0, comp.os.ms-windows.misc=591.0}
13/06/19 09:54:32 INFO common.BayesTfIdfDriver: {dataSource=hdfs, alpha_i=1.0, minDf=1, gramSize=1}
13/06/19 09:54:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/06/19 09:54:32 INFO mapred.FileInputFormat: Total input paths to process : 3
13/06/19 09:54:33 INFO mapred.JobClient: Running job: job_201306190949_0002
13/06/19 09:54:34 INFO mapred.JobClient: map 0% reduce 0%
13/06/19 09:54:49 INFO mapred.JobClient: map 66% reduce 0%
13/06/19 09:55:01 INFO mapred.JobClient: map 100% reduce 0%
13/06/19 09:55:07 INFO mapred.JobClient: map 100% reduce 22%
13/06/19 09:55:13 INFO mapred.JobClient: map 100% reduce 100%
13/06/19 09:55:15 INFO mapred.JobClient: Job complete: job_201306190949_0002
13/06/19 09:55:15 INFO mapred.JobClient: Counters: 18
13/06/19 09:55:15 INFO mapred.JobClient: Job Counters
13/06/19 09:55:15 INFO mapred.JobClient: Launched reduce tasks=1
13/06/19 09:55:15 INFO mapred.JobClient: Launched map tasks=3
13/06/19 09:55:15 INFO mapred.JobClient: Data-local map tasks=3
13/06/19 09:55:15 INFO mapred.JobClient: FileSystemCounters
13/06/19 09:55:15 INFO mapred.JobClient: FILE_BYTES_READ=54669917
13/06/19 09:55:15 INFO mapred.JobClient: HDFS_BYTES_READ=36446070
13/06/19 09:55:15 INFO mapred.JobClient: FILE_BYTES_WRITTEN=82004984
13/06/19 09:55:15 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=15645417
13/06/19 09:55:15 INFO mapred.JobClient: Map-Reduce Framework
13/06/19 09:55:15 INFO mapred.JobClient: Reduce input groups=304129
13/06/19 09:55:15 INFO mapred.JobClient: Combine output records=608257
13/06/19 09:55:15 INFO mapred.JobClient: Map input records=754826
13/06/19 09:55:15 INFO mapred.JobClient: Reduce shuffle bytes=27334946
13/06/19 09:55:15 INFO mapred.JobClient: Reduce output records=304129
13/06/19 09:55:15 INFO mapred.JobClient: Spilled Records=1824770
13/06/19 09:55:15 INFO mapred.JobClient: Map output bytes=28610110
13/06/19 09:55:15 INFO mapred.JobClient: Map input bytes=36445773
13/06/19 09:55:15 INFO mapred.JobClient: Combine input records=754826
13/06/19 09:55:15 INFO mapred.JobClient: Map output records=754826
13/06/19 09:55:15 INFO mapred.JobClient: Reduce input records=608257
13/06/19 09:55:15 INFO bayes.BayesDriver: Calculating weight sums for labels and features...
13/06/19 09:55:15 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/06/19 09:55:15 INFO mapred.FileInputFormat: Total input paths to process : 1
13/06/19 09:55:16 INFO mapred.JobClient: Running job: job_201306190949_0003
13/06/19 09:55:17 INFO mapred.JobClient: map 0% reduce 0%
13/06/19 09:55:31 INFO mapred.JobClient: map 100% reduce 0%
13/06/19 09:55:43 INFO mapred.JobClient: map 100% reduce 100%
13/06/19 09:55:45 INFO mapred.JobClient: Job complete: job_201306190949_0003
13/06/19 09:55:45 INFO mapred.JobClient: Counters: 18
13/06/19 09:55:45 INFO mapred.JobClient: Job Counters
13/06/19 09:55:45 INFO mapred.JobClient: Launched reduce tasks=1
13/06/19 09:55:45 INFO mapred.JobClient: Launched map tasks=2
13/06/19 09:55:45 INFO mapred.JobClient: Data-local map tasks=2
13/06/19 09:55:45 INFO mapred.JobClient: FileSystemCounters
13/06/19 09:55:45 INFO mapred.JobClient: FILE_BYTES_READ=11395006
13/06/19 09:55:45 INFO mapred.JobClient: HDFS_BYTES_READ=15646192
13/06/19 09:55:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=17092570
13/06/19 09:55:45 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=5156501
13/06/19 09:55:45 INFO mapred.JobClient: Map-Reduce Framework
13/06/19 09:55:45 INFO mapred.JobClient: Reduce input groups=146591
13/06/19 09:55:45 INFO mapred.JobClient: Combine output records=201494
13/06/19 09:55:45 INFO mapred.JobClient: Map input records=304128
13/06/19 09:55:45 INFO mapred.JobClient: Reduce shuffle bytes=5697500
13/06/19 09:55:45 INFO mapred.JobClient: Reduce output records=146591
13/06/19 09:55:45 INFO mapred.JobClient: Spilled Records=604482
13/06/19 09:55:45 INFO mapred.JobClient: Map output bytes=23703690
13/06/19 09:55:45 INFO mapred.JobClient: Map input bytes=15645194
13/06/19 09:55:45 INFO mapred.JobClient: Combine input records=912384
13/06/19 09:55:45 INFO mapred.JobClient: Map output records=912384
13/06/19 09:55:45 INFO mapred.JobClient: Reduce input records=201494
13/06/19 09:55:45 INFO bayes.BayesDriver: Calculating the weight Normalisation factor for each class...
13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: Sigma_k for Each Label
13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: {rec.motorcycles=10950.08247078713, comp.windows.x=9140.40229191363, talk.politics.guns=9717.884898541553, talk.politics.mideast=9774.792829912312, talk.religion.misc=6253.280625101324, rec.sport.baseball=9964.975295683822, rec.autos=10318.471983615944, rec.sport.hockey=9689.106187278217, comp.sys.mac.hardware=9294.329591214286, comp.sys.ibm.pc.hardware=9261.965098786126, sci.space=10877.81456432966, talk.politics.misc=8292.138753814019, sci.electronics=10382.850213940757, comp.graphics=9327.325741885199, sci.crypt=10401.387454343632, sci.med=10654.852600564873, soc.religion.christian=9581.585347264707, alt.atheism=7503.494393077384, misc.forsale=10119.779786780977, comp.os.ms-windows.misc=9063.881127401353}
13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: Sigma_kSigma_j for each Label and for each Features
13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: 190570.40125624838
13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: Vocabulary Count
13/06/19 09:55:45 INFO bayes.BayesThetaNormalizerDriver: 146570.0
13/06/19 09:55:45 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/06/19 09:55:46 INFO mapred.FileInputFormat: Total input paths to process : 1
13/06/19 09:55:46 INFO mapred.JobClient: Running job: job_201306190949_0004
13/06/19 09:55:47 INFO mapred.JobClient: map 0% reduce 0%
13/06/19 09:55:58 INFO mapred.JobClient: map 100% reduce 0%
13/06/19 09:56:10 INFO mapred.JobClient: map 100% reduce 100%
13/06/19 09:56:12 INFO mapred.JobClient: Job complete: job_201306190949_0004
13/06/19 09:56:12 INFO mapred.JobClient: Counters: 18
13/06/19 09:56:12 INFO mapred.JobClient: Job Counters
13/06/19 09:56:12 INFO mapred.JobClient: Launched reduce tasks=1
13/06/19 09:56:12 INFO mapred.JobClient: Launched map tasks=2
13/06/19 09:56:12 INFO mapred.JobClient: Data-local map tasks=2
13/06/19 09:56:12 INFO mapred.JobClient: FileSystemCounters
13/06/19 09:56:12 INFO mapred.JobClient: FILE_BYTES_READ=757
13/06/19 09:56:12 INFO mapred.JobClient: HDFS_BYTES_READ=15646192
13/06/19 09:56:12 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1584
13/06/19 09:56:12 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=932
13/06/19 09:56:12 INFO mapred.JobClient: Map-Reduce Framework
13/06/19 09:56:12 INFO mapred.JobClient: Reduce input groups=20
13/06/19 09:56:12 INFO mapred.JobClient: Combine output records=21
13/06/19 09:56:12 INFO mapred.JobClient: Map input records=304128
13/06/19 09:56:12 INFO mapred.JobClient: Reduce shuffle bytes=397
13/06/19 09:56:12 INFO mapred.JobClient: Reduce output records=20
13/06/19 09:56:12 INFO mapred.JobClient: Spilled Records=42
13/06/19 09:56:12 INFO mapred.JobClient: Map output bytes=10423028
13/06/19 09:56:12 INFO mapred.JobClient: Map input bytes=15645194
13/06/19 09:56:12 INFO mapred.JobClient: Combine input records=304128
13/06/19 09:56:12 INFO mapred.JobClient: Map output records=304128
13/06/19 09:56:12 INFO mapred.JobClient: Reduce input records=21
13/06/19 09:56:12 INFO common.HadoopUtil: Deleting 20news-model/trainer-docCount
13/06/19 09:56:12 INFO common.HadoopUtil: Deleting 20news-model/trainer-termDocCount
13/06/19 09:56:12 INFO common.HadoopUtil: Deleting 20news-model/trainer-featureCount
13/06/19 09:56:12 INFO common.HadoopUtil: Deleting 20news-model/trainer-wordFreq
13/06/19 09:56:12 INFO common.HadoopUtil: Deleting 20news-model/trainer-tfIdf/trainer-vocabCount
13/06/19 09:56:12 INFO driver.MahoutDriver: Program took 240921 ms
5:上述过程生成的20news-model是在hdfs上的,拷贝到本地文件系统中。
hadoop fs -get /user/root/20news-model /usr/Mahout/dataset/
6:测试模型,命令及运行过程如下:
mahout testclassifier -d 20news-test -m 20news-model -type cbayes -ng 1 -source hdfs -method sequential
Running on hadoop, using HADOOP_HOME=/usr/Hadoop/hadoop-0.20.2
No HADOOP_CONF_DIR set, using /usr/Hadoop/hadoop-0.20.2/conf
13/06/19 10:11:54 INFO bayes.TestClassifier: Loading model from: {basePath=20news-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, testDirPath=20news-test}
13/06/19 10:11:54 INFO bayes.TestClassifier: Testing Bayes Classifier
13/06/19 10:11:55 INFO io.SequenceFileModelReader: hdfs://localhost:9000/user/root/20news-model/trainer-weights/Sigma_j/part-00000
13/06/19 10:11:55 INFO io.SequenceFileModelReader: Read 50000 feature weights
13/06/19 10:11:55 INFO io.SequenceFileModelReader: Read 100000 feature weights
13/06/19 10:11:55 INFO io.SequenceFileModelReader: hdfs://localhost:9000/user/root/20news-model/trainer-weights/Sigma_k/part-00000
13/06/19 10:11:55 INFO io.SequenceFileModelReader: hdfs://localhost:9000/user/root/20news-model/trainer-weights/Sigma_kSigma_j/part-00000
13/06/19 10:11:55 INFO io.SequenceFileModelReader: 190570.40125624838
13/06/19 10:11:55 INFO io.SequenceFileModelReader: hdfs://localhost:9000/user/root/20news-model/trainer-thetaNormalizer/part-00000
13/06/19 10:11:55 INFO io.SequenceFileModelReader: hdfs://localhost:9000/user/root/20news-model/trainer-tfIdf/trainer-tfIdf/part-00000
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: rec.sport.baseball -127395.14399316712 547567.2698760114 -0.23265660860630674
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: sci.crypt -189010.62350617294 547567.2698760114 -0.3451824714595736
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: rec.sport.hockey -166203.2548335905 547567.2698760114 -0.3035302947731423
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: talk.politics.guns -198793.14260997035 547567.2698760114 -0.3630478911841903
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: soc.religion.christian -158106.48187003663 547567.2698760114 -0.2887434851718539
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: sci.electronics -138650.82033374818 547567.2698760114 -0.25321239592195427
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: comp.os.ms-windows.misc -547567.2698760114 547567.2698760114 -1.0
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: misc.forsale -141981.48005545404 547567.2698760114 -0.2592950453148956
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: talk.religion.misc -134885.60852883724 547567.2698760114 -0.2463361416020722
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: alt.atheism -134262.4272892253 547567.2698760114 -0.24519805086163582
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: comp.windows.x -172513.19965389522 547567.2698760114 -0.3150538922696353
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: talk.politics.mideast -189368.63272082788 547567.2698760114 -0.3458362892356726
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: comp.sys.ibm.pc.hardware -134535.56471897085 547567.2698760114 -0.24569687072317975
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: comp.sys.mac.hardware -121323.62827571077 547567.2698760114 -0.22156844455510047
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: sci.space -189203.04544769705 547567.2698760114 -0.3455338838834164
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: rec.motorcycles -138625.26282429774 547567.2698760114 -0.2531657212741868
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: rec.autos -136935.18434679657 547567.2698760114 -0.25007919917821886
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: comp.graphics -161979.38306986375 547567.2698760114 -0.29581640828631267
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: talk.politics.misc -159579.70032298338 547567.2698760114 -0.29143396455949216
13/06/19 10:11:57 INFO datastore.InMemoryBayesDatastore: sci.med -183835.5334355675 547567.2698760114 -0.3357314133790253
13/06/19 10:11:58 INFO bayes.TestClassifier: Classified instances from talk.politics.mideast.txt
13/06/19 10:11:58 INFO bayes.TestClassifier: Classified instances from comp.sys.mac.hardware.txt
13/06/19 10:11:58 INFO bayes.TestClassifier: Classified instances from rec.sport.baseball.txt
13/06/19 10:11:59 INFO bayes.TestClassifier: Classified instances from misc.forsale.txt
13/06/19 10:11:59 INFO bayes.TestClassifier: Classified instances from talk.religion.misc.txt
13/06/19 10:11:59 INFO bayes.TestClassifier: Classified instances from rec.motorcycles.txt
13/06/19 10:12:00 INFO bayes.TestClassifier: Classified instances from sci.electronics.txt
13/06/19 10:12:00 INFO bayes.TestClassifier: Classified instances from sci.space.txt
13/06/19 10:12:01 INFO bayes.TestClassifier: Classified instances from talk.politics.guns.txt
13/06/19 10:12:01 INFO bayes.TestClassifier: Classified instances from rec.sport.hockey.txt
13/06/19 10:12:02 INFO bayes.TestClassifier: Classified instances from alt.atheism.txt
13/06/19 10:12:02 INFO bayes.TestClassifier: Classified instances from comp.graphics.txt
13/06/19 10:12:03 INFO bayes.TestClassifier: Classified instances from comp.sys.ibm.pc.hardware.txt
13/06/19 10:12:03 INFO bayes.TestClassifier: Classified instances from comp.windows.x.txt
13/06/19 10:12:04 INFO bayes.TestClassifier: Classified instances from talk.politics.misc.txt
13/06/19 10:12:04 INFO bayes.TestClassifier: Classified instances from rec.autos.txt
13/06/19 10:12:05 INFO bayes.TestClassifier: Classified instances from sci.crypt.txt
13/06/19 10:12:05 INFO bayes.TestClassifier: Classified instances from sci.med.txt
13/06/19 10:12:06 INFO bayes.TestClassifier: Classified instances from comp.os.ms-windows.misc.txt
13/06/19 10:12:07 INFO bayes.TestClassifier: Classified instances from soc.religion.christian.txt
13/06/19 10:12:07 INFO bayes.TestClassifier: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 5997 79.6203%
Incorrectly Classified Instances : 1535 20.3797%
Total Classified Instances : 7532
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classified as
385 0 7 0 0 0 0 3 0 0 0 0 0 1 0 0 1 0 0 0 | 397 a &#61; rec.sport.baseball
3 372 1 1 0 3 1 1 0 0 1 0 0 3 0 0 1 7 0 2 | 396 b &#61; sci.crypt
7 2 384 0 1 2 0 1 0 0 0 0 0 0 0 2 0 0 0 0 | 399 c &#61; rec.sport.hockey
3 12 0 327 1 4 1 3 0 0 0 1 0 0 1 5 2 1 0 3 | 364 d &#61; talk.politics.guns
5 0 1 0 368 2 2 2 0 2 0 0 1 0 2 3 0 3 0 7 | 398 e &#61; soc.religion.christian
1 14 0 0 0 321 7 6 0 0 0 0 11 5 3 7 4 11 0 3 | 393 f &#61; sci.electronics
4 9 0 0 0 2 258 4 0 0 4 0 51 7 5 7 1 41 0 1 | 394 g &#61; comp.os.ms-windows.misc
1 0 1 0 0 5 2 343 0 0 0 0 11 8 1 6 9 2 0 1 | 390 h &#61; misc.forsale
9 9 2 33 102 0 0 0 16 24 0 5 1 3 13 11 6 5 0 12 | 251 i &#61; talk.religion.misc
4 13 2 10 85 7 1 0 1 134 0 5 2 1 13 18 4 2 0 17 | 319 j &#61; alt.atheism
1 5 0 0 0 4 11 4 0 0 287 0 11 6 2 3 0 60 0 1 | 395 k &#61; comp.windows.x
5 7 0 3 11 1 0 2 0 0 0 337 0 0 0 5 2 1 0 2 | 376 l &#61; talk.politics.mideast
0 1 0 0 0 24 20 12 0 0 1 0 292 29 1 2 0 10 0 0 | 392 m &#61; comp.sys.ibm.pc.hardware
3 1 0 0 0 14 7 10 0 0 0 0 6 329 4 2 3 6 0 0 | 385 n &#61; comp.sys.mac.hardware
1 2 0 1 1 4 0 1 0 0 1 0 0 0 370 0 0 9 0 4 | 394 o &#61; sci.space
1 0 0 0 0 2 0 3 0 0 0 0 1 0 0 384 6 0 0 1 | 398 p &#61; rec.motorcycles
1 0 2 0 0 6 0 11 0 0 0 0 2 0 1 8 364 1 0 0 | 396 q &#61; rec.autos
5 10 0 0 0 10 7 6 0 0 14 1 11 11 8 1 2 301 0 2 | 389 r &#61; comp.graphics
8 32 1 109 9 2 0 1 0 0 0 1 0 3 19 20 5 1 87 12 | 310 s &#61; talk.politics.misc
3 0 2 1 4 13 0 7 0 0 0 0 1 1 5 10 3 8 0 338 | 396 t &#61; sci.med
Default Category: unknown: 20