当前位置: 开发笔记 > 后端 > 正文

tensorflowtutorials（八）：手写数字数据集MNIST介绍

作者：O依楼观雪O | 来源：互联网 | 2023-09-02 18:54

在做机器学习相关实验的时候，首先我们就是需要一份通用的数据集，以便与其他的算法得到的实验结果进行比较。在图像分类领域MNIST数据集就是这样一个通用的数据集，前面几篇博文都用到了MNIST数据集，本文对其进行一些简单的介绍！

MNIST

In [1]:

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data

%matplotlib inline  
print ("packs loaded")

packs loaded

Download and Extract MNIST dataset

In [2]:

print ("Download and Extract MNIST dataset")
mnist = input_data.read_data_sets('/tmp/data/', one_hot=True)
print
print (" tpye of 'mnist' is %s" % (type(mnist)))
print (" number of trian data is %d" % (mnist.train.num_examples))
print (" number of test data is %d" % (mnist.test.num_examples))

Download and Extract MNIST dataset
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz

 tpye of 'mnist' is 
 number of trian data is 55000
 number of test data is 10000

In [3]:

# What does the data of MNIST look like? 
print ("What does the data of MNIST look like?")
trainimg   = mnist.train.images
trainlabel = mnist.train.labels
testimg    = mnist.test.images
testlabel  = mnist.test.labels
print
print (" type of 'trainimg' is %s"    % (type(trainimg)))
print (" type of 'trainlabel' is %s"  % (type(trainlabel)))
print (" type of 'testimg' is %s"     % (type(testimg)))
print (" type of 'testlabel' is %s"   % (type(testlabel)))
print (" shape of 'trainimg' is %s"   % (trainimg.shape,))
print (" shape of 'trainlabel' is %s" % (trainlabel.shape,))
print (" shape of 'testimg' is %s"    % (testimg.shape,))
print (" shape of 'testlabel' is %s"  % (testlabel.shape,))

What does the data of MNIST look like?

 type of 'trainimg' is 
 type of 'trainlabel' is 
 type of 'testimg' is 
 type of 'testlabel' is 
 shape of 'trainimg' is (55000, 784)
 shape of 'trainlabel' is (55000, 10)
 shape of 'testimg' is (10000, 784)
 shape of 'testlabel' is (10000, 10)

In [4]:

# How does the training data look like?
print ("How does the training data look like?")
nsample = 5
randidx = np.random.randint(trainimg.shape[0], size=nsample)

for i in randidx:
    curr_img   = np.reshape(trainimg[i, :], (28, 28)) # 28 by 28 matrix 
    curr_label = np.argmax(trainlabel[i, :] ) # Label
    plt.matshow(curr_img, cmap=plt.get_cmap('gray'))
    plt.title("" + str(i) + "th Training Data " 
              + "Label is " + str(curr_label))
    print ("" + str(i) + "th Training Data " 
           + "Label is " + str(curr_label))

How does the training data look like?
12118th Training Data Label is 5
46324th Training Data Label is 8
33th Training Data Label is 4
36491th Training Data Label is 3
6910th Training Data Label is 3

In [5]:

# Batch Learning? 
print ("Batch Learning? ")
batch_size = 100
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
print ("type of 'batch_xs' is %s" % (type(batch_xs)))
print ("type of 'batch_ys' is %s" % (type(batch_ys)))
print ("shape of 'batch_xs' is %s" % (batch_xs.shape,))
print ("shape of 'batch_ys' is %s" % (batch_ys.shape,))

Batch Learning? 
type of 'batch_xs' is 
type of 'batch_ys' is 
shape of 'batch_xs' is (100, 784)
shape of 'batch_ys' is (100, 10)

In [6]:

# Get Random Batch with 'np.random.randint'
print ("5. Get Random Batch with 'np.random.randint'")
randidx   = np.random.randint(trainimg.shape[0], size=batch_size)
batch_xs2 = trainimg[randidx, :]
batch_ys2 = trainlabel[randidx, :]
print ("type of 'batch_xs2' is %s" % (type(batch_xs2)))
print ("type of 'batch_ys2' is %s" % (type(batch_ys2)))
print ("shape of 'batch_xs2' is %s" % (batch_xs2.shape,))
print ("shape of 'batch_ys2' is %s" % (batch_ys2.shape,))

5. Get Random Batch with 'np.random.randint'
type of 'batch_xs2' is 
type of 'batch_ys2' is 
shape of 'batch_xs2' is (100, 784)
shape of 'batch_ys2' is (100, 10)

In [7]:

randidx

Out[7]:

array([51472, 13751, 33562, 23281,  8489, 48481,  7799, 30307, 37366,
       25312, 46149, 49712,  5083, 52853, 29819, 36444, 34829,  8769,
       39518, 54911,  6720, 43675, 41703, 35594,  9300, 14474, 33318,
       14808, 53456, 41978,  8047, 34524, 30978, 53455, 42119, 22660,
       30329, 27169, 53798,  2125, 41759, 38951,  1438, 33511, 38784,
       15822, 16785,  9229,  1216, 19569,  3116, 22172, 14766, 16153,
        1707, 20899,  9087, 21263, 24853, 27784, 38324, 29287, 21828,
       34511, 26340, 39194, 38272, 34238, 28050, 29294, 42672, 18696,
       17796, 48147, 41841, 47077,  5925, 48237, 30605,  9169, 11260,
        9155, 39346, 41049, 11342,   536,  5927, 11155, 40424, 33583,
       38991, 16569, 34801,   870, 20546, 25061, 17601,  4521, 24359,  4613])

推荐阅读

api
优化深度神经网络在低性能硬件上的运行

尽管深度学习带来了广泛的应用前景，其训练通常需要强大的计算资源。然而，并非所有开发者都能负担得起高性能服务器或专用硬件。本文探讨了如何在有限的硬件条件下（如ARM CPU）高效运行深度神经网络，特别是通过选择合适的工具和框架来加速模型推理。 ... [详细]

蜡笔小新 2024-12-24 08:48:32
api
帝国CMS多图上传插件详解及使用指南

本文介绍了一款用于帝国CMS的多图上传插件，该插件通过Flash技术实现批量图片上传功能，显著提升了多图上传效率。文章详细说明了插件的安装、配置和使用方法。 ... [详细]

蜡笔小新 2024-12-26 13:30:01
ci
Coursera ML 机器学习

2019独角兽企业重金招聘Python工程师标准线性回归算法计算过程CostFunction梯度下降算法多变量回归![选择特征](https:static.oschina.n ... [详细]

蜡笔小新 2024-12-22 16:09:09
api
深入浅出TensorFlow数据读写机制

本文详细介绍TensorFlow中的数据读写操作，包括TFRecord文件的创建与读取，以及数据集（dataset）的相关概念和使用方法。 ... [详细]

蜡笔小新 2024-12-19 16:23:17
api
美团推荐系统：机器学习优化重排序模型

在互联网信息爆炸的时代，当用户需求模糊或难以通过精确查询表达时，推荐系统成为解决信息过载的有效手段。美团作为国内领先的O2O平台，通过深入分析用户行为，运用先进的机器学习技术优化推荐算法，提升用户体验。 ... [详细]

蜡笔小新 2024-12-17 17:56:15
正则
【度量学习】Siamese Network

基于2-channelnetwork的图片相似度判别一、相关理论本篇博文主要讲解2015年CVPR的一篇关于图像相似度计算的文章：《LearningtoCompar ... [详细]

蜡笔小新 2024-12-12 19:11:33
queue
Java并发编程：LinkedBlockingQueue的实际应用

本文介绍了Java并发库中的阻塞队列（BlockingQueue）及其典型应用场景。通过具体实例，展示了如何利用LinkedBlockingQueue实现线程间高效、安全的数据传递，并结合线程池和原子类优化性能。 ... [详细]

蜡笔小新 2024-12-27 18:51:49
queue
Java面试题解析

本文详细介绍了Java编程语言中的核心概念和常见面试问题，包括集合类、数据结构、线程处理、Java虚拟机（JVM）、HTTP协议以及Git操作等方面的内容。通过深入分析每个主题，帮助读者更好地理解Java的关键特性和最佳实践。 ... [详细]

蜡笔小新 2024-12-27 13:55:14
queue
Android LED 数字字体的应用与实现

本文介绍了一种适用于 Android 应用的 LED 数字字体（digital font），并详细描述了其在 UI 设计中的应用场景及其实现方法。这种字体常用于视频、广告倒计时等场景，能够增强视觉效果。 ... [详细]

蜡笔小新 2024-12-27 10:34:22
server
如何顺利使用Eclipse进行Struts开发

作为一名新手，您可能会在初次尝试使用Eclipse进行Struts开发时遇到一些挑战。本文将为您提供详细的指导和解决方案，帮助您克服常见的配置和操作难题。 ... [详细]

蜡笔小新 2024-12-27 09:57:58
server
RecyclerView初步学习(一)

RecyclerView初步学习(一)ReCyclerView提供了一种插件式的编程模式，除了提供ViewHolder缓存模式，还可以自定义动画，分割符，布局样式，相比于传统的ListVi ... [详细]

蜡笔小新 2024-12-26 20:24:01
多线程
2023年京东Android面试真题解析与经验分享

本文由一位拥有6年Android开发经验的工程师撰写，详细解析了京东面试中常见的技术问题。涵盖引用传递、Handler机制、ListView优化、多线程控制及ANR处理等核心知识点。 ... [详细]

蜡笔小新 2024-12-26 17:45:48
多线程
计算机图形学实训：OpenGL入门与直线光栅化算法

本教程涵盖OpenGL基础操作及直线光栅化技术，包括点的绘制、简单图形绘制、直线绘制以及DDA和中点画线算法。通过逐步实践，帮助读者掌握OpenGL的基本使用方法。 ... [详细]

蜡笔小新 2024-12-26 12:24:25
api
中央电视台电影频道节目预告及优化分析

本文详细介绍了中央电视台电影频道的节目预告，并通过专业工具分析了其加载方式，确保用户能够获取最准确的电视节目信息。 ... [详细]

蜡笔小新 2024-12-25 21:01:14
ci
基于KVM的SRIOV直通配置及性能测试

SRIOV介绍、VF直通配置，以及包转发率性能测试小慢哥的原创文章，欢迎转载目录?1.SRIOV介绍?2.环境说明?3.开启SRIOV?4.生成VF?5.VF ... [详细]

蜡笔小新 2024-12-25 19:26:39

O依楼观雪O

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章