Nand Flash读写速度的计算方法
在下面的部分,我们以Micron的Nand Flash芯片为例,看一下Nand Flash的访问速度(Write / Read)是如何计算的?我们可以利用Datasheet提供的Read / Program / Erase操作时序图进行逐项累加,并通过一定的公式推导来完成。
以下图为例,这是一个相当复杂的图示。它包含两部分(target)。每个target有两个LUNs (Logic Unit)。每个都是完全独立的。但LUNs can do interleaved operations. 如下图所示:LUN1 和LUN2 在同一个target中。这样的好处是:最大化带宽和降低干扰。
上述设备的参数具体情况如下:
下面以Synchronous Interface为例进行Nand Flash访问速度的计算:
1. Read operation
<1> Read a single page
消耗时间计算如下:
7 * tCAD (Send address and command) &#43; (tWB &#43; tR) (Read data from the NAND Flash Array into the data register) &#43; tdqs * 4320(Transfer a page of data out)
tCAD &#61; 25ns
tWB &#61; 100ns
tR &#61; 25us
tdqs &#61; 0.5 tCK (minimum)
tCK &#61; 12ns (minimum)
Total Time: 7 * 25ns &#43; 100ns &#43; 25000ns &#43; 0.5 * 12ns * 4320 &#61; 51195ns
Data Transferred: 4320 bytes
Bandwidth: 4320 bytes / 51.195us &#61; 84.4MB/s
主要特性:
1) 页大小为: 4K &#43; 224 Bytes。
2) 采用DQS的上升沿和下降沿同时采集数据来进行传输。
<2> 2 LUN Four-plane page read
The time needed:
[ (7 * tCAD &#43; tWB &#43; tDBSY) * 3 &#43; (7 * tCAD &#43; tWB &#43; tDBSY) * 3 &#43; (7 * tCAD &#43; tWB) &#43; (7 * tCAD &#43; tWB &#43; tR) &#43; [(7 * tCAD &#43; tCCS &#43; tDQSCK &#43; tdqs * 4320)] * 8
Note:
tCAD &#61; 25ns
tWB &#61; 100ns
tDBSY &#61; 0.5us &#61; 500ns
tR &#61; 30us &#61; 30000ns (for multi-plane read)
tdqs &#61; 0.5tCK
tCK &#61; 12ns
tCCS &#61; 200ns
tDQSCK &#61; 20ns
tTime &#61; [ (175ns &#43; 100ns &#43; 500ns) * 3 &#43; (175ns &#43; 100ns &#43; 500ns) * 3 &#43; (175ns &#43; 100ns) &#43; (175ns &#43; 100ns &#43; 30000ns) &#43; [(175ns &#43; 200ns &#43; 20ns &#43; 0.5 * 12ns * 4320)] * 8
&#61; 2325ns &#43; 2325ns &#43; 30550ns &#43; 210520ns
&#61; 245720ns
Data transferred: 4320 * 4 * 2 &#61; 34560bytes
Bandwidth: 34560 bytes / 245.720us &#61; 140.6MB/s
<3> Device that has 2 independent targets
每个target是完全独立的&#xff0c;因此相应的速度在理论上为倍数关系。
此种情况下的访问速度为倍数关系: 2 * 140.6MB/s &#61; 281.2MB/s.
2. Program operation
<1> Single program operation
写操作的时间消耗为&#xff1a;
6 * tCAD (Send address and command) &#43; tADL &#43; tDQSS &#43; tdqs * 4320(Transfer the data into the Flash) &#43; tCAD (Program confirm command) &#43; tWB &#43; tPROG (Program the Flash Array time) &#61;
tCAD &#61; 25ns
tADL &#61; 70ns (Minimum)
tDQSS &#61; 0.75tCK(minimum)
tdqs &#61; 0.2tCK (minimum)
tWB &#61; 100ns
tPROG &#61; 160us
tCK &#61; 12ns
tTime &#61; 150ns &#43; 70ns &#43; 0.75tCK &#43; 0.2tCK * 4320 &#43; 25ns &#43; 100ns &#43; 160us &#61;
&#61; 150ns &#43; 70ns &#43; 9ns &#43; 10368 ns &#43; 25ns &#43; 100ns &#43; 160000ns &#61; 170728ns
&#61; 170.722us
Data transferred: 4320bytes
Bandwidth &#61; 4320bytes / 170.722us &#61; 25.3MB/s
<2> 2 LUN Four-plane program operation
先送命令和数据到4 planes,然后执行写操作。
整个写的时间消耗为:
[tCAD &#43; 4 * tCAD &#43; tADL &#43; tDQSS &#43; tdqs * 4320 &#43; tCAD &#43; tWB &#43; tDBSY] * 3 &#43;
[tCAD &#43; 4 * tCAD &#43; tADL &#43; tDQSS &#43; tdqs * 4320 &#43; tCAD &#43; tWB] &#43;
[tCAD &#43; 4 * tCAD &#43; tADL &#43; tDQSS &#43; tdqs * 4320 &#43; tCAD &#43; tWB &#43; tDBSY] * 3 &#43;
[tCAD &#43; 4 * tCAD &#43; tADL &#43; tDQSS &#43; tdqs * 4320 &#43; tCAD &#43; tWB &#43; tPROG] &#61;
&#61; [tCAD &#43; 4 * tCAD &#43; tADL &#43; tDQSS &#43; tdqs * 4320 &#43; tCAD &#43; tWB &#43; tDBSY] * 6 &#43;
[tCAD &#43; 4 * tCAD &#43; tADL &#43; tDQSS &#43; tdqs * 4320 &#43; tCAD &#43; tWB] * 2 &#43;
tPROG
tCAD &#61; 25ns
tADL &#61; 70ns(Minimum)
tDQSS &#61; 0.75tCK
tdq &#61; 0.2tCK
tCK &#61; 12ns
tPROG &#61; 160000ns
tDBSY &#61; 500ns
tWB &#61; 100ns
Total time needed:
[125ns &#43; 70ns &#43; 0.75 * 12ns &#43; 0.2 * 12ns * 4320 &#43; 25ns &#43; 100ns &#43; 500ns] * 6 &#43;
[125ns &#43; 70ns &#43; 0.75 * 12ns &#43; 0.2 * 12ns * 4320 &#43; 25ns &#43; 100ns] * 2 &#43; 160000ns
&#61; 67182ns &#43; 21376 &#43; 160000ns &#61; 248558ns
Data transferred: 4320 bytes * 4 * 2 &#61; 34560 bytes
Bandwidth: 34560 bytes / 248.558 us &#61; 139.0MB/s
<3> Device that has 2 targets
每个target是完全独立的&#xff0c;因此相应的速度在理论上为倍数关系。
即&#xff1a;139.0 MB/s * 2 &#61; 278.2MB/s
3. Erase operation
<1> Erase a single block (See Figure 78 at page 99)
Erase的时间消耗为&#xff1a;
5 * tCAD (Send command and block address) &#43; tWB &#43; tBERS (Block erase time)
tCAD &#61; 25ns
tWB &#61; 100ns
tBERS &#61; 3ms
tTime &#61; 5 * 25ns &#43; 100ns &#43; 3000000ns &#61; 3000225ns &#61; 3000.225us
Data erased: 128 pages * 4320 bytes/page &#61; 552960bytes
Bandwidth &#61; 552960 bytes / 3000.225us &#61; 184.3MB/s
<2> 2 LUN Erase 4-plane block operation
整个Erase操作的时间消耗为:
(5 * tCAD &#43; tWB &#43; tDBSY) * 3 &#43; (5 * tCAD &#43; tWB) &#43;
(5 * tCAD &#43; tWB &#43; tDBSY) * 3 &#43; (5 * tCAD &#43; tWB &#43; tBERS) &#61;
tCAD &#61; 25ns
tWB &#61; 100ns
tDBSY &#61; 0.5us &#61; 500ns
tBERS &#61; 3ms &#61; 3000000ns
tTime &#61; 6 * (125ns &#43; 100ns &#43; 500ns) &#43; (125ns &#43; 100ns) * 2 &#43; 3000000ns
&#61; 4350ns &#43; 450ns &#43;3000000ns
&#61; 3004800ns &#61; 3004.800us
Data Erased: 2 * 128 pages * 4320 bytes/page * 4 planes &#61; 4423680 bytes
Bandwidth &#61; 4423680 bytes / 3004.8 us &#61; 1472.2MB/s
<3> Two target 4-plane erase operation
每个target是完全独立的&#xff0c;因此相应的速度在理论上为倍数关系。
即&#xff1a;2 * 1472.2MB/s &#61; 2944.4 MB/s
写状态&#xff1a;
首先将坏块管理中的地址与当前块地址对照&#xff0c;排除无效块。
无效块读时钟置0&#xff0c;定义输出地址信号从输出地址端口输出
无效块读时钟置1&#xff0c;输出地址加1&#xff0c;将当前地址下的数据输出&#xff08;此处的数据就是存储的无效块地址&#xff09;&#xff0c;判断如果坏块管理中心输出的坏块地址大于当前访问的块地址&#xff0c;则证明该地址为有效块&#xff08;坏块肯定是少数&#xff0c;如果不是第一块则就是后面的&#xff0c;地址肯定大于当前有效块地址&#xff09;
命令80H或者81H
之后写五个循环地址&#xff0c;
之后等待一个ADL时间&#xff0c;
然后将一页4K的数据从fifo中输入到flash中
读fifo时钟置1&#xff0c;保证fifo写地址大于读地址&#xff08;此处fifo读地址就是我们当前flash的写地址&#xff09;&#xff0c;flash写有效&#xff0c;flash数据口输出当前fifo数据写完一个字节&#xff0c;读fifo时钟置0&#xff0c;也就是说在读fifo时钟为1的时候才能进行fifo数据的往flash中的写操作。Fifo读地址加1&#xff0c;且4k字节计数加1&#xff0c;Flash写锁存关闭&#xff0c;读fifo时钟置1&#xff0c;判断4k字节是否写完&#xff0c;如果没写完则继续写&#xff0c;如果写完4K字节&#xff0c;则
命令11H或者10H
写无效命令无效&#xff0c;如果是0面则等待tDBSY&#xff0c;面加1&#xff0c;如果是1面则直接面加1&#xff0c;加1后没有到第111层&#xff0c;则继续写该层该页的地址读该页的数据。如果到了第111层&#xff0c;则加页再执行0到7层的页写。如果64页写满了&#xff0c;块地址加1&#xff0c;块地址加之后需要判断是否为无效块&#xff0c;再进行页写。