2
I promised a fuller answer. Sorry it is later than I expected:
我答应了更全面的答案。对不起,它比我预期的要晚:
- I got tied up with another problem,
我遇到了另一个问题,
- Technique 1, which I was expecting to recommend, did not work as I expected so I added some other techniques which are much more satisfactory.
技术1,我期望推荐,没有按照我的预期工作,所以我添加了一些更令人满意的技术。
As I said in my first comment:
正如我在第一次评论中所说:
Public Function ReDimPreserve(aArrayToPreserve, nNewFirstUBound, nNewLastUBound)
causes aArrayToPreserve
to have the default type of Variant. This does not match:
导致aArrayToPreserve具有默认类型Variant。这与以下内容不符:
Dim Years() As String
As you discovered, redefining Years as a Variant, fixes the problems. An alternative approach would be to amend the declaration of ReDimPreserve
so aArrayToPreserve
is an array of type String. I would not recommend that approach since you are storing both strings and numbers in the array. A Variant array will handle either strings or numbers while a String array can only handle numbers by converting them to strings for storage and back to numbers for processing.
正如您所发现的那样,将Years重新定义为Variant,可以解决问题。另一种方法是修改ReDimPreserve的声明,以便aArrayToPreserve是String类型的数组。我不推荐这种方法,因为你在数组中存储字符串和数字。 Variant数组将处理字符串或数字,而String数组只能通过将数字转换为字符串进行存储并返回数字进行处理来处理数字。
I tried your macro with different quantities of data and different amendments and timed the runs:
我尝试使用不同数量的数据和不同的修改宏,并计算运行时间:
Rows of data Amendment Duration of run
3,500 Years() changed to Variant 4.99 seconds
35,000 Years() changed to Variant 502 seconds
35,000 aArrayToPreserve changed to String 656 seconds
As I said in my second comment, ReDim Preserve
is slow for both the inbuilt method and the VBA routine you found. For every call it must:
正如我在第二条评论中所说,ReDim Preserve对于内置方法和您找到的VBA例程都很慢。每次通话都必须:
- find space for the new larger array
找到新的更大阵列的空间
- copy the data from the old array to the new
将旧数组中的数据复制到新数组
- release the old array for garbage collection.
释放旧数组以进行垃圾回收。
ReDim Preserve
is a very useful method but it must be used with extreme care. Sometimes I find that sizing an array to the maximum at the beginning and using ReDim Preserve to cut the array down to the used size at the end is a better technique. The best techniques shown below determine the number of entries required before sizing the array.
ReDim Preserve是一种非常有用的方法,但必须非常小心使用。有时我发现在开始时将数组的大小调整到最大值并使用ReDim Preserve将数组减少到最后使用的大小是一种更好的技术。下面显示的最佳技术确定了在调整阵列大小之前所需的条目数。
At the bottom of your routine, I added:
在你的日常工作的最底层,我补充说:
For i = LBound(Years, 1) To LBound(Years, 1) + 9
Debug.Print Years(i, 0) & "|" & Years(i, 1) & "|" & Years(i, 2) & "|" & Years(i, 3)
Next
For i = UBound(Years, 1) - 9 To UBound(Years, 1)
Debug.Print Years(i, 0) & "|" & Years(i, 1) & "|" & Years(i, 2) & "|" & Years(i, 3)
Next
This resulted in the following being output to the Immediate Window:
这导致以下输出到立即窗口:
|||
|AAAA|2|2
|AAAB|3|4
|AAAC|5|7
|AAAD|8|11
|AAAE|12|16
|AAAF|17|22
|AAAG|23|23
|AAAH|24|25
|AAAI|26|28
|AOUJ|34973|34976
|AOUK|34977|34981
|AOUL|34982|34987
|AOUM|34988|34988
|AOUN|34989|34990
|AOUO|34991|34993
|AOUP|34994|34997
|AOUQ|34998|35002
|AOUR|35003|
|||
Since you have called the array Years
, I doubt my string values are anything like yours. This does not matter. What matters, is that I doubt this output was exactly what you wanted.
既然你已经调用了数组,我怀疑我的字符串值与你的一样。这没关系。重要的是,我怀疑这个输出正是你想要的。
If you write:
如果你写:
ReDim Years(1, 3)
The lower bounds are set to the value specified by the Option Base
statement or zero if there is no Option Base
statement. You have lower bounds for both dimensions of zero which you do not use. This is the reason for the “|||” at the top. There is another “|||” at the end which means you are creating a final row which you are not using. The final used row does not have an end row which I assume in a mistake.
下限设置为Option Base语句指定的值,如果没有Option Base语句,则设置为零。您有两个不使用的维度的下限。这就是顶部“|||”的原因。最后还有另一个“|||”表示您正在创建一个您没有使用的最后一行。最后使用的行没有一个我认为是错误的结束行。
When I can divide a routine into steps, I always validate the result of one step before advancing to the next. That way, I know any problems are within the current step and not the result of an error in an earlier step. I use Debug.Print
to output to the Immediate Window most of the time. Only if I want to output a lot of diagnostic information will I write to a text file. Either way, blocks of code like mine are a significant aid to rapid debugging of a macro.
当我可以将例程分成几个步骤时,我总是在前进到下一步之前验证一步的结果。这样,我知道任何问题都在当前步骤内,而不是前一步骤中的错误结果。我使用Debug.Print在大多数时间输出到立即窗口。只有当我想输出大量诊断信息时,才会写入文本文件。无论哪种方式,像我这样的代码块都是快速调试宏的重要辅助手段。
I would never write ReDim Years(1, 3)
. I always specify the lower bound so as to be absolutely clear. VBA is the only language I know where you can specify any value for the lower bound (providing it is less than the upper bound) so I will specify non-standard values if is helpful for a particular problem. In this case, I see not advantage to a lower bound other than one so that is what I have used.
我永远不会写ReDim年(1,3)。我总是指定下限,以便绝对清楚。 VBA是我所知道的唯一语言,您可以在其中指定下限的任何值(假设它小于上限),因此如果对特定问题有帮助,我将指定非标准值。在这种情况下,我认为除了一个以外的下限没有优势,所以这就是我所使用的。
With two dimensions arrays it is conventional to have columns as the first dimension and rows as the second. One exception is for arrays read from or to be written to a worksheet for which the dimensions are the other way round. You have rows as the first dimension. If you have used the conventional sequence you could have used the ReDim Preserve
method, thereby avoiding the RedimPreserve
function and the problem of non-matching types.
对于二维数组,通常将列作为第一维,将行作为第二维。一个例外是读取或写入工作表的数组,其尺寸是相反的。您将行作为第一个维度。如果您使用了传统的序列,则可以使用ReDim Preserve方法,从而避免使用RedimPreserve函数和不匹配类型的问题。
Technique 1
I expected this to be the fastest technique. Experts advise us to avoid “re-inventing the wheel”. That is, if Excel has a routine that will do what you want, don’t code an alternative in VBA. However, I have found a number of examples where this is not true and I discovered this technique was one of them.
我希望这是最快的技术。专家建议我们避免“重新发明轮子”。也就是说,如果Excel有一个可以执行您想要的例程,请不要在VBA中编写替代代码。但是,我发现了一些不成立的例子,我发现这种技术就是其中之一。
The obvious technique here is to use Filter
, then create a range of the visible rows using SpecialCells
and finally process each row in this range. I have used this technique very successfully to meet other requirements but not here.
这里显而易见的技术是使用Filter,然后使用SpecialCells创建一系列可见行,最后处理此范围内的每一行。我非常成功地使用这种技术来满足其他要求,但不是这里。
I did not know the VBA to select unique rows so started the macro recorder and filtered my test data from the keyboard to get:
我不知道VBA选择了唯一的行,因此启动了宏录制器并从键盘中过滤了我的测试数据以获得:
Range("A1:A35000").AdvancedFilter Action:=xlFilterInPlace, Unique:=True
My past uses of Filter
have all converted to AutoFilter which I have found to give acceptable performance. This converted to AdvancedFilter
which took 20 seconds both from the keyboard and from VBA. I do not know why it is so slow.
我过去使用的Filter都转换为AutoFilter,我发现它可以提供可接受的性能。这转换为AdvancedFilter,从键盘和VBA都需要20秒。我不知道为什么这么慢。
The second problem was that:
第二个问题是:
Set RngUnique = .Range(.Cells(1, 1), .Cells(RowLast, 1)) _
.SpecialCells(xlCellTypeVisible)
was rejected as “too complicated”.
被拒绝为“太复杂”。
Not being able to get the visible rows as a range means the benefits of Filter
are not really available. I have counted the visible rows to simulate having RngUnique.Rows.Count
. This shows the technique which has always worked with AutoFilter
. If AdvancedFilter
had reported the unique rows in an accepted time I might have investigated this problem but under the circumstances it does not seem worth the effort.
无法将可见行作为范围获得意味着Filter的好处并不真正可用。我已经计算了可见行来模拟RngUnique.Rows.Count。这显示了一直使用AutoFilter的技术。如果AdvancedFilter在接受的时间内报告了唯一的行,我可能已经调查了这个问题,但在这种情况下,这似乎不值得努力。
The macro demonstrating this technique is:
展示这种技术的宏是:
Option Explicit
Sub Technique1()
' * Avoid using meaningless names like i. Giving every variable a meaningful
' name is helpful during development and even more helpful when you return
' to the macro in six months for maintenence.
' * My naming convention is use a sequence of keywords. The first keyword
' identifies what type of data the variable holds. So "Row" means it holds
' a row number. Each subsequent keyword narrows the scope. "RowSb" is a
' row of the worksheet "Simple Boundary" and "RowYears" is a row of the Years
' array. "RowSbCrnt"is the current row of the worksheet "Simple Boundary".
' * I can look at macros I wrote years ago and know what all the variables are.
' You may not like my convention. Fine, development your own but do not
' try programming with random names.
' * Avoid data type Integer which specifies a 16-bit whole number and requires
' special processing on 32 and 64-bit computers. Long is now the recommended
' data type for whole numbers.
Dim NumRowsVisible As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
' This can save significant amounts of time if the macro amends the
' screen or switches between workbooks.
Application.ScreenUpdating = False
With Worksheets("Simple Boundary")
' Rows.Count avoiding having to guess how many rows will be used
RowSbLast = .Cells(Rows.Count, "A").End(xlUp).Row
' Hide non-unique rows
With .Range(.Cells(1, 1), .Cells(RowSbLast, 1))
.AdvancedFilter Action:=xlFilterInPlace, Unique:=True
End With
' Count number of unique rows.
' It is difficult to time small pieces of code because OS routines
' can execute at any time. However, this count takes less than .5
' of a second with 35,000 rows.
NumRowsVisible = 0
For RowSbCrnt = 2 To RowSbLast
If Not .Rows(RowSbCrnt).Hidden Then
NumRowsVisible = NumRowsVisible + 1
End If
Next
' Use count to ReDim array to final size.
ReDim Years(1 To 3, 1 To NumRowsVisible)
RowYearsCrnt = 1
Years(1, RowYearsCrnt) = .Cells(2, 1).Value
Years(2, RowYearsCrnt) = 2
For RowSbCrnt = 3 To RowSbLast
If Not .Rows(RowSbCrnt).Hidden Then
Years(3, RowYearsCrnt) = RowSbCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(1, RowYearsCrnt) = .Cells(RowSbCrnt, 1).Value
Years(2, RowYearsCrnt) = RowSbCrnt
End If
Next
' Record final row for final string
Years(3, RowYearsCrnt) = RowSbLast
.ShowAllData ' Clear AdvancedFilter
End With
Application.ScreenUpdating = True
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
End Sub
The output to the Immediate Window is:
立即窗口的输出是:
Duration: 20.570
AAAA|2|2|
AAAB|3|4|
AAAC|5|7|
AAAD|8|11|
AAAE|12|16|
AAAF|17|22|
AAAG|23|23|
AAAH|24|25|
AAAI|26|28|
AOUI|34970|34972|
AOUJ|34973|34976|
AOUK|34977|34981|
AOUL|34982|34987|
AOUM|34988|34988|
AOUN|34989|34990|
AOUO|34991|34993|
AOUP|34994|34997|
AOUQ|34998|35002|
AOUR|35003|35008|
As you can see the last row is correct. A duration of 20 seconds is better than the 8 minutes of your technique but I am sure we can do better.
如您所见,最后一行是正确的。持续时间为20秒优于技术的8分钟,但我相信我们可以做得更好。
Technique 2
The next macro is similar to the last one but it counts the unique rows rather than use AdvancedFilter to hide the non-unique rows. This macro has a duration of 1.5 seconds with 35,000 rows. This demonstrates that counting how many rows are required for an array in a first pass of the data is a viable approach. The diagnostic output from this macro is the same as above.
下一个宏类似于最后一个宏但它计算唯一行而不是使用AdvancedFilter来隐藏非唯一行。此宏的持续时间为1.5秒,包含35,000行。这表明在第一次传递数据时计算数组所需的行数是可行的方法。此宏的诊断输出与上述相同。
Sub Technique2()
Dim NumRowsUnique As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
With Worksheets("Simple Boundary")
RowSbLast = .Cells(Rows.Count, "A").End(xlUp).Row
' Count number of unique rows.
' Assume all data rows are unique until find otherwise
NumRowsUnique = RowSbLast - 1
For RowSbCrnt = 3 To RowSbLast
If .Cells(RowSbCrnt, 1).Value = .Cells(RowSbCrnt - 1, 1).Value Then
NumRowsUnique = NumRowsUnique - 1
End If
Next
' * Use count to ReDim array to final size.
' * Note that I have defined the columns as the first dimension and rows
' as the second dimension to match convention. Had I wished, this would
' have allowed me to use the standard ReDim Preserve which can only
' adjust the last dimension. However, this does not match the
' syntax of Cells which has the row first. It may have been better to
' maintain your sequence so the two sequences were the same.
ReDim Years(1 To 3, 1 To NumRowsUnique)
RowYearsCrnt = 1
Years(1, RowYearsCrnt) = .Cells(2, 1).Value
Years(2, RowYearsCrnt) = 2
For RowSbCrnt = 3 To RowSbLast
If .Cells(RowSbCrnt, 1).Value <> .Cells(RowSbCrnt - 1, 1).Value Then
Years(3, RowYearsCrnt) = RowSbCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(1, RowYearsCrnt) = .Cells(RowSbCrnt, 1).Value
Years(2, RowYearsCrnt) = RowSbCrnt
End If
Next
' Record final row for final string
Years(3, RowYearsCrnt) = RowSbLast
End With
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
End Sub
Technique 3
The next macro is only slightly changed from the last.
下一个宏只是从最后一个稍微改变了。
Firstly, I have replaced the literals used to identify the column numbers in worksheets and arrays with constants such as:
首先,我用常量替换了用于识别工作表和数组中列数的文字,例如:
Const ColYrEnd As LOng= 3
Under my naming convention ColYrEnd
= Column of Year array holding range End hence:
在我的命名约定下,ColYrEnd = Year of Year数组保持范围因此结束:
Years(ColYrEnd, RowYearsCrnt) = RowCvCrnt - 1
instead of Years(3, RowYearsCrnt) = RowCvCrnt - 1
This makes no difference to the compiled code but makes the source code easier to understand because you do not have to remember what columns 1, 2 and 3 hold. More importantly, if you ever have to rearrange the columns, updating the constants is the only change required. If you ever have to search through a long macro replacing every use of 2 as a column number (while ignoring any other use of 2) by 5, you will know why this is important.
这对编译的代码没有任何影响,但是使源代码更容易理解,因为您不必记住第1,2和3列的含义。更重要的是,如果您必须重新排列列,则更新常量是唯一需要的更改。如果你需要搜索一个长宏来代替每次使用2作为列号(而忽略任何其他2的使用),那么你就会知道为什么这很重要。
Secondly, I have used:
其次,我用过:
ColValues = .Range(.Cells(1, ColSbYear), _
.Cells(RowSbLast, ColSbYear)).Value
to import column 1 to an array. The code that read the values from the worksheet now reads them from this array. Array access is much faster than worksheet access so this reduces the runtime from 1.5 seconds to .07 seconds.
将列1导入数组。从工作表中读取值的代码现在从此数组中读取它们。数组访问比工作表访问快得多,因此这将运行时间从1.5秒减少到.07秒。
The revised code is:
修订后的代码是:
Sub Technique3()
Const ColCvYear As LOng= 1
Const ColSbYear As LOng= 1
Const ColYrYear As LOng= 1
Const ColYrStart As LOng= 2
Const ColYrEnd As LOng= 3
Const RowSbDataFirst As LOng= 2
Const RowCvDataFirst As LOng= 2
Dim ColValues As Variant
Dim NumRowsUnique As Long
Dim RowCvCrnt As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
With Worksheets("Simple Boundary")
RowSbLast = .Cells(Rows.Count, ColSbYear).End(xlUp).Row
ColValues = .Range(.Cells(1, ColSbYear), _
.Cells(RowSbLast, ColSbYear)).Value
' * The above statement imports all the data from column 1 as a two dimensional
' array into a Variant. The Variant is then accessed as though it is an array.
' * The first dimension has one entry per row, the second dimension has on entry
' per column which is one in this case. Both dimensions will have a lower bound
' of one even if the first row or column loaded is not one.
End With
' Count number of unique rows.
' Assume all data rows are unique until find otherwise
NumRowsUnique = UBound(ColValues, 1) - 1
For RowCvCrnt = RowCvDataFirst + 1 To UBound(ColValues, 1)
If ColValues(RowCvCrnt, ColCvYear) = ColValues(RowCvCrnt - 1, ColCvYear) Then
NumRowsUnique = NumRowsUnique - 1
End If
Next
' I mentioned earlier that I was unsure if having rows and columns in the
' convention sequence was correct. I am even less sure here where array
' ColValues has been loaded from a worksheet and the rows and columns are
' not in the conventional sequence. ReDim Years(1 To 3, 1 To NumRowsUnique)
RowYearsCrnt = 1
Years(ColYrYear, RowYearsCrnt) = ColValues(RowCvDataFirst, ColCvYear)
Years(ColYrStart, RowYearsCrnt) = RowCvDataFirst
For RowCvCrnt = RowCvDataFirst + 1 To UBound(ColValues, 1)
If ColValues(RowCvCrnt, ColCvYear) <> ColValues(RowCvCrnt - 1, ColCvYear) Then
Years(ColYrEnd, RowYearsCrnt) = RowCvCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(ColYrYear, RowYearsCrnt) = ColValues(RowCvCrnt, ColCvYear)
Years(ColYrStart, RowYearsCrnt) = RowCvCrnt
End If
Next
' Record final row for final string
Years(ColYrEnd, RowYearsCrnt) = UBound(ColValues, 1)
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(ColYrYear, RowYearsCrnt) & "|" & _
Years(ColYrStart, RowYearsCrnt) & "|" & _
Years(ColYrEnd, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(ColYrYear, RowYearsCrnt) & "|" & _
Years(ColYrStart, RowYearsCrnt) & "|" & _
Years(ColYrEnd, RowYearsCrnt) & "|"
Next
End Sub
Other techniques
I considered introducing other techniques but I decided they were not useful for this requirement. Also, this answer is already long enough. I have provided much for you to think about and more would just be overload. As stated above I have reduced the run time for 35,000 rows from 8 minutes to 20 seconds to 1.5 seconds to .07 seconds.
我考虑引入其他技术,但我认为它们对此要求没用。此外,这个答案已经足够长了。我为你提供了很多思考,而且更多的只是过载。如上所述,我将35,000行的运行时间从8分钟缩短为20秒至1.5秒至.07秒。
Work slowly through my macros. I have hope I have provided adequate explanation of what each is doing. Once you know a statement exists, it is generally easy to look it up so there is not too much explanation of the statements. Come back with questions as necessary.
通过我的宏慢慢地工作。我希望我已经提供了足够的解释,说明每个人在做什么。一旦你知道一个语句存在,通常很容易查找它,所以没有太多的语句解释。如有必要,请回答问题。