|
Working with Excel and VBA |
An incredible amount can be achieved in Excel using native Excel technology only. Often though to get that extra bit of functionality or usability, VBA (or some other programming) is needed.
This section provides a brief overview, and some advice for working with VBA.
Simons VBA Best Practice Slides are here. |
VBA Performance
There seems generally to be some confusion about what VBA is and what its uses and limitations are.
VBA is based on the old (no longer fully supported) Visual Basic 6.0. It is powerful, fast and flexible. It is partly compiled rather than purely interpreted as some people think.
All our experience suggests there is NO significant performance gain in moving from VBA to VB6, for example for COM/Automation add-ins, but it is sure to depend on exactly what you are doing. If anyone has an example of VB6 being faster than VBA that we can test contact us. |
There are 2 aspects to performance in programming Excel
- The cost of interacting with Excel (eg Range.Value)
- The language performance itself (eg someValue = someValue + 1)
Performance in both of these areas is very similar between VBA and VB6 COM Add-ins. COM add-ins (DLLs) run in-process with Excel, just like VBA so the overhead of working with Excel objects is similar. VBA and VB6 are basically the same product and execution speed of the actual language seems to be similar. It is also possible to create an add-in that runs out of process using VB6. Stability is improved in this case as Excel is protected from problems in the add-in, but communicating between processes is expensive. |
COM (Component Object Model), also known as Activex, is a powerful and flexible (and fairly safe) way for pieces of software to communicate with each other. Unfortunately its quite expensive performance wise. Excel, VBA, VB6 etc use this approach to communicate.
Excel has another interface. Its at a lower level and more complex to work with, but it is dramatically faster - its called the Excel C API (Application Programming Interface). Unfortunately VB cannot work with this interface, and generally C or C++ is used instead. C languages also execute many tasks significantly faster than VB languages, so this route has a double benefit. The main costs are increased development time, and increased (?) developer skills, certainly different skills anyway. One drawback of the C API approach is it that it is not a fault tolerant as the VB/COM approach, the lack of checking that improves performance, also means errors are likely to crash Excel. |
COM VBA performance
There are 2 ways to use COM components from VBA, Early Bound and Late Bound.
Early binding (dim x as Areferenced.ObjectType) is generally considered better than late binding (dim x as object; x= CreateObject("AnUnreferenced.ObjectType")
the most common reasons given being performance, error/syntax checking and intellisense. Or testing suggests the language chosen to write the COM component may well have more impact than the binding choice. To test we built a simple COM server in VB and in C++ using ATL (the Active Template Library).
There are 2 aspects to COM performance, the time to create objects and time to call their methods.
This was an attempt to measure the performance of using different COM calling approaches rather than the execution speed of any particular language. As such we chose a very simple function:
Cpp code:
STDMETHODIMP Cautomtest::SpeedTest1(long theValue, long* theResult)
{
*theResult = theValue;
return S_OK;
}
VB code:
Public Function SpeedTest1(ByVal theValue As Long, theResult As Long) As Long
theResult = theValue
End Function
This takes a long as the first parameter and returns it in the second. This approach was used as the COM standard in C++ is to use the return value to indicate success/failure. There were 5 versions of the function (SpeedTest1-SpeedTest5) to create a worthwhile function table. As a matter of interest calling the first function in the table will be fractionally faster than calling the last one. And the longer the function list the worse the performance of the later functions. Think of it as a long Select Case statement, the sooner it finds the right one the less cases it has to try.
|
We have done 2 sets of tests. The first, original findings are included below with conclusions drawn at that time, but it became apparent that this test, whilst not unreasonable, could be made more meaningful.
To do that we split out the creation aspect and the calling aspect and measured them separately, here are the results. |
| Results for 500,000 iterations |
Relative function call performance |
|
|
Creation (Seconds) |
5 function calls (Seconds) |
% of Early bound C++ |
Factor v VBA |
Factor v win 32 |
| Early Bound |
|
|
|
|
|
| |
VB |
9 |
0.3 |
1.8 |
0.6 |
1.5 |
| |
C++ |
2 |
0.17 |
1.0 |
0.3 |
0.9 |
| Late Bound |
|
|
|
|
|
| |
VB |
11 |
9 |
52.9 |
18.0 |
45.0 |
| |
C++ |
4 |
3 |
17.6 |
6.0 |
15.0 |
| Others |
|
|
|
|
|
| Win32 C XLL |
n/a |
0.2 |
1.2 |
0.4 |
1.0 |
| Win32 C DLL |
n/a |
0.2 |
1.2 |
0.4 |
1.0 |
| Excel VBA |
n/a |
0.5 |
2.9 |
1.0 |
2.5 |
|
This shows how expensive it is to create COM objects, in all cases it took longer to create the object than to call the 5 functions. In the case of early bound objects, up to 10 times as long. It took longer to create the late bound objects than the early ones, in the case of C++ twice as long.
The calling times are interesting with late bound being in the order of 20 times slower than early. The message is clear if you go late bound you pay twice - at creation time and at use time.
The initial tests below seemed to suggest the late bound C++ was faster than early bound VB, it can be seen that this is to do with the better creation performance of ATL objects over VB, because actually early bound VB function call performance is 10x better than late bound C++.
Interestingly the COM calls to an early bound C++ object would seem to be faster than win32, assuming the object was not created many times. Not really sure why that would be, possibly registry optimisations compared to the file system? Also no real idea why the early bound creation performance for the VB component was almost 5x slower than the ATL equivalent.
The biggest surprise (for us) was that VB6 function performance was actually better than VBA. We have never seen this is the field, even though it is almost twice as fast. Of course the function call benefit is offset by the object creation cost. It would seem though, that there could be some benefit in moving complex routines to VB6 in certain cases. Well except that VB6 will be retired next year, and for that reason we probably won't do further testing of this option.
In some further testing not covered here, the relative speed of the individual functions SpeedTest1-5 was compared. Well actually a different dll with 10 functions in (imaginatively called SpeedTest1 - SpeedTest10) was used to increase the function table position effect. For late bound calls, 500k calls to SpeedTest10 took twice as long as calls to SpeedTest1. I.e if you are using late binding, get your most frequently used functions to the the top of the function table. It also seems to take slightly longer to create bigger objects, no surprise there, but worth remembering.
The difference in performance of the two languages is still a surprise, with C++ significantly better in all areas. The xll version has the same performance as the win 32 dll, but is easier to manage as it registers itself with Excel on opening. This means you don't need to mess with the VBA to try it out.
A second set of workbooks with different test VBA are here. For fun try messing with the function calls (change all to SpeedTest1 and then SpeedTest5) it makes a significant difference with the late bound versions. As below all the activex dlls will need regsvr32-ing. This set also includes an xll version.
|
| Test Version 1 results These are the original findings and conclusions. Valid, but maybe potentially misleading - very dependent on create/call mix. ie if you are not creating a component calling 5 functions and destroying it, your performance will vary. |
| We then wrote a simple VBA client to call it multiple times using early binding and late binding. For completeness we also included a pure VBA solution and a win32 dll version. Note this is the complete cycle - create object, call some (5) methods and destroy it. The results were quite surprising: |
| Results for creating object and calling 5 functions 500,000 times |
|
|
Seconds |
% of Early bound C++ |
Factor v VBA |
Factor v win 32 |
| Early Bound |
|
|
|
|
| |
VB |
10 |
500% |
20.0 |
50.0 |
| |
C++ |
2 |
100% |
4.0 |
10.0 |
| Late Bound |
|
|
|
|
| |
VB |
21 |
1050% |
42.0 |
105.0 |
| |
C++ |
7 |
350% |
14.0 |
35.0 |
| Others |
|
|
|
|
| Win32 C DLL |
0.2 |
10% |
0.4 |
1.0 |
| Excel VBA |
0.5 |
25% |
1.0 |
2.5 |
|
So COM is slower than win32, we knew that, its safer and more flexible too. Pure VBA is 20 times faster than using an early bound VB6 COM dll, this means your code must be doing something pretty intensive to recover the time cost of calling out of VBA to VB6.
The difference between early and late binding was no real surprise, but its nice to be able to put a figure on it. Its roughly twice as slow to use late binding over early binding. The VBA loop overhead in all cases was approx 0.006 seconds, so a fairly insignificant proportion.
The big surprise was the C++ version, calling a C++ COM component late bound is 3 times faster than calling VB the same way, and 30% faster than calling VB early bound. I.e. C++'s worst is 30% better than VB's best! And remember this test had very minimal calculation. Also note this testing was just VBA to external code, there was no Excel element, and no worksheet formula element.
To keep these results in perspective, it important to remember these timing differences are over 2.5 million function calls. If the system only makes a few thousand calls then for all but the most time critical applications this interface performance is probably not the most important factor. If the app is that time critical then something other than Excel/VBA may be more appropriate.
The test workbooks and the dlls are here, note the COM ones will need Regsvr32'ing, and the path to the win32 version will need changing in the VBA.
Of course, with any performance testing you should test for yourself in your own environment, to be sure the results are valid for your specific project and application. |
| If you'd like to discuss C or C++ based Excel work, please get in touch. |
VBA Security
There are two main aspects to security
- Potential loss of intellectual property if people access source code without permission.
- Potential danger from a malicious user subverting the expected behaviour for their own (possibly disastrous) purposes.
VBA security is fairly poor, the code is not compiled, and the source is available in the excel file. The password protection is pretty easy to circumvent. This reflects Excel/VBA target as an end user application. Code security can be enhanced by moving to COM components or C, if there is a real need. |
.Net for Excel
Hmmm. Maybe one day. You can program Excel with .net, but generally would probably struggle to justify it on any business needs basis. Our current general recommendation is use VBA if possible for its simple development and deployment, if performance or security are an issue, use C and create an XLL add-in. Revisit in 2008/2009 to see if the then current versions of Excel and Visual Studio offer a better .net (or its replacement) story. There is some fairly in-depth information on the .net and Excel page. |
Codematic VBA Code Convention for Excel VBA
(Note this advice is specific to Excel VBA in business applications. Other languages and other applications have very different requirements).
Pretty simple really, easy to remember and easy to implement:
- Always use Option Explicit, usually use Option Private Module.
- Give Module level variables a scope qualifier prefix g_, p_ or m_ (Global (all open workbooks), Public (this workbook), or Module). Scope everything as tightly as possible. Use procedure level scope where possible and pass values by parameters.
- Use meaningful names for all procedures and variables. Do not bother with convoluted data type prefixes, that is inappropriate for business level applications, makes the code harder to read and adds very little of value. The compiler will pick up any obvious data type errors. Used mixed case descriptive names - if the name is too long then probably the procedure is too.
- Procedures should fit on one screen - ie be 40-50 lines long maximum.
- Avoid most comments - make the executable code meaningful and simple instead.
- Avoid magic numbers and strings - use constants.
- Never comment what the code does - that should be crystal clear from the code, comment WHY something is done, especially if it is unusual. Add a couple of sentences to provide an overview of a module or class.
- Pass parameters ByVal (ByRef is the default) - only use ByRef where you intend to modify the parameter and pass the change back to the caller.
- Avoid Application.Run where possible as it breaks the error handling stack.
- Use additional tools. See links page for some suggestions.
- Be aware of other options if VBA appears inappropriate for certain aspects of the project.
- Vary any rules you the developer feel do not promote clarity, simplicity and safety.
- If you really want to write high quality code read Code Complete 2 by Steve McConnell - and then apply it.
- Fuller info is in Simons VBA Best Practice slides
|
Please contact us with any questions.
|
| |
|