While checking out Grant Skinner’s new tweening engine, gTween, I was bothered by one small phrase…
gTween is a small (4.5kb), fast (1500 instances, 0.5s duration, ~25fps), instance based tweening class, with a huge number of options and capabilities.
The definition of ‘fast’ in terms of Flash Player performance is somewhat of a mystery. We’re looking for high frame rate i guess? Lots of things on stage? Total time of operations? But frame rate and number of instances don’t really tell the whole story. There are a number of factors that make the Flash Player performance a very difficult thing to measure.
- Flash Player performance varies based on the speed of the viewer’s computer.
That’s nothing new. All apps deal with this. However, Flash Player has these added complications.
- Flash Player performance varies based on what version of the player is being used.
- Flash Player performance varies based on the browser in which it is embedded.
- The browsers’ Flash Player runs at a different speed as desktop versions (browsers seem to have a speed cap around 50 or 60 fps while stand alone versions do not)
- Loading times for external assets must sometimes be taken into account.
- Framerates can vary based on the set framerate of the Flash app. Rumored ‘magic framerates’ may affect this as well.
- Flash Player can sometimes hang, crash, or self-destruct if too many processes are going on at once.
- Flash Player 10′s support for video hardware should complicate things further (although it will probably make our lives easier in the long run).
As far as I know, there is no single standard for testing performance of Flash applications (unless you count the Flex Profiler which we probably should). I’ve definitely seen benchmark tests for comparing specific operations, like tweening and 3d rendering, but they are usually very specific to the task at hand and the results are difficult to compare because of all the variables.
Any good scientific experiment will always measure a control group. That is, a test of a relatively variable free environment used as a baseline measure. In the case of Flash Player performance, this would be some fixed set of functions that may produce different results on different platforms but should reflect the overall performance of each platform. For example, a suite of math and visual computations similar to the ones found here. This would allowing us to describe test operations in terms of a multiplier of the baseline. In other words if the baseline took 623ms with an average framerate of 45fps and a Tweener test took 2430ms @ 37fps you might say that the results were 3.9*base time @ 82% frame rate. Ideally, this number would stay more or less the same regardless of which computer or player it was running on.
I had considered what might happen if there was a public forum for posting swf files. There, a developer could post a file and gather averaged data from the other users of the site. The price of this data would be that the developer would participate by viewing, say, 5 other projects thus insuring that there will be plenty of eyes on the work that’s posted.
But in my experience, the average Flash developer has a very different setup from the intended audience. So another possible solution is to create a testing environment based on the most popular machine spec. As this post at Draw Logic points out, we don’t really know that much about the hardware specs for our audiences, although, we could probably assume that the average user has Flash Player 9.0 and is using a single-core, Windows-based machine and Internet Explorer.
Caleb Johnston of this blog notes:
I think that the machine-standard is a bad idea because the reality is that performance is relative. That’s how Flash works… its an important thing that everybody neglects. I think that people should be more aware that their program may run faster/slower on different machines.
In the end, the best practices for benchmarking in Flash Player remain elusive. What do you think about this? My questions to you, the reader:
- Broadly, what is the best way to benchmark in Flash? What techniques have you used successfully?
- What are the factors we’re interested in when measuring success? Framerate? Total time of operation? Something else? What are the best practices for measuring these factors?
- How can we overcome some of the pitfalls pointed out in this article?
I’m looking forward to reading your comments!

it’s a cool topic but one that can be way overthought I think. In the end what matters is user experience and I’m not sure any amount of counting obects and frames can quantify that.
Of course, I don’t mean to dismiss your whole post. It’s always good to know that library X is “faster” than library Y, or this method of doing something is faster than another method. :)
I think Keith is correct. However, there are plenty of developers out there who will determine which library others must use based on their precedence at a company or on a project. I would rather that they make an informed and appropriate decision on the matter. So valid benchmarking should be a factor in deciding which library to use. And even more importantly, understanding framerate and performance is a critical thing for any Flash developer. Perhaps I will give that topic more attention in future writings.
I might be a bit offtopic here but there are some ways to benchmark the capabilities of the programming language that powers the player.
Check out this interesting post http://www.victordramba.com/?p=16
I agree, benchmarking is almost meaningless without context. 1500 tweens could be great on a slower system, or mediocre on a quad core Mac Pro. However, I think there is a bigger consideration here, which is the fixation on performance benchmarks as an indicator of usefulness.
Take tweening – in most cases, performance only has to be at an acceptable level. Most people do not utilize anywhere near 1500 simultaneous tweens in their projects. Once a tweening engine exceeds acceptable speed (which most engines beyond Adobe’s Tween do), decisions should be based on whether the capabilities and API of a library fit your needs. Unfortunately, many developers (particularly junior developers) tend to latch on to measurable qualities, because it provides a solid metric to justify their choice. IMHO, this mentality is also what leads to the abuse of design patterns and frameworks.
The main reason I put that very vague metric on the gTween page was to indicate an acceptable level of performance, not to provide a direct benchmark against other engines.
Benchmarking is really something that needs to be performed by developers on performance critical code on an as-needed basis. The best performance benchmark will always be whether it runs acceptably on the minimum target system.
Hi everyone, thanks for the comments.
I agree. The final measure is always whether it looks acceptable on the target machine. I guess the main reason I am interested in this is to test changes to performance-dependent projects like tween engines internally and not so much to compare different ones or get detailed numbers. It’s difficult to tell whether your optimization changes are having any effect sometimes.
So far great feedback!
Also, I wasn’t trying to pick on Grant with that statement! ;-D
Pingback: localToGlobal » Blog Archive » news review -> 38th week of 2008