Archive for the ‘performance’ Category
Fast 2D Arrays/Vectors (and the Game Of Life)
Introduction
I was recently looking at an implementation of Conway’s Game Of Life, and was struck by the way in which the developer had constructed his grid. The Game of Life requires that you construct a two-dimensional array, but Flash does not support 2D arrays natively, so work arounds have to be found. There are slow workarounds, fast workarounds, and very fast workarounds. Of course, for very fast workarounds, we use Vector and target the Flash Player 10.
An example of Conway’s Game Of Life can be found below.
I should point out at the start that in these I am using for (var x:int = 0; x < 100; x++) loops. Using while (x--) or even while(--x > -1) may well be quicker, but here I am trying to highlight the relative speeds of the data structures that are being looped around, not the loops themselves.
Slow
public function generate():void
{
grid = [];
for (var x:int = 0; x < 100; x++)
{
grid[x] = [];
for (var y:int = 0; y < 100; y++)
{
grid[x][y] = Math.random() < 0.5;
}
}
}
public function retrieve(x:int, y:int):Boolean
{
return grid[x][y];
}
This is the slowest good method around. It's undoubtedly good, in that it is readable and concise. It's not so good in terms of speed though, because of the double Array reference that is required to get to the relevant reference. Array lookups are slow, and should be avoided in performance-critical areas.
Fast
public function generate():void
{
grid = [];
for (var x:int = 0; x < 100; x++)
{
for (var y:int = 0; y < 100; y++)
{
grid[x * 100 + y] = Math.random() < 0.5;
}
}
}
public function retrieve(x:int, y:int):Boolean
{
return grid[x * 100 + y];
}
Instead of putting your 2D grid in a nested array, we can just use a one-dimensional array, and use an expression x * 100 + y within the lookup to differentiate between cells.
If you're targetting the version 9 player still, then this is probably the fastest mechanism you're going to get. (there is probably no need to, the latest Flash player penetration statistics suggests about 95% adoption in the western hemisphere, and above 90% adoption elsewhere)
Fast (again, but with Vector)
public function generate():void
{
grid = new Vector.<Boolean>(10000, true);
for (var x:int = 0; x < 100; x++)
{
for (var y:int = 0; y < 100; y++)
{
grid[x * 100 + y] = Math.random() < 0.5;
}
}
}
public function retrieve(x:int, y:int):Boolean
{
return grid[x * 100 + y];
}
Moving to use the FP10 Vector class gives you some great benefits. The Flash Player can be told up-front how big the array is going to be, and it knows the type of element. It is all-round quicker to do it this way. However, this is not the quickest structure we can use!
Faster
public function generate():void
{
shift = 7;
grid = new Vector.<Boolean>(100 << shift, true);
for (var x:int = 0; x < 100; x++)
{
for (var y:int = 0; y < 100; y++)
{
grid[(x << shift) | y] = Math.random() < 0.5;
}
}
}
public function retrieve(x:int, y:int):Boolean
{
return grid[(x << shift) | y];
}
By referencing the cell using [x * 100 + y] we get the computer to do a multiplication and an addition. That's pretty quick, but while we consider these two of the most primitive mathematical operations, they are not two primitive calculations for a computer! Instead, we can use [(x << shift) | y] which replaces the multiplication and addition with a bit-shift and a bitwise disjunction. They are about as simple calculations as a computer can make, and the time it gains us is small, but significant if we are working with big grids.
I do not intend to go into the details of bitwise operations here, but for further reading, try this Wikipedia article.
Interestingly, if you try this approach with Array, it does not run as quickly as the example labelled 'Fast'. This is because, as Jackson Dunstan and JP Auclair noted, it will leave 'gaps' in the Array (for example no values will be entered in the above Vector for values, 100 - 127. For a Vector of a known size, that is not a performance problem, but Arrays work differently.
Results Summary

The results on the horizontal access are milliseconds that it took to iterate through the grid 1000 times. The difference in speed by adopting Vector and the bitwise-shift approach is to cut the calculation time by nearly a quarter. The bigger the grid is, the more this technique becomes useful.
After-Thought - Game Of Life
The Game Of Life is a grid structure, in which each cell is a Boolean switch; either 'on' or 'off'. Iteratively, the grid changes, according to a very simple instruction set:
- Each grid cell's value is determined by the values of the eight adjacent grid cells;
- For each cell, If exactly three of those adjacent cells are 'on', then in the next iteration, that cell will be 'on';
- For each cell, if two adjacent cells are 'on' then in the next iteration that cell will remain in it's current state;
- Otherwise, that cell's value in the next iteration will be set to 'off'.
At the edges, either those cells just have fewer neighbours, or the neighbours are thought to 'wrap around' the 2D space, so that the cells' universe resembles a torus in 3D space.
The result of these simple rules is an interesting pattern which is very interesting for a great number of reasons. The example below should show you what I mean. It will run only when the mouse is over the application. Click on it to break the iteration cycle and populate the cells with random data.
Introducing RacetrackStats v.0.0.1
Introduction
RacetrackStats is an in-Flash performance profiler that gives real-time feedback about how the code it is contained within is performing. It can be found at GitHub:
http://github.com/alecmce/RacetrackStats
RacetrackStats is inspired by and seeks to move forward from Mr Doob’s Stats package (http://code.google.com/p/mrdoob/wiki/stats)
In Action
Below is the first alpha of RacetrackStats in action.
The Flash plugin is required to view this object.
source: http://github.com/alecmce/RacetrackStats/blob/master/test/demo/RacetrackStatsDemo.as
The SWF comprises three small animations (from left to right):
- A mixture of computation and rendering: A simple BitmapData-driven particle system attempting to model smoke using 100000 pixels and a heavy glow filter and blur filter;
- Heavy computation: A prime-factorisation algorithm, which computes 24 * 100 prime factor decompositions per frame. As it continues to run, the numbers get larger which leads to heavier calculation cycles;
- Extremely heavy rendering: A cycadelic circle pattern that has hundreds of semi-transparent circular sprites moving around and overlapping with each other.
The ‘Elastic Racetrack’
RacetrackStats attempts to go beyond Mr Doob’s Stats by measuring how the elastic racetrack is being stretched. The Elastic Racetrack was a phrase coined to describe how the original AVM1 Flash Player segmented each second into frames and attempted to balance the requirements of code, refreshing the screen and doing work like garbage collection. The AVM2 player’s model is sufficiently similar, that the name remains appropriate.
Code Execution
It does this by hooking into the Event.ENTER_FRAME and Event.RENDER events to record when portions of the code start and end. By recording when the ENTER_FRAME starts and when the RENDER starts, the intervening time is taken to be code execution.
Pre-Render Code Execution
Commonly this part of the code cycle is almost unused. If you exploit stage.invalidate and Event.RENDER in order for your classes to execute code after the main code execution has completed, then it will be captured here, between the top-priority Event.RENDER and the bottom-priority Event.RENDER calls.
Rendering
A problem is, the time between the RENDER ending and the ENTER_FRAME starting is used to draw the screen but if time is left over in which the Flash Player has nothing to do, it will idle during this time also. To counteract this, RacetrackStats attempts to throttle the FlashPlayer so that the application is always running as fast as it can, minimising, or even eliminating that idle time. I call this the “render” portion – the “Event.RENDER” event is badly named – screen rendering happens after the RENDER event has finished.
Work To Do
As soon as you add weight like this to an application, the speed at which it runs changes. By measuring the speed, you change the speed. It is the classic scientific paradox. It is mitigated by shrinking down the weight of the stats package as far as possible. At the moment, functionality rather than size is the chief concern, though I have tried to minimize the package by avoiding embedding fonts, and so forth.
I would love to hear what you think about the package, whether positive or negative. I’d also really appreciate it if you use it in your applications. If you have any problems, questions, queries or comments, please leave a comment below, or open an issue on GitHub.
Events and Signals – Performance Tests
The new as3signals library devised by Robert Penner offers a replacement to the native AS3 event model. One benefit is that signals are more lightweight, and therefore faster. Grant Skinner’s Performance Harness is a widely accepted library for comparing the speeds of code execution.
I found the following results on Mac OS X, Flash Player 10.0.42.34, with a release build.
When an event (signal) is dispatched but nothing is listening for it:

When an event (signal) is dispatched and handled by one method listener:

The source code for my test methods follows:
If you browse these tests, you will notice that more tests were run than I have displayed above. The graphs above reference results from running the dataEventOptimised and dataSignal methods in each context.
These tests are not scientific. No attempt has been to control for player version, computer version, computer specifications, so on and so forth. You are invited to download and run these tests for yourself. If you do, remember that testing on a release build not a debug build is appropriate, and then please post your results as a comment, particularly if your results differ from my own.
Updates
For a comparison with good old fashioned callbacks, please put on your sarcasm snorkel and visit va.lent.in’s follow up post.
From the as3signals Google Group there was a suggestion that the Flash Player 10.1 might produce different results. Here are my results; they appear to corroborate the earlier results:
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– performancetests.EventsTest (5 iterations) Player version: MAC 10,1,51,66 (regular) –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– method...................................................ttl ms...avg ms dataEvent 2344 468.80 dataEventOptimised 2270 454.00 dataSignal 313 62.60 simpleEvent 2755 551.00 simpleEventOptimised 2285 457.00 simpleSignal 174 34.80 –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– performancetests.CapturedEventsTest (5 iterations) Player version: MAC 10,1,51,66 (regular) –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– method...................................................ttl ms...avg ms dataEvent 2708 541.60 dataEventOptimised 2703 540.60 dataSignal 1182 236.40 simpleEvent 2724 544.80 simpleEventOptimised 2717 543.40 simpleSignal 1042 208.40 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
Replacing ENTER_FRAME
Today an interesting discussion took place on Twitter, in which I was involved. It caught my attention for a variety of reasons, and I wanted to document the conversation, the problem and the solution here. It started when it was identified that Flash’s event dispatcher is sub-optimal in performance terms, and revolved around trying to find a solution.
Why replace the ENTER_FRAME?
Every event that is dispatched creates an object. An ENTER_FRAME creates as many objects per second as there are frames. If you have mutliple objects dispatching ENTER_FRAME events, that’s many objects creating event objects many times per second. Computationally, chains of “many” are one of those things you want to avoid. So, can we do better?
Doing Better
Robert Penner proposed creating an as3signal and using a MovieClip with two frames to dispatch that signal on each frame. He posted his idea to this GitHub gist.
One drawback to this approach is that it requires a MovieClip symbol with two empty frames, to which the two event dispatcher methods could be attached. I remembered reading in this excellent article by Bit 101 how using a [Frame] metadata tag would force your application to have two frames. This seemed to be what Penner needed for his functionality to work correctly, so I came up with my own GitHub gist which extends the concept. The code is also added ‘below the fold‘, at the bottom of this article.
[Update, after Jackson's comment, below] The idea behind the approach is that a frame-loop is an internal Flash player structure. It sits underneath the AS3 functionality, and so it doesn’t produce ENTER_FRAME events. By hooking into them we can produce our own iterative ‘event’ (Penner proposes using an as3signal), which doesn’t construct an object each time it is dispatched. It is a lovely idea, and quite distinct from invoking one ENTER_FRAME dispatcher on the root or stage of the application.
[Update, after Joa Ebert tweeted test results] Whether this method is superior to the ‘one-event ticker’ is open to question. Joa Ebert applied the frameloop concept to AudioBox and found no memory nor performance improvements.
Background
It all started today with a statement on Twitter from Robert Penner:
Listening to a single #AS3 enterFrame will create over 1000 event instances per minute. @robpenner
In fact, at a decent frame rate, you’re looking at 1800 event instances per minute. Across the blogosphere and on Twitter, there has been a lot of talk about the cost of object construction in AS3. People are pushing the limitations of the player, and are looking for performance optimisation wherever they can find it. Strategies such as object pooling are becoming mainstream in order to avoid the creation and destruction of objects. In the meanwhile, the internal Flash event model is creating and destroying huge numbers of objects to furnish us with an event model.
At the same time however, there has been a murmur of discomfiture about the state of the event model. (Not that I have much of a voice, but I chimed in to this debate as well!). The most notable result of this murmur was Penner’s decision to create as3signals. It is an on-going project to which I have contributed in a small way, and which I whole-heartedly support.
After Penner’s tweet, the #as3 twitterati (please tweet if there are glaring omissions from this list) became very excited indeed.
My initial thoughts led to the idea that maybe the setInterval/setTimeout functions, which are hangovers from AS2 might offer a solution – perhaps they shortcut the event system? Not only ain’t it so, they’re even worse as this Action Script Viewer deconstruction of the setInterval methods demonstrate.
So what else? Mike Chambers (aka @Mesh) in conversation with @TheFlashBum pointed towards a solution using a ‘ticker’; a single ENTER_FRAME dispatcher, and @bengarney confirmed that he also uses this technique.
It seems that @scottjanousek and @robpenner had the idea of using the player frameloop (which the solution offered here exploits) at pretty much the same time. (@scottjanousek’s tweets are protected, but it was reported by this tweet.)
Whether and how this vein of thought will develop is really interesting. It is fascinating how a tweet from one of the best-known ActionScript developers can provoke such a lot of attention from the general community. It is interesting too to see the envelope of what is possible being pushed.
Constants and Speed
A common scenario in coding is to have a constant that you would like to be shared across classes. Defining the constant separately is bad because if you change the constant value in one class it does not get changed in another, and you’re likely to have a bad dose of debugging. However, defining a constant in an external class and referencing it when you need it is bad because it is really slow.
How Slow?
Using Grant Skinner’s Performance Harness I did a quick test to determine exactly how slow various mechanisms for referencing variables and constants actually are. The results are as follows (based on accessing the variable/constant 10,000,000 times!):
| referencing a local variable | 46.80 ms |
| referencing a member variable | 44.00 ms |
| referencing a member constant | 48.40 ms |
| referencing a static constant in the same class file | 52.80 ms |
| referencing a package constant | 526.00 ms |
| referencing an external static constant | 550.00 ms |
There isn’t a significant difference between the speed of access of member variables, local variables, or member constants. Perhaps a static constant in the same package is a little slower, but then it would only be defined as public static if it was referenced elsewhere as an external static constant, so it is inevitably a slow choice at some stage.
I tested package constants on the basis that this blog I came across suggested that I could get a performance saving by using package constants. This does not appear to be the case; or at least not significantly.
Yes, you probably know this already, perhaps with the expception of the package constant, which I had never considered before. However, I’d never done this test definitively and having recently come across Skinner’s test harness I thought I’d be thorough.
Suggested Solution
My technique for solving the dilemma of wanting a single source for constants but accessing them quickly is to create a static constant repository, but reference them into member constants in your speed critical classes, as follows:
A constants class:
package example
{
public class Constants
{
public static const MY_CONSTANT:int = 100;
}
}
A class that uses the constant, in time-critical code:
package another
{
import example.Constants;
public class TimeCritialCode
{
private const MY_CONSTANT:int = Constants.MY_CONSTANT;
public function timeCriticalLoop():void
{
... important stuff referencing MY_CONSTANT ...
}
}
}
