AlecMcE.com

Coding on the Flash Platform

Archive for the ‘performance’ Category

Fast 2D Arrays/Vectors (and the Game Of Life)

View Comments

Introduction

I was recently looking at an implementation of Conway’s Game Of Life, and was struck by the way in which the developer had constructed his grid. The Game of Life requires that you construct a two-dimensional array, but Flash does not support 2D arrays natively, so work arounds have to be found. There are slow workarounds, fast workarounds, and very fast workarounds. Of course, for very fast workarounds, we use Vector and target the Flash Player 10.

An example of Conway’s Game Of Life can be found below.

I should point out at the start that in these I am using for (var x:int = 0; x < 100; x++) loops. Using while (x--) or even while(--x > -1) may well be quicker, but here I am trying to highlight the relative speeds of the data structures that are being looped around, not the loops themselves.

Slow

public function generate():void
{
	grid = [];
	for (var x:int = 0; x < 100; x++)
	{
		grid[x] = [];
		for (var y:int = 0; y < 100; y++)
		{
			grid[x][y] = Math.random() < 0.5;
		}
	}
}

public function retrieve(x:int, y:int):Boolean
{
	return grid[x][y];
}

This is the slowest good method around. It's undoubtedly good, in that it is readable and concise. It's not so good in terms of speed though, because of the double Array reference that is required to get to the relevant reference. Array lookups are slow, and should be avoided in performance-critical areas.

Fast

public function generate():void
{
	grid = [];
	for (var x:int = 0; x < 100; x++)
	{
		for (var y:int = 0; y < 100; y++)
		{
			grid[x * 100 + y] = Math.random() < 0.5;
		}
	}
}

public function retrieve(x:int, y:int):Boolean
{
	return grid[x * 100 + y];
}

Instead of putting your 2D grid in a nested array, we can just use a one-dimensional array, and use an expression x * 100 + y within the lookup to differentiate between cells.

If you're targetting the version 9 player still, then this is probably the fastest mechanism you're going to get. (there is probably no need to, the latest Flash player penetration statistics suggests about 95% adoption in the western hemisphere, and above 90% adoption elsewhere)

Fast (again, but with Vector)

public function generate():void
{
	grid = new Vector.<Boolean>(10000, true);
	for (var x:int = 0; x < 100; x++)
	{
		for (var y:int = 0; y < 100; y++)
		{
			grid[x * 100 + y] = Math.random() < 0.5;
		}
	}
}

public function retrieve(x:int, y:int):Boolean
{
	return grid[x * 100 + y];
}

Moving to use the FP10 Vector class gives you some great benefits. The Flash Player can be told up-front how big the array is going to be, and it knows the type of element. It is all-round quicker to do it this way. However, this is not the quickest structure we can use!

Faster

public function generate():void
{
	shift = 7;
	grid = new Vector.<Boolean>(100 << shift, true);

	for (var x:int = 0; x < 100; x++)
	{
		for (var y:int = 0; y < 100; y++)
		{
			grid[(x << shift) | y] = Math.random() < 0.5;
		}
	}
}

public function retrieve(x:int, y:int):Boolean
{
	return grid[(x << shift) | y];
}

By referencing the cell using [x * 100 + y] we get the computer to do a multiplication and an addition. That's pretty quick, but while we consider these two of the most primitive mathematical operations, they are not two primitive calculations for a computer! Instead, we can use [(x << shift) | y] which replaces the multiplication and addition with a bit-shift and a bitwise disjunction. They are about as simple calculations as a computer can make, and the time it gains us is small, but significant if we are working with big grids.

I do not intend to go into the details of bitwise operations here, but for further reading, try this Wikipedia article.

Interestingly, if you try this approach with Array, it does not run as quickly as the example labelled 'Fast'. This is because, as Jackson Dunstan and JP Auclair noted, it will leave 'gaps' in the Array (for example no values will be entered in the above Vector for values, 100 - 127. For a Vector of a known size, that is not a performance problem, but Arrays work differently.

Results Summary

The results on the horizontal access are milliseconds that it took to iterate through the grid 1000 times. The difference in speed by adopting Vector and the bitwise-shift approach is to cut the calculation time by nearly a quarter. The bigger the grid is, the more this technique becomes useful.

After-Thought - Game Of Life

The Game Of Life is a grid structure, in which each cell is a Boolean switch; either 'on' or 'off'. Iteratively, the grid changes, according to a very simple instruction set:

  1. Each grid cell's value is determined by the values of the eight adjacent grid cells;
  2. For each cell, If exactly three of those adjacent cells are 'on', then in the next iteration, that cell will be 'on';
  3. For each cell, if two adjacent cells are 'on' then in the next iteration that cell will remain in it's current state;
  4. Otherwise, that cell's value in the next iteration will be set to 'off'.

At the edges, either those cells just have fewer neighbours, or the neighbours are thought to 'wrap around' the 2D space, so that the cells' universe resembles a torus in 3D space.

The result of these simple rules is an interesting pattern which is very interesting for a great number of reasons. The example below should show you what I mean. It will run only when the mouse is over the application. Click on it to break the iteration cycle and populate the cells with random data.

The Flash plugin is required to view this object.

source code

Written by alec

March 6th, 2010 at 10:01 pm

Posted in as3, performance

Introducing RacetrackStats v.0.0.1

View Comments

Introduction

RacetrackStats is an in-Flash performance profiler that gives real-time feedback about how the code it is contained within is performing. It can be found at GitHub:

http://github.com/alecmce/RacetrackStats

RacetrackStats is inspired by and seeks to move forward from Mr Doob’s Stats package (http://code.google.com/p/mrdoob/wiki/stats)

In Action

Below is the first alpha of RacetrackStats in action.

The Flash plugin is required to view this object.

source: http://github.com/alecmce/RacetrackStats/blob/master/test/demo/RacetrackStatsDemo.as

The SWF comprises three small animations (from left to right):

  • A mixture of computation and rendering: A simple BitmapData-driven particle system attempting to model smoke using 100000 pixels and a heavy glow filter and blur filter;
  • Heavy computation: A prime-factorisation algorithm, which computes 24 * 100 prime factor decompositions per frame. As it continues to run, the numbers get larger which leads to heavier calculation cycles;
  • Extremely heavy rendering: A cycadelic circle pattern that has hundreds of semi-transparent circular sprites moving around and overlapping with each other.

The ‘Elastic Racetrack’

RacetrackStats attempts to go beyond Mr Doob’s Stats by measuring how the elastic racetrack is being stretched. The Elastic Racetrack was a phrase coined to describe how the original AVM1 Flash Player segmented each second into frames and attempted to balance the requirements of code, refreshing the screen and doing work like garbage collection. The AVM2 player’s model is sufficiently similar, that the name remains appropriate.

Code Execution

It does this by hooking into the Event.ENTER_FRAME and Event.RENDER events to record when portions of the code start and end. By recording when the ENTER_FRAME starts and when the RENDER starts, the intervening time is taken to be code execution.

Pre-Render Code Execution

Commonly this part of the code cycle is almost unused. If you exploit stage.invalidate and Event.RENDER in order for your classes to execute code after the main code execution has completed, then it will be captured here, between the top-priority Event.RENDER and the bottom-priority Event.RENDER calls.

Rendering

A problem is, the time between the RENDER ending and the ENTER_FRAME starting is used to draw the screen but if time is left over in which the Flash Player has nothing to do, it will idle during this time also. To counteract this, RacetrackStats attempts to throttle the FlashPlayer so that the application is always running as fast as it can, minimising, or even eliminating that idle time. I call this the “render” portion – the “Event.RENDER” event is badly named – screen rendering happens after the RENDER event has finished.

Work To Do

As soon as you add weight like this to an application, the speed at which it runs changes. By measuring the speed, you change the speed. It is the classic scientific paradox. It is mitigated by shrinking down the weight of the stats package as far as possible. At the moment, functionality rather than size is the chief concern, though I have tried to minimize the package by avoiding embedding fonts, and so forth.

I would love to hear what you think about the package, whether positive or negative. I’d also really appreciate it if you use it in your applications. If you have any problems, questions, queries or comments, please leave a comment below, or open an issue on GitHub.

Written by alec

March 2nd, 2010 at 12:08 am

Posted in as3, performance, tools

Events and Signals – Performance Tests

View Comments

The new as3signals library devised by Robert Penner offers a replacement to the native AS3 event model. One benefit is that signals are more lightweight, and therefore faster. Grant Skinner’s Performance Harness is a widely accepted library for comparing the speeds of code execution.

I found the following results on Mac OS X, Flash Player 10.0.42.34, with a release build.

When an event (signal) is dispatched but nothing is listening for it:

A graph indicating that as3signals events dispatch roughly 5 times faster than native as3 events.

When an event (signal) is dispatched and handled by one method listener:

A graph indicating that as3signals signals are dispatched and captured at twice the speed of native as3 events.

The source code for my test methods follows:

If you browse these tests, you will notice that more tests were run than I have displayed above. The graphs above reference results from running the dataEventOptimised and dataSignal methods in each context.

These tests are not scientific. No attempt has been to control for player version, computer version, computer specifications, so on and so forth. You are invited to download and run these tests for yourself. If you do, remember that testing on a release build not a debug build is appropriate, and then please post your results as a comment, particularly if your results differ from my own.

Updates

For a comparison with good old fashioned callbacks, please put on your sarcasm snorkel and visit va.lent.in’s follow up post.

From the as3signals Google Group there was a suggestion that the Flash Player 10.1 might produce different results. Here are my results; they appear to corroborate the earlier results:

––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
performancetests.EventsTest (5 iterations)
Player version: MAC 10,1,51,66 (regular)
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
method...................................................ttl ms...avg ms
dataEvent                                                  2344   468.80
dataEventOptimised                                         2270   454.00
dataSignal                                                  313    62.60
simpleEvent                                                2755   551.00
simpleEventOptimised                                       2285   457.00
simpleSignal                                                174    34.80
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––

––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
performancetests.CapturedEventsTest (5 iterations)
Player version: MAC 10,1,51,66 (regular)
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
method...................................................ttl ms...avg ms
dataEvent                                                  2708   541.60
dataEventOptimised                                         2703   540.60
dataSignal                                                 1182   236.40
simpleEvent                                                2724   544.80
simpleEventOptimised                                       2717   543.40
simpleSignal                                               1042   208.40
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––

Written by alec

January 26th, 2010 at 8:20 am

Posted in as3, performance

Tagged with

Replacing ENTER_FRAME

View Comments

Today an interesting discussion took place on Twitter, in which I was involved. It caught my attention for a variety of reasons, and I wanted to document the conversation, the problem and the solution here. It started when it was identified that Flash’s event dispatcher is sub-optimal in performance terms, and revolved around trying to find a solution.

Why replace the ENTER_FRAME?

Every event that is dispatched creates an object. An ENTER_FRAME creates as many objects per second as there are frames. If you have mutliple objects dispatching ENTER_FRAME events, that’s many objects creating event objects many times per second. Computationally, chains of “many” are one of those things you want to avoid. So, can we do better?

Doing Better

Robert Penner proposed creating an as3signal and using a MovieClip with two frames to dispatch that signal on each frame. He posted his idea to this GitHub gist.

One drawback to this approach is that it requires a MovieClip symbol with two empty frames, to which the two event dispatcher methods could be attached. I remembered reading in this excellent article by Bit 101 how using a [Frame] metadata tag would force your application to have two frames. This seemed to be what Penner needed for his functionality to work correctly, so I came up with my own GitHub gist which extends the concept. The code is also added ‘below the fold‘, at the bottom of this article.

[Update, after Jackson's comment, below] The idea behind the approach is that a frame-loop is an internal Flash player structure. It sits underneath the AS3 functionality, and so it doesn’t produce ENTER_FRAME events. By hooking into them we can produce our own iterative ‘event’ (Penner proposes using an as3signal), which doesn’t construct an object each time it is dispatched. It is a lovely idea, and quite distinct from invoking one ENTER_FRAME dispatcher on the root or stage of the application.

[Update, after Joa Ebert tweeted test results] Whether this method is superior to the ‘one-event ticker’ is open to question. Joa Ebert applied the frameloop concept to AudioBox and found no memory nor performance improvements.

Background

It all started today with a statement on Twitter from Robert Penner:

Listening to a single #AS3 enterFrame will create over 1000 event instances per minute. @robpenner

In fact, at a decent frame rate, you’re looking at 1800 event instances per minute. Across the blogosphere and on Twitter, there has been a lot of talk about the cost of object construction in AS3. People are pushing the limitations of the player, and are looking for performance optimisation wherever they can find it. Strategies such as object pooling are becoming mainstream in order to avoid the creation and destruction of objects. In the meanwhile, the internal Flash event model is creating and destroying huge numbers of objects to furnish us with an event model.

At the same time however, there has been a murmur of discomfiture about the state of the event model. (Not that I have much of a voice, but I chimed in to this debate as well!). The most notable result of this murmur was Penner’s decision to create as3signals. It is an on-going project to which I have contributed in a small way, and which I whole-heartedly support.

After Penner’s tweet, the #as3 twitterati (please tweet if there are glaring omissions from this list) became very excited indeed.

My initial thoughts led to the idea that maybe the setInterval/setTimeout functions, which are hangovers from AS2 might offer a solution – perhaps they shortcut the event system? Not only ain’t it so, they’re even worse as this Action Script Viewer deconstruction of the setInterval methods demonstrate.

So what else? Mike Chambers (aka @Mesh) in conversation with @TheFlashBum pointed towards a solution using a ‘ticker’; a single ENTER_FRAME dispatcher, and @bengarney confirmed that he also uses this technique.

It seems that @scottjanousek and @robpenner had the idea of using the player frameloop (which the solution offered here exploits) at pretty much the same time. (@scottjanousek’s tweets are protected, but it was reported by this tweet.)

Whether and how this vein of thought will develop is really interesting. It is fascinating how a tweet from one of the best-known ActionScript developers can provoke such a lot of attention from the general community. It is interesting too to see the envelope of what is possible being pushed.

Read the rest of this entry »

Written by alec

November 26th, 2009 at 11:37 pm

Constants and Speed

View Comments

A common scenario in coding is to have a constant that you would like to be shared across classes. Defining the constant separately is bad because if you change the constant value in one class it does not get changed in another, and you’re likely to have a bad dose of debugging. However, defining a constant in an external class and referencing it when you need it is bad because it is really slow.

How Slow?

Using Grant Skinner’s Performance Harness I did a quick test to determine exactly how slow various mechanisms for referencing variables and constants actually are. The results are as follows (based on accessing the variable/constant 10,000,000 times!):

referencing a local variable 46.80 ms
referencing a member variable 44.00 ms
referencing a member constant 48.40 ms
referencing a static constant in the same class file 52.80 ms
referencing a package constant 526.00 ms
referencing an external static constant 550.00 ms

There isn’t a significant difference between the speed of access of member variables, local variables, or member constants. Perhaps a static constant in the same package is a little slower, but then it would only be defined as public static if it was referenced elsewhere as an external static constant, so it is inevitably a slow choice at some stage.

I tested package constants on the basis that this blog I came across suggested that I could get a performance saving by using package constants. This does not appear to be the case; or at least not significantly.

download the test source code

Yes, you probably know this already, perhaps with the expception of the package constant, which I had never considered before. However, I’d never done this test definitively and having recently come across Skinner’s test harness I thought I’d be thorough.

Suggested Solution

My technique for solving the dilemma of wanting a single source for constants but accessing them quickly is to create a static constant repository, but reference them into member constants in your speed critical classes, as follows:

A constants class:

package example
{
	public class Constants
	{
		public static const MY_CONSTANT:int = 100;
	}
}

A class that uses the constant, in time-critical code:

package another
{
	import example.Constants;

	public class TimeCritialCode
	{
		private const MY_CONSTANT:int = Constants.MY_CONSTANT;

		public function timeCriticalLoop():void
		{
			... important stuff referencing MY_CONSTANT ...
		}
	}
}

Written by alec

November 17th, 2009 at 11:48 am

Posted in as3, performance