This entry was posted
on Wednesday, March 26th, 2008 at 10:47 pm and is filed under Announcements.
You can follow any responses to this entry through the RSS 2.0 feed.
Both comments and pings are currently closed.
12 Responses to “What is the fastest way to draw pixels in AS3?”
LOL, I’ve made similiar test yesterday, after reading about your raytracing demo:) My case was simpler: I was testing only setPixel32 vs setPixels, but results are the same way disappointing. Before could bet, that setPixels with ByteArray should be much faster…
Probably main problem exists in generation: both use single pixel write method (writeInt vs setPixel).
noah: Nicely done! I don’t think it is cheating to multiple by .000125 if you know that there is a fixed width. In fact, if you wanted to take it one step further and you are using ints, you could replace a lot of calculations involving multiplication or division by a very fast bit-shift. For example, x*640 = (x<<9)+(x<<7).
One thing to be very careful of is implicit conversions between int and Number. I'm not sure about this in AS3, but in C++, that kind of conversion costs a lot of time. The take away message here is to use the same datatype for variables that will be used in the same calculation, wherever possible (even if it seems slightly inappropriate). For example, I hypothesis that it might be possible to increase the speed on test 1 by changing the x and y loop indices to Numbers (I haven't tried this yet.)
To be slightly nit-picky, I might point out that most of your optimizations do not actually make it faster to get pixels onto the screen, but increase the speed of calculating the RGB values of each pixel. In all of these tests, the amount of time it spends calculating the RGB values is small compared to the amount of time spent copying the RGB values into memory. However, in a real-world situation, it will take more effort to compute the RGB values. The same kind of tricks you used to speed up calculating RGB could probably be used in other situations, so these are definitely valuable observations.
>>I also manged to add maybe half an fps by changing your actual fps calculation variable from a Number to an int also, like so:
I’m not sure if you are actually getting an improvement in FPS or just a rounding error.
You are absolutely right about int being faster than Number, but the difference is not huge. See Grant Skinner’s post on int vs unsigned int vs. Number.
…to use a AND operator instead of the MOD operator makes the setPixel test case go from 25 to 40 FPS on my machine. I do think setPixel is the fastest approach but a lot of CPU time is being spent on the above calculation which requires some int->number->int roundtrip calculations.
I couldn’t understand some parts of this article o.us poetry, but I guess I just need to check some more resources regarding this, because it sounds interesting.
Thanks for the pointer to Grant’s article on int’s and Numbers. I hadn’t seen that one! You’re right of course that my optimizations were not really effecting the actual meat of the test itself, but like you say, some of these optimizations could be applied closer to that actual meat too!
It’s become almost 2nd nature for me now to always see if I can switch a multiplier in whenever I need/see a divisor, at least in AS3 anyway.
Seems you are doing an awful lot of computation per pixel just to get a test image. Since it’s repeated vertically, you could precompute one row and then write it out to the bitmap buffer one integer at a time in a very tight loop. To benchmark actual “rendering speed” you should have the simplest case possible for your graphical content.
I also doubt SetPixel is the fastest approach since it implies some internal computations (likely a multiplication and an addition) to arrive at the correct buffer position for a given set of coordinates.
So I’d say try test 5 but write one row of pixel data to a BUFFER_WIDTH-sized integer array before entering the X/Y loop, then only do the equivalent of buffer.writeInt(row_array[pos++]) inside that. Also, you could make it a flat one-dimensional loop by precomputing buffersize=BUFFER_WIDTH*BUFFER_HEIGHT and looping for 0
Whooops. I should have known better than to use the “less than”/tag character in my post…
After the last one cuts off I wrote:
…looping for 0 (less than) buffersize.
With most “real” graphics, the inner loops are often optimized to resemble the above data copy, or possibly with some addition and wrapping thrown in. For many (opaque) effects this is all you need (coupled with sparse setup code to initiate a strip of pixel rendering).
Actually I’m mostly a C coder, but I’m currently trying my hand at Flash, and fast pixel buffers seem like the most comfortable and familiar way to go. Haven’t actually written a line yet though, I’m surveying example code left and right at the moment.
Hah. I finally got my AS3 build environment up and running. Tried compiling this and playing around a bit with it, and indeed it doesn’t quite perform as might be expected
My assumption was that writing values into an array would be relatively quick, but nope. Even writing a constant red value to the entire buffer was as slow as your “complex” gradient.
I guess the graphics speed is there for copying static bitmap buffers back and forth, but actual pixel/array manipulation is the culprit. Presumably the setPixel function bypasses whatever memory management overhead regular array manipulation incurs, and results in faster performance despite its added complexity of coordinate-to-address calculation and clipping.
Oh well, my primary goal is to make some good old sprite animations so I guess I won’t suffer too much from this. Good luck with the assembly approach
March 27th, 2008 at 8:48 am
Very useful information. Kind of surprised actually, but I see why it makes sense. Well played!
March 28th, 2008 at 1:34 am
LOL, I’ve made similiar test yesterday, after reading about your raytracing demo:) My case was simpler: I was testing only setPixel32 vs setPixels, but results are the same way disappointing. Before could bet, that setPixels with ByteArray should be much faster…
Probably main problem exists in generation: both use single pixel write method (writeInt vs setPixel).
March 28th, 2008 at 2:20 am
Hi Forrest,
I managed to add a couple of fps to Test 1 by getting rid of the dt:Number variable and changing t to an int, then changing the r calculation from:
r = (t*100 + 255 * x / STAGE_WIDTH)%255;
to:
r = (t + 255 * x * 0.00125)%255;
Which is cheating perhaps by hardcoding the width variable but multiplication over division is a definite speed increase.
I also manged to add maybe half an fps by changing your actual fps calculation variable from a Number to an int also, like so:
var fps:int = 1/((getTimer() - timer) / 1000);
frameTimeTxt.text = “fps: ” + fps;
int does seems to be handled slightly faster than Number.
These are very enlightening tests though, thank you!
March 28th, 2008 at 11:20 am
noah: Nicely done! I don’t think it is cheating to multiple by .000125 if you know that there is a fixed width. In fact, if you wanted to take it one step further and you are using ints, you could replace a lot of calculations involving multiplication or division by a very fast bit-shift. For example, x*640 = (x<<9)+(x<<7).
One thing to be very careful of is implicit conversions between int and Number. I'm not sure about this in AS3, but in C++, that kind of conversion costs a lot of time. The take away message here is to use the same datatype for variables that will be used in the same calculation, wherever possible (even if it seems slightly inappropriate). For example, I hypothesis that it might be possible to increase the speed on test 1 by changing the x and y loop indices to Numbers (I haven't tried this yet.)
To be slightly nit-picky, I might point out that most of your optimizations do not actually make it faster to get pixels onto the screen, but increase the speed of calculating the RGB values of each pixel. In all of these tests, the amount of time it spends calculating the RGB values is small compared to the amount of time spent copying the RGB values into memory. However, in a real-world situation, it will take more effort to compute the RGB values. The same kind of tricks you used to speed up calculating RGB could probably be used in other situations, so these are definitely valuable observations.
>>I also manged to add maybe half an fps by changing your actual fps calculation variable from a Number to an int also, like so:
I’m not sure if you are actually getting an improvement in FPS or just a rounding error.
You are absolutely right about int being faster than Number, but the difference is not huge. See Grant Skinner’s post on int vs unsigned int vs. Number.
March 28th, 2008 at 11:18 pm
Changing your ‘r’ line to this…
r = (t*100 + 255 * x / STAGE_WIDTH)&0xFF;
…to use a AND operator instead of the MOD operator makes the setPixel test case go from 25 to 40 FPS on my machine. I do think setPixel is the fastest approach but a lot of CPU time is being spent on the above calculation which requires some int->number->int roundtrip calculations.
March 29th, 2008 at 9:47 am
Werner: Thanks for your comments. As you point out, Number to int conversions and vice-versa are very slow.
March 29th, 2008 at 4:19 pm
I couldn’t understand some parts of this article o.us poetry, but I guess I just need to check some more resources regarding this, because it sounds interesting.
March 31st, 2008 at 3:40 am
Thanks for the pointer to Grant’s article on int’s and Numbers. I hadn’t seen that one! You’re right of course that my optimizations were not really effecting the actual meat of the test itself, but like you say, some of these optimizations could be applied closer to that actual meat too!
It’s become almost 2nd nature for me now to always see if I can switch a multiplier in whenever I need/see a divisor, at least in AS3 anyway.
Great blog BTW.
April 24th, 2008 at 1:30 am
hey
Really nice tips!
will try to keep it in my mind
thanks!
April 27th, 2008 at 6:38 pm
Seems you are doing an awful lot of computation per pixel just to get a test image. Since it’s repeated vertically, you could precompute one row and then write it out to the bitmap buffer one integer at a time in a very tight loop. To benchmark actual “rendering speed” you should have the simplest case possible for your graphical content.
I also doubt SetPixel is the fastest approach since it implies some internal computations (likely a multiplication and an addition) to arrive at the correct buffer position for a given set of coordinates.
So I’d say try test 5 but write one row of pixel data to a BUFFER_WIDTH-sized integer array before entering the X/Y loop, then only do the equivalent of buffer.writeInt(row_array[pos++]) inside that. Also, you could make it a flat one-dimensional loop by precomputing buffersize=BUFFER_WIDTH*BUFFER_HEIGHT and looping for 0
April 27th, 2008 at 6:40 pm
Whooops. I should have known better than to use the “less than”/tag character in my post…
After the last one cuts off I wrote:
…looping for 0 (less than) buffersize.
With most “real” graphics, the inner loops are often optimized to resemble the above data copy, or possibly with some addition and wrapping thrown in. For many (opaque) effects this is all you need (coupled with sparse setup code to initiate a strip of pixel rendering).
Actually I’m mostly a C coder, but I’m currently trying my hand at Flash, and fast pixel buffers seem like the most comfortable and familiar way to go. Haven’t actually written a line yet though, I’m surveying example code left and right at the moment.
April 27th, 2008 at 7:50 pm
Hah. I finally got my AS3 build environment up and running. Tried compiling this and playing around a bit with it, and indeed it doesn’t quite perform as might be expected
My assumption was that writing values into an array would be relatively quick, but nope. Even writing a constant red value to the entire buffer was as slow as your “complex” gradient.
I guess the graphics speed is there for copying static bitmap buffers back and forth, but actual pixel/array manipulation is the culprit. Presumably the setPixel function bypasses whatever memory management overhead regular array manipulation incurs, and results in faster performance despite its added complexity of coordinate-to-address calculation and clipping.
Oh well, my primary goal is to make some good old sprite animations so I guess I won’t suffer too much from this. Good luck with the assembly approach