Page 1 of 2
fast resize transparent images
Posted: Thu Mar 24, 2016 2:00 pm
by Cosmin
Hi.
Very nice collection.
I use it in Delphi XE8 update 1. OS: Windows 8.1.
For example I need it for fast resizing many transparent bitmaps (pf32bit) and display them as animation.
I tried the StretchBitmap function(s) from madGraphics.pas. But, although I can input 32 bit bitmaps and it outputs 32 bit, it doesn't process the alpha too (transparency).
For example, the only difference from Bilinear32 and Bilinear24 is a few 3's turn into 4's. It just outputs as 32 bit but it doesn't seem to process the alpha field.
Could you please make it process the alpha too?
Thank you.
Re: fast resize transparent images
Posted: Thu Mar 24, 2016 2:10 pm
by madshi
Hello,
I haven't worked on madGraphics for years. Of course it would be possible to add alpha processing. But to be honest, my to do list is already more than full with stuff I'm earning money with. So right now I simply have no time left to work on free parts of madCollection. That said, if you feel like changing madGraphics yourself I'd be happy to include your changes into my source code base.
Re: fast resize transparent images
Posted: Thu Mar 24, 2016 2:37 pm
by Cosmin
I understand.
But can you at least add some comments to the code from Bilinear32 function explaining what the code lines do? I know how to use bitmap's scanline function but I've never seen code like yours.
It would help me a lot.
Thank you in advance.
Re: fast resize transparent images
Posted: Thu Mar 24, 2016 3:50 pm
by madshi
I guess I should have added more comments, but back at the time when I wrote that code I wasn't used to write a lot of comments. Anyway, I think this code changes the RGB values:
Code: Select all
dbLine^[0] := (sbLine1[xp1 ] * w11 + sbLine1[xp2 ] * w21 + sbLine2[xp1 ] * w12 + sbLine2[xp2 ] * w22) shr 16;
dbLine^[1] := (sbLine1[xp1 + 1] * w11 + sbLine1[xp2 + 1] * w21 + sbLine2[xp1 + 1] * w12 + sbLine2[xp2 + 1] * w22) shr 16;
dbLine^[2] := (sbLine1[xp1 + 2] * w11 + sbLine1[xp2 + 2] * w21 + sbLine2[xp1 + 2] * w12 + sbLine2[xp2 + 2] * w22) shr 16;
So if you just add one more line like this:
Code: Select all
dbLine^[3] := (sbLine1[xp1 + 3] * w11 + sbLine1[xp2 + 3] * w21 + sbLine2[xp1 + 3] * w12 + sbLine2[xp2 + 3] * w22) shr 16;
That might already take care of the alpha channel.
Re: fast resize transparent images
Posted: Thu Mar 24, 2016 6:24 pm
by Cosmin
Yes, it woks
Thank you very much.
Re: fast resize transparent images
Posted: Sat Mar 26, 2016 10:18 am
by Cosmin
Just asking a question:
I'm thinking of rewriting the Bilinear32 function in asm, maybe even with MMX/SSE.
Do you think it will perform a lot faster, so it would worth the work?
Re: fast resize transparent images
Posted: Sat Mar 26, 2016 10:28 am
by madshi
I'd go SSE2, almost all modern CPUs support that, it's nicer to work with and should produce a very noticeable performance improvement.
Re: fast resize transparent images
Posted: Sat Mar 26, 2016 10:51 am
by Cosmin
Not sure about SSE2, my application has to work on older CPU's too.
Btw, AMD implementation of SSE2 doesn't work as expected. Until a year ago I had an AMD CPU. The difference in performance wasn't so high as with Intel CPU's when using SSE2 optimized code.
I'm hoping MMX has a better implementation.
Re: fast resize transparent images
Posted: Sat Mar 26, 2016 1:20 pm
by Cosmin
Just so you can have an idea about what I'm trying to make, here is a testing app:
https://drive.google.com/open?id=0ByKxA ... VlFQ1BzVlU
And a testing file:
https://drive.google.com/open?id=0ByKxA ... 2tGZEYybGs
Use the load button from the middle of the form to load the file and then click on Preview.
After it starts, use + - to resize or 0 to reset to original size. When its size is different from default then your code is used.
In the caption of the main window the "delay display" parameter will jump from ~5..10 ms to 30..50 ms (on my 2 GHz processor).
Well, I want to decrease that delay.
Btw, if you're interested I'll show you the code too.
Re: fast resize transparent images
Posted: Sat Mar 26, 2016 1:28 pm
by madshi
I'm really short on time atm. But if you want to do "real time" animation scaling, you might want to consider using Direct3D. GPUs are much faster at that sort of stuff than even MMX/SSE/SSE2 etc.
Re: fast resize transparent images
Posted: Sat Mar 26, 2016 1:38 pm
by Cosmin
madshi wrote:I'm really short on time atm. But if you want to do "real time" animation scaling, you might want to consider using Direct3D. GPUs are much faster at that sort of stuff than even MMX/SSE/SSE2 etc.
Yes, I know.
I already found something called DelphiX
http://www.micrel.cz/Dx/
But the problem is I have to transfer all the frames into the video memory as textures so I can display them. And the animation I showed you is 3 GB uncompressed (!). Do you know a video card with 3+ GB video memory?
Re: fast resize transparent images
Posted: Sat Mar 26, 2016 1:47 pm
by madshi
Many GPUs these days have 2GB, some 4GB, some even more.
Anyway, you don't have to upload all the frames at once. Just create a queue of 3 frames, and delete frames from GPU RAM which were already displayed. That's how video players work.
Re: fast resize transparent images
Posted: Sat Mar 26, 2016 1:54 pm
by Cosmin
madshi wrote:Many GPUs these days have 2GB, some 4GB, some even more.
Not so many but, like I said, my app should work on older hardware too.
madshi wrote:Anyway, you don't have to upload all the frames at once. Just create a queue of 3 frames, and delete frames from GPU RAM which were already displayed. That's how video players work.
Good idea.
That's what I'm doing in RAM memory now.
Unfortunately with DelphiX this is too slow. It takes a few hundred ms to transfer just a frame (768x768).
Also I thought about using DSPack (to make a sort of "video player")
Re: fast resize transparent images
Posted: Sun Mar 27, 2016 10:02 am
by Cosmin
I started working to the asm conversion. And I understood why you recommended SSE2 - because MMX and SSE don't have 32 bit integer multiplication.
For now I just tried to convert a code line:
Code: Select all
dbLine^[0] := (sbLine1[xp1] * w11 + sbLine1[xp2] * w21 + sbLine2[xp1] * w12 + sbLine2[xp2] * w22) shr 16;
The SSE2 asm version:
Code: Select all
asm
mov eax,[sbline1]
mov edx,[xp1]
movzx ecx,[eax+edx]
movd xmm0, ecx //sbLine1[xp1]
mov edx,[xp2]
movzx ecx,[eax+edx]
movd xmm4, ecx //sbLine1[xp2]
movd xmm2, [w11]
movd xmm6, [w21]
pmuludq xmm0, xmm2 //sbLine1[xp1] * w11
pmuludq xmm4, xmm6 //sbLine1[xp2] * w21
addpd xmm0, xmm4 //sbLine1[xp1] * w11 + sbLine1[xp2] * w21
movd eax, xmm0
push eax //send sbLine1[xp1] * w11 + sbLine1[xp2] * w21 to stack
mov eax,[sbline2]
movzx ecx,[eax+edx]
movd xmm0, ecx //sbLine2[xp2]
mov edx,[xp1]
movzx ecx,[eax+edx]
movd xmm4, ecx //sbLine2[xp1]
movd xmm2, [w22]
movd xmm6, [w12]
pmuludq xmm0, xmm2 //sbLine2[xp2] * w22
pmuludq xmm4, xmm6 //sbLine2[xp1] * w12
addpd xmm0, xmm4 //sbLine2[xp2] * w22 + sbLine2[xp1] * w12
movd eax, xmm0
pop edx //get sbLine1[xp1] * w11 + sbLine1[xp2] * w21 from stack
add eax, edx //sbLine1[xp1] * w11 + sbLine1[xp2] * w21 + sbLine2[xp2] * w22 + sbLine2[xp1] * w12
shr eax,$10 //(sbLine1[xp1] * w11 + sbLine1[xp2] * w21 + sbLine2[xp2] * w22 + sbLine2[xp1] * w12) shr 16
mov edx,[dbLine]
mov [edx],al
end;
But, instead of been faster, the code is slower (!?).
I wonder what am I doing wrong?
Re: fast resize transparent images
Posted: Sun Mar 27, 2016 11:31 am
by madshi
Just using SSE2 instructions instead of normal x86/64 ASM instructions won't bring you any benefit. SSE2 doesn't multiply faster than x86/64. The purpose of SSE2 is not to do a single multiplication per instruction. It's to do 4 (dwords), 8 (words) or 16 (bytes) operations with one SSE2 instruction. Only if you do that, you get a speed improvement over x86/64.
So the proper way to use SSE2 is to 1) use an SSE2 instruction to load 16 bytes directly from RAM into an SSE2 register. Don't use x86/64 instructions to fill the SSE2 registers. 2) Use SSE2 instructions to operate on those 16 bytes directly somehow. 3) Use an SSE2 instruction to write the final result back to RAM.
Ideally you would do SSE2 operations on 16 different bytes (you know, 1 byte is one Red, Green, Blue or Alpha component of a 32bit RGBA pixel) "at once". Doing that will give you a very big speed gain. However, the code is more difficult to write, of course.