fast resize transparent images
fast resize transparent images
Hi.
Very nice collection.
I use it in Delphi XE8 update 1. OS: Windows 8.1.
For example I need it for fast resizing many transparent bitmaps (pf32bit) and display them as animation.
I tried the StretchBitmap function(s) from madGraphics.pas. But, although I can input 32 bit bitmaps and it outputs 32 bit, it doesn't process the alpha too (transparency).
For example, the only difference from Bilinear32 and Bilinear24 is a few 3's turn into 4's. It just outputs as 32 bit but it doesn't seem to process the alpha field.
Could you please make it process the alpha too?
Thank you.
Very nice collection.
I use it in Delphi XE8 update 1. OS: Windows 8.1.
For example I need it for fast resizing many transparent bitmaps (pf32bit) and display them as animation.
I tried the StretchBitmap function(s) from madGraphics.pas. But, although I can input 32 bit bitmaps and it outputs 32 bit, it doesn't process the alpha too (transparency).
For example, the only difference from Bilinear32 and Bilinear24 is a few 3's turn into 4's. It just outputs as 32 bit but it doesn't seem to process the alpha field.
Could you please make it process the alpha too?
Thank you.
Re: fast resize transparent images
Hello,
I haven't worked on madGraphics for years. Of course it would be possible to add alpha processing. But to be honest, my to do list is already more than full with stuff I'm earning money with. So right now I simply have no time left to work on free parts of madCollection. That said, if you feel like changing madGraphics yourself I'd be happy to include your changes into my source code base.
I haven't worked on madGraphics for years. Of course it would be possible to add alpha processing. But to be honest, my to do list is already more than full with stuff I'm earning money with. So right now I simply have no time left to work on free parts of madCollection. That said, if you feel like changing madGraphics yourself I'd be happy to include your changes into my source code base.
Re: fast resize transparent images
I understand.
But can you at least add some comments to the code from Bilinear32 function explaining what the code lines do? I know how to use bitmap's scanline function but I've never seen code like yours.
It would help me a lot.
Thank you in advance.
But can you at least add some comments to the code from Bilinear32 function explaining what the code lines do? I know how to use bitmap's scanline function but I've never seen code like yours.
It would help me a lot.
Thank you in advance.
Re: fast resize transparent images
I guess I should have added more comments, but back at the time when I wrote that code I wasn't used to write a lot of comments. Anyway, I think this code changes the RGB values:
So if you just add one more line like this:
That might already take care of the alpha channel.
Code: Select all
dbLine^[0] := (sbLine1[xp1 ] * w11 + sbLine1[xp2 ] * w21 + sbLine2[xp1 ] * w12 + sbLine2[xp2 ] * w22) shr 16;
dbLine^[1] := (sbLine1[xp1 + 1] * w11 + sbLine1[xp2 + 1] * w21 + sbLine2[xp1 + 1] * w12 + sbLine2[xp2 + 1] * w22) shr 16;
dbLine^[2] := (sbLine1[xp1 + 2] * w11 + sbLine1[xp2 + 2] * w21 + sbLine2[xp1 + 2] * w12 + sbLine2[xp2 + 2] * w22) shr 16;
Code: Select all
dbLine^[3] := (sbLine1[xp1 + 3] * w11 + sbLine1[xp2 + 3] * w21 + sbLine2[xp1 + 3] * w12 + sbLine2[xp2 + 3] * w22) shr 16;
Re: fast resize transparent images
Yes, it woks
Thank you very much.
Thank you very much.
Re: fast resize transparent images
Just asking a question:
I'm thinking of rewriting the Bilinear32 function in asm, maybe even with MMX/SSE.
Do you think it will perform a lot faster, so it would worth the work?
I'm thinking of rewriting the Bilinear32 function in asm, maybe even with MMX/SSE.
Do you think it will perform a lot faster, so it would worth the work?
Re: fast resize transparent images
I'd go SSE2, almost all modern CPUs support that, it's nicer to work with and should produce a very noticeable performance improvement.
Re: fast resize transparent images
Not sure about SSE2, my application has to work on older CPU's too.
Btw, AMD implementation of SSE2 doesn't work as expected. Until a year ago I had an AMD CPU. The difference in performance wasn't so high as with Intel CPU's when using SSE2 optimized code.
I'm hoping MMX has a better implementation.
Btw, AMD implementation of SSE2 doesn't work as expected. Until a year ago I had an AMD CPU. The difference in performance wasn't so high as with Intel CPU's when using SSE2 optimized code.
I'm hoping MMX has a better implementation.
Re: fast resize transparent images
Just so you can have an idea about what I'm trying to make, here is a testing app:
https://drive.google.com/open?id=0ByKxA ... VlFQ1BzVlU
And a testing file:
https://drive.google.com/open?id=0ByKxA ... 2tGZEYybGs
Use the load button from the middle of the form to load the file and then click on Preview.
After it starts, use + - to resize or 0 to reset to original size. When its size is different from default then your code is used.
In the caption of the main window the "delay display" parameter will jump from ~5..10 ms to 30..50 ms (on my 2 GHz processor).
Well, I want to decrease that delay.
Btw, if you're interested I'll show you the code too.
https://drive.google.com/open?id=0ByKxA ... VlFQ1BzVlU
And a testing file:
https://drive.google.com/open?id=0ByKxA ... 2tGZEYybGs
Use the load button from the middle of the form to load the file and then click on Preview.
After it starts, use + - to resize or 0 to reset to original size. When its size is different from default then your code is used.
In the caption of the main window the "delay display" parameter will jump from ~5..10 ms to 30..50 ms (on my 2 GHz processor).
Well, I want to decrease that delay.
Btw, if you're interested I'll show you the code too.
Re: fast resize transparent images
I'm really short on time atm. But if you want to do "real time" animation scaling, you might want to consider using Direct3D. GPUs are much faster at that sort of stuff than even MMX/SSE/SSE2 etc.
Re: fast resize transparent images
Yes, I know.madshi wrote:I'm really short on time atm. But if you want to do "real time" animation scaling, you might want to consider using Direct3D. GPUs are much faster at that sort of stuff than even MMX/SSE/SSE2 etc.
I already found something called DelphiX http://www.micrel.cz/Dx/
But the problem is I have to transfer all the frames into the video memory as textures so I can display them. And the animation I showed you is 3 GB uncompressed (!). Do you know a video card with 3+ GB video memory?
Re: fast resize transparent images
Many GPUs these days have 2GB, some 4GB, some even more.
Anyway, you don't have to upload all the frames at once. Just create a queue of 3 frames, and delete frames from GPU RAM which were already displayed. That's how video players work.
Anyway, you don't have to upload all the frames at once. Just create a queue of 3 frames, and delete frames from GPU RAM which were already displayed. That's how video players work.
Re: fast resize transparent images
Not so many but, like I said, my app should work on older hardware too.madshi wrote:Many GPUs these days have 2GB, some 4GB, some even more.
Good idea.madshi wrote:Anyway, you don't have to upload all the frames at once. Just create a queue of 3 frames, and delete frames from GPU RAM which were already displayed. That's how video players work.
That's what I'm doing in RAM memory now.
Unfortunately with DelphiX this is too slow. It takes a few hundred ms to transfer just a frame (768x768).
Also I thought about using DSPack (to make a sort of "video player")
Re: fast resize transparent images
I started working to the asm conversion. And I understood why you recommended SSE2 - because MMX and SSE don't have 32 bit integer multiplication.
For now I just tried to convert a code line:
The SSE2 asm version:
But, instead of been faster, the code is slower (!?).
I wonder what am I doing wrong?
For now I just tried to convert a code line:
Code: Select all
dbLine^[0] := (sbLine1[xp1] * w11 + sbLine1[xp2] * w21 + sbLine2[xp1] * w12 + sbLine2[xp2] * w22) shr 16;
Code: Select all
asm
mov eax,[sbline1]
mov edx,[xp1]
movzx ecx,[eax+edx]
movd xmm0, ecx //sbLine1[xp1]
mov edx,[xp2]
movzx ecx,[eax+edx]
movd xmm4, ecx //sbLine1[xp2]
movd xmm2, [w11]
movd xmm6, [w21]
pmuludq xmm0, xmm2 //sbLine1[xp1] * w11
pmuludq xmm4, xmm6 //sbLine1[xp2] * w21
addpd xmm0, xmm4 //sbLine1[xp1] * w11 + sbLine1[xp2] * w21
movd eax, xmm0
push eax //send sbLine1[xp1] * w11 + sbLine1[xp2] * w21 to stack
mov eax,[sbline2]
movzx ecx,[eax+edx]
movd xmm0, ecx //sbLine2[xp2]
mov edx,[xp1]
movzx ecx,[eax+edx]
movd xmm4, ecx //sbLine2[xp1]
movd xmm2, [w22]
movd xmm6, [w12]
pmuludq xmm0, xmm2 //sbLine2[xp2] * w22
pmuludq xmm4, xmm6 //sbLine2[xp1] * w12
addpd xmm0, xmm4 //sbLine2[xp2] * w22 + sbLine2[xp1] * w12
movd eax, xmm0
pop edx //get sbLine1[xp1] * w11 + sbLine1[xp2] * w21 from stack
add eax, edx //sbLine1[xp1] * w11 + sbLine1[xp2] * w21 + sbLine2[xp2] * w22 + sbLine2[xp1] * w12
shr eax,$10 //(sbLine1[xp1] * w11 + sbLine1[xp2] * w21 + sbLine2[xp2] * w22 + sbLine2[xp1] * w12) shr 16
mov edx,[dbLine]
mov [edx],al
end;
I wonder what am I doing wrong?
Re: fast resize transparent images
Just using SSE2 instructions instead of normal x86/64 ASM instructions won't bring you any benefit. SSE2 doesn't multiply faster than x86/64. The purpose of SSE2 is not to do a single multiplication per instruction. It's to do 4 (dwords), 8 (words) or 16 (bytes) operations with one SSE2 instruction. Only if you do that, you get a speed improvement over x86/64.
So the proper way to use SSE2 is to 1) use an SSE2 instruction to load 16 bytes directly from RAM into an SSE2 register. Don't use x86/64 instructions to fill the SSE2 registers. 2) Use SSE2 instructions to operate on those 16 bytes directly somehow. 3) Use an SSE2 instruction to write the final result back to RAM.
Ideally you would do SSE2 operations on 16 different bytes (you know, 1 byte is one Red, Green, Blue or Alpha component of a 32bit RGBA pixel) "at once". Doing that will give you a very big speed gain. However, the code is more difficult to write, of course.
So the proper way to use SSE2 is to 1) use an SSE2 instruction to load 16 bytes directly from RAM into an SSE2 register. Don't use x86/64 instructions to fill the SSE2 registers. 2) Use SSE2 instructions to operate on those 16 bytes directly somehow. 3) Use an SSE2 instruction to write the final result back to RAM.
Ideally you would do SSE2 operations on 16 different bytes (you know, 1 byte is one Red, Green, Blue or Alpha component of a 32bit RGBA pixel) "at once". Doing that will give you a very big speed gain. However, the code is more difficult to write, of course.