« generic.avs grows stronger | Main | anri-chan and total independence »

June 23, 2007

time for lossy nmf

[20:53:11] <nate> i've decided i'm never using lossless nmf again

[20:58:18] <Radix> why>
[20:58:32] <DJGrenola> you had to ask didn't you
[20:58:33] <DJGrenola> heheh
[21:00:27] <nate> disk read locks
[21:00:31] <nate> i have too much cpu with 2 cores
[21:00:40] <nate> so what happens when i put a 4 core cpu in there
[21:00:46] <nate> i buy a raid? i don't think so
[21:01:00] <nate> more cpu just means less relative cost to decode x264 original
[21:01:10] <nate> can't use it with interlaced stuff but that's not an issue
[21:01:16] <nate> this is post-mvbob stuff
[21:01:39] <nate> basically what happened is the zoid ss flv was cut off
[21:01:49] <nate> because it was one of those times p: filled up while i was in kansas
[21:01:54] <nate> and i only just caught it today
[21:02:01] <nate> so now it's taking ~12 hours to reencode the flv
[21:02:07] <nate> when it's less than half that cpu wise
[21:02:33] <DJGrenola> I'd definitely consider looking into ffvhuff in ffdshow
[21:02:39] <DJGrenola> I use that for everything
[21:02:47] <nate> if it's lossless then it's too big
[21:02:51] <nate> well
[21:02:59] <nate> if it's anything like huffyuv anyway
[21:03:04] <DJGrenola> it's a little smaller than huffyuv is, ~20% or so
[21:03:09] <nate> 3:1 compression for all #000000 is pathetic
[21:03:38] <Radix> have you tried a compressed partition?
[21:04:04] <DJGrenola> :o
[21:04:10] <nate> nope ... but do you really think blind compression i.e. not knowing it's compressing video will do a whole lot
[21:04:19] <DJGrenola> that's a clever idea
[21:04:27] <nate> certainly can't be more than the 3:1 the shit lossless ones get
[21:04:32] <Radix> i dunno, zip compress a lossless avi and compare to huffyuv
[21:04:51] <Radix> huffyuv was built for speed, not compression ratio
[21:06:33] <nate> ok making a test file
[21:06:39] <Radix> also try zip compressiong a huffyuv avi
[21:06:45] <Radix> preferrably the same one i guess :-p
[21:26:26] <nate> ok this is pal d1 f1
[21:26:44] <nate> samus wearing the varia fusion suit kills the sheegoth and gets the wave beam, then leaves c.i.temple
[21:33:30] <nate> clip is about 34 seconds long

1.9G test_rgb24.avi
924M test_rgb24.avi.zip
680M test_huffyuv.avi
611M test_huffyuv.avi.zip
467M test_lagarith.avi
467M test_lagarith.avi.zip
328M test_x264.avi
328M test_x264.avi.zip

[22:02:54] <nate> quantizer 0 x264 avi it is
[22:03:12] <nate> though it seems lagarith would come close
[22:04:32] <nate> xp (10 megabaud) mpeg-2 original from dvd would be just over 42.5 meg
[22:05:10] <DJGrenola> wonder if snow has a lossless mode
[22:19:57] <nate> trying quantizer 1 etc
[22:37:46] <nate> looks like quantizer 1 and quality 1 are similar in size to the quantizer 0
[22:38:01] <nate> so what i'm going to do is no quantizer two pass 10 megabaud and look at the thing closely to see if i can see any artifacts
[...]
[23:13:42] <nate> the difference between the original and 10 megabaud is slight
[23:13:50] <nate> visible ... but not necessarily bad
[23:13:57] <nate> what it did was smooth out some areas
[23:14:03] <nate> typical of h.264 i guess
[23:14:18] <nate> the thing is that a lot of what it smoothed out was actually not supposed to be there in the first place
[23:14:22] <nate> because it was artifacts from mvbob
[23:14:34] <nate> so sometimes the lossy looks better than the lossless (1, 3)
[23:16:54] <nate> it's certainly better than mpeg-2!
[23:17:50] <nate> near key frames they're almost identical (2)

1: original | 10 megabaud 2-pass x264
2: original | 10 megabaud 2-pass x264
3: original | 10 megabaud 2-pass x264

[23:26:31] <nate> so 10 megabaud 2-pass x264 avi is the new standard nmf
[23:26:35] <nate> hurrah

Posted by njahnke at June 23, 2007 10:11 PM

Comments

Uhh...

This is for the output of avisynth from DVD deinterlaced and the input of the final encode, right?

1) The screenshots show contrast loss. For example in the 3rd one the ball reached true yellow (FFFF00) whereas in the encoded one is stays a little lower. It's easily visible.

2) 2-pass means doing MVBob twice. The time lost will be higher than the one saved...

3) It provides inconsistent quality, as with any bitrate-based encoding. For example, in the newest Soldier of fortune 2 speedrun, some jungle segments still look terrible at insane quality, as 5mbps fall very short.

If the settings are the same you will be using,

4) Exhaustive motion search? That's crazy and useless. Doesn't it slow down the encoding several times?

5) B-frames and B-pyramid in AVI? These settings will cause a 4-frame lag (2 for the encoder to do 2-bframes, 1 for the AVIfile decoder for b-frames and the last one for b-pyramid), which will cause AV-desync (and is ugly anyway). If using directshow, it'll still be a 2-frame delay.

Why do you need such small filesize, when you were doing fine except for the bandwidth? Fixed quant at 5 or 10 or so should be indistinguishable and avoid the disk bottleneck.

If you are going to use AVI, by all means, disable B-frames, otherwise it's just more trouble. Open the file you just posted in VDub: The first 2 frames are dropped frames. If you advance frame by frame, the first keyframe will appear, but then it'll freeze for another 2 frames, for a total delay of 4, as I explaned above.

For more speed, you could try XviD, 1 pass quality mode, disable B-frames, disable trellis, and set the quantizer to 2. This will be much faster for encoding and decoding, more robust for AVI, and still provide a reasonable filesize. You don't need (or want) the hyper-complexity of H264 for this sort of stuff.

Posted by: John ? at June 28, 2007 12:05 PM

thanks very much for your comments. this is still a work in progress as i only found out after i wrote the entry how much of a hack x264 in vfw is. unfortunately i haven't yet found a superior replacement, believe it or not. i can't use commandline because mp4box won't mux lossless audio and ffmpeg overflows and hangs encoding x264, and i can't use avidemux because it won't open avs.

1) yes, but is this compounded with each encode?
2) it's a very complex issue made even more complex by the fact that mvbob is limited to one core. therefore for single segment stuff i'm not sure there would really be any more time lost versus going through a temporary lagarith (remember that my cpus are so fast that even lagarith will read lock the disk). mvbob twice is nothing compared to mvbob 14 times or the disk read locking and choking the cpus 14 times (standard 7 qualities, 2 passes each).
3) inconsistent versus ... constant quality? i have to admit i'm not quite sure how cq even works, so i was afraid to try it. also keep in mind that sda iq is 5 megabaud while this would be 10.
4) again, cpu use is irrelevant coming from lagarith because the disk read locks. less insane settings would be used if this disk bandwidth thing were not already in my way.
5) yeah, i found out about the unfortunate state of b-frames into avi after i committed this entry. again, i'm not sure what to do at this point. b-frames are my best friend for game footage.

as i said before, rate can't go much above 10 megabaud without losing efficiency due to the storage bottleneck, and so the goal is to make good lossy nmf at that rate.

xvid? seriously? i'm not sure i can handle the thought of fixed macroblock size on white text on a black background. just seems like shooting myself in the foot especially for all digital captures.

thanks again for the reply.

Posted by: nathan jahnke Author Profile Page at June 28, 2007 12:36 PM

I don't understand how can you be so IO-bottlenecked. Perhaps there's something I don't know about your setup, but:

10mbps = 1.25 megabytes / second

Now let's assume the hard disk can provide 40 mbytes/s (a rather conservative estimate). For a 30fps source, you'd have to be encoding at 960fps for this to be the bottleneck. For a 60fps source, at 1920fps.

I can believe a lossless source can be a bottleneck (the lagarith encode you mention in the post is 115mbps), but for a lossy one, I think you should be able to go much higher (than 10mbps). Then there's the issue the encoding is VBR, which means for a given bitrate some parts will "bottleneck" more than others as bitrate varies...

1) I have no idea, you'll have to compare the final encode with the source for contrast loss...

2&3) Constant quantizer at 10-15 or so, or whatever gives you something in the neighborhood of 10-20mbps for "normal" sources... My point is that since you don't need a given filesize, 2 pass is overkill (and suboptimal since it gives you a size at the expense of varying quality, but you ideally want good quality always at the expense of varying size). Of course, with constant quantizer, bitrate will vary a lot depending on content, but at this stage bitrate isn't important as long as it is much less than lossless.

I see this "lossy nmf" thing as a way to speed up things overall, but I don't think obsessing over not being bottlenecked is good, even at 50mbps you'd still be 2x as fast as lagarith, is it really worth sacrificing quality for the sake of not being bottlenecked in some cases?

4) But this will not come from lagarith, it'll come from DVD->MVBob, right? esa is not really better than umh (in fact, it can very well be worse), and IIRC it was several times slower...

5) Same thing I've said... I don't think low bitrate is so important, you don't need compression efficiency. For example 15mbps w/o B-frames should look better than 10 with... do you really need so much compression for an intermediate step?

About XviD, I'd give it a try just for checking it out. It's not as terrible as you think it is, I believe it's a better quality/speed tradeoff for something like this, but if you say you are so bottlenecked, then I don't know anymore...

It does have a different distinct look to it, sharper (but perhaps noisier) than H264, which might be good or bad for what you want. At quantizer 2 it might very well be considered "almost lossless" with a very competitive bitrate (the benefit from sophisticated codecs like H264 is less as the bitrate goes up...)

A pedantic remark: "macroblocks" are always of fixed size (16x16) on both codecs. H264 does have a smaller transform (4x4 vs 8x8) that gives it advantage on sharp edges and artificial sources, so your point stands :)

Posted by: John ? at June 28, 2007 2:29 PM

yes, the disk bottleneck does seem very strange. it looks like the d1 f1 lagarith is supposed to be ~14 megabytes per second, and you would think a decent 7200 rpm sata disk could read twice that fast at the very least. yet 120 fps x264 encoding is a fantasy for me, and the cpus are only at about 50% usage each reading that kind of file. much smaller sources e.g. vobs will solve the problem, and looking at the disk access led seemed to confirm that.

about cq, do you think only 1 pass is really ok? i was seeing some pretty nasty ghosting artifacts with cq like 10, talking about stuff like trails being left behind from a large object moving quickly across a black background leaving the background not so black (this was x264 as well).

interesting info about xvid. i'm wondering how good the vfw one is. i guess i'll run some tests and see what quant ~2 looks like visually and sizewise. if it's acceptable then that will also reduce the cpu load decoding the nmf once it's made. the thing that kind of gets me is that all dvd source stuff is run through a realtime hardware mpeg-2 encoder most often at only 5 megabaud and even i can't tell usually (that is without doing things the viewer won't do like blow up tiny sections to huge sizes). it's mainly only if there's a lot of interference (like nes over rf) or like i said, really high contrast jaggy text that i can tell, and so if i can get around that with the very high bitrate xvid then i'm sold. no more screwing with ghetto vfw x264.

Posted by: nathan jahnke Author Profile Page at June 28, 2007 3:21 PM

I've never seen artifacts like that in x264 with such low quants... something must have gone wrong :(

Keep in mind that the difference between 1-pass and 2-pass is just ratecontrol, that is, selecting the qp for every frame. If a 2-pass encode selects qps>10 for every frame, then it should always look worse than a qp=10 encode.

I've used fixed qp in the past and it worked as I expected, and provided good quality even with high values (>20), that is, with no surprises in that respect. Not much more that I can say.

Xvid for vfw is very robust (and has the same performance as the command line one). It can even do B-frames without delay (when encoding in VDub and using packed bitstream), although I'd advise against B-frames for this purpose (they are useful for increasing compression, but at a given qp they'll lower quality).

Also, if everything goes right, it's less prone to banding and ghosting than x264 (though at qps

It surprises me too how good the realtime mpeg2 encoders are, but there's more magic to it that what meets the eye: you don't have the original to compare. They have gotten very good at avoiding artifacts, but they might be losing detail. I thought most of them aimed for 8mbps at top quality though; 5mbps sounds like a tough challenge.

Posted by: John ? at June 28, 2007 3:56 PM

Post a comment

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?