Wednesday 23 January 2013

Short hacking break

Finally got into some house stuff so have had a break from hacking. I'd hit a few 'milestones' anyway and thought it time for a decent break from it whilst i'm still on leave. Haven't been particularly interested in reading much net or watching much tv either.

Been fairly busy - cleared some overgrowth, did some downpipes, adjusted my suit trousers for a funeral (after 15 years and that adjustment, the suit still fits), got some road-base and barrow-ed it around the house, bought some pavers, ... That and a few other things should keep me busy for a few weeks.

Wednesday 16 January 2013

Kobo stuff

Well i had to do a factory reset of the Kobos - I just couldn't be fucked with seeing a blank page every time I opened a text file and dealing with their embarrassingly slow font tweaking window. I left it at the version it came with and just bypassed the "must have an account and be logged in to even use the machine" bullshit it comes with.

Very nice when simply "logging out" has it reboot to what appears to be a factory reset (it isn't, it just looks that way).

I left a rant on the mobileread forums about how shitfully fucked and slow the firmware is, but i'm just one voice amongst many there.

Anyway, so I guess I did do some kobo hacking soon rather than later after-all, and today I spent too much time today playing with some new widget code.

Basically i'm not all that happy with the old gadget toolkit stuff - although it surely would do the job. And although some guy who took a copy of some of the ReaderZ code via the koper project worked out how to make Swing talk to an alternative device (looks easier than I thought) ... after JavaFX i'm not a big fan of Swing either.

So, foolishly I started hacking on another toolkit which is basically a "cut down JavaFX". Which is obviously a bit pointless in itself because it should eventually be available in the ARM backend ... although that depends on if they support it for the soft float abi, and unless it comes with native eink support it might not be so easy to add it in.

Actually I just noticed that openjfx is GPLv2 + classpath exception, which I missed the first time I looked - should have been obvious I guess. Maybe I should just look at that (when it's ready?) ... although I guess by the time I get anything working maybe kobo will have released a working-enough firmware like 2.0.0 was.

Tuesday 15 January 2013

Kobo glo

So i was a bit bored so I went and got a kobo glo today - although there's nothing wrong with my kobo touch and I use it all the time, I though the light might be handy. And besides it isn't very expensive. Actually I got two, as i'm going to give one to my luddite sister who is in to reading books but otherwise wouldn't bother with such a machine.

I'm still going through the 'something happened to your network' errors whilst configuring it using the kobo setup software inside wine, but at least it appears to be working ... It did the upgrade now it's just syncing some books or something. Oh I can just skip that.

So I finally got to try the backlight after all that - the one in the shop (officeworks - bloody useless for customer service but it's close to me, although they only had black, or black) wasn't configured so just showed the setup screen. Looks like it's supposed to I guess, I will have to wait until I read tonight. If anything there is a slight bright line across the very top and slight dark line across the bottom but the whole text area is nice and even. The higher resolution vs the kobo touch is noticeable too, and now it's about the limit of my tired eyes.

The hardware looks nice as always - but it's a pity the software still has issues. Although the lack of the re-assuring home button is ... well, not as re-assuring as it is having one on the kobo touch. And the soft quilted back isn't as soft in it's quilting as the touch is. As to the software, the i/o code needs to be multithreaded or something so it's more responsive to clicks, and other annoying things. Like setting the font waits for the whole document to be reformatted before refreshing. And it still takes FOREVER to close a text file down (sometimes ... garbage collection is a massive massive win).

Speaking of fonts, Times New Roman has vanished! Only left with Georgia as a remotely decent serif font - which is just not good enough. I tried some of the 'fattened' ones on the mobileread forums but ended up with DejaVu Sans Serif for now - the descenders are a bit squat and the serif's aren't pointy enough but otherwise it seems ok. Unfortunately the font fine-tuning which I used on the touch doesn't work on third-part fonts but i'm happy enough with the weight as it is.

Anyway, this frees up my kobo touch for just hacking on, although I don't have any immediate plans for doing so. Whilst I have some holidays left I need to get some shit organised around the yard, well if i don't get too lazy ...

Update: Well I 'upgraded' to the latest firmware, and it's pretty fucked. Dropped linux filesystem support - well that's not the end of the world. But text files always open blank until you change the font settings. And they are excruciatingly slow to close or to suspend - it takes 25 seconds to suspend when reading one text file, which is about how long it takes to 'close' a text file as well. The touch with it's older firmware is pretty fast.

The hardware is mostly excellent, why is it so let down by shitty software?

So maybe it wont be that long before i'm hacking on it after-all ...

Update: So after having it for 6 weeks, here's a little update.

I did a factory reset to restore the original firmware it came with and bypassed the "must be logged in to use" mis-feature, and just load books using usb mass-storage or the sd-card. This makes it a pretty usable reading device, although without connecting to kobo I don't get a dictionary - but I can live without that even though it would be nice to have.

I've warmed to DejaVu Serif (extended version of Bitstream Vera Serif) and actually prefer now it to the Times New Roman on the touch (I never quite liked Times New Roman before, a little too thin). It just has a nice weight and horizontal density and good readability on the e-ink screen. The higher resolution of the glo is definitely noticeable compared to the touch too.

The backlight is really useful, allows me to read anywhere without any extra bits or reaching to turn off lights. I like the way the on/off button for the light works - unobtrusive and easy to access. As mentioned in some reviews I read, it would be nice if it was a touch dimmer at it's lowest setting for reading in a completely dark situation, but it's still comfortable enough as it is.

The battery life is definitely better than the kobo touch was which seemed to drain fairly fast in suspend mode, although i've been reading more epub's which seem to use orders of magnitude less cpu resources than text files.

The touch responsiveness is "adequate". It still seems to miss touches sometimes or get ones I didn't intend at others if my fingers are near the edge of the screen, but it's a minor annoyance. I'm not a 'skim reader', so any delay from a wrong page isn't the end of the world.

My sister is also really happy with the one I bought her, for the convenience and backlight, and also having access to old editions of classics which are hard to find in the cultural backwater where she lives.

The only thing I can say I really miss is the soft-quilted back cover! The touch is definitely nicer to hold with the raised diamond pattern versus the flat one. It's a feature I would never have thought to be quite so important but it was one of the first things my sister mentioned when I showed it to her and I would have to agree it is noticeable.

Although I wish they would improve the software, it works fine as a novel reader and I would get another one. Although I can't see needing one until the battery dies it's inevitable non-removable death. The support for standard formats and the ability to use it without any "desktop software" is a real winner for me. I just wish it came like that and didn't need poking at first.

Saturday 12 January 2013

A/V sync II

Had another go at a/v sync this morning. So far it's looking quite good although it needs a good clean-up from so many aborted experiments.

I managed to remove most of the synchronous code that was being used to try and coordinate everything and instead I'm using a centrally coordinated sequence number.

Every packet and frame created has a sequence number associated with it which is maintained through the data flow graph. Every time a seek operation is performed the sequence number is incremented. This feeds through to each stage and lets them make cheap decisions on what to do:

  • Decoders just flush if the sequence number changes.
  • Decoders discard any packets which have the wrong number.
  • Renderers discard any frames which have the wrong number.
  • Renderers reset any output they need to (i.e. audio flush, re-sync).

I also use similar code to JJMediaReader so that after a seek it discards any frames which come through before desired seek position.

I still need some out-of-band callbacks for seek and pause because all of the above means the renderers may never see anything, but they don't need to do much.

The android player is somewhat broken so I need to fix that before committing anything.

Update: Got it working well enough and checked everything in. The new sync code causes some extra 'recovery time' after seeking, but once it's settled down it ends up with better sync. The recovery time is only noticeable on slow hardware.

Friday 11 January 2013

A/V sync

Had a bit of a limp stab at a/v sync today.

I started with something simple although it ended up a bit too complicated - trying to synchronise multiple decoding and display threads is a little messy. I was trying to hide as much as possible inside MediaReader and the Audio/VideoDecoder classes, but it got ugly.

But as far as the timing, the simple approach nearly worked as far as sync goes, there's just a couple of issues remaining:

  • After seeking on the container, the audio stream starts well before the video stream, so one ends up with a visual pause before starting, or even the video decoding never catching up (wrt packet buffering) and some nasty frame jumping.

    Hopefully this can be solved as I do in JJMediaReader: after a seek I discard packets and/or frames until the seek position is reached.

  • Getting the amount audio data "buffered" is a bit of a pain. Trying to use the reported position is fine until you seek since it throws the position out.

The simple approach was to have a central MediaClock object which tracks the audio renderer position, and then does various timing calculations for the video sync. It manages pause as well as interacts with seek. Maybe it isn't so simple ...

Eventually I should work it out.

Wednesday 9 January 2013

Preserving arbitrary Aspect Ratio in JavaFX

I had an attempt at displaying the proper aspect ratio in JavaFX, and after a couple of false-starts came up with a pretty simple solution. The ImageView does it's own aspect ratio preservation but for a WritableImage the pixels are always square - as far as I can tell.

So one must adjust it outside. First I just used:

        vout.setScaleX(aspect);

To set the ratio - this displayed the video properly but didn't take the adjust size into account during layout and fitting to the display area. Not really a show-stopper as the user can just adjust the window until it fits, but I was sure I could do better than that.

I tried various things such as placing the vout (an ImageView) into a Group and so on - but this didn't work (of course) as I was setting the dimensions of the ImageView relative to the window size.

Actually it turned out to be extremely simple: since I'm scaling the ImageView when it is being displayed, I just have to scale the inverse when I'm binding it's dimensions:

        vout.fitWidthProperty().bind(root.widthProperty().divide(aspect));

The height is simply bound to the root.height.

And also remember to take it into account on the initial scene size too:

        Scene scene = new Scene(root, width * aspect, height);
So apart from calculating the aspect ratio, those were the only lines of code required. Seems to work although i've only tested it on one PAL 16:9 video so far ...

Although the screen capture stuff just stores the unscaled frame still ... which is probably what I want tbh.

Well not a bad haul today for a poor nights sleep and feeling a bit crap overall.

Update: Added some shots to show it it works in JJPlayer.

Window too narrow - automatically scaled to fit horizontally:

Window too wide - automatically scaled to fit vertically (and controls fading out):

A raw non-corrected frame grab showing the image using square pixels:

More video player work

Mucked around with a few things in JJPlayer last night and this morning. Nothing on TV last night and I just wanted a couple of hours today.

  • Tried a much more aggressive fallback mode for frame skipping - it starts to drop decoding of frames as well.

    I'm not sure how useful it is if the cpu is just too slow - it eventually turns the video into a irregular slideshow, but can let the sound play fairly smoothly if there's enough time left. So maybe it has some use.

  • I noticed the "get audio location" isn't very reliable in Android (at least), so I changed the way it syncs to audio. I think get audio location is just reporting the head of the last buffer submitted to the sound device or something similar, so it jumps around compared to the actual time. So now it only re-sync's if it gets well out of whack. It seems to remove some annoying frame rate jitter, but as i'm playing back 50Hz material on 60Hz it's still pretty jittery.

  • I only allocate 3xtextures in the GLES backend if I don't need them to buffer. Less memory usage.

  • Moved some more functionality into the MediaReader class where it has the proper non-asynchronous state from which to make accurate decisions.

  • Added a full-screen mode in JavaFX (F11).

  • Bound the image size to the window size so it sizes to fit, maintaining aspect ration. Still assumes square pixels.

  • Added keyboard controls in JavaFX. Space to pause/unpause, left/right to skip back/forward 30s, page up/down for start/end of file, escape exits full-screen mode or quits, and the afore-mentioned F11 for full-screen mode.

  • Changed to SVG icons in JavaFX, and even if they're a bit ugly they're consistent. Converting Inkscape SVG to code is a bit of a pain so it's a bunch of hard-coded stuff plus the curved triangle thing.

  • Did some styling to make the buttons flat/borderless, they pre-light.

  • Fixed some other seek related stuff in JavaFX frontend. It's better but still not perfect. Pause/resume still messes things up sometimes.

The new buttons and styling.

There were a few annoying things along the way. The keyboard input was a bit of a pain to get working, I had to override Slider.requestFocus() so it wouldn't grab keyboard focus when used via the mouse (despite not being in the focus traversal), and I added a 'glass-pane' over the top of the window to grab all keyboard events. I had to remove all buttosn from the focus traversal group and only put the 'glass-pane' in it. I made the glass-pane mouse-transparent so the buttons still work.

Another strange thing was full-screen mode. Although JavaFX captures ESC to turn it off, the ESC key event still ends up coming through to my keyboard handler. i.e. I have to track the fullscreen state separately and quit only if it's pressed twice.

Next to fix is those a/v sync issues ...

Update Well instead I added a frame capture function. Hit print-screen and it captures the currently displayed frame (raw RGB) and opens a pannable/zoomable image viewer. From here the image can be saved to a file.

Although I haven't implemented it, one could imagine adding options to automatically annotate it in various ways - timestamp, filename, and so on.

Update 2: I subsequently discovered "accelerators", so i've changed the code to use those instead of the 'glass pane' approach - a ton of pointless anonymous inner Runnables, but it feels less like a hack.

        scene.getAccelerators().put(new KeyCombination(...), ...);

Update 3: I tried it on my laptop which is a bit dated and runs 32-bit fedora with a shitty Intel onboard GPU (i.e. worthless). It runs, but it's pretty inefficient - 2-4x higher load than mplayer on same source. Partly due to Java2D pipeline I guess but it's probably all the excess frame copying and memory usage. I tried to do some profiling on the workstation but didn't have much luck. Just seems to be spending most of it's time in Gtk_MainLoop, although the profiler doesn't seem to know how to sort properly either.

Monday 7 January 2013

JJPlayer controls

This morning I added sound and started work on some controls for the JavaFX version of JJPlayer.

It's mostly pretty buggy but it kind of works ... some of the time.

I got JavaSound working easily enough at least - although i'm just converting to stereo 16-bit for now as I did on Android.

But there are some big problems with the way i'm handling the a/v sync when pause or seeking, so things get messed up some of the time. Too lazy to fix it right now ...

I kept playing a bit with the UI and added a fade in/out of the controls as well as hiding the mouse pointer, and some stylesheet stuff.

Clearly the styling and the ASCII-icon buttons both leave something to be desired at this point.

I thought i'd have trouble getting the hiding to reverse if the user started moving the mouse whilst it was hiding, but it turned out to be pretty easy. Just change the rate on the fading out animation to reverse it, and it still runs to completion at the other end instead. So it fades in and out smoothly depending on user action, and without any eye-jars.

Saturday 5 January 2013

JJPlayer for JavaFX

I finally had enough of Android and started poking at a JavaFX version of JJPlayer this morning.

I'm hoping that working on both of them i'll be able to fine tune the design a bit and help with debugging - e.g. i already fixed the end-of-file bugs and "discovered" the av_frame_get_best_effort_timestamp() function (guh, what a name, bad memories of GNOME dev flooding back).

So far I just have unscaled video going (no sound). There are no controls but I should think JavaFX will make that pretty easy to add (and an opportunity to add some bling). But it can play multiple videos in sequence and the window sizes to fit without eye-jars. Once I get sound going i'll look at filling out the GUI for it - I prefer the OpenAL API but I might try with JavaSound this time.

Performance seems ok, although my first measurements had it on par with mplayer with the GL backend but that seemed to be a specific video (actually it's a touch lower cpu usage on that video). Uses 3-4x memory, but that's cheap isn't it? Unfortunately as one can only write RGB data to WritableImage's, it has to do a YUV-RGB step separately and then perform a redundant copy but there's not much option there.

No screenshot yet as it's pretty basic ...

The code (GPL3) will be in the jjmpeg-1.0 branch of jjmpeg by the time you read this or shortly thereafter inside jjmpeg-javafx.

Friday 4 January 2013

JJPlayer 1.0-a1

I decided I had enough hacked into the JJPlayer code to warrant a release - this one is actually (somewhat) usable as a player now. Unlike the previous which were totally broken. I'm getting a bit bored with it so it seemed as good a point as any.

My ainol elf 2 can manage a test 720p h.264 file pretty well - with only the occasional group of stutters on complex scenes with an eventual catch up. Unfortunately the Mele can barely handle PAL MPEG ... Update: I was working on some decoding-skipping throttle code, and for whatever reason, the Mele can now handle PAL MPEG just fine (nothing to do with the new code). No idea ... maybe i/o related - the Mele seems to go funny after suspend too.

I managed to fit in a few useful things in no particular order:

  • Runs full-screen - no navigation or title bars (they return temporarily on touch);
  • Busy spinner when opening;
  • Open is performed asynchronously;
  • Seek mostly works;
  • Much better stability;
  • Display of frames can be dropped to try to catch up;
  • Video is synced to the audio and usually works (it doesn't drift at any rate);
  • Basic Android activity lifecycle stuff works, pause/resume, etc;
  • Improved the startup time;
  • Keeps the screen turned on (i think);
  • Some developer oriented debug stuff; and
  • A preferences page.

There are still some unsolved issues wrt audio sync, pixels are assumed square, the possibility of more aggressive frame dropping (i.e. not decoding B frames and/or I frames, etc), very inefficient memory use (e.g. i have 31 AVFrame buffers and 31 YUV textures although I only need 2 of the latter), performance, jarring UI manoeuvres when the UI is shown/hidden, and many others I can't be bothered enumerating at this point. And that doesn't include the 'missing features' like a pause button or subtitles.

Most of the Android specific code is a huge mess - experiments are still on-going.

See the downloads link in the jjmpeg project.

But if you're after a more polished free software Android video player, go look at Dolphin Player. Although that seems to have some audio glitches and drift, and is implemented afaict as a native SDL app ported to Android rather than as a Java player.

Whilst writing this my ADSL network went down - probably heat related - so maybe thats a hint to go drink beer instead of hack. Might start with the coffee i brewed and forgot to drink this morning - that'll make a nice iced coffee. Mmm, blenders.

We're in the middle of a heat-wave and I should probably be at the beach or something - but the house is (relatively) cool and I can get out and water the garden to keep it alive here ... meant to be 44 degrees today, but that isn't even enough to get a record. Yesterday I got out an IR thermometer and measured 75 degrees on the lid of a plastic green sulo bin that isn't even in full-sun all day. Today has the added bonus of being windy too - feels like being inside a fan-forced oven when you walk outside. Maybe it's better not to go riding in such weather ...

Thursday 3 January 2013

NEON YUV vs GPU

This morning I did some experiments with Android and the YUV code - although patience is wearing thin for such a shitty alternative to GNU/Linux that Android is. As icing on the cake most of the android developer site just doesn't render on most of my browsers anymore - I just get junk. Well I can always go elsewhere with my spare time ...

I changed the code to perform a simple doubling up of the U and V components without a separate pass, and changed to an RGB 565 output stage and embedded it into the code in another mess of crap. Then I did some profiling - comparing mainly to the frame-copying version.

Interestingly it is faster than sending the YUV planes to the GPU and using it to do the YUV conversion - and that is only including the CPU time for the frame copy/conversion, and the texture load. i.e. even using NEON it uses less CPU time (and presumably much less GPU time) even though it's doing more work. The volume of texture memory copied is also 33% more for the RGB565 case vs YUV420p one.

Still, 1ms isn't very much out of 10 or so.

The actual YUV420p to RGB565 conversion is only around 1/2 the speed of a simple AVFrame.copy() - ok considering it's writing 33% more data and I didn't try to optimise the scheduling.

Stop press Whilst writing this I thought i'd look at the scheduling and also using the saturating left shift to clamp the values implicitly. Got the inner loop down from 54 to 35 cycles (according to the cycle counter), although it only runs about 10% faster. Better than a kick in the nuts at any rate. Fortunately due to the way I already used registers I could decouple the input loading/formatting from the calculations, so i simply interleaved the next block of data load within the calculations wherever there were delay slots and only made the data loading conditional.

The (unscheduled) output stage now becomes:

        @ saturating left shift automatically clamps to signed [0,0xffff]
        vqshlu.s16      q8,#2           @ red in upper 8 bits
        vqshlu.s16      q9,#2
        vqshlu.s16      q10,#2          @ green in upper 8 bits
        vqshlu.s16      q11,#2
        vqshlu.s16      q12,#2          @ blue in upper 8 bits
        vqshlu.s16      q13,#2

        vsri.16         q8,q10,#5       @ insert green
        vsri.16         q9,q11,#5
        vsri.16         q8,q12,#11      @ insert blue
        vsri.16         q9,q13,#11

        vst1.u16        { d16,d17,d18,d19 },[r3]!

Which saves all those clamps.

As suspected, the 8 bit arithmetic leads to a fairly low quality result, although the non-dithered RGB565 can't help either. Perhaps using shorts could improve that without much impact on performance. Still, it's passable for a mobile device given the constraints (and source material), but it isn't much chop on a big tv.

Of course, all this wouldn't be necessary if one had access to the overlay framebuffer hardware present on pretty well all ARM SOCs ... but Android doesn't let you do that does it ...

Update: I've checked a couple of variations of this into yuv-neon.s, although i'm not using it in the released JJPlayer yet.

Mele vs Ainol Elf II

The Elf is much faster than the Mele at almost everything - particularly video decoding (which uses multiple threads), but pretty much everything else is faster (Better memory? The Cortex-A9? The GPU?) and with the dual-cores means it just works a lot better. Can't be good for the battery though.

(as an aside, someone who spoke english should've told the guys in China that "anal elf 2" is probably not a good name for a computer!)

But the code is written with multiple cores in mind - demux, decoding of video and audio, and presentation is all executed on separate threads. Having all of the cpu-bound tasks executed in a single thread may help on the Mele, although by how much I will only know if and when I do it ...

Wednesday 2 January 2013

NEON yuv + scale

Well I still haven't checked the jjmpeg code in but I did end up playing with NEON yuv conversion yesterday, and a bit more today.

The YUV conversion alone for a 680x480 frame on the beagleboard-xm is about 4.3ms, which is ok enough. However with bi-linear scaling to 1024x600 as well it blows out somewhat to 28ms or so - which is definitely too slow.

Right now it's doing somewhat more work that it needs to - it's scaling two rows each time in X so it can feed into the Y scaling. Perhaps this could be reduced by about half (depending on the scaling going on), which might knock about 10ms off the processing time (asssuming no funny cache interactions going on) which is still too slow to be useful. I'm a bit bored with it now and don't really feel like trying it out just yet.

Maybe the YUV only conversion might still be a win on Android though - if loading an RGB texture (or an RGB 565 one) is significantly faster than the 3x greyscale textures i'm using now. I need to run some benchmarks there to find out how fast each option is, although that will have to wait for another day.

yuv to rgb

The YUV conversion code is fairly straightforward in NEON, although I used 2:6 fixed-point for the scaling factors so I could multiply the 8 bit pixel values directly. I didn't check to see if it introduces too many errors to be practical mind you.

I got the constants and the maths from here.

        @ pre-load constants
        vmov.u8 d28,#90                 @ 1.402 * 64
        vmov.u8 d29,#113                @ 1.772 * 64
        vmov.u8 d30,#22                 @ 0.34414 * 64
        vmov.u8 d31,#46                 @ 0.71414 * 64

The main calculation is calculated using 2.14 fixed-point signed mathematics, with the Y value being pre-scaled before accumulation. For simplification the code assumes YUV444 with a separate format conversion pass if required, and if executed per row should be cheap through L1 cache.

        vld1.u8 { d0, d1 }, [r0]!       @ y is 0-255
        vld1.u8 { d2, d3 }, [r1]!       @ u is to be -128-127
        vld1.u8 { d4, d5 }, [r2]!       @ v is to be -128-127

        vshll.u8        q10,d0,#6       @ y * 64
        vshll.u8        q11,d1,#6

        vsub.s8         q1,q3           @ u -= 128
        vsub.s8         q2,q3           @ v -= 128
        
        vmull.s8        q12,d29,d2      @ u * 1.772
        vmull.s8        q13,d29,d3

        vmull.s8        q8,d28,d4       @ v * 1.402
        vmull.s8        q9,d28,d5

        vadd.s16        q12,q10         @ y + 1.722 * u
        vadd.s16        q13,q11
        vadd.s16        q8,q10          @ y + 1.402 * v
        vadd.s16        q9,q11

        vmlsl.s8        q10,d30,d2      @ y -= 0.34414 * u
        vmlsl.s8        q11,d30,d3
        vmlsl.s8        q10,d31,d4      @ y -= 0.71414 * v
        vmlsl.s8        q11,d31,d5

And this neatly leaves the 16 RGB result values in order in q8-q13.

They still need to be clamped which is performed in the 2.14 fixed point scale (i.e. 16383 == 1.0):

        vmov.u8         q0,#0
        vmov.u16        q1,#16383

        vmax.s16        q8,q0
        vmax.s16        q9,q0
        vmax.s16        q10,q0
        vmax.s16        q11,q0
        vmax.s16        q12,q0
        vmax.s16        q13,q0
        
        vmin.s16        q8,q1
        vmin.s16        q9,q1
        vmin.s16        q10,q1
        vmin.s16        q11,q1
        vmin.s16        q12,q1
        vmin.s16        q13,q1
Then the fixed point values need to be scaled and converted back to byte:
        vshrn.i16       d16,q8,#6
        vshrn.i16       d17,q9,#6
        vshrn.i16       d18,q10,#6
        vshrn.i16       d19,q11,#6
        vshrn.i16       d20,q12,#6
        vshrn.i16       d21,q13,#6
And finally re-ordered into 3-byte RGB triplets and written to memory. vst3.u8 does this directly:

        vst3.u8         { d16,d18,d20 },[r3]!
        vst3.u8         { d17,d19,d21 },[r3]!

vst4.u8 could also be used to write out RGBx, or the planes kept separate if that is more useful.

Again, perhaps the 8x8 bit multiply is pushing it in terms of accuracy, although it's a fairly simple matter to use shorts instead. If shorts were used then perhaps the saturating doubling returning high half instructions could be used too, to avoid at least the input and output scaling.

Stop Press

As happens when one is writing this kind of thing I noticed that there is a saturating shift instruction - and as it supports signed input and unsigned output, it looks like it should allow me to remove the clamping code entirely if I read it correctly.

This leads to the following combined clamping and scaling stage:

        vqshrun.s16     d16,q8,#6
        vqshrun.s16     d17,q9,#6
        vqshrun.s16     d18,q10,#6
        vqshrun.s16     d19,q11,#6
        vqshrun.s16     d20,q12,#6
        vqshrun.s16     d21,q13,#6

Which appears to work on my small test case. This drops the test case execution time down to about 3.9ms.

And given that replacing the yuv2rgb step with a memcpy of the same data (all else being equal - i.e. yuv420p to yuv444 conversion) still takes over 3.7ms, that isn't too shabby at all.

RGB 565

An alternative scaling & output stage (after the clamping) could produce RGB 565 directly (I haven't checked this code works yet):

        vshl.i16        q8,#2           @ red in upper 8 bits
        vshl.i16        q9,#2
        vshl.i16        q10,#2          @ green in upper 8 bits
        vshl.i16        q11,#2
        vshl.i16        q12,#2          @ blue in upper 8 bits
        vshl.i16        q13,#2

        vsri.16         q8,q10,#5       @ insert green
        vsri.16         q9,q11,#5
        vsri.16         q8,q12,#11      @ insert blue
        vsri.16         q9,q13,#11

        vst1.u16        { d16,d17,d18,d19 },[r3]!

Tuesday 1 January 2013

jjmpeg android video player work

Yesterday I was too lazy to get out of the house, so after getting over-bored I fired up netbeans and had a poke at the android video player JJplayer again.

And now it's next year.

I tried a few buffering strategies, copying video frames, loading the decoded frames directly with multiple buffers (works only on some decoders), and synchronously loading each decoded frame into a texture as it is decoded. I previously had some code to load the texture on another GL context but I didn't see if it still worked (it was rather slow which is why i let it rot, but it's probably worth a re-visit).

Just copying the raw video frame seems to be the most reliable solution, even with it's supposed overheads. Actually it didn't seem to make much difference to performance how I did it - they all ran with similar cpu time (according to top).

I've looked into doing the scaling/yuv/rg565 conversion in NEON but haven't got any code up and running yet (one has to be a bit keen to get stuck into it). I doubt it will be quicker as this means I wont be using the GPU for this processing, but given how slow the texture loading is it might be a win and it will let me avoid redundant copies.

I also fixed some of the android behavioural stuff - pause/resume pauses/resumes the playback, it runs in full-screen with a hidden 'ui' (although for whatever reason, setting the slider to invisible doesn't always work, and never on the mele), and it now opens the video in another thread with a busy spinner.

Although it's playing back most SD sized videos ok on my (dual core) ainol tablet, it is struggling on the mele. Even when the mele can decode fast enough the timing is funny - jumping around (and oddly, the load average is well over 2 yet it decodes all frames fast enough). I guess the video decode was behind the audio so it was just displaying frames as fast as it decoded them rather than with per-frame timing. So I tried another timing mechanism rather than just using an absolute clock from the first frame decoded, have it based on the audio playback position. This works quite a bit better and lets me add a tunable delay, although it isn't perfect. It falls down when you seek backwards, but otherwise it works reasonably well - and should be fixable. OTOH even with the busted timing the audio sync was consistent - unlike Dolphin Player which loses sync quite rapidly for whatever reason.

Unfortunately opening some videos seems to have become quite slow ... not sure what's going on there, whether it's just slow i/o or some ffmpeg tunable (I was poking around with some other stuff before getting back to jjplayer so i might've left something in). I don't know how to drop the decoding of video frames yet either - which would be useful. I think I need to add some debugging output to the display to see what's going on inside too, and eventually run it in the profiler again when I have most of a day free.

The code is still a bit of a pigs breakfast and I haven't checked it in yet. But i should get to it either today or soon after and will update the project news. I should probably release another package too - as the current one is too broken to be interesting.

In the not too distant future I will probably poke at a JavaFX version as well. If I don't find something better to do ... must really look at the Eiphany SDK sometime.