A Hacker's Craic: January 2014

Friday 31 January 2014

The weirdness of mass-market economics, m$l vs psn.

So this is about the playstation network on PS3 vs the microsoft live(?) thing on xbox, xbox 360 for on-line games and other on-line services.

On paper it seems like a no-brainer: zero-cost on PSN for multi-player games and all of the internet based services the XMB provides vs a yearly subscription for microsoft multi-player games and all of the internet based services they provide (even those which already require a subscription?). Oh and the one you pay for is the one full of fucking adverti$ing???

M$ are totally nuts right - surely everyone would just go with PSN?

Funny that.

Except, once people start paying for stuff they just assume it's "better" (and i'm sure a bit of viral/illegal marketing helped that idea take hold as well) and will argue as such to their death. Apart from that once you pay for something the whole "you may as well use it" factor comes in: once paid for people will naturally want to use the paid for service rather than the free one, and network effects start to kick in and suddenly everyone is using the paid service for all multi-player games, despite their best interests taking second stage.

So it's kind of sad ... that the only way Sony could possibly counter this is by just charging for their service too - as they have with the PS4. Very few people will want to pay for both (and to be honest, you'd have to be a bit thick in the head if you did) and so it forces everyone to make a choice. This breaks up the advantage of the network effects and although some will stick to what they know and are already paying, the magic factor for one vendor is instantly evapourated. By being that insignificant bit cheaper and leaving some services free anyway (not to mention the games) Sony are instead creating a positive network-effect in their favour and thus under-cutting years of effort by M$. (Apart from in the case of the PS4 vs xbone actually creating a games machine and not a gimped water-cooler/us-only-tv/football thing).

"master-stroke"?

Still, it's kinda dumb that the only way to fight a paid service is to charge as well and free just can't compete. Somewhat fucked, more than just dumb.

On a side note I did start with PS+ a few months ago myself but it was just for the games - I have zero interest in online mutli-player and don't have a PS4 anyway (SotC+Ico-HD pushed me over, although TBH I couldn't get into the former and forgot how to play the latter - even though i've previously finished it on a borrowed copy). I stopped buying games about 18-24 months ago so there's been a constant stream of good games that I never played that I thought I might one day. Many are ones I wont ever like or bother with but there's enough to keep me going for a few years already at the rate I play (and some I never even saw in the shops here at the time and certainly cannot be found there now).

Paying for this type of subscription service pretty much goes against every free software (and otherwise) bone in my body ... but for me it's not even an hours labour and less than 2 cartons of piss and in Australia it's substantially less than a single full-priced game so from a personal perspective it's pretty much insignificant. When my utility bills and rates are hitting around $8K/PA these days it somewhat changes how you think of anything under a grand (fortunately I have no mortgage, ooo-yeah!). Although with a small garden, good freezer, and a bit of smart shopping decent food is still cheap as shit here in Australia so those high costs aren't as bad as they sound - although it was a bit of a one-off, i'm currently going through a second vac-pac whole rump that only cost about $4/kg (~$2/lb). It's no $30/kg steak but it's great for curries and stir-fry (and the cat) and i don't mind having to use my teeth occasionally anyway.

The main (rather big) downside is that I can't share the games with friends.

I guess i'll see if they end up adding adverts. That is my new bug-bear at the moment: pretty much all investment today is based on something as stupid and non-producing as advertising dollars. I can deal with a certain amount but it's so overbearing now that it makes viewing the web or watching TV, well, unbearable. Google, i'm looking at you.

Voxel heightmap, dla

I've been mucking about a bit with the voxel stuff some more, trying some terrain generation. I first cooked up something from memory but it wasn't correct, tried perlin noise but wasn't really happy with it and then I came across this post about terrain generation using Diffusion Limited Aggregation (DLA) to generate mountain ranges.

Partial solution.

And the final result of the basic DLA. This version is cyclic.

The basic algorithm is very simple:

  Node[] nodes;  // grid of nodes
  int[] map;     // grid of points
  randomly seed Ns locations in nodes
  total = Ns;
  // until all locations are visited
  while total < width*height
    x = random(width);
    y = random(height);
    if (nodes[x,y] != null)
       continue;
    fi
    // randomly walk until something is hit
    while !hit
      // save location we came from
      lx = x;
      ly = y;
      // walk to next location
      (x,y) = randomly move by 1 cell in a compass direction
      if (cyclic)
        x = x & (width-1);
        y = y & (height-1);
      else if (out of range)
        break;
      fi
      // check for attachment point
      n = nodes[x,y]
      if (n != null)
        hit = true;
        total += 1;
        n = new node(n, lx, ly);
        nodes[lx,ly] = n;
        n.visit(map);
      fi
    wend
  wend

Where:
Node {
  Node parent;
  int x, y;

  void visit(int[] map) {
    map[x,y] += 1;
    if (parent != null)
      parent.visit(map);
  }
}

I'm then showing log(map).

It certainly has some nice 'erosion' like shapes, and can also make some nice lightning. I also experimented with something based on physical erosion but that didn't really pan out.

This algorithm grows the seed points like a crystal and because the search space is quite big is rather slow to get going on larger images. Although it really races to the end (the first image above was about 15 seconds in, the second was 2 seconds later). I tried a few variations to speed this up:

Using random line segments

This is much much faster but the result has fewer "fiddly bits" and is more strung out.

Randomly choosing an existing node from which to grow a new point

This is faster at the start but slows down as it starts to randomly choose nodes which have no where to go. It also produces a more regular shape more akin to mould growth; which is not really very mountainous.

It might be possible to play with how it chooses the growth points with this one, both to speed it up and tune the shapes it generates. I've experimented a little bit but didn't have much luck so far.

Unfortnately the pixel-scale of everything is a bit too detailed so I need a way to scale it up without losing the intricacies. I haven't tried the accumulation of multiple blur radii from the link above yet. That should be able to upscale at the same time too.

But yeah, right now it just isn't grabbing me enough to really get into it - but then again nothing is atm. Blah.

Saturday 25 January 2014

Voxels

Just playing around with a 3d implementation of the ray casting in the previous post.

Fly-through video here. All spheres have a radius of 30 voxels.

I'm just using the abs(normal.z) as the pixel value and a very simple projection. This is a double version and there are some rounding/stepping errors present. I changed the maths slightly so that the stepping is parametric rather than on x/y/z which simplifies the code somewhat.

It's pretty slow on my laptop, about 1 frame/s, although it doesn't really matter how deep the volume is - this one is 4096x512x512. Not that it's optimised in any way mind you.

At some point I do intend to see how it would run on epiphany ... but right now i'm too lazy! The main bottleneck will be the tree traversal for each ray although I think in practice quite a lot of that can be cached which would be a big speed-up. Other than that it could be a good fit for the epiphany given how small the code is and that only a single data strucutre needs to be traversed.

TBH i'm not sure if it's ever going to be practical for anything because of the memory requirements even if it can be made to run fast enough. *shrug*

Thursday 23 January 2014

Bresenham(ish) line, skipping, raycasting.

Haven't been up to much lately - just trying to have some holidays. I have been tentatively poking at some ideas for raycasting. Not that I have any particular idea in mind but it might be something to try on the parallella.

Part of that is to step through the cells looking for ray hits, which is where bresenham's line drawing algorithm comes in. I'm working toward a heirarchical data structure with wider branching than a typical quad/oct tree and so because of that I need to be able to step in arbitrary amounts and not just per-cell. Fairly cheaply.

I don't really like the description on wikipedia and although there are other explanations I ended up deriving the maths myself which allows me to handle arbitrary steps. Unfortunately this does need a division but I can't see how that can be avoided in the general case.

If one takes a simple expression for a line:

  y = mx + c
  m = dy / dx
    = (y1 - y0) / (x1 - x0)

And implements it directly using real arithmetic assuming x increments by a fixed amount:

 dx = x1-x0;
 dy = y1-y0;
 for (x=0; x<dx; x+=step) {
   x = x0 + x;
   y = round((real)x * dy / dx) + y0;
   plot(x, y);
 }

Since it steps incrementally the multiply/division can be replaced by addition:

  incy = (real)step * dy / dx;
  ry = 0;
  for (x=0; x<dx; x+=step) {
    x = x0 + x;
    y = round(ry) + y0;
    plot(x, y);
    ry += incy;
  }

But ... floating point aren't 'real' numbers they are just approximations, so this leads to rounding errors not to mention the rounding and type conversions needed.

Converting the equation to using rational numbers solves this problem.

  a = dy * step / dx;        // whole part of dy / dx
  b = dy * step - a * dx;    // remainder
  ry = 0;                    // whole part of y
  fy = 0;                    // fractional part of y
  for (x=0; x<dx; x+=step) {
    x = x0 + x;
    y = ry + y0;
    plot(x, y);
    ry += a;                 // increment by whole part
    fy += b;                 // tally remainder
    if (fy > dx) {           // is it a whole fraction yet?
      fy -= dx;              // carry it over to the whole part
      ry += 1;
    }
  }

Where y = ry + fy / dx (whole part plus fractional part).

For rendering into the centre of pixels one shifts everything by dx/2 but this cannot be accurately represented using integers. So instead the upper test on the fractional overflow is shifted by dx/2 by multiplying both sides by 2 which leads to:;

    if (fy * 2 > dx) {
      fy -= dx;
      ry += 1;
    }

This suits my requirements because step can be changed at any point during the algorithm (together with a newly calculated a and b) to step over any arbitrary number of locations. It only needs one division per step size.

The black outline is calculated using floating point/rounded, and the filled-colour dots are calculated using the final algorithm. The green are calculated by stepping 8 pixels at a time and the blue are calculated by starting at each green location with the same (y = ry + fy/dx) and running the equation with single stepping values for 3 pixels.

(i did this kinda shit on amiga coding 20 years ago; it's just been so long and outside of what i normally hack on that i always need to do a refresher when I revisit things like this).

Unfortunately for raycasting things are a bit more involved so I will have to see whether I can be bothered getting any further than this.

Update: So yeah after all that I realised it isn't much use for raycasting anyway since you start with floats from the view transform - converting this to integer coordinates will already be an approximation. And since hardware float multiply is just as fast as float add may as well just use the first equation with floats and pre-calculate dy/dx so any rounding errors will be consistent and not accumulate.

So here's a plot of a 2d ray through a 16-way (4x4) tree. Cyan X is when it hits a vertical cell wall, purple X is when it hits a horizontal cell wall, and black circle is when it hits a non-transparent leaf cell. The black boxes are the nodes and leaf elements in the tree, and the pink boxes are those which are touched by the ray. I just filled the tree with some random circles.

The code turned out to be pretty simple and consists of just two steps:

First the location is converted to an integer (using truncation) and that is used to lookup into the tree. Because each level of the tree is 4x4, 2 bits are used for both x and y at each level so the lookup at each level is a simple bit shift and mask. If the given cell is in a non-transparent leaf node it is a hit.

Then it calculates the amount it has to step in X to get beyond the current cell bounds: this will simply be MIN(cell.x + cell.width - loc.x, (cell.y + cell.height - loc.y) * slopex). The cell size is calculated implicitly based on the node depth.

The 2d case isn't all that interesting to me but it's easier to visualise/test.

Hmm, on the other hand this still has some pesky edge cases and rounding issues to deal with so perhaps an integer approach is still useful.

Friday 17 January 2014

GT6 is the best yet (so far)

I bought a few games recently but have been too lazy to even start them up, but one of them was gran turismo 6 that I just had a go with tonight.

So far for me it seems like the best gran turismo yet - and i haven't really liked any since 3 (my first one). GT4 was just confusing, and GT5 just wasn't very fun - that's not even counting the weird menus and loading times. I think with that one polyphony might've forgotten they were making a game for the general public and not just themselves.

The graphics aren't really any different to 5 (not a bad thing) but I really don't understand why polyphony keep insisting on upping the resolution - the hardware just can't do 1080 pees at that frame rate and the tearing they resort to is simply unforgivable (haven't heard of triple buffering??). I force my display to 720p which makes it go away for the most part and the lack of tearing more than makes up for any loss of resolution which is barely noticable from the couch anyway. The blocky/shimmering dynamic shadows are also a bit more than the hardware can deal with but the dynamic time and weather it enables makes it an acceptable trade-off. I only drive using the bonnet-cam so I couldn't care less about the internal car modelling and TBH GT3's cars looked good enough from the arse-end which is the only thing you can see when racing. I don't really mind GT's sound engine or even music for the most part either.

The menus are mostly better apart from the tuning screen with it's expanding tabs although most of the other menus are a lot better. Much quicker too and global short-cuts let you change or re-tune your car anywhere rather than having that shithouse partitioned garage from GT5 (where standard cars are segregated). A weird niggle is that it doesn't remember some of the driving options when you change cars. Collisions are still modelled on a plastic canoe but I suppose that's better than just ending the race in a shower of broken glass and burning rubber that a true simulation would demand, or god-forbid adding some "sands of time" to rewind.

The introduction to the game (first few races, not the movie) is more gentle and the annoying license tests are a lot easier and fewer in number. Having the racing line from your friends pop up is also a nice touch - "asynchonous online" could work very well for this game and I think could well be the killer-app for Drive Club if it can reach a critical mass. The economy seems to be more newbie friendly too - GT5 had me re-running shit-box races over and over just to be able to afford a couple of upgrades to beat the next shit-box race. So far i've only re-run races to get a better time, and for the first few races it only takes one or two tries and by then I had enough credits to upgrade a bonus car to finish all of the novice class races easily. A couple of the current seasonal events give out nice cars for little effort too which is more than enough to get established but even the shit-boxes have a bit more get up and go which makes the early races less of a chore. Apparently it has micro-transactions although I haven't seen where.

I guess the most important part - the driving - just feels way more "fun". I think the tyres just feel a bit grippier and don't overheat as much or for as long. It seems more possible to recover from a loss of traction rather than just turning into an uncontrollable slide/spin. And you can finally roll cars - no more weird elastic which keeps the car from flipping over. I think they toned down the slipperiness of the non-road surfaces for the beginner cars and maybe they tweaked the learning curve a bit to make these more enjoyable to drive. I know it's supposed to be a simulator but it definitely is a game first.

Despite the number it feels like a more rounded game than any of the recent ones and would be the best entry point to the series i've tried. It doesn't take the 'simulator' sub-title so seriously that it ruins the game part of the game.

And yeah ... Bathurst ... at night.

Update: Been putting in a few hours every few nights lately. Yeah the economy is way better than GT5, I somehow ended up with 1Mcr from some 90 second time trial in the seasonal thing. So much less (actually none) grinding needed compared to GT5 - thank fuck for that.

Some of the tracks suck a bit, but that's just the track. e.g. silverstone or the indy track; too flat and the actual track doesn't always follow the marked lines on the road so you have to follow the map otherwise you run off (pain on a time trial if you're not familiar with the track since a run-off == disqualify). And the tracks being so sucky put you off wanting to learn them. OTOH some of the tracks are pretty good. I did a lot of grinding in GT5 on Rome with the 4x4 seasonal truck race and so that's become a bit of a favourite, I always liked Trail Mountain (that straight up the tunnel), and there's really nothing at all like Bathurst at dawn or dusk. During the race there's some weird dithering on trees and on replays you notice how they are low-depth sprites but given they fit the whole forest on there it's pretty forgivable. At least the gum trees actually look like gum trees too - which is pretty rare in racing games given how distinctly different they appear to other trees and the milky way at night is pretty impressive from the southern hemisphere. It's a pity they don't have the original layout before they added the dog-leg at the bottom of conrod straight, but I guess 'licensing' wouldn't allow that. The Matahorn is also quite a bit of fun in a light nimble car although it's pretty easy to over-cook it on the down-hill bits.

One annoying thing is that it only runs in NTSC timing - no 50hz output on HDMI. This is pretty sucky because those extra 4ms would help with the screen tearing a great deal and I just don't see why we need to be subjected to some legacy timing from an ancient american video standard. I know i'm probably going against the grain from 'gamers' who think otherwise but they are just ignorant and ill-informed.

Hopefully the seasonal events start including races - time trials are pretty boring and it's starts to become 'not a game' when one simple mistake requires a whole lap to fix. Although at least for the car prizes you only need to do a single clean lap - regardless of the time - to win the cars. If it was the first GT ever I could understand all the time trails to some extent - it's a bit of a training exercise to be able to drive properly - but it's no.6 (+ all the prethings) in the series.

I was pretty surprised to see how Forza on the xbone uses PS2 era rendering techniques - fixed shadows (actually even worse, all shadows project directly down from the videos i've seen), fixed time of day, no weather, really basic textures and 2D crowds (oh dear). Because the views in car games are quite predictable they are always able to push better-than-average graphics at high frame-rates and the car models are getting so detailed they seem to be at a point of diminishing returns. But I think there's an awful lot left for future games; GT already has weather and dynamic shadows, driveclub improves on the shadows, adds 3D trees and volumetric clouds. Future games could add procedurally created seasons or even dynamically created landscapes that grow and age. Better water modelling (puddles, flowing water, and not just 'wet'). Not to mention VR stuff. I think the car models are good enough (more than good enough TBH), but there's still enough possibilities that no amount of FLOPS will ever exhaust them.

Thursday 16 January 2014

Original version PS3, dust, cost.

A few weeks ago I was bored one afternoon and thought my ps3 was making a bit too much fan noise so i opened it up to clean it up.

Boy, what an involved process. So many screws. So many ribbon connectors. Ended up being a whole afternoon operation.

I followed video of a guy taking one apart on youtube (can't find out which one) but I did the same mistake he did before I got to that part and cracked the ribbon cable connector for the blu-ray drive - it flips up. It still works so I guess it makes enough of a connection but it may be on borrowed time. I also temporarily lost the memory-card cover spring although I found it the next day (but didn't want to open it all up again).

I got it down to the point of taking the fan out to clean it - it has some very stuck-on dust which I wiped off, but apart from that most of the dust was outside of the main cooling thoroughfare and so was pretty irrelevent. After getting it all back together it didn't really make any difference. I had mine sitting on a frame I made up because until i blew up my amp it was sitting above that and both needed the ventilation - that probably reduced the dust getting into the insides.

In hindsight I should've just left it alone!

I only got about 2/3 the way through taking it apart too.

However I guess it was interesting to see the thing for myself. That fan motor is a total monster and the heatsink is gigantic - most of the whole base area. It must've cost a fortune to make and to assemble. With all those separate parts and shielding pieces to put together the initial production must have also been hard to test and very failure prone increasing costs.

Looking at the video tear-down and pictures of the PS4 motherboard the thing that immediately struck me is how simple it is: i'm not a hardware engineer but I can imagine that simple means cheaper to make. I wouldn't be surprised if the PS4 is already cheaper to make than the even the current PS3 on a per-unit basis - and if not it is something that will become much cheaper much faster. It's probably a lot cheaper to make than the xbone too, and again something that will become cheaper at a faster rate. Dunno where all those passive components went.

Wednesday 15 January 2014

resampling, again

Been having a little re-visit of resampling ... again. By tweaking the parameters of the data extraction code for the eye and face detectors i've come up with a ratio which lets me utilise multiple classifiers at different scales to increase performance and accuracy. I can run a face detection at one scale then check for eyes at 2x the scale, or simply check / improve accuracy with a 2x face classifier. This is a little trickier than it sounds because you can't just take 1/4 of the face and treat it as an eye - the choice of image normalisation for training has a big impact on performance (i.e. how big it is and where it sits relative to the bounding box); I have some numbers which look like they should work but I haven't tried them yet.

With all these powers of two I think i've come up with a simple way to create all the scales necessary for multi-scale detection; scale the input image a small number of times between [0.5, 1.0] so that the scale adjustment is linear. Then create all other scales using simple 2x2 averaging.

This produces good results quickly, and gives me all the octave-pairs I want to any scale; so I don't need to create any special scales for different classifiers.

The tricky bit is coming up with the initial scalers. Cubic resampling would probably work ok because of the limited range of the scale but I wanted to try to do a bit better. I came up with 3 intermediate scales above 0.5 and below 1.0 and spaced evenly on a logarithmic scale and then approximated them with single-digit ratios which can be implemented directly using upsample/filter/downsample filters. Even with very simple ratios they are quite close to the targets - within 0.7%. I then used octave to create a 5-tap filter for each phase of the upscaling and worked out (again) how to write a polyphase filter using it all.

(too lazy for images today)

This gives 4 scales including the original, and from there all the smaller scales are created by scaling each corresponding image by 1/2 in each dimension.

scale           approx ratio   approx value
  1.0             -              -   _
  0.840896       5/6           0.83333
  0.707107       5/7           0.71429
  0.594604       3/5           0.60000

Actually because of the way the algorithm works having single digit ratios isn't crticial - it just reduces the size of the filter banks needed. But even as a lower limit to size and upper limit to error, these ratios should be good enough for a practical implementation.

A full upfirdn implementation uses division but that can be changed to a single branch/conditional code because of the limited range of scales i.e. simple to put on epiphany. In a more general case it could just use fixed-point arithmetic and one multiplication (for the modulo), which would have enough accuracy for video image scaling.

This is a simple upfirdn filter implementation for this problem. Basically just for my own reference again.

  // scale ratio is u / d  (up / down)
  u = 5;
  d = 6;

  // filter paramters: taps per phase
  kn = 5;
  // filter coefficients: arranged per-phase, pre-reversed, pre-normalised
  float kern[u * kn];

  // source x location
  sx = 0;
  // filter phase
  p = 0;

  // resample one row
  for (dx=0; dx<dwidth; dx++) {

    // convolve with filter for this phase
    v = 0;
    for (i=0; i<kn; i++)
       v += src[clamp(i + sx - kn/2, 0, dwidth-1)] * kern[i + p * kn];
    dst[dx] = v;

    // increment src location by scale ratio using incremental star-slash/mod
    p += d;

    sx += (p >= u * 2) ? 2 : 1;
    p -= (p >= u * 2) ? u * 2 : u;

    // or general case using integer division:
    // sx += p / u;
    // p %= u;
  }

Filters can be created using octave via fir1(), e.g. I used fir1(u * kn - 1, 1/d). This creates a FIR filter which can be broken into 'u' phases, each 'kn' taps long.

This stuff would fit with the scaling thing I was working on for epiphany and allow for high quality one-pass scaling, although i haven't tried putting it in yet. I've been a bit distracted with other stuff lately. It would also work with the NEON code I described in an earlier post for horizontal SIMD resampling and of course for vertical it just happens.

AFAIK this is pretty much the type of algorithm that is in all hardware video scalers e.g. xbone/ps4/mobile phones/tablets etc. They might have some limitations on the ratios or the number of taps but the filter coefficients will be fully programmable. So basically all that talk of the xbone having some magic 'advanced' scaler was simply utter bullsnot. It also makes m$'s choice of some scaling parameters that cause severe over-sharpening all the more baffling. The above filter can also be broken in the same way: but it's something you always try to minimise, not enhance.

The algorithm above can create scalers of arbitrarly good quality - the scaler can never add any more signal than is originally present so a large zoom will become blurry. But a good quality scaler shouldn't add signal that wasn't there to start with or lose any signal that was. The xbone seems to be doing both but that's simply a poor choice of numbers and not due to the hardware.

Having said that, there are other more advanced techniques for resampling to higher resolutions that can achieve super-resolution such as those based on statistical inference, but they are not practical to fit on the small bit of silicon available on a current gpu even if they ran fast enough.

Looks like it wont reach 45 today after some clouds rolled in, but it's still a bit hot to be doing much of anything ... like hacking code or writing blogs.

Tuesday 14 January 2014

A great idea or capitalism gone awry?

I've been pondering crowd-funding lately and i'm not really sure it's a good idea.

It seems good on paper - democratic / meritic small-scale funding by interested public. Thing is we already have something like that: the stock market.

But unlike the stock market it is a complete free-for-all unregulated mess full of fraud and failures (ok so is the stock market, but even if it is also no better than a slot-machine, they do pay out sometimes).

In some ways crowd funding could be seen as a clever ploy by capital to finally remove all the risk from their side of the equation - most of it is already gone anyway. Rather than having lawyer backed due dilligence being used to take a calculated risk on an investment with some expected return, the public are taking uneducated risks based on emotion and group-think for no real return at all.

I don't regret helping to find the parallella, but i'm not sure I would do it again.

It is an industry sorely in need of regulation which will surely come before long. It should have a long-term place for small projects but once you get into the millions it seems far too skewed in favour of the fundees.

Saturday 11 January 2014

dead miele washing machine

Blah, washing machine blew up this morning. Novotronic W310, cost $1 900, bought Feb 2004.

During a spin cycle it started making very loud grinding noise and after turning it off and opening it up the drum had a really hot spot near the rim and a bit of a burnt rubber smell. Lucky I was home and it didn't catch fire. I was only washing a few t-shirts, shorts, undies, and my cycling shit.

Despite stating that the drum wont turn freely and that it rotates off-centre the service centre claims they can't tell if it will require major repair (it could only be a bearing, and that is a major repair) and still wants $200 for someone to come and have a look at it. Redeemable if i buy another one. I guess I got 5 weeks shy of 10 years out of it so I can't complain too much - then again being a male living alone for most of that time it hardly got much of a work-out either, so i'm not terribly inclined to buy another miele at the premium they charge here.

I guess i'll think about it over the weekend.

Washing machines aren't exactly a high priority item for a single male, but I don't want to have to deal with replacing broken shit either.

Friday 10 January 2014

Road Kill

Finally got out on the roadie today and went for a 30km blat down to the beach and back. I haven't gone for a recreational cycle in nearly a year - had to give it a good wipe down to remove the dust and reinflate the tyres after taking it off the wall. I wasn't going to lycra it up but the shy-shorts I have these days come down to my fucking knees (longs?) and it was way too hot to wear those. Ran like new though, it's an awesome bike to ride.

Overall I can't say the experience was particularly enjoyable however - several cars cut me off and a pair of fuckwits (having a race?) nearly took me out through a roundabout on seaview road next to the grange hotel. I always hate that stretch and the fuckwit grange council obviously just hates bikes - they haven't changed it in years so I think i'll just avoid going down that way ever again - grange and henly are nice beaches but all the facilities and vendors are completely anti-cyclist so they can all just go and get fucked. I had to cut it short anyway due to a "natural break" required from a bit too much home-made hot-sauce on my dinner last night which was getting a bit painful. And somehow the racing saddle manages to find the only boney bits of my arse as well so 1.5 hours in 34 degree heat was enough of a re-intro trip after such a long break from it.

At least as a bonus I chanced on a homebrew shop that had a capping bell for champagne bottles - something i've been looking for for a while (not that i really need it, i have a ton of glass longnecks, but champagne bottles are much stronger). Yesterday I finally bottled off the last brew (nearly 2 weeks late - but it was still good, i ended up drinking over 4 litres straight out of the wart - it's quite decent but too warm) and started another one. Unfortunately all I have left from last year is 1 super-hot chilli beer (actually i had a half-bottle of one of those last night, maybe that was the cause of the natural break requirement) and a few stout/porter-like things which are a bit heavy for this weather - so after i finish this lime cordial and soda I'm going to have to find something else to drink today.

I really need to get back into regular cycling but it's not going to happen unless I can find some route that is safe and enjoyable and not too boring to do it on. That's a big part of why I've been so slack at it since coming back from Perth (and all my cycling mates moved interstate or os). Chances are this is the just another last ride for another 6 months, but time will tell ...

Update: Sunday I went to see a friend at Taperoo and he took his young family down to the beach. Apart from a couple of spots that isn't such a bad ride so maybe I can do the Outer Harbour loop - it's about 90 minutes on a good day. Even though one road is a bit truck laden there's plenty of room. Given the weather this week I had thought of hitting the beach a couple of times but today it was already 41 by 10:30 - and a burning northerly wind - so I might not be going anywhere after the washing machine is delivered. Monday I went for a loop through the city and around about to buy a washing machine and do a bit of shopping and that was pretty much the limit - I seemed to catch every red light and waiting in the full sun on newly laid ashphalt on a still day really takes it out of you. The LCD panel on my speedo even started to turn black so hell only knows how hot it was out on the road. And it's warming up tomorrow ;-)

Update: Well this has now become very strange weather. 40+ in summer is as common as sheep shit around here but it ended up hitting 45.1 at 2pm which is a bit on the extreme side even for here (i believe it may be a record for Kent Town). And now thunderstorms are coming? They are looking to miss me, but if they hit it'll turn the place into a sauna. Just saw a nice fork of lightning about 18 seconds away (~6km). Time for beer and a light-show?

Update: Only 4th hottest day on record after all.

Wednesday 8 January 2014

Fast Face Detection in One Line of Code

                  ****    ****    ****
                  *  *    *  *    *  *
                  ****    ****    ****

                  ****    ****    ****
                  *  *    *  *    *  *
                  ****    ****    ****

                  ****    ****    ****
                  *  *    *  *    *  *
                  ****    ****    ****

(blogger is broken: this is supposed to be a pic)

Based on the work of the last couple of weeks I've written up a short article / paper about the algorithm and created a really basic android demo. I think it's something novel and potentially useful but i'm not sure if i'm just suffering from an island effect and it's already been tried and failed. I tried to find similar research but once outside of the software engineering realm the language changes too much to even know if you're looking at the same thing when you are. Statistics gives me the willies.

Since I did this at home and am not aquianted with the academic process I didn't know what else to do. None of my peers, aquaintances or contacts do similar work for a hobby.

I have created a basic 1990s style `home' page on my ISPs web server to store the paper and application and anything else I may want to put there. This may move in the future (and perhaps this blog will too).

And yes, it really does detect faces in a single line of code (well, significant code) - and we're talking C here, not apl - and on SIMD or GPU hardware it is super duper fast. I haven't even looked at optimising the OpenCL code properly yet.

I wasn't going to but I ended up creating an optimised NEON implementation; I kinda needed something to fill the timing table out a bit in the article (at the time; it filled out afterwards) but I guess it was worth it. I also wrote up a NEON implementation of the LBP code i'm using this afternoon and because it is only 4 bits it is quite a bit faster than the LBP8,1u2 code I used last time, although in total it's a pretty insignificant part of the processing pie.

Now perhaps it is time for summer holidays.

And just to help searchers: This is an impossibly simple algorithm for a very fast image classifier and face detector which uses local binary patterns (LBP) and can be implemented directly using single instruction multiple data (SIMD) processors such as ARM/NEON and scales very well to massively multi-core parallel processors including graphics processing units (GPU) and application processing units (APU). OpenCL, CUDA, AMD, NVidia.

Sunday 5 January 2014

Further beyond the ROC - training a better than perfect classifier.

I added a realtime plot of the population distribution to my training code, and noticed something a little odd. Although the two peaks worked themselves apart they always stayed joined at the midpoint. This is not really strange I guess - the optimiser isn't going to waste any effort trying to do something I didn't tell it to.

So with a bit of experimentation I tweaked the fitness sorting to produce a more desriable result. This allows training to keep improving the classifier even though the training data tells it it is 'perfect'. It was a little tricky to get right because incorrect sorting could lead to the evolution getting stuck in a local minimum but I have something now that seems to work quite well. I did a little tuning of the GA parameters to speed things up a bit and added a bit more randomisation to the mutation step.

The black line is the ROC curve (i.e. it's perfect), green is the positive training set, red is the negative. For the population distrtribution the horizontal range is the full possible range of the classifier score, and vertically it's just scaled to be useful. The score is inverted as part of the ROC curve generation so a high score is on the left.

The new fitness collator helps push the peaks of the bell curves outwards too moving the score distribution that bit closer to the ideal for a classifier.

The above is for a face detector - although I had great success with the eyes I wanted to confirm with another type of data. Eyes are actually a harder problem because of scale and distinguishing signal. Late yesterday I experimented with creating a face classifier using the CBCL data-set but I think either the quality of the images is too low or I broke something as it was abysmal and had me thinking I had hit a dead-end.

One reason I didn't try using the Color FERET data directly is I didn't want to try to create a negative training set to match it. But I figured that since the eye detector seemed to work ok with limited negative samples, so should a face detector so I had a go today. It works amazingly well considering the negative training set contains nothing outside of the Color FERET portraits.

Yes, it is Fantastic.

I suspect the reason the Color FERET data worked better is that due to the image sizes they are being downsampled - with the same algorithm as the probe calculation. So both the training data and test data is being run through the same image scaling algorithms. In effect the scaling is part of the LBP transform on which the processing runs.

This is using a 16x16 classifier with a custom 5-bit LBP code (mostly just LBP 8,1).

The classifier response is strong and location specific as can be seen here for a single scale. This detector here is very size specific but i've had almost as good results from one that synthesized some scaling variation.

I couldn't get the young klingon chick to detect no matter what I tried - it may just be her pose but her prosthetics do make her fall outside of the positive training set so perhaps it's just doing it's job.

Beyond the ROC

I mentioned a couple of posts ago that i was hitting a wall trying to improve the classifier using a genetic algorithm because the fitness measure i'm using reached 'perfect' ... well I just worked out how to go further.

Here is a plot of the integral of the population density curve (it's just the way it comes out of the code, the reader will have to differentiate this in their head) after 400 and 50K generations of a 16x16 classifier. I now have the full-window classifier working mostly in OpenCL so this only took about 20 minutes.

Although a perfect classifier just has a dividing line separating the two populations, it is clear that these two (near) perfect classifiers are not equal (the above plot was generated from a super-set of the training data, so are not perfect - there should be no overlap at the base of the curves). The wider and deeper the chasm between the positive and negative population, the more robust the classifier is to noise and harder to classify images.

400 generations is the first time it reached a perfect ROC curve on the training data. I just let it run to 50K generations to see how far it would get and although most of the improvement had been reached by about 10K generations it didn't appear to encounter an upper bound by the time I stopped it. Progress is quite slow though and there is probably cause to revisit the genetic algorithm i'm using (might be time to read some books).

This is a very significant improvement and creates much more robust detectors.

Because the genetic algorithm is doing the heavy lifting all I had to do was change the sorting criteria for ranking the population. If the area under the ROC curve is the same for each individual then the distance between the mean positive and mean negative score is used as the sort key instead.

The Work

So i'm kind of not sure where to go with this work. A short search didn't turn up anything on the internets and recent papers are still mucking about with MP-LBP and integral images on GPUs which I found 2 years ago are definitely not a marriage made in heaven. The eye detector result seems remarkable but quite a bit of work is required to create another detector to cross-check the results.

The code is so simple that it's effectiveness defies explanation - until the hidden maths is exposed.

I started writing it up and I've worked out where most of the mathematics behind it come from and it does have a sound basis. Actually I realised the algorithm is just an exisiting common algorithm but with a single specific decision causing almost all of the mathematics to vanish through simplification. I first looked at this about 18 months ago but despite showing some promise it's just been pretty much sitting idle since.

Saturday 4 January 2014

tablet firmware

I had my occasional look for updated firmware for my tablet yesterday - an Onda V712 Quad - and was pleased to find one came out late November. All firmwares, I guess.

For whatever reason this particular tablet seems to be remarkably uncommon on the internet. Apart from the piss-poor battery life it's pretty nice for it's price - although that is a fairly big issue I guess.

I apparently bricked it running the firmware updater via microsoft. Nothing seemed to be happening/it said the device was unplugged, so after a few minutes I unplugged it. Don't really know what happened but the list of instructions that then popped up in a requester managed to get it back on track. Not that I needed it this time, but every time I go into the recovery boot menu I forget which button is 'next' - the machine only has power and home - and always seem to press the wrong one, but luckily I didn't really need it. I know from previous readings that the allwinner SOCs are completely unbrickable anyway, so i wasn't terribly concerned about it.

After a bit of confusion with the - i presume - ipad like launcher they decided to change to, I got it back to where it was before. I didn't even need to reinstall any apps.

So although it's still android 4.2 (4.4 is out for some of their tablets but not this one, not sure if it will get it, Update 11/1/14: 4.4 is now up for my tablet but i haven't tried it yet) they fixed a few things.

The main one for me is that media player service plays streaming mp3 properly now: previously my internoderadioplayer app wouldn't play anything onitdidn't work. I might be motivated to fix a couple of things and do another release sometime soonish.

Other than that it just feels a bit snappier - although I really wouldn't mind if you could set the display to update at some low frame-rate like 12-15fps to save power. That full-screen render can't be cheap on battery and most of the animations just give me the shits to start with.

I'm still a bit annoyed how they changed the order of the software buttons along the bottom - having back in the corner followed by the rest made much more physical sense with all the varying screen sizes out there. Having them centred is a pain in the arse, and I keep accidentally activating that 'google' thing on that bizarre circular menu thing off the home button when trying to scroll scroll scroll through long web pages because there's no fucking scrollbar on anything anymore. I don't even know what that google thing is for (read: i have no interest in finding out) but I sure wish I could disable it from ever showing up.

Pretty much only use it as a web browser for the couch anyway - the screen is too shiny to use outside (despite a very high brightness). Typing on a touch screen is utterly deplorable, and playing games on one isn't much better. It's just passable as a PDF reader, although I wish mupdf handled off-set zoom with page-flipping better (it's a hard thing to get right though). I'm finding the over-bright black-on-white becoming somewhat irritating to read for very long so i might have to patch it for a grey background anyway. Actually it's quite useful for finding papers - for some reason Google search from my desktop has decided to keep returning stuff it thinks I want to read, rather than what I'm searching for. So that just means I keep finding the same fucking papers and articles i've already read - which isn't much use to me. Fortunately for whatever reason the tablet doesn't have this problem, ... yet.

I had to turn off javascript in firefox because it kept sucking battery (and generally just kept sucking full stop), any websites or features that rely on it just wont work - if that breaks a web site then I just dont bother returning. I've no interest in commenting on blogs or forums from it so it doesn't break much for me. Amazing how much smoother the internet is without all that crap. Everything has layout problems too because I force a typeface and font size that's big enough to read; but I have that problem on every browser. Dickhead web designers.

Friday 3 January 2014

eye detector again

I was playing around with generating ROC curves for various algorithms - first I was trying to determine whether some new LBP codes I came up with worked better in certain cases (not yet, but they have promise, ok i kept playing as below: maybe they do already). For this purpose I implemented a basic LBP histogram matching algorithm, as I figured having baseline would provide a decent comparison.

This just took all eye images as one class and all non-eye images as another and then measured the distance of them all to the classifier. I'm using 20x20 images and the classfier creates 16 histograms from 8x8 tiles overlapping by half in each direction. It isn't terribly valid as a measure because it is using the same set for training and for testing but i'm only after the relative results. There are 10560 positive training examples and 9940 negative examples in the data set, all taken (and synthesised) from Color FERET fa partition. I'm only using abs difference as the distance measure which isn't ideal, and perhaps the 8x8 tile area isn't a large enough sample to build a decent histogram - in short, these histogram results are not tuned for performance.

The Zucchi LBP is using some directional differential filters in order to build the LBP code bit-planes. The idea is that they're tunable to the problem, should be more robust to noise, and generally produce a more noise-free and accurate LBP code. They are much more expensive to calculate however. This tunability offers another possibility for GA optimisation. I guess this is something else I should really write up properly.

Then I revisited the classifier I came up with late 2012 (the one I put into a small android app). Here the data and LBP transform are the same, only the algorithm has changed.

Blew me away a bit, I really didn't expect that much of an improvement. The 'Zucchi' algorithm requires a tiny fraction of the storage and fewer cpu instructions to process. Training takes more memory but fewer instructions per pixel. The original goals in designing it were for it to be SIMD parallelisable (at execution time) so it optimises very well. The first algorithm should be more robust to registration errors, but on this test the differences are remarkable and in many cases you don't want/need such robustness.

Then I wondered if this classifier could be trained using GA instead. Here I truncated the training set to 8192+8192 images to match the OpenCL GA algorithm, so the results are slightly more valid.

Somewhat surprisingly given the size of the problem (each individual is ~6K bits) it works extremely well and attains a result in only 300 generations. Actually the problem I have is that it is almost too good(tm) - before doing much of a search of the problem space the utility of the fitness measure has been exhausted and the population no longer evolves. One might also note that this result is for the relatively information poor 4-bit LBP codes, and in all cases is only a single-stage classifier.

Given the results I should probably look at the data that isn't classifying properly at each end, some of it may just be poor / incorrectly labelled data. Update: I had a look - the false negatives at the upper end are from synthesised images some of which are over-rotated or rotated off-centre. A couple of the false positives are samples taken a bit too close to the eye although most are legitimate and from around under the nose to under the mouth.

The 8-feature test as described in the previous post doesn't fare so well here - even though it is an effective eye detector in practice and requires about 1/3 of the processing.

Here the differential LBP code is working very well too. And it works extremely well as an eye detector in practice.

For this post I wasn't going to run the differential LBP codes through the GA algorithm based on the first plot - it didn't seem that good. But the final plot and this eye detector heat map somewhat validates the initial reasoning behind the algorithm - reduced noise. See the previous post.

Looking closer at the LBP code image I now see there isn't enough similarity between the left and the right eye regions in the LBP domain to try to create a classifier which resolves both at the same time. What I might do instead is create separate classifiers but not include the other eye in the negative training set - this should improve the results of both. I guess i'm near the point where spectacles should be added to the mix.

I've started writing some of this up but i'm a bit lazy to put the effort in required to do a good job. Apart from the fact that i'm only doing this for self education and entertainment purposes in my spare time right now, i've little experience writing papers. I'm purportedly on holidays for that matter - technically i'm even unemployed but literally i'm just between contracts.

Update: Oops, looks like I made a bit of a mistake. I thought I was using a decent resampler whilst generating the scaled training data, but it wasn't - so aliasing was creating noise in the small-scale images and thus in the LBP codes. This is probably why the derivative LBP code managed to win out in the end. This mistake will lead to the plots showing poorer performance than they should be - but i'm not going to re-run the plots right now.

Seeing that the classifier was doing such a good job, I thought i'd push it a bit harder. I have a couple of other ideas too but the first one I tried was to create a classifier for a very small region. This has very large implications on execution time. So whilst working on an 8x8 classifier I noticed the aliasing and realised the scaling problem. I still have an issue with pixel-boundary sampling when extracting the normalised eye so the data is of a lower quality - but it has to deal with these problems in the input data too.

An 8x8 detector is absurdly small. Here's an example of a decent training image from the training set:

Testing on Lenna's photograph gives quite a few false positives - but out of 181 244 tests, 30 or even 100 false positives isn't too bad for a single-stage classifier. An 8x8 classifier requires 16x less total processing from the inner loop vs a 16x16 classifier to detect the same sized features at the same input scale - so even a modest true negative rate could be a big win if it is reliable. And this doesn't count the image scaling and LBP operator both of which also scale at an N*N rate.

This is just one run of a single-pass 8x8 detector generated via a GA using left and right eye images as the positive set. The threshold was chosen roughly. There are a total of 43 hits out of a total of 181 244 probes, with a false acceptance rate of 0.022%. It took about an hour to generate this detector via pure-Java code on a 6-core intel machine.

This is one run of a single-pass 20x20 detector generated via GA using left-eye images as the positive set. It is being executed over a wide range of scales from 0.2x to 1.5x of the original (512x512) source. There are a total of 22 hits out of a 5 507 592 probes, with a false acceptance rate of 0.00016%. It took about 3 minutes to generate the classifier using the same pure-Java code on the 6-core intel at which point evolution stopped due to having a perfect classifier on the training data set. It took under 200 generations on the improved input data vs the numbers in the top-half of this post.

Update 2: I wasn't going to, but I compared it to the raw results from the left-eye cascade from OpenCV - i remembered it not being particularly good but wanted to quantify it. This is showing the raw hits as exit the cascade and do not include grouping which would remove most of the false positives. There are 313 total hits.

For comparison purposes I re-ran the 20x20 classfier with settings that result in about the same number of scans at roughly the same scales. Here there are 64 total hits, although I just chose an arbitrary threshold value (a fairly loose one however). Haar cascades only provide a binary true/false result and have no threshold that can be adjusted - they also provide no quality indicator so there is no way to choose a peak match and one must resort to error prone averaging and merging.

This code executes about 30% faster than the haar cascade although perhaps the looseness of the haar detector would allow it to be run on fewer locations at the cost of accuracy. However; accuracy is often important if not critical. It should be possible to train a better haar-cascade but I haven't had any luck geting even close to the ones supplied in OpenCV. Likewise this is still only a single-stage classifier and can still be improved (I think) quite easily.

One last data point - the 8x8 classifier over similar scales executes 10-20x faster. These classifiers also scale much much better on parallel hardware - all the way from SIMD to multi-stream GPUs, so this 30% figure is a little misleading. It would also take only a tiny bit of FPGA logic ...

Thursday 2 January 2014

eye detector

I thought i'd see how well the detector works on more general data so first I cleaned up the data a bit and ran against a test image. I cleaned out some closed eyes and excessive squint, mis-alignment, and fiddled a bit with the ratio of eye-size to region. I ended up going with a 20x20 region because adding a bit more height to the classifier seems to help with nose and mouth false-positives. I synthesised a number of rotated images but kept the scale fixed, and included both left and right eyes in the positive training set.

In short, I trained a classifier for open and straight-on eyes only, with a small amount of head tilt (+- 15 degrees) and only misalignment variation to the scale. No spectacles.

This is the raw results after running the detector over 19 scales from 0.2 to 0.5 of the original Lenna image. The scale range isn't chosen arbitrarily obviously. Whilst the threshold can be adjusted to remove the false positive I left it in to demonstrate some remaining issues. Although blank areas aren't that big of a problem as they can be detected via other means such as variance.

This was the single highest response from all scales tested. The heat map is rendering exp(score), and the result is normalised to highlight the peaks more clearly. The signal being processed is the local binary pattern (LBP) image in the centre, which is a 4-bit CS-LBP 8,1 operator with pixel centered sampling.

As can be seen, It's very good at picking out Lenna's right eye ... her left eye is a bit obscured and off-angle and generally gives the detectors i've tried a bit of grief. It doesn't work well with outside images due to squint, or portraits against a blank background (blank areas score well). But given all of the negative training data come from only face images, it's working quite well as suppressing non-eyes. It has some problems with scale - once the face is about 2x the size the detector expects it starts to pick up eyebrows/noses/mouths/under-eyes but again given the training data this is to be expected - and generally the eye detector is only run after you have face candidates.

This is just with a single 8-feature (4x4) classifier which processes a 20x20 region. I had intended to create a multi-stage classifier but I got side-tracked.

This is the regions used by the classifier created in this example, overlayed with one of the positive training images. The data comes from Color FERET. Any significant overlap is essentially wasted processing (apart from a weighting factor of the overlapped region), so in this case the ga may not have chosen the best solution. It may be worth investigating whether I can get better results by pre-locking the regions and just letting the genetic algorithm choose the other classifier parameters instead.

It took a few runs of the classifier to get a this good result - although the ROC curve fitness measure works well, each run tends to 'lock on' to specific feature points and so that limits the performance to a local peak. If a random bit-flip hits the coordinates it just creates a poorer classifier and it dies immediately.

Some earlier trials had greater problems with background but this particular one was looking very good after about 1K generations so I just kept letting it run for much longer to see if it would get better - it did slightly and the first individual reached 'perfection' on the training set after about 55K generations (around 40 minutes, 8K of each class). Although that wasn't necessarily the best classifier in practice.

Experiments on-going ...

I've moved! So long and thanks for all the fish!

A Hacker's Craic