Sunday 31 May 2009

Press Any Key

Well that was a strange weekend. I felt very odd Saturday - very sore throat, dizzy, really nasty headache and a bit of a fever for a while. The fever and dizziness went but I've still got a damn sore throat (a bit less so). I slept a lot.

I managed to get sucked in to poking on code again ... and once I start it's hard to stop.

I wrote up a basic keyboard device - it's a user-mode server which listens to an incoming message port for requests messages and for interrupts. If it gets an interrupt it buffers the keys (and provides decoding/etc), and if it gets a request it returns a key from the buffer (or queues it until one comes along). Very simple code. Then I wanted to have it synthesize key repeats too -- I find the pc keyboard repeats too slowly -- so that meant looking at writing a timing sub system. I tried to think of how these devices should be setup and initialised but for now I've just hardcoded their process creation from the kickstart routine, and use a simple manual process for accessing them via a public port name.

So then I had to work out how to use the APIC for timing - I'll leave the PIC timer for scheduling for now. And with that, I wrote a preliminary 'timer.device', although I haven't tried to see if it works yet. I ummed and ahhed over putting the timing service directly in the kernel, rather than in a user-service. But the kernel can only send signals to processes, and the timer device can send messages, so it seems like it's more useful, even if there is a somewhat more overhead involved. Sigh, I wish there was another APIC timer or two, it would be nice to have a good reliable period counter as well as one to be reprogrammed for varying intervals.

Friday 29 May 2009

It works

Well, the next thing works.

I finally implemented 'protected non-copying message passing', after a pretty lengthy effort. Phew.

Once I started putting it together I realised I needed a 'virtual address range' allocator, since I want to have a fixed global address range for messages. After a bit of mucking around I settled on an AVL tree based implementation, using first-fit. Although I'd written an AVL tree before (one perfect for the purpose) I'm not sure which version works anymore or where it is, so I just took the parent-based libavl implementation and 'tweaked' it to be more suitable - tree nodes are 'embedded' in objects (no data pointer), and never allocated, and there is no tree object or traversers (comparision functions are passed as arguments if needed). So basically it now has the same api as the one I wrote, but I know it's already debugged and working.

The nodes keep track of the unallocated memory range, and are sorted by address. This makes it easy to coalese blocks if adjacent ones are freed, and so on. Looking up an empty block is O(n) but the empty block list should be pretty short. It shouldn't end up with much fragmentation since it's allocating groups of pages at a time. In the past I did a lot of research on memory algorithms and it ends up that first fit has some desirable characteristics that best fit doesn't - which is nice since it's simpler too.

Anyway, once I had a range of memory, it was simply a matter of mapping that to the callee process when they allocate a message, and re-mapping it to the destination process when it is sent. Well, almost. I was going to have the kernel take ownership of the memory but since it just uses the last-process's page table it can't, at least without globally sharing it and that loses any protection from other processes. So instead a PutMsg maps the memory to the target process's page table immediately, and tracks the object separately (unfortunately requiring dynamic memory allocation). Page tables are only changed when the thread changes - so the page table update shouldn't involve any unexpected overhreads. Once the target process invokes GetMsg, the kernel just returns it's pointer. It's something that needs to run fast (although 'fast' is relative when the cpu is so fast), so it would've been better to avoid the AllocMem/FreeMem overhead required to queue the message, but maybe I can find some other mechanism if it becomes an issue.

I also ran into an optimisation thing that interfered with the nasty hack I'm using to implement 'tag lists' - which basically treat a varargs list as an array of pairs of ints. Anyway, I had a small inline wrapper to call the real function (that takes an array) and all the varag arguments were getting optimised out. __attribute__ ((noinline, unused)) works for now.

Now that stuff is sorted, I guess I can start looking at 'devices' next (funny, I thought I was at that point a couple of weeks ago). Although I think I need to rest this weekend - I've gone and bloody gotten sick again. I usually don't get sick too often, so 3 times in a month is a bit of a shock. Just a sore throat so far this time, but it's enough to get in the way of things.

Thursday 28 May 2009

One door closes ...

... and another opens, in a never-ending corridor of closed doors. Or so it seems.

I managed to get at least some virtual memory stuff working. I can now create processes which have independent address spaces. Yay. After having to fix a lot of lazy coding errors along the way too. Hmm.

And i've got some initial process startup code done. It can initialise the 'exec' library for use by the process before it calls the entry point - from this it will then eventually access all other resources on the system. I'm still working on some of these mechanisms, but it'll do for a start. What makes these things tricky is the virtual memory (and memory protection), for example I can only easily pass a few registers to the initial entry point, unless I add a lot of messy page mapping stuff to pre-load the stack or data space. And I can't call anything inside exec until the structures are setup, so it all has to be done manually (i.e. prone to error). So i'm getting by with 2 arguments; real entry point, and the start of a 16kb block of pre-allocated application memory. Once exec is setup, the startup code can then use various calls to get arguments and whatnot if it needs them - which will be stored in the in-kernel process object.

But, with one hurdle out of the way, another pops up. I have to get message passing working again since all memory is now effectively process-private. I guess I will have to go with some sort of system call to allocate messages in a memory space that can be accessed by all processes (eventually moved amongst them), although I can probably hide that behind the memory allocation interface (MEMF_PUBLIC allocation flag, or so). I'd like any solution to work reasonably efficiently for intra-process message passing too, but I have a feeling that'll probably just about fall out of any solution anyway.

I still need to work out how i'm going to bring up the whole system too - the initial setup of all 'internal' libraries and devices and so on; if there ever is any that is.

One step at a time.

Tuesday 26 May 2009

MMUmbling along

Twas a bit of a slow weekend, still feeling a bit funny from the cold last week and started hung over from a few beers in town on Friday night. Saturday we went to McLaren Vale with a friend and I ended up buying way too much wine as usual. It always seems to taste better the further you go ... And I was pretty wasted by the time we got home and cooked up a BBQ.

When I had time to code I was just re-arranging stuff again. I hadn't really 'interface-ised' the code properly so I needed to finish that off. Plus I decided to move all of the supervisor level code to a different directory and make it run off it's own management structure rather than 'execbase', and to settle on a consistent naming convention for structure members. So lots of killing and yanking lines of code and renaming stuff and mucking about with makefiles.

Last night I nutted out some of the main mechanics of process creation, although I still have a little way to go. I've again (about the 3rd time) written a set of utilities for MMU management but I've decided to go with the page tables allocated directly in kernel space, and I keep track of the virtual and real address of these using separate pointers. For memory pages I don't need to know physical address after i've setup the PTE, so I just track those by index in an array.

I was going to map the whole page table to the last 4MB of memory using the self-referential PDE 'hack', but I decided against it. For one, I need to be able to easily manipulate more than 1 page table if I want to re-map messages between processes. And the overhead of keeping track of the page table virtual addresses is small.

For keeping track of physical pages I just have an array of ints, one for each available page of memory (to make it easy, it will be every page from address 0 to the highest physical address present). For available pages I have a linked list stored inside those ints. And once they've been used they're unlinked and the content becomes a reference count. And I have up to 12 bits I can use to indicate other things if required. I'm thinking the reference counting might be handy for keeping track of shared memory - on the other hand, I may just use globally shared sections to load shared library code into, so it might not be necessary at all. For now, it's also the simplest implementation.

I'm still thinking about the process creation primitives too. Maybe a CreateProc that takes either a function address in the current process, or a loaded ELF image. In the former case, i'll just copy the process and call the function, similar to fork() although there may be more specific argument passing system. With the other case it will 'steal' the pages from the calling process and map them to the new process and kick it off. Might need to allocate the memory in a special range so it can be 'stolen' easily , or do some other sort of magic so it works, or even use a process loader server; in either case the loader would manage the details.

The first is required so I can start some initial in-`rom' processes, and also to implement fork() for some possible future posix layer. The second allows a process to load another one relatively efficiently, and do most of the work in user-space on the callee context.

I will have a separate task creation primitive which basically works like pthread_create does.

And I was thinking about alternative platforms again. I'm still dreading the point where I have to deal with even more fucked up components of the PC architecture like video and sound. I should really bite the bullet and get some sort of ARM platform - they're just not that easy to get around here yet. Maybe I should delve into the PS3 as well.

Friday 22 May 2009

Cold, kernel, KDE

After the cold I had last weekend I haven't been able to do much. I can't seem to get up in the morning and remain light-headed most of the day. So I've been dithering a bit with my coding and getting caught up in too many irrelevant details.

I got stuck contemplating the 'object' system used by libraries, and got side-tracked thinking up different approaches. I think i'll stick to what I have for now (similar to OS4's); as an 'object system' it is limited, and actually 'backwards', but as a collection of 'interfaces' it makes some sense. I still might go with something a lot simpler too (it's not like I'll really have to worry about keeping a stable api 'going forward', if it's never finished and/or never used). I've also been thinking of the user/kernel split some more. I'm liking the idea that each process will be more like a protected virtual os environment, and the 'exec' library will just run the same as any other local library. It will then have a very limited set of system calls only it can make to do things which require supervisor mode, or call a server for things which require global data. The 'supervisor' kernel code will then have its own data structures apart from the rest of the system and be quite limited in functionality.
  • Processes and Tasks
  • Signals
  • Ports and Message Passing
  • IO Port Management
  • Cache Management (for DMA/etc)
  • Virtual Memory Management
I thought i'd setup KDE on my old laptop for my flatmate, until now it has just had blackbox and xdm on it (xdm is rather difficult to get running on fedora too). Wow. KDE is so ... quaint. I can sort of see where they're coming from but they've missed the boat on in so many ways it is difficult to enumerate them all. The launch menu is just odd - let's put a scrolling window inside a menu? The 'shutdown' button is even more confusing than xp. And everything runs like a total pig, why is resizing windows so slow? Just bringing up kdm vs xdm is another 20 seconds on the boot time and logging in another 20 or so. The sound scheme is really annoying. It's probably better than XP. But that isn't saying much.

Tuesday 19 May 2009

Share And Enjoy

Well I finished moving the executive library over to use object interfaces, and managed to get enough of the bootstrap process done so that it can now launch a `user process' which may one day complete the job. Still no virtual memory management but it's a start.

Then I started looking at the process of creating and initialising libraries. InitResident is the function that creates libraries and devices from simple (rommable) definitions. It allocates the base object and sets up it's interfaces and may call init functions on either.

But the real issue is how to handle this type of shared library in a protected memory environment.

Basically the library (and device or resource) is just an object. It has a data area referenced through an instance pointer (with public and private parts), and it has a set of interfaces (function table). In the no-mmu case when you 'open' a library it (normally) only gets initialised once (for the entire system), and can contain global data within its `base' object. If it needs to do a lot of initialisation work (in time or space) then this can be a huge win since every process using the same library doesn't need to do it every time. But how to do that with multiple processes with virtual memory? In OS4 they just stick to using `public memory' - globally shared read/write memory. It's not as bad as it sounds, for a single-user system, but I'm looking at adding more protection.

Code is easy, it can just be relocated to the same virtual address range and then mapped to all processes. BSS+Data is a pita. I will need to either force no globals (no Data or BSS) or implement copy-on-write. Forcing no global data is fine for any libraries I write, but if I were to `wrap' existing libraries it might be too limiting. Library bases (the data component of library objects) are another issue. Basically once the init code is run, a library expects these to be initialised and accssible for all interface calls.

About the only idea I can see working is to completely re-init the library base again for every process that opens it. If the library wants some sort of global shared state it could start its own server process and use message ports to communicate; I will need to do this anyway for devices so maybe I'll put the feature into libraries instead. It loses some of the benefits but there are big gains in security and isolation.

The executive library is a special case, for one it needs to be set up before the setup functions inside it can be called. However it may make sense to have per-process data structures the library can access from process space too. So far I have had the supervisor-state code use the library base to store global system data, and I was thinking of having it accessible read-only to all processes. But maybe they can have their own copy of some of the data, e.g. memory allocator tables or current thread-id or thread-local-storage stuff. That way processes get their own copy of 'execbase', and the supervisor code has its own structure (or perhaps its own instance of the same structure to allow for maximal code re-use). Although I don't want to be duplicating any significant amount of data during a context switch a pointer or two might be ok. More thought required.

Monday 18 May 2009

The devel's in the details

Well the last post was written before the weekend, and it's been a long snotty weekend since then. I've become a little obsessed with it lately although I found a couple of hours to fix a sticking door.

I spent the rest of the weekend glued to the screen hacking away feverishly - although I'm getting stuck doing more support code than anything else.
  • Had another stab at MMU code, but i'm still not comfortable with what i'm coming up with so again I've shelved it for now. I'm getting too caught up with trying to write neat re-usable code or worrying about efficiency.
  • Wrote an extremely rudimentary ELF relocator, which will be required for shared library support at least. The doco's a bit slim so it was a bit of mucking about to get anything going.
  • More work on re-arranging the code structure to isolate platform-dependent parts and create a practical file heirarchy. Lost of makefile crap.
  • Added the ability for user processes to attach to interrupts. They just get translated into signals. I'm not sure if this will be flexible or efficient enough.
  • Came up with basic process and task (thread) objects and creation primitives and more thoughts on how they might work. Yet to test.
  • More thinking and research about the kickstart process after the initial entry point. How to set-up exec, initialise ram, initialise included modules, etc. AmigaOS has a nifty extensible mechanism for automatic module-discovery and initialisation which i think i'll mimic.
  • Broke the context switcher for a few hours with a silly mistake. Bloody hard to debug. I started building with optimisation turned on too - about halves the number of instructions i have to step through.
  • Worked on implementing the user-level interfaces as object references rather than linked calls. Lots of scafolding support work required here. I also need to decide how I handle in-kernel calls, although they will be isolated to a single library.
  • More thoughts on how shared libraries and devices might work, although i've still got a way to go there.
Before one of the more major re-orgs I burnt the iso image to a dvd and loaded it up on an old pc. As much as bochs and qemu are handy, it was quite inspiring to see it work on real hardware even if it didn't do much.

Although they're not particularly efficient, i'm thinking of using taglists a lot for many of the interfaces - anywhere where you might need to extend functionality. At least on x86 32 bit they are very simple. If a syscall is required, the user-level code will then marshal the taglist into a more compact private structure and invoke the system handler. This would let it perform some validation checks too although the supervisor code will probably still have to do the same to avoid malicious code; which is an expensive pita. Maybe I can let the 'kernel' crash in this case and just throw away the calling task 'safely' instead - let the hardware do the checking.

I guess i'll be stuck doing some of this housekeeping work for a while, and then I can get back to trying to get a device going, or something else more 'interesting'.

Friday 15 May 2009

Asynchronous non-copying message passing

Well I decided to shelve work on the MMU code for now and focus on easier targets. I implemented the basic functions for message passing first. Although since I turned off the memory protection things are quite simple - it just passes around pointers for the most part. The idea for enabling memory protection would be to have each 'message' allocated page aligned in a global virtual address range, and just re-map the page to the target process address space when it is received or replied to, thus enforcing the rules of who owns the memory when; although any data the message references is another issue. I could force messages to include all data they reference (they have a length field), and perhaps special-case io request's buffers. If i force those semantics, then it could probably allow for a copying implementation to be written as well.

Ports are stored inside kernel memory and referenced by handle, rather than by address in user space - so they can go away without fatal problems and can't be corrupted. One problem is that I will probably need to add another query system call to get a copy of objects like these as they contain information a task might want to access - like the signal bit used. Probably useful for other privileged system objects anyway, like library bases or tasks. Anyway, I managed to send messages between tasks 'in the normal way'.

I also had to write some TagList utilities - and I added a new one which can 'pack' a taglist into a structure. Somewhat inefficient but it'll do for now.

And i'm starting to think about how the various `kernel' interfaces might work in practice. Basically everything is accessed through (globally) shared libraries with a few system calls thrown in when necessary. But some of the code wont need to run in supervisor mode at all or go to another server.

So what is the next thing to look at ... well I decided to look at devices, and again checked out the AmigaOS implementation. It's a little different from microkernel models I've read about - io requests can be handled on the user context, for example. And apart from that they are actually implemented as shared libraries - complete with the possibility of public functions directly callable. But long-running i/o functionality runs in a separate task and may interact with interrupts and so on - like a normal uKernel approach.

In a protected environment things will have to change, although the basic ideas seem quite sound and usable.

But before I can get that far there's a lot of mucking about to do. I need to work out the design of the process and thread model and how they'll start and finish.

And how to do libraries; devices are just libraries too. Shared libraries will be objects, not just function tables, and in a virtual memory environment this creates some issues. e.g. should it allow system-wide shared data, per-user shared-data or just per-process/thread state. Also do I want to enforce `pure' re-entrant code for libraries - which saves the hassle when working out what to do with the data segments.

Wednesday 13 May 2009

Signals and such

After a couple of late nights I've hacked up a few more bits and pieces for my 'os'.

Signals were pretty easy to get going. These are a bit like posix signals, but also not at all the same. A low-level asynchronous bit-level synchronisation mechanism basically. You can allocate a signal 'bit', wait on it (task goes to sleep) or send it to another process (thus waking it up, and perhaps pre-emptying yourself if it has a higher priority). It took a bit longer than I'd hopped because I aso 'cleaned up' the code - started creating a better directory structure and moving things out of the single c file I had. I rewrote the interrupt handler and system call assembly too - they're pretty trivial so it wasn't much.

I also did a lot of reading about `OS4', the latest AmigaOS. Some interesting ideas there, particularly the extension of the basic object oriented nature of the system further. Hmm, food for thought.

And investigated possible ways to get video output in the future. Sigh. PC video hardware sucks. It's not something I want to get bogged down in, but I might have to if it comes to that. I guess I can get some drivers from some generic library/X/or linux even, if they can be extracted or wrapped somehow anyway.

Being annoyed at the PC hardware I was looking at getting a beagleboard to play with - ARM looks like a lot more fun. But just as I was about to hit the buy button I had second thoughts. The display hardware is completely proprietary and undocumented and I can't really see the point if I can't get that to work. A real pity, as otherwise it looks like nice hardware that would be interesting to work with. Maybe somewhere down the line.

Then the next night I started looking at how to implement message ports and message passing. On AmigaOS these are really just arbitrated queues associated with a (possibly shared) signal bit. Very simple, very fast. Question is how to add memory protection in a way which maintains the simplicity. And since I hadn't done any paging stuff yet I got side-tracked learning how that works.

For message passing w/ memory protection in OS4 they've implemented MEMF_PUBLIC, so it actually means something, but I presume it also means all public memory is public to all processes, which is probably not what I want. That allows all processes to access the shared resources like message ports using the same virtual memory address. I thought instead perhaps I could just allocate the memory in a fixed 'public/shared' virtual address space, and migrate the memory around to the current owner using the paging system. But after reading various comments from kernel developers about the performance, I'm not sure it's a good idea. Then again, maybe it's just the cost of using an MMU and can't be avoided. And then I got so side-tracked with how the VM system should work (do i leave it in kernel space, or implement it as a service, etc) I feared I was getting too bogged down and had to put it aside for later.

And even as far as message ports go I probably need to move to a 'handle' model, rather than passing around pointers. So that meant an allocator that could initialise all the user fields, and OS4 has a generic tag (varagsish) based allocator for those so I had to look into tag utilities and such before even having a chance to go further.

And today i've got a little cold. Hrmph.

Sunday 10 May 2009

More muck'n about

In the quest to find something to keep me busy I've switched tacks again. I've been playing about with writing an operating system, of all things. I guess it will only be a diversion as it would take years of work to get anywhere, but it is nice to play with some low level stuff for once.

I think the idea came after I found some link to Minix 3 and had a look at that. It looked quite interesting, and I always had a soft-spot for micro-kernels. It's surprisingly functional and very small - boots in an instant. Although one problem is you really need to buy the book to understand the code as the documentation is lacking, and even then the code has moved on quite a bit since the book was written, so it isn't terribly useful anyway. The way the various servers inter-communicate isn't very simple either; for a microkernel it seems to have a tighter coupling between servers than I'd have imagined, although there is probably a reason for that. I think the original goal of nice clean readable code has also been corrupted by the bolting on of new features like virtual memory and a vfs layer. And then there are some serious performance issues with it's synchronous message passing design; well not so much the message passing bit (although the overhead is quite large) as the way the services have been implemented. I'm sure it makes the code simpler but it just doesn't scale. And no threads. I do like my threads.

But anyway, the idea of a tiny micro-kernel intrigued me, so I read up on Minix 3, HURD, Mach, L4, ECos. Grand ideas flashed before my eyes. Then all the work involved pulled me back to earth. It took me days just to get a context switch working because of the shitty PC hardware (more on that below). So I definitely want to keep things simple and have pretty low expectations of where things might end up, if not just another long-forgotten directory on one of my PC's (which is pretty much a dead-set certainty no matter what else happens).

The PC isn't nice to play with at all. How disappointing that such an utterly bullshit architecture ever got so common and cheap. First there's the processor, which starts with an ancient and obtuse instruction set and then just goes down-hill from there. Segments and TSS? Enough said. Even the manuals aren't very good; after having read other processor and hardware documentation, they seem to make things far more difficult to find than necessary and leave out too many important details. Then there's the rubbish hardware beyond the rubbish processor. Only one real timing source? And even that isn't very good. Jesus, even the bloody C=64 had far more advanced peripheral hardware than that. Even one of it's timer chips had 2 timers (plus a TOD/clock - which also had an interrupting alarm function), each of which could create interrupts, and they could be cascaded for very long intervals, they even had safe easy programming interfaces - and it had two of these chips! I don't even have to mention the SID chip vs's the PC sound - oh that's right, lets just use another timer channel to make offensive beeps instead.

Sigh. So I should really be targeting the PS3 I guess; maybe if I get the system simulator up I'll look into it. The docs on the hypervisor are a bit slim though.

Well, despite all the rubbish you have to deal with I'm making slow progress; thanks to Grub and some of the free OS's out there and making lots of mistakes in Qemu and Bochs. I have a pretty simple context switch and system call mechanism sorted out, and basic interrupt handlers for timer and keyboard. I got bogged down just writing a very simple memory allocator, which was quite disheartening (I blame lack of sleep; I made it in the end). And instead of Minix or HURD I'm looking to AmigaOS for inspiration instead. It's such a simple and functional design; although I'd like to play with memory protection and similar ideas which don't translate directly and will make things a bit trickier and slower (than they might be otherwise). So that probably means the next thing to look at is signals - which are quite trivial (the only mechanism in AmigaOS used to wait on events or signal them), and then message ports - which aren't much more complex (although separate address spaces makes the whole notion quite different). Once they're done I can probably look at implementing simple devices like keyboard and timing. Although I might also have to look at how I might handle process isolation and virtual memory at some point fairly early in the piece too.

Well, unless I get bored with it all before then.