Too Busy For Words - the PaulWay Blog

Wed 28th Feb, 2024

The Experia, one year on.

On Friday, 1st March, it will be exactly one year since I walked into Zen Motorcycles, signed the paperwork, and got on my brand new Energica Experia electric motorbike. I then rode it back to Canberra, stopping at two places to charge along the way, but that was more in the nature of making sure - it could have done the trip on one better-chosen charging stop.

I got a call yesterday from a guy who had looked at the Experia Bruce has at Zen and was considering buying one. I talked with him for about three quarters of an hour, going through my experience, and to sum it up simply I can just say: this is a fantastic motorbike.

Firstly, it handles exactly like a standard motorbike - it handles almost exactly like my previous Triumph Tiger Sport 1050. But it is so much easier to ride. You twist the throttle and you go. You wind it back and you slow down. If you want to, the bike will happily do nought to 100km/hr in under four seconds. But it will also happily and smoothly glide along in traffic. It says "you name the speed, I'm happy to go". It's not temperamental or impatient; it has no weird points where the throttle suddenly gets an extra boost or where the engine braking suddenly drops off. It is simple to ride.

As an aside, this makes it perfect for lane filtering. On my previous bike this would always be tinged with a frisson of danger - I had to rev it and ease the clutch in with a fair bit of power so I didn't accidentally stall it, but that always took some time. Now, I simply twist the throttle and I am ahead of the traffic - no danger of stalling, no delay in the clutch gripping, just power. It is much safer in that scenario.

I haven't done a lot of touring yet, but I've ridden up to Gosford once and up to Sydney several times. This is where Energica really is ahead of pretty much every other electric motorbike on the market now - they do DC fast charging. And by 'fast charger' here I mean anything from 50KW up; the Energica can only take 25KW maximum anyway :-) But this basically means I have to structure any stops we do around where I can charge up - no more stopping in at the local pub or a cafe on a whim for morning tea. That has to either offer DC fast charging or I'm moving on - the 3KW onboard AC charger means a 22KW AC charger is useless to me. In the hour or two we might stop for lunch I'd only get another 60 - 80 kilometres more range on AC; on DC I would be done in less than an hour.

But OTOH my experience so far is that structuring those breaks around where I can charge up is relatively easy. Most riders will furiously nod when I say that I can't sit in the seat for more than two hours before I really need to stretch the legs and massage the bum :-) So if that break is at a DC charger, no problems. I can stop at Sutton Forest or Pheasant's Nest or even Campbelltown and, in the time it takes for me to go to the toilet and have a bit of a coffee and snack break, the bike is basically charged and ready to go again.

The lesson I've learned, though, is to always give it that bit longer and charge as much as I can up to 80%. It's tempting sometimes when I'm standing around in a car park watching the bike charge to move on and charge up a bit more at the next stop. The problem is that, with chargers still relatively rare and there often only being one or two at each site, a single charger not working can mean another fifty or even a hundred kilometres more riding. That's a quarter to half my range, so I cannot afford to risk that. Charge up and take a good book (and a spare set of headphones).

In the future, of course, when there's a bank of a dozen DC fast chargers in every town, this won't be a problem. Charger anxiety only exists because they are still relatively rare. When charging is easy to find and always available, and there are electric forecourts like the UK is starting to get, charging stops will be easy and will fit in with my riding.


Other advantages of the Experia:

You can get it with a complete set of Givi MonoKey top box and panniers. This means you can buy your own much nicer and more streamlined top box and it fits right on.

Charging at home takes about six hours, so it's easy to do overnight. The Experia comes with an EVSE so you don't need any special charger at home. And really, since the onboard AC charger can only accept 3KW, there's hardly any point in spending much money on a home charger for the Experia.

Minor niggles:

The seat is a bit hard. I'm considering getting the EONE Canyon saddle, although I also just need to try to work out how to get underneath the seat to see if I can fit my existing sheepskin seat cover.

There are a few occasional glitches in the display in certain rare situations. I've mentioned them to Energica, hopefully they'll be addressed.

Overall rating:

5 stars. Already recommending.

Last updated: | path: tech | permanent link to this entry

Sun 5th Mar, 2023

The Energica Experia

I recently bought an Energica Experia - the latest, largest and longest distance of Energica's electric motorbike models.

The decision to do this rather than build my own was complicated, and I'm going to mostly skip over the detail of that. At some time I might put it in another blog post. But for now it's enough to say that I'd accidentally cooked the motor in my Mark I, the work on the Mark II was going to take ages, and I was in the relatively fortunate situation of being able to afford the Experia if I sold my existing Triumph Tiger Sport and the parts for the Mark II.

For other complicated reasons I was planning to be in Sydney after the weekend that Bruce at Zen Motorcycles told me the bike would be arriving. Rather than have it freighted down, and since I would have room for my riding gear in our car, I decided to pick it up and ride it back on the Monday. In reconnoitering the route, we discovered that by pure coincidence Zen Motorcycles is on Euston Road in Alexandria, only 200 metres away from the entrance to WestConnex and the M8. So with one traffic light I could be out of Sydney.

I will admit to being more than a little excited that morning. Electric vehicles are still, in 2023, a rare enough commodity that waiting lists can be months long; I ordered this bike in October 2022 and it arrived in March 2023. So I'd had plenty of time to build my expectations. And likewise the thought of riding a brand new bike - literally one of the first of its kind in the country (it is the thirty-second Experia ever made!) - was a little daunting. I obtained PDF copies of the manual and familiarised myself with turning the cruise control on and off, as well as checking and setting the regen braking levels. Didn't want to stuff anything up on the way home.

There is that weird feeling in those situations of things being both very ordinary and completely unique. I met Bruce, we chatted, I saw the other Experia models in the store, met Ed - who had come down to chat with Bruce, and just happened to be the guy who rode a Harley Davidson Livewire from Perth to Sydney and then from Sydney to Cape Tribulation and back. He shared stories from his trip and tips on hypermiling. I signed paperwork, picked up the keys, put on my gear, prepared myself.

Even now I still get a bit choked up just thinking of that moment. Seeing that bike there, physically real, in front of me - after those months of anticipation - made the excitement real as well.

So finally, after making sure I wasn't floating, and making sure I had my ear plugs in and helmet on the right way round, I got on. Felt the bike's weight. Turned it on. Prepared myself. Took off. My partner followed behind, through the lights, onto the M8 toward Canberra. I gave her the thumbs up.

We planned to stop for lunch at Mittagong, while the NRMA still offers the free charger at the RSL there. One lady was charging her Nissan Leaf on the ChaDeMo side; shortly after I plugged in a guy arrived in his Volvo XC40 Recharge. He had the bigger battery and would take longer; I just needed a ten minute top up to get me to Marulan.

I got to Marulan and plugged in; a guy came thinking he needed to tell the petrol motorbike not to park in the electric vehicle bay, but then realised that the plug was going into my bike. Kate headed off, having charged up as well, and I waited another ten minutes or so to get a bit more charge. Then I rode back.

I stopped, only once more - at Mac's Reef Road. I turned off and did a U turn, then waited for the traffic to clear before trying the bike's acceleration. Believe me when I say this bike will absolutely do a 0-100km/hr in under four seconds! It is not a light bike, but when you pull on the power it gets up and goes.

Here is my basic review, given that experience and then having ridden it for about ten weeks around town.

The absolute best feature of the Energica Experia is that it is perfectly comfortable riding around town. Ease on the throttle and it gently takes off at the traffic lights and keeps pace with the traffic. Ease off, and it gently comes to rest with regenerative braking and a light touch on the rear brake after stopping to hold it still. If you want to take off faster, wind the throttle on more. It is not temperamental or twitchy, and you have no annoying gears and clutch to balance.

In fact, I feel much more confident lane filtering, because before I would have to have the clutch ready and be prepared to give the Tiger Sport lots of throttle lest I accidentally stall it in front of an irate line of traffic. With the Experia, I can simply wait peacefully - using no power - and then when the light goes green I simply twist on the throttle and I am away ahead of even the most aggressive car driver.

It is amazingly empowering.

I'm not going to bore you with the stats - you can probably look them up yourself if you care. The main thing to me is that it has DC fast charging, and watching 75KW go into a 22.5KWHr battery is just a little bit terrifying as well as incredibly cool. The stated range of 250km on a charge at highway speeds is absolutely correct, from my experience riding it down from Sydney. And that plus the fast charging means that I think it is going to be quite reasonable to tour on this bike, stopping off at fast or even mid-level chargers - even a boring 22KW charger can fill the battery up in an hour. The touring group I travel with stops often enough that if those stops can be top ups, I will not hold anyone up.

Some time in the near future I hope to have a nice fine day where I can take it out on the Cotter Loop. This is an 80km stretch of road that goes west of Canberra into the foothills of the Brindabella Ranges, out past the Deep Space Tracking Station and Tidbinbilla Nature Reserve. It's a great combination of curving country roads and hilly terrain, and reasonably well maintained as well. I did that on the Tiger Sport, with a GoPro, before I sold it - and if I can ever convince PiTiVi to actually compile the video from it I will put that hour's ride up on a platform somewhere.

I want to do that as much to show off Canberra's scenery as to show off the bike.

And if the CATL battery capacity improvement comes through to the rest of the industry, and we get bikes that can do 400km to 500km on a charge, then electric motorbike touring really will be no different to petrol motorbike touring. The Experia is definitely at the forefront of that change, but it is definitely possible on this bike.

Last updated: | path: tech | permanent link to this entry

Mon 26th Nov, 2018

New workshop, new lathe

I've been building a workshop for my wood turning hobby, and finally I had some time to put my lathe in. This involved hiring an engine hoist, disassembling and re-assembling that to get it through the doorway. Because the lathe's motor controller was permanently connected to the motor, power cord, controller and speed display, I had to get the lathe in the room before I could attach the controller to the wall and fit the cables into the slot in the bench. It was complicated.

Fortunately for you, you can watch the whole thing in time lapse format, speeded up so two hours becomes five minutes:

Lathe install

Last updated: | path: tech | permanent link to this entry

Wed 25th Apr, 2018

Temperature monitoring in Linux on an AMD Ryzen processor

I bought an AMD Ryzen processor and compatible motherboard as an upgrade for my home server. Seriously, this thing rocks! Four cores, eight threads, and still 65W of total power draw on the processor.

You need a fairly recent kernel to support the AMD Ryzen processor - the stock CentOS 7 kernels will intermittently and randomly lock up. One way of dealing with this is to install an updated kernel from EL Repo or similar, and that worked for me for a time. But at some point it dropped support for the Hauppauge NovaT USB DVB input I have for recording for MythTV, and I had to recompile the kernel. There are plenty of instructions out there for how to do that and it's relatively painless.

One thing that isn't currently included in the kernel source is the IO driver for the temperature and voltage monitoring on modern AMD boards. In fixing this I learnt a valuable lesson about how lm-sensors actually works. sensors looks for and reads sensor information from all the devices in the kernel that present the right interface; but you need to have the right device loaded in order for sensors to read it. So recompiling lm-sensors or re-running sensors-detect won't fix this.

Instead, you need to install the it87, or possibly the nct6775 driver at Instructions for doing this can be helpfully found at

Last updated: | path: tech | permanent link to this entry

Sun 22nd Jan, 2017

LCA 2017 roundup

I've just come back from LCA at the Wrest Point hotel and fun complex in Hobart, over the 16th to the 20th of January. It was a really great conference and keeps the bar for both social and technical enjoyment at a high level.

I stayed at a nearby AirBNB property so I could have my own kitchenette - I prefer to be able to at least make my own breakfast rather than purchase it - and to give me a little exercise each day walking to and from the conference. Having the conference in the same building as a hotel was a good thing, though, as it both simplified accommodation for many attendees and meant that many other facilities were available. LCA this year provided lunch, which was a great relief as it meant more time to socialise and learn and it also spared the 'nearby' cafes and the hotel's restaurants from a huge overload. The catering worked very well.

From the first keynote right to the last closing ceremony, the standard was very high. I enjoyed all the keynotes - they really challenged us in many different ways. Pia gave us a positive view of the role of free, open source software in making the world a better place. Dan made us think of what happens to projects when they stop, for whatever reason. Nadia made us aware of the social problems facing maintainers of FOSS - a topic close to my heart, as I see the way we use many interdependent pieces of software as in conflict with users' social expectations that we produce some kind of seamless, smooth, cohesive whole for their consumption. And Robert asked us to really question our relationship with our users and to look at the "four freedoms" in terms of how we might help everyone, even people not using FOSS. The four keynotes really linked together well - an amazing piece of good work compared to other years - and I think gave us new drive.

I never had a session where I didn't want to see something - which has not always been true for LCA - and quite often I skipped seeing something I wanted to see in order to see something even more interesting. While the miniconferences sometimes lacked the technical punch or speaker polish, they were still all good and had something interesting to learn. I liked the variety of miniconf topics as well.

Standout presentations for me were:

Last updated: | path: tech | permanent link to this entry

Tue 10th May, 2016

How to get Fedora working on a System 76 Oryx Pro

Problems: a) No sound b) Only onboard screen, does not recognise HDMI or Mini-DP Solutions: 1) Install Korora 2) Make sure you're not using an outdated kernel that doesn't have the snd-hda-intel driver available. 3) dnf install akmod-nvidia xorg-x11-drv-nvidia Extra resources:

Last updated: | path: tech | permanent link to this entry

Fri 6th Nov, 2015

Open Source Developers Conference 2015

In the last week of October I attended the Open Source Developer's Conference in lovely Hobart. It was about 90 people this year - for some reason people don't come to it if they have to travel a bit further. It's their loss - this year was excellent.

We started with Dr Maia Sauren's keynote on all the many many ways that government departments and not-for-profit organisations are working to open up our access to transparent democracy. I've never seen a talk given by going through browser tabs before but it was a good indication of just how much work is going on in this field. Then we had Ben Dechrai demonstrating how easy it is to install malware on systems running PHP, Julien Goodwin talking about the mistakes people make when securing data (like thinking NATting is the answer), and Katie McLaughlin with a good round-up of why Javascript is actually a good language (and why the "WAT" talks are amusing but irrelevant to the discussion).

Tuesday afternoon was GIS afternoon. Patrick Sunter gave a really amazing talk about urban planning, demonstrating mapping transit time across a city like Melbourne interactively - drop a pin on the map and in three seconds or so the new isocron map would be generated. This allowed them to model the effects of proposed public transport changes - like a train line along the Eastern Freeway (get this done already!) - very quickly. Then Blair Wyatt demonstrated SubPos, a system of providing location data via WiFi SSID beacons - doesn't work on Apple phones though because Apple are into control. Matthew Cengia gave a comprehensive introduction into OpenStreetMap, then afternoon tea. I skipped the lightning talks since I normally find those a bit scattered - any talk where you spend more time hassling over how much time you have remaining and whether or not your technology is working is a talk wasted in my opinion. I needed a rest, though, since I was struggling with a nose and throat infection.

Then we headed off to dinner at the Apple Shed in the picturesque Huon Valley. Local ciders, local produce, good food, good company, good conversation. All the boxes satisfyingly checked :-). I bought a bottle of the Apple Schnapps to sample later.

Wednesday morning's keynote was by Mark Elwell and showed his experience as an educator looking at Second Life and OpenSim. This was a different take on openness - demonstrating how our desire to create and share is stronger than our greed. The things that SL and OpenSim have done to lock up 'intellectual property' and monetise people's interactions have generally hindered their success, and people still put hundreds or thousands of hours into modelling things just for the satisfaction of seeing it in a virtual world. It was a good reflection on one of the many reasons we create free open source software.

Casey West, Thor's younger brother, gave an excellent review of the 'time estimation' methods we've traditionally used in software engineering - the waterfall model, agile development, and scrum - and why they all usually end up with us lying making up how much time things take. One thing he said which struck home to me was "your company invests in you" - it was the answer to the problem of support (and security) being seen as a cost rather than a benefit. Kathy Reid gave an excellent talk about how to guide your career with some excellent speaking tips thrown in (an acknowledgement of country and assistance for hearing impaired people, amongst others). I skipped Paul Fenwick's CKAN talk as I wanted to prepare my lightning talk for later (hypocritical? Yes, I suppose so :-) ).

In the afternoon Chris Neugebauer gave a good demonstration on why HTTP/2 is going to rock, Scott Bragg talked about one of the more esoteric uses of BitCoin block chains, and Arjen Lentz showed the benefits (and absence of fail) in teaching primary school children to make their own robots (including soldering). Michael Cordover gave a highly anticipated talk on his progress trying to get the Australian Electoral Commission to reveal the source code for its "EasyCount" software that's used (amongst other things) to count Federal Senate elections. It's disappointing that the closed mindset exists so strongly in some areas of government - the reasons and the delays and the obstructions were more than just simple accident.

We then had a set of "Other Skills" lightning talks - people talking about other things they do outside of programming things. Unfortunately I can't remember many of these because I was preparing for mine, which was on constructing my electric motorbike. This was well received - quite a few people came up to me afterward to talk about motorbikes, and the practicalities of building an electric one. It's always satisfying to talk with people that don't need the basics (like "can't you put wind generators on it to generate power as you move?") explained.

The Thursday morning keynote was by Richard Tubb, talking about how we can create opportunities and use the situations we find ourselves in to open up and improve our lives, and showed some of the things achieved in the GovHack Tasmania he ran. Sven Dowideit, the author of Boot2docker, gave a good demonstration of the things you can do with containers - particularly good for build systems as they can be stripped down to avoid unexpected dependencies. Then I gave my talk on my experiences with logs and how we can improve the logs our programs generate; the feedback I got was good, but I'd like to add more examples and an actual library or two to implement the principles I talk about. Then John Dalton gave a talk about how to use ssh's tunnel flags; it was a good overview of how the various options work.

I don't remember what I was doing after lunch but I don't remember the first talk - I think I was resting again. I did see Jacinta Richardson's talk on RPerl, which is basically a library that compiles your Perl code into C++. It's useful for computationally intensive things but the author of RPerl seems to have bizarre notions of how to interact with a community - like refusing to look at Github issues and requesting they be put on his Facebook page instead. We had a couple of 'thunder' talks - the main one I can remember was Morgan's talk on her PhD on Second Life and OpenSim (her mentor was Mark Elwell), which touched on the same points of social and open interaction.

After afternoon tea we had Pia Waugh speaking via Hangout from her home in Canberra - she wasn't able to attend in person because of imminent child process creation (!). She talked about GovHack, leading some of the projects to open up government processes and her work in dealing with the closed mindset of some people in government departments. Pia is always so positive and engaged, and her energy and enthusiasm is a great inspiration to a lot of people who struggle with similar interactions with less-than-cooperative bureaucrats. Sadly though, it was another demonstration of how we really need a high speed broadband network - the video stalled occasionally and Pia's voice was garbled at some times because of bandwidth problems.

We had another set of lightning talks which I stayed around for - and good thing too, because Fraser Tweedale demonstrated an amazing new system called Deo. It's essentially "encryption keys as a network service": a client can store a key in a network server and then request it later automatically. The two situations Fraser demonstrated for this were unlocking your Apache SSL certificate when Apache starts up (using a pass phrase helper) and unlocking LUKS disk encryption automatically when a machine boots (using a helper in LUKS). Since I'd recently had a customer ask for this very thing - machines with encrypted disks for data security outside the corporate network but that boot without user intervention when in the presence of the key server - this was hugely useful. I'm watching the Deo project eagerly, and have changed my attitude to lightning talks. If only more of them could be like this!

As is common with open source events, OSDC 2015 was collecting money for charity - in this case, the Tasmanian Refugee Defence Fund. After Lev Lafayette donated $1000 to the cause, I decided to match it. The few glimpses we get into the abysmal conditions in our costly, closed offshore detention camps are harrowing - yet we don't see (many) people in them saying "you know, take me back to Syria, I'll take my chances there". We're only hurting the poorest of the poor and the most desperate of the desperate, and only because of the xenophobia created by the Coalition and the conservative media. We're damaging people for life, and burdening our own society in coping with the problems we've created. In my opinion we're going to find out in the upcoming decades just how bad that problem really is. Anything we can do to alleviate it now is a good thing.

Overall, OSDC 2015 was a great learning experience. The "hallway track" was just as beneficial as the talks, the food was good, the venue was good, and I was glad I came.

Last updated: | path: tech | permanent link to this entry

Wed 9th Jul, 2014

New web server, same old content.

Over the last couple of years I've been implementing bits of my website in Django. Those initially started on my home server, and recently I moved them to a test domain on the new server my web host provided. Then they advised me that their old hardware was failing and they'd really like to move my domain off it onto the old one.

So I took backups, and copied files, and wrote new code, and converted old Django 1.2 code which worked in Django 1.4 up to the new standards of Django 1.6. Much of the site has been 404'ing for the last couple of days as I fix problems here and there. It's still work in progress, especially fixing the issues with URL compatibility - trying to make sure URLs that worked in the old site, in one Perl-based CGI system, work in the new site implemented in Django with a changed database structure.

Still, so far so good. My thanks once again to Daniel and Neill at Ace Hosting for their help and support.

Last updated: | path: tech / web | permanent link to this entry

Fri 11th Apr, 2014

Sitting at the feet of the Miller

Today I woke nearly an hour earlier than I'm used to, and got on a plane at a barely dignified hour, to travel for over three hours to visit a good friend of mine, Peter Miller, in Gosford.

Peter may be known to my readers, so I won't be otiose in describing him as a programmer with great experience who's worked in the Open Source community for decades. For the last couple of years he's been battling Leukaemia, a fight which has taken its toll - not only on him physically and on his work but also on his coding output. It's a telling point for all good coders to consider that he wrote tests on his good days - so that when he was feeling barely up to it but still wanted to do some coding he could write something that could be verified as correct.

I arrived while he was getting a blood transfusion at a local hospital, and we had spent a pleasurable hour talking about good coding practices, why people don't care about how things work any more, how fascinating things that work are (ever seen inside a triple lay-shaft synchronous mesh gearbox?), how to deal with frustration and bad times, how inventions often build on one another and analogies to the open source movement, and many other topics. Once done, we went back to his place where I cooked him some toasted sandwiches and we talked about fiction, the elements of a good mystery, what we do to plan for the future, how to fix the health care system (even though it's nowhere near as broken as, say, the USA), dealing with road accidents and fear, why you can never have too much bacon, what makes a good Linux Conference, and many other things.

Finally, we got around to talking about code. I wanted to ask him about a project I've talked about before - a new library for working with files that allows the application to insert, overwrite, and delete any amount of data anywhere in the file without having to read the entire file into memory, massage it, and write it back out again. Happily for me this turned out to be something that Peter had also given thought to, apropos of talking with Andrew Cowie about text editors (which was one of my many applications for such a system). He'd also independently worked out that such a system would also allow a fairly neat and comprehensive undo and versioning system, which was something I thought would be possible - although we differed on the implementation details, I felt like I was on the right track.

We discussed how such a system would minimise on-disk reads and writes, how it could offer transparent, randomly seekable, per-block compression, how to recover from partial file corruption, and what kind of API it should offer. Then Peter's son arrived and we talked a bit about his recently completed psychology degree, why psychologists are treated the same way that scientists and programmers are at parties (i.e. like a form of social death), and how useful it is to consider human beings as individual when trying to help them. Then it was time for my train back to Sydney and on to Canberra and home.

Computing is famous, or denigrated, as an industry full of introverts, who would rather hack on code than interact with humans. Yet many of us are extroverts who don't really enjoy this mould we are forced into. We want to talk with other people - especially about code! For an extrovert like myself, having a chance to spend time with someone knowledgeable, funny, human, and sympathetic is to see sun again after long days of rain. I'm fired up to continue work on something that I thought was only an idle, personal fantasy unwanted by others.

I can only hope it means as much to Peter as it does to me.

Last updated: | path: tech | permanent link to this entry

Wed 15th Jan, 2014

Ignorable compression

On the way home from LCA, and on a whim, in Perth I started adding support for LZO compression to Cfile.

This turned out to have unexpected complications: while liblzo supports the wide variety of compression methods all grouped together as "LZO", it does not actually created '.lzo' files. This is because '.lzo' files also have a special header, added checksums, and file contents lists a bit like a tar file. All of this is added within the 'lzop' program - there is no external library for reading or writing lzo files in the same way that zlib handles gz files.

Now, I see three options here:

Yeah, I'm going for option one there.

LZO is a special case: it does a reasonable job of compression - not quite as much as standard gzip - but its memory requirements for compression can be miniscule and its decompression speed is very fast. It might work well for compression inside the file system, and is commonly used in consoles and embedded computers when reading compressed data. But for most common situations, even on mobile phones, I imagine gzip is still reasonably quick and produces smaller compressed output.

Now to put all the LZO work in a separate git branch and leave it as a warning to others.

Last updated: | path: tech / c | permanent link to this entry

Thu 31st Oct, 2013

Converting cordless drill batteries

We have an old and faithful Ryobi 12V cordless drill which is still going strong. Unfortunately, the two batteries it came with have been basically killed over time by the fairly basic charger it comes with. I bought a new battery some time ago at Battery World, but they now don't stock them and they cost $70 or so anyway. And even with a small box from Jaycar connected to the charger to make sure it doesn't cook the battery too much, I still don't want to buy another Nickel Metal Hydride battery when all the modern drills are using Lithium Ion batteries.

Well, as luck would have it I recently bought several LiIon batteries at a good price, and thought I might as well have the working drill with a nice, working battery pack too. And I'd bought a nice Lithium Ion battery balancer/charger, so I can make sure the battery lasts a lot longer than the old one. So I made the new battery fit in the old pack:

First, I opened up the battery pack by undoing the screws in the base of the pack:

There were ten cells inside - NiMH and NiCd are 1.2V per cell, so that makes 12V. The pack contacts were attached to the top cell, which was sitting on its own plinth above the others. The cells were all connected by spot-welded tabs. I really don't care about the cells so I cut the tabs, but I kept the pack contacts as undamaged as possible. The white wires connect to a small temperature sensor, which is presumably used by the battery charger to work out when the battery is charged; the drill doesn't have a central contact there. You could remove it, since we're not going to use it, but there's no need to.

The new battery is going to sit 'forward' out of the case, I cut a hole for my replacement battery by marking the outline of the new pack against the side of the old case. I then used a small fretsaw to cut out the sides of the square, cutting through one of the old screw channels in the process.

I use "Tamiya" connectors, which are designed for relatively high DC current and provide good separation between both pins on both connectors. Jaycar sells them as 2-pin miniature Molex connectors; I support buying local. I started with the Tamiya charge cable for my battery charger and plugged the other connector shell into it. Then I could align the positive (red) and negative (black) cables and check the polarity against the charger. I then crimped and soldered the wires for the battery into the connector, so I had the battery connected to the charger. (My battery came with a Deanes connector, and the charger didn't have a Deanes connector cable, which is why I was putting a new connector on.)

Aside: if you have to change a battery's connector over, cut only one side first. Once that is safely sealed in its connector you can then do the other. Having two bare wires on a 14V 3AH battery capable of 25C (i.e. 75A) is a recipe for either welding something, killing the battery, or both. Be absolutely careful around these things - there is no off switch on them and accidents are expensive.

Then I repeated the same process for the pack contacts, starting by attaching a red wire to the positive contact, since the negative contact already had a black wire attached. The aim here is to make sure that the drill gets the right polarity from the battery, which itself has the right polarity and gender for the charger cable. I then cut two small slots in the top of the pack case to let the connector sit outside the case, with the retaining catch at the top. My first attempt put this underneath, and it was very difficult to undo the battery for recharging once it was plugged in.

The battery then plugs into the pack case, and the wires are just the right length to hold the battery in place.

Then the pack plugs into the drill as normal.

The one thing that had me worried with this conversion was the difference in voltages. Lithium ion cells can range from 3.2V to 4.2V and normally sit around 3.7V. The drill is designed for 12V; with four Lithium Ion cells in the battery, it ranges from 14.8V to 16.8V when fully charged. Would it damage the drill?

I tested it by connecting the battery to a separate set of thin wires, which I could then touch to the connector on the pack. I touched the battery to the pack, and no smoke escaped. I gingerly started the drill - it has a variable trigger for speed control - and it ran slowly with no smoke or other signs of obvious electric distress. I plugged the battery in and ran the drill - again, no problem. Finally, I put my largest bit in the drill, put a piece of hardwood in the vice, and went for it - the new battery handled it with ease. A cautious approach, perhaps, but it's always better to be safe than sorry.

So the result is that I now have a slightly ugly but much more powerful battery pack for the drill. It's also 3AH versus the 2AH of the original pack, so I get more life out of the pack. And I can swap the batteries over quite easily, and my charger can charge up to four batteries simultaneously, so I have something that will last a long time now.

I'm also writing this article for the ACT Woodcraft Guild, and I know that many of them will not want to buy a sophisticated remote control battery charger. Fortunately, there are many cheap four-cell all-in-one chargers at HobbyKing, such as their own 4S balance charger, or an iMAX 35W balance charger for under $10 that do the job well without lots of complicated options. These also run off the same 12V wall wart that runs the old pack charger.

Bringing new life to old devices is quite satisfying.

Last updated: | path: tech | permanent link to this entry

Mon 3rd Jun, 2013

New file system operations

Many many years ago I thought of the idea of having file operations that effectively allowed you to insert and delete, as well as overwrite, sections of a file. So if you needed to insert a paragraph in a document, you would simply seek to the byte in the file just before where you wanted to insert, and tell the file to insert the required number of bytes. The operating system would then be responsible for handling that, and it could then seamlessly reorganise the file to suit. Deleting a paragraph would be handled by similar means.

Now, I know this is tricky. Once you go smaller than the minimum allocation unit size, you have to do some fairly fancy handling in the file system, and that's not going to be easy unless your file system discards block allocation and goes with byte offsets. The pathological case of inserting one byte at the start of a file is almost certainly going to mean rewriting the entire file on any block-based file system. And I'm sure it offends some people, who would say that the operations we have on files at the moment are just fine and do everything one might efficiently need to do, and that this kind of chopping and changing is up to the application programmer to implement.

That, to me, has always seemed something of a cop-out. But I can see that having file operations that only work on some file systems is a limiting factor - adding specific file system support is usually done after the application works as is, rather than before. So there it sat.

Then a while ago, when I started writing this article, I found myself thinking of another set of operations that could work with the current crop of file systems. I was thinking specifically of the process that rsync has to do when it's updating a target file - it has to copy the existing file into a new, temporary file, add the bits from the source that are different, then remove the old file and substitute the new. In many cases we're simply appending new stuff to the end of the old file. It would be much quicker if rsync could simply copy the appended stuff into a new file, then tell the file system to truncate the old file at a specific byte offset (which would have to be rounded to an allocation unit size) and concatenate the two files in place.

This would be relatively easy for existing file systems to do - once the truncate is done the inodes or extents of the new file are simply copied into the table of the old file, and then the appended file is removed from the directory. It would be relatively quick. It would not take up much more space than the final file would. And there are several obvious uses - rsync, updating some types of archives - where you want to keep the existing file until you really know that it's going to be replaced.

And then I thought: what other types of operations are there that could use this kind of technique. Splitting a file into component parts? Removing a block or inserting a block - i.e. the block-wise alternative to my byte offset operations above? All those would be relatively easy - rewriting the inode or offset map isn't, as I understand it, too difficult. Even limited to operations that are easy to implement in the file system, there are considerably more operations possible than those we currently have to work with.

I have no idea how to start this. I suspect it's a kind of 'chicken and egg' problem - no-one implements new operations for file systems because there are no clients needing them, and no-one clients use these operations because the file systems don't provide them. Worse, I suspect that there are probably several systems that do weird and wonderful tricks of their own - like allocating a large chunk of file as a contiguous extent of disk and then running their own block allocator on top of it.

Yes, it's not POSIX compliant. But it could easily be a new standard - something better.

Last updated: | path: tech / ideas | permanent link to this entry

Tue 14th May, 2013

Modern kernels and uncooperative monitors

Our main TV screen is a Kogan 32" TV hooked up to a Mini-ITX machine running a MythTV frontend on Fedora 18. Due to Kogan buying the cheapest monitors, which are the ones with the worst firmware, it has several annoyingly braindead features that make it hard to use with a computer:

Now, not having an EDID used not to be a problem when X did most of the heavy work of setting up the display, because you could, at a pinch, tell it to trust you on what modes the monitor could support. With a program like cvt you could generate a modeline that you'd stick in your /etc/X11/xorg.conf and it'd output the right frequencies. This is what I had to do for Fedora 16.

The new paradigm now is that the kernel sets the monitor resolution and X is basically a client application to use it. This solves a lot of problems for most people, but unfortunately the kernel doesn't really handle the situation when the monitor doesn't actually respond with a valid EDID. More unfortunately, this actually happens in numerous situations - dodgy monitors and dodgy KVM switches being two obvious ones.

It turns out, however, that there is a workaround. You can tell the kernel that you have a (made-up) EDID block to load that it's going to pretend came from the monitor. To do this, you have to generate an EDID block - handily explained in the Kernel documentation - which requires grabbing the kernel source code and Making the files in the Documentation/EDID directory. Then put the required file, say 1920x1080.bin, in a new directory /lib/firmware/edid, and add the parameter "drm_kms_helper.edid_firmware=edid/1920x1080.bin" to your kernel boot line in GRUB, and away you go.

Well, nearly. Because the monitor literally does not respond, rather than responding with something useless, the kernel doesn't turn that display on (because, after all, not responding is also what the HDMI and DVI ports are also doing, because nothing is plugged into them). So you also have to tell the kernel that you really do have a monitor there, by also including the parameter "video=VGA-1:e" on the kernel boot line as well.

Once you've done that, you're good to go. Thank you to the people at OSADL for documenting this. Domestic harmony at PaulWay Central is now restored.

Last updated: | path: tech | permanent link to this entry

Mon 1st Apr, 2013

Preventing patent obscurity

One of the problems I see with the patent system is that patents are often written in obscure language, using unusual and non-standard jargon, so as to both apply as broadly as possible and not show up as "obvious" inventions.

So imagine I'm going to try to use a particular technology, or I'm going to patent a new invention. As part of my due diligence, I have to provide a certified document that shows what search terms I used to search for patents, and why any patents I found were inapplicable to my use. Then, when a patent troll comes along and says "you're using our patent", my defence is, "Sorry, but your patent did not appear relevant in our searches (documentation attached)."

If my searches are considered reasonable by the court, then I've proved I've done due diligence and the patent troll's patent is unreasonably hard to find. OTOH, if my searches were unreasonable I've shown that I have deliberately looked for the wrong thing in the hopes that I can get away with patent infringement, so damages would increase. If I have no filing of what searches I did, then I've walked into the field ignorant and the question then turns on whether I can be shown to have infringed the patent or whether it's not applicable, but I can be judged as not taking the patent system seriously.

The patent applicant should be the one responsible for writing the patent in the clearest, most useful language possible. If not, why not use Chinese? Arpy-Darpy? Ganster Jive? Why not make up terms: "we define a 'fnibjaw' to be a sequence of bits at least eight bits long and in multiples of eight bits"? Why not define operations in big-endian notation where the actual use is in little-endian notation, so that your constants are expressed differently and your mathematical operations look nothing like the actual ones performed but your patent is still relevant? The language of patents is already obscure enough, and even if you did want to actually use a patent it is already hard enough with some patents to translate their language into the standard terms of art. Patent trolls rely on their patents being deliberately obscure so that lawyers and judges have to interpret them, rather than technical experts.

The other thing this does is to promote actual patent searches and potential usage. If, as patent proponents say, the patent system is there to promote actual use and license of patents before a product is implemented, then they should welcome something that encourages users to search and potentially license existing patents. The current system encourages people to actively ignore the patent system, because unknowing infringement is seen as much less of an offence than knowing infringement - and therefore any evidence of actually searching the patent system is seen as proof of knowing infringement. Designing a system so that people don't use it doesn't say a lot about the system...

This could be phased in - make it apply to all new patents, and give a grace period where searches are encouraged but not required to be filed. Make it also apply so that any existing patent that is used in a patent suit can be queried by the defendent as "too obscure" or "not using the terms of art", and require the patent owner to rewrite them to the satisfaction of the court. That way a gradual clean-up of the current mess of incomprehensible patents that that have deliberately been obfuscated can occur.

If the people who say patents are a necessary and useful thing are really serious in their intent, then they should welcome any effort to make more people actually use the patent system rather than try to avoid it.

Personally I'm against patents. Every justification of patents appeals to the myth of the "home inventor", but they're clearly not the beneficiaries of the current system as is. The truth is that far from it being necessary to encourage people to invent, you can't stop people inventing! They'll do it regardless of whether they're sitting on billion-dollar ideas or just a better left-handed cheese grater. They're inventing and improving and thinking of new ideas all the time. And there are plenty of examples of patents not stopping infringement, and plenty of examples of companies with lots of money just steamrollering the "home inventor" regardless of the validity of their patents. Most of the "poster children" for the "home inventor" myth are now running patent troll companies. Nothing in the patent system is necessary for people to invent, and its actual objectives do not meet with the current reality.

I love watching companies like Microsoft and Apple get hit with patent lawsuits, especially by patent trolls, because they have to sit there with a stupid grin on their face and still admit that the system that is screwing billions of dollars in damages out of them is the one they also support because of their belief that patents actually have value.

So introducing some actual utility into the patent system should be a good thing, yeah?

Last updated: | path: tech / ideas | permanent link to this entry

Sat 16th Mar, 2013

Recording video at LCA

A couple of people have asked me about the process of recording the talks at Linux Conference Australia, and it's worth publishing something about it so more people get a better idea of what goes on.

The basic process of recording each talk involves recording a video camera, a number of microphones, the video (and possibly audio) of the speaker's laptop, and possibly other video and audio sources. For keynotes we recorded three different cameras plus the speaker's laptop video. In 2013 in the Manning Clark theatres we were able to tie into ANU's own video projection system, which mixed together the audio from the speaker's lapel microphone, the wireless microphone and the lectern microphone, and the video from the speaker's laptop and the document scanner. Llewellyn Hall provided a mixed feed of the audio in the room.

Immediately the problems are: how do you digitise all these things, how do you get them together into one recording system, and how do you produce a final recording of all of these things together? The answer to this at present is DVswitch, a program which takes one or more audio and video feeds and acts as a live mixing console. The sources can be local to the machine or available on other machines on the network, and the DVswitch program itself acts as a source that can then be saved to disk or mixed elsewhere. DVswitch also allows some effects such as picture-in-picture and fades between sources. The aim is for the room editor to start the recording before the start of the talk and cut each recording after the talk finishes so that each file ends up containing an entire talk. It's always better to record too much and cut it out later rather than stop recording just before the applause or questions. The file path gives the room and time and date of recording.

The current system then feeds these final per-room recordings into a system called Veyepar. It uses the programme of the conference to match the time, date and room of each recording with the talk being given in the room at that time. A fairly simple editing system then allows multiple people to 'mark up' the video - choosing which recorded files form part of the talk, and optionally setting the start and/or end times of each segment (so that the video starts at the speaker's introduction, not at the minute of setup beforehand).

When ready, the talk is marked for encoding in Veyepar and a script then runs the necessar programs to assemble the talk title and credits and the files that form the entire video into one single entity and produce the desired output files. These are stored on the main server and uploaded via rsync to and are then mirrored or downloaded from there. Veyepar can also email the speakers, tweet the completion of video files, and do other things to announce their existence to the world.

There are a couple of hurdles in this process. Firstly, DVswitch only deals with raw DV files recorded via Firewire. These consume about a gigabyte per hour of video, per room - the whole of LCA's raw recorded video for a week comes to about 2.2 terabytes. These are recorded to the hard drive of the master machine in each room; from there they have to be rsync'ed to the main video server before any actual mark-up and processing in Veyepar can begin. It also means that previews must be generated of each raw file before it can be watched normally in Veyepar, a further slow-down to the process of speedily delivering raw video. We tried using a file sink on the main video server that talked to the master laptop's DVswitch program and saved its recordings directly onto the disk in real time, but despite having tested this process in November 2012 and it working perfectly, during the conference it tended to produce a new file each second or three even when the master laptop was recording single, hour-long files.

Most people these days are wary of "yak shaving" - starting a series of dependent side-tasks that become increasingly irrelevant to solving the main problem. We're also wary of spending a lot of time doing something by hand that can or should be automated. In any large endeavour it is important to strike a balance between these two behaviours - one must work out when to stop work and improve the system as a whole, and when to keep using the system as is because improving it would take too long or risk breaking things irrevocably. I fear in running the AV system at LCA I have tended toward the latter too much - partly because of the desire within the team (and myself) to make sure we got video from the conference at all, and partly because I sometimes prefer a known irritation to the unknown.

The other major hurdle is that Veyepar is not inherently set up for distributed processing. In order to have a second Veyepar machine processing video, one must duplicate the entire Veyepar environment (which is written in Django) and point both at the same database on the main server. Due to a variety of complications, this was not possible without stopping Veyepar and possibly having to rebuild its database from scratch, and I and the team lacked the experience with Veyepar to know how to easily set it up in this configuration. I didn't want to start to set up Veyepar on other machines and finding myself shaving a yak and looking for a piece of glass to mount a piece of 1000-grit wet and dry sandpaper on to sharpen the razor correctly.

Instead, I wrote a separate system that produced batch files in a 'todo' directory. A script running on each 'slave' encoding machine periodically checked this directory for new scripts; when it found one it would move it to a 'wip' directory, run it, and move it and its dependent file into a 'done' directory when finished. If the processes in the script failed it would be moved into a 'failed' directory and could be resumed manually without having to be regenerated. A separate script (already supplied in Veyepar and modified by me) periodically checked Veyepar for talks that were set to "encode", wrote their encode script and set them to "review". Thus, as each talk was marked up and saved as ready to encode, it would automatically be fed into the pipeline. If a slave saw multiple scripts it would try to execute them all, but would check that each script file existed before trying to execute it in case another encoding machine had got to it first.

That system took me about a week of gradual improvements to refine. It also took me giving a talk at the CLUG programming SIG on parallelising work (and the tricks thereof) to realise that instead of each machine trying to allocate work to itself in parallel, it was much more efficient to make each slave script do one thing at a time and then run multiple slave scripts on each encoder to get more parallel processing, thus avoiding the explicit communication of a single work queue per machine. It relies on NFS correctly handling the timing of a file move so that one slave script cannot execute the script another has already moved into work in progress, but that at this granularity of work is a very small time of overlap.

I admit that, really, I was unprepared for just how much could go wrong with the gear during the conference. I had actually prepared; I had used the same system to record a number of CLUG talks in months leading up to the conference; I'd used the system by myself at home; I'd set it up with others in the team and tested it out for a weekend; I've used similar recording equipment for many years. What I wasn't prepared for was that things that I'd previously tested and had found to work perfectly would break in unexpected ways:

The other main problem that galls me is that there are inconsistencies in the recordings that I could have fixed if I'd been aware of them at the time. Some rooms are very loud, others quite soft. Some rooms cut the recording at the start of the applause, so I had to join the next segment of recording on and cut it early to include the applause that the speaker deserved. There were a few recordings that we missed entirely for reasons I don't know. I was busy trying to sort out all the problems with the main server and I was immensely proud of and thankful for the team of Matt Franklin, Tomas Miljenovic, Leon Wright, Euan De Koch, Luke John and Jason Nicholls who got there early, left late, worked tirelessly, and leapt - literally - up to fix a problem when it was reported. Even with a time machine some of those problems would never be fixed - I consider it both rude and amateur to interrupt a speaker to tell them that we them to start again due to some glitch in the recording process.

But the main lesson to me is that you can only practice setting it up, using it, packing it up and trying again with something different in order to find out all the problems and know how to avoid them. The 2014 team were there in the AV room and they'll know all of what we faced, but they may still find their own unique problems that arise as a result of their location and technology.

There's a lot of interest and effort being put in to improve what we have. Tim Ansell has started producing gstswitch, a Gstreamer-based program similar to DVswitch which can cope with modern, high-definition, compressed media. There's a lot of interest in the LCA 2014 team and in other people to produce a better video system that is better suited to distributed processing, distributed storage and cloud computing. I'm hoping to be involved in this process but my time is already split between many different priorities and I don't have the raw knowledge of the technologies to be able to easily lead or contribute greatly such a process. All I can do is to contribute my knowledge of how this particular LCA worked, and what I would improve.

Last updated: | path: tech / lca | permanent link to this entry

Tue 5th Mar, 2013

Code on the beach!

In 2011 I ran an event called CodeCave, which saw nine intrepid coders and three intrepid family go to Yarrangobilly Caves to spend a cool, wet winter weekend coding, eating, exploring in caves, coding, playing Werewolf, taking photos, coding, swimming (!), talking, flying planes and helicopters, and coding. Being an extrovert, I love those opportunities to see friends doing cool things with code, and my impression is everyone enjoyed the weekend.

I had a hiatus in 2012 for various reasons, but this year I've decided to run another similar event. But, as lovely as Yarrangobilly is and as comfortable as the Caves House was to stay in, it's a fair old five hour drive for people in Sydney, and even Canberrans have to spend the best part of two hours driving to get there. And Peter Miller, who runs the fabulous CodeCon (on which CodeCave was styled) every year, is going to be a lot better off near his health care and preferred hospital. Where to have such an event, then?

One idea that I'd toyed with was the Pittwater YHA: close to Sydney (where many of the attendees of CodeCave and CodeCon come from), still within a reasonable driving distance from Canberra (from where much of the remainder of the attendees hail), and close to Peter's base in Gosford. But there's no road up to it, you literally have to catch the ferry and walk 15 minutes to get there - while this suits the internet-free aesthetic of previous events, for Peter it's probably less practical. I discussed it on Google+ a couple of weeks ago without a firm, obvious answer (Peter is, obviously, reserving his say until he knows what his health will be like, which will probably be somewhere about two to three weeks out I imagine :-) ).

And then Tridge calls me up and says "as it happens, my family has a house up on the Pittwater". To me it sounds brilliant - a house all to ourselves, with several bedrooms, a good kitchen, and best of all on the roads and transport side of the bay; close to local shops, close to public transport, and still within a reasonable drive via ambulance to Gosford Hospital (or, who knows, a helicopter). Tridge was enthusiastic, I was overjoyed, and after a week or so to reify some of my calendar that far out, I picked from Friday 26th July to Sunday 28th July 2013.

So it's now called CodeBeach 2013, and it also has a snazzy Google Form to take bookings on. Please drop me an email if you've got any questions. We'd love to have you there!

Last updated: | path: tech | permanent link to this entry

Mon 13th Aug, 2012

The Library That Should Be

In my current job, I have to look at PHP. Often, I have to run command-line programs written in PHP. All of these programs have a typically PHP approach to command line processing - in other words, it's often a hack, it's done without any great consistency, and you have to do a lot of the hard work yourself. There are at least three command-line processing libraries in PHP, but I longed for Perl's wonderful Getopt::Long module because it improved on them in several important ways:

The main thing we want to eliminate by using modules is 'boilerplate', and the current offerings for command-line processing in PHP still require lots of extra code to process their results. So, because the current offerings were insufficient, I decided to write my own. The result is:


Along the way I added a couple of things. For a start, Console_GetoptLong recognises --option=value arguments, as well as -ovalue where 'o' is a single letter option and doesn't already match a synonym. It also allows combining single-letter options, like tar -tvfz instead of tar -t -v -f -z (and you've specified that it should do that - this is off by default). It gives you several ways of handling something starting with a dash that isn't a defined synonym - warn, die, ignore, or add it to the unprocessed arguments list.

One recent feature which hopefully will also reduce the amount of boilerplate code is what I call 'ordered unflagged' options. These are parameters that aren't signified by an option but by their position in the argument list. We use commands like this every day - mv and cp are examples. By specifying that '_1' is a synonym for an option, Console_GetoptLong will automatically pick the first remaining argument off the processed list and, if that parameter isn't already set, it will make that first argument the value of that parameter. So you can have a command that takes both '-i input_file' and 'input_file' style arguments, in the one parameter definition.

Another way of hopefully reducing the amount of boilerplate is that it can automatically generate your help listing for you. The details are superfluous to this post, but the other convenience here is that your help text and your synonyms for the parameter are all kept in one place, which makes sure that if you add a new option it's fairly obvious how to add help text to it.

As always, I welcome any feedback on this. Patches are even better, of course, but suggestions, bug reports, or critiques are also gladly accepted.

Last updated: | path: tech | permanent link to this entry

Sat 23rd Jun, 2012

Forgotten projects

MythTV has recently updated to version 0.25. That has meant a small but important change to the parameters necessary for updating guide data. Chris Yeoh was ahead of the game and, knowing I used it, sent me a patch for the tv_grab_oztivo script. He noted that he'd tried to get it from the last known good source, and it wasn't answering.

Well, it sort of is. The normal URL doesn't work but Google reveals Interestingly, its version number is still at the recognised place - 1.36 - but all other parts of the site seem to be having problems with its database. And since it hasn't been updated since this time in 2010, I think there's a good possibility it may remain unchanged from now on.

A number of years ago I offered to host the script on my home Subversion repository, but got no response. So I've blown the dust off, updated it, added Chris's patch, and it's now up to date at Please feel free to check that out and send me patches if there are other improvements to make to it.

Last updated: | path: tech | permanent link to this entry

Mon 16th Apr, 2012

Swapping Shackles

Charles Stross talks here about why book publishers are afraid of Amazon and that the publishers have given Amazon control over them by insisting on DRM. The problem I see with this analysis is that, actually, the publishers have another option: publish their own 'free' app that can read their own DRM. Cut Amazon out of the equation by selling direct to the readers. There may be contractual reasons why the Big Six can't set up a web store to compete directly with Amazon, but I'm sure that's a matter that their lawyers could sort out. There might be a possible legal reason - I don't study this field and Charlie does, so he might correct me there, but I don't see anything in his comments on it and a few people suggest it.

The cited reason that the Big Six don't sell their own books directly seems to be that they just haven't set up their websites. Bad news for Amazon: that's easy with the budgets the big publishers have - Baen already do sell their own ebooks, for example (without DRM, too). More bad news for Amazon: generating more sales by referrals (the "other readers also bought" stuff) isn't a matter of customers or catalogue, it's just a matter of data. Start selling books and you've got that kind of referral. Each publisher has reams of back catalogue begging to be digitised and sold. They've got the catalogue, they've got the direct access to the readers, they've got the money to set up the web sites, and they've now got the motivation to avoid Amazon and sell direct to the reader. That to me spells disaster for Amazon.

But it also means disaster for us. Because you're going to have multiple different publisher's proprietary e-book reader - the only one they'll bless with their own DRM. Each one will have its own little annoyances, peccadilloes and bugs. Some won't let you search. Some won't let you bookmark. Some will make navigation difficult. Some won't remember where you were up to in one book if you open up another. Others might lock up your reader, have back doors into your system, use ugly fonts, be slow, have no 'night' mode, or might invasively scan your device for other free books and move them into their own locked-down storage. And you won't be able to change, because none of your books will work in any other reader than the publisher's own. After all, why would they give another app writer access to their DRM if it means the reader might then go to a different publisher and buy books elsewhere?

We already have this situation. I have to use the Angus & Robertson reader (created by Kobo) for reading some of my eBooks. It doesn't allow me to bookmark places in the text, its library view has one mode (and it's icons, not titles), I can't search for text, and its page view is per chapter (e.g. '24 of 229') not through the entire book. In those ways and more it's inferior to the free FBReader that I read the rest of my books in - mostly from Project Gutenberg - but I have no choice; the only way to get the books from the store is through the app. These are books I paid money for and I'm restricted by what the software company that works for the publishing broker contracted by the retailer wants to implement. This is not a good thing.

What can we, the general public, do about this? Nothing, basically. Write to your government and they'll nod politely, file your name in the "wants to hear more about the arts" mailing list, and not be able to do a thing. Write to a publisher and they'll nod vacantly, file your name in the wastepaper bin, and get back to thinking how they can make more profit. Write to your favourite author and they'll nod politely, wring their hands, say something about how it's out of their control what their editor's manager's manager's manager decides, and be unable to do anything about it. Everyone else is out of the picture.

Occasionally someone suggests that Authors could just deal directly with the readers directly. At this point, everyone else sneers - even fanfic writers look down on self-publishers. And, sadly, they're right - because (as Charlie points out) we do actually need editors, copy-readers and proofers to turn the mass of words an author emits into a really compelling story. (I personally can't imagine Charlie writing bad prose or forgetting a character's name, but I can imagine an editor saying "hey, if you replaced that minor character with this other less minor character in this reference, it'd make the story more interesting", and it's these things that are what we often really enjoy about a story.) I've written fiction, and I've had what I thought was elegantly clear writing shown to be the confusing mess of conflicting ideas and rubbish imagery that it was. Editors are needed in this equation, and by extension publishers, imprints, marketers, cover designers, etc.

Likewise, instead of running your own site, why not get a couple of authors together and share the costs of running a site? Then you get something like Smashwords or any of the other indie book publishers - and then you get common design standards, the requirement to not have a conflicting title with another book on the same site, etc. So either way you're going to end up with publishers. And small publishers tend to get bought up by larger publishers, and so forth; capitalism tends to produce this kind of structure to organisations.

So as far as I can see, it's going to get worse, and then it's going to get even worse than that. I don't think Amazon will win - if nothing else, because they're already looking suspiciously like a monopolist to the US Government (it's just that the publishers and Apple were stupid enough to look like they were being greedier than Amazon). But either way, the people that will control your reading experience have no interest in sharing with anyone else, no interest in giving you free access to the book you've paid to read (and no reason if they can give you a license, call it a book, charge what a book costs, and then screw you later on), and everyone else has no control over what they're going to do with an ebook in the future. If the publisher wants to revoke it, rewrite it, charge you again for it, stop you re-reading it, disallow you reading previous pages, only read it in the publisher's colours of lime green on pink, or whatever, we have absolutely no way of stopping this. The vast majority of people are already happy to shackle themselves to Amazon, to lock themselves into Apple, and tell themselves they're doing just fine.

Sorry to be cynical about this, but I think this is going to be one of those situations where the disruptive technologies just come too little and too late. Even J. L. Rowling putting her books online DRM-free isn't going to change things - most of the commentators I've read just point to this and say "oh well, the rest of us aren't that powerful, we'll just have to co-operate with (Amazon|the publisher I'm already dealing with)". Even the ray of hope that Cory Doctorow offers with his piece on Digital Lysenkoism - that the Humble E-Book Bundle has authors wanting to get their publishers off DRM because there's a new smash-hit to be had with the Humble Bundle phenomenon - is a drop of nectar in the ocean of tears; no publisher's really going to care about the Humble Bundle success if it means facing down the bogey-man of unfettered public copying of ebooks that they themselves have been telling everyone for the last twenty years.

So publishers are definitely worrying about Amazon's monopsony. But the idea that that will cause them to give up DRM is wishful thinking. They've got too much commitment to preventing people copying their books, they don't have to give up DRM in order to cut Amazon out of the deal, and if DRM then locks readers into a reliance on the publishers it's a three-way win for them. And a total lose for us, but then capitalism has never been about giving the customer what they want.

Last updated: | path: tech | permanent link to this entry

Wed 1st Feb, 2012

Going from zero

A friend of mine and I were discussing cars the other day. He said that he thought the invention of the electric motor was a curse on cars, because it meant you wouldn't have a gearbox to control which gear you were in. A suitable electric motor has enough power to drive the car from zero to a comfortable top speed (110km/hr) at a reasonable acceleration using a fixed gear ratio - the car stays in (in this case third) gear and you drive it around like that. He maintained, however, that you needed to know which gear you were in, and to change gears, because otherwise you could find yourself using a gear that you hadn't chosen.

I argued that, in fact, having to select a gear meant that drivers both new and experienced would occasionally miss a gear change and put the gearbox into neutral by mistake, causing grinding of gears and possible crashes as the car was now out of control. He claimed to have heard of a clever device that would sit over your gearbox and tell you when you weren't in gear, but you couldn't use the car like that all the time because it made the car too slow. So you tested the car with this gearbox-watcher, then once you knew that the car itself wouldn't normally miss a gear you just had to blame the driver if the car blew up, crashed, or had other problems. But he was absolutely consistent in attitude towards electric motors: you lost any chance to find out that you weren't in the right gear, and therefore the whole invention could be written off as basically misguided.

Now, clever readers will have worked out that at this point my conversation was not real, and was in fact by way of an analogy (from the strain on the examples, for one). The friend was real - Rusty Russell - but instead of electric motors we were discussing the Go programming language and instead of gearboxes we were discussing the state of variables.

In Go, all variables are defined as containing zero unless initialised otherwise. In C, a variable can be declared but undefined - the language standard AFAIK does not specify the state of a variable that is declared but not initialised. From the C perspective, there are several reasons you might not want to automatically pre-initialise a variable when you define it - it's about to be set from some other structure, for example - and pre-initialising it is a waste of time. And being able to detect when a variable has been used without knowing what its stage is - using valgrind, for example - means you can detect subtle programming errors that can have hard-to-find consequences when the variable's meaning or initialisation is changed later on. If you can't know whether the programmer is using zero because that's what they really wanted or because it just happened to be the default and they didn't think about it, then how do you know which usage is correct?

From the Go perspective, in my opinion, these arguments are a kludgy way of seeing a bug as a feature. Optimising compilers can easily detect when a variable will be set twice without any intervening examination of state, and simply remove the first initialisation - so the 'waste of time' argument is a non-issue. Likewise, any self-respecting static analysis tool can determine if a variable is tested before it's explicitly defined, and I can think of a couple of heuristics for determining when this usage isn't intended.

And one of the most common errors in C is use of undefined variables; this happens to new and experienced programmers alike, and those subtle programming problems happen far more often in real-world code as it evolves over time - it is still rare for people to run valgrind over their code every time before they commit it to the project. It's far more useful to eliminate this entire category of bugs once and for all. As far as I can see, you lose nothing and you gain a lot more security.

To me, the arguments against a default value are a kind of lesser Stockholm Syndrome. C programmers learn from long experience to do things the 'right way', including making sure you initialise your variables explicitly before you use them, because of all the bugs - from brutally obvious to deviously subtle - that are caused by doing things in any other way. Tools like valgrind work around indirectly fixing this problem after the fact. People even come to love them - like the people who love being deafened by the sound of growling, blaring petrol engines and associate the feeling of power with that cacophany. They mock those new silent electric motors because they don't have the same warts and the same pain-inducing behaviour as the old petrol engine.

I'm sure C has many good things to recommend it. But I don't think lack of default initialisation is one.

Last updated: | path: tech | permanent link to this entry

Critical Thinking

In the inevitable rant-fest that followed the LWN story on the proposal to have /lib and /bin point to /usr/lib and /usr/bin respectively (short story), I observe with wry amusement the vocal people who say "Look at PulseAudio - it's awful, I have to fight against all the time, that's why we shouldn't do this". The strange, sad thing about these people is that they happily ignore all those people (like me) for whom PulseAudio just works. There's some little concieted part of their brain that says "I must be the only person that's right and everyone else has got it wrong." It's childish, really.

And in my experience, those people often make unrealistic demands on new software, or misuse it - consciously or unconsciously, and with or without learning about it. These people are semi-consciously determined to prove that the new thing is wrong, and everything they do then becomes in some way critical of it. Any success is overlooked as "because I knew what to do", every failure is pounced on as proof that "the thing doesn't work". I've seen this with new hardware, new software, new cars, new clothes, new houses, accommodation, etc. You can see it in the fact that there's almost no correlation between people who complain about wind generator noise and the actual noise levels measured at their property. Human beings all have a natural inclination to believe that they are right and everything else is wrong, and some of us fight past that to be rational and fair.

This is why I didn't get Rusty's post on the topic. It's either completely and brilliantly ironic, or (frankly) misguided. His good reasons are all factual; his 'bad' reasons are all ad-hominem attacks on a person. I'd understand if it was e.g. Microsoft he was criticising - e.g. "I don't trust Microsoft submitting a driver to the kernel; OT1H it's OK code, OTOH it's Microsoft and I don't trust their motives" - because Microsoft has proven so often that their larger motives are anti-competition even if their individual engineers and programmers mean well. But dmesg, PulseAudio, and systemd have all been (IMO) well thought out solutions to clearly defined problems. systemd, for example, succeeds because it uses methods that are simple, already in use and solve the problem naturally. PulseAudio does not pretend to solve the same problems as JACK. I agree that Lennart can be irritating some times, but I read an article once by someone clever that pointed out that you don't have to like the person in order to use their code...

Last updated: | path: tech | permanent link to this entry

Mon 19th Dec, 2011

PHP Getopt::Long

In my current work I have to occasionally work with PHP code. I don't really like PHP, for a variety of otiose reasons. But one of the things that surprised me was that it didn't have an equivalent to Perl's 'Getopt::Long' module. There are a couple of other modules that are in PHP's PEAR package repository which attempt to handle more than PHP's built-in getopt function, but all of these lack a couple of fundamental features:

  1. I want to be able to pass a single description - e.g. 'verbose|v' - and have the function recognise both as synonyms for the same setting.
  2. I want to be able to pass a variable reference and have that updated directly if the associated command line parameter is supplied.
  3. I want to have it remove all the processed arguments off the command line so that all that is left is the array of things that weren't parameters or their arguments.
  4. I want a single, single call, rather than calling object methods for each separate parameter.
(To be clear: some of the PEAR modules provide some of these. But all of them lack goal 2, most lack goal 3, and while are able to achieve goal 1 it's only by lots of extra code or option specification.)

So I wrote one.

The result is available from my nascent PHP Subversion library at:

It's released under version 3 of the GPL. It also comes with a simple test framework (written, naturally, in a clearly superior language: Perl).

This is still a work in progress, and there are a number of features I want to add to it - chief amongst them packaging it for use in PEAR. I'm not a PHP hacker, and it still astonishes me that PHP programmers have been content to use the mish-mash of different half-concocted options for command line processing when something clearly better exists - and that many of the PHP programs I have to work with don't use any of those but write their own minimal, failure-prone and ugly command line processing from scratch.

I'd love to hear from people with patches, suggestions or comments. If you want write access to the repository, let me know as well.

Last updated: | path: tech | permanent link to this entry

Tue 29th Nov, 2011

LED strip lighting for the deck

Yesterday we got the electrician to install a switch in the dining room which would turn on a power point in the ceiling. This powers a 100W 12V DC power supply which in turn powers twenty metres of LED strip lighting. The parts list:

The results are pretty much what I'd hoped for:

Trying to get a photo that shows what the eye sees of the strip when lit is hard - the camera just thinks it's way too bright. This is the closest I could get with our camera:

With the eye you can see the individual LEDs and they're bright but not so bright as to be difficult to look at. So the strip doesn't make the deck feel too bright or oversaturated. The light is warm without being monochrome or too intense. And the fact that it's a strip means that you don't get shadows or bits of the deck that are dark - the whole deck feels quite evenly lit, even at the corners.

100 watts feels like a lot, but in comparison to even one 18W fluorescent globe per space between beams (12) it is still much more efficient on power. That arrangement of fluorescent bulbs would also mean shadows, single point sources, and having to put an extra beam in the middle of the deck. And let's not even consider spot lights. No, this is a really good layout.

Last updated: | path: tech | permanent link to this entry

Thu 17th Nov, 2011

Adding Cans to C

I've been meaning to copy some of my personal C libraries to CCAN, Rusty Russell's C Code Archive. It's not yet quite as comprehensive as he would like, I suspect, but it's certainly a good project. And I think, bold as this may be, that I have something to offer it even if I'm not a full-time C programmer.

The thing that's scared me off is the whole "meeting someone else's standards" thing. So after Rusty's talk at OSDC this year, and finding out that 'ccanlint' can prompt you with what you need to do to make a good package, I decided to give it a go. And after I started having a few minor problems understanding exactly what I needed to do to get it working, I decided to write it down here for other people.

  1. Check out the Git repository:
    git clone git:// ; cd ccan
  2. Make everything:
  3. Make the thing that isn't included in everything that you really need:
    make tools/ccanlint/ccanlint
  4. Create a directory for my work and put the already written stuff in there:
    mkdir ccan/cfile; cd ccan/cfile; cp ~/coding/pwlib-c/cfile/cfile* .
  5. Save yourself some typing: export CCT=../../tools; export CCL=$CCT/ccanlint/ccanlint
  6. Generate a standard configuration file for the project:
    $CCT/configurator/configurator > config.h
  7. Check what you've got so far:
    ccanlint will ask you if you want to create a _info file. This is the file which declares the 'metadata' about a ccan module. It's a C program. Let it generate it.
  8. Check what you've got so far:
    Keep repeating this step, fixing what ccanlint tells you is wrong, until you get a module in some form.
  9. Submit the module for inclusion in CCAN:
    haven't done this yet.

Last updated: | path: tech / c | permanent link to this entry

Sat 29th Oct, 2011

Who are you trusting

The "Secure Boot" proposal from Microsoft - to turn on digital signatures on new UEFI-enabled motherboards so that only signed operating systems get booted, allowing them to get motherboard manufacturers to lock Linux out under the guise of "preventing malware" - is worrying enough as it is. Several large Linux companies - Canonical and Red Hat amongst them - have already been working on white papers, and an expert in the field has proposed IMO a better solution to the problem. But really, if you think about it, Microsoft should be working to prevent the whole thing working at all.

Why? Very simple. Just think of the number of state-level attacks on software and Internet infrastructure in recent years. "Hackers" getting fraudulent SSL certificates issued for * and other sites. People requesting Mozilla remove CNNIC from the certificate authority list because of the Chinese government similar faking of SSL certificates. Malware created by the German government for spying on people. British companies selling malware to the Egyptian government. The list goes on.

One can easily imagine any government in the world telling motherboard manufacturers that they need to install the government's own public keys in order to import motherboards into the country. It's obvious in the case of countries like Iran, Syria, and Jordan, and it's no stretch to imagine the US, Australian or any other 'Western' government doing it under the guise of "protecting our citizens". After all, we do want the government to snoop on those evil child molesters, dont' we? Or at least, the people the government tells us are child molesters. Or, at least, the people who turn out to have child abuse material on their computers after the government has done their investigation. They wouldn't use those powers to spy on ordinary citizens, right? Right?

Wrong. For state-level actors, it's not about the ordinary citizens. It's about protecting the status quo. It's about protecting their access to information and protecting their powers. The idea that someone can lock government spyware out of their computer has an easy solution - make sure that the computer itself will always install the spyware. And they have the power to go to motherboard manufactuers and get these keys installed. It's a no-brainer for them, really.

I also have no doubt that secure booting to a secure operating system will do little to stop real malware. There's always flaws to be exploited in something as large and kludgy as Microsoft's software. The phenomena Microsoft is allegedly trying to protect against - rootkits that start at boot time - are a relatively small portion of the malware spectrum. And if you're going to let an unsigned binary run - the alternative being to lock all but the large players out of the Windows software market - then malware is already exploiting the user's trust in the system and their lack of knowledge about what is good software and what isn't. "Your PC is already infected" and all that; it's trojan horses all the way down.

I don't think Microsoft is going to care that state-level players can exploit the system their proposing. It's not like they don't already give the source code to the Chinese government and so forth. But I think the rest of the PC using world has a right to be very worried about a system that will tell you that it's running signed software without you being able to choose which signatories you trust. And choice is never going to be on the agenda with Microsoft.

Last updated: | path: tech | permanent link to this entry

Thu 6th Oct, 2011


This is half a bleg and half an idea.

Trying to get useful information of log files that are being continually written is kind of frustrating. The usual Linux method is to tail -f the file and then apply a bunch of grep, cut, sed or awk filters to the pipeline. This is clumsy if you don't know what you're dealing with or looking for yet, and there are a bunch of other limitations with this approach. So my idea is to create an application with these features:

I'm thinking it should be named 'wag', because it's the wag tailing the dog and because of a similarity to watch.

If such a thing even vaguely exists, please email me. Otherwise I'll have to think about learning how to write inotify-based ncurses-driven applications in my copious free time.

Last updated: | path: tech | permanent link to this entry

Wed 27th Jul, 2011

Every car should have one

What I want is a small device that has a fuel flow meter suitable for use in a car or motorbike, and a digital radio (e.g. bluetooth) to connect to it. It can run of the car's 12V power system, and simply reports the amount of fuel used every second in serial form.

Then I want a piece of software, say on my android phone, which reads information from the fuel meter and GPS coordinates. It then records how much fuel is used and where the car was at the end of that second. This can be used simply to work out how much fuel is being used, or a kilometres per litre or miles per gallon figure based on the current distance travelled. The software can then show you your average fuel consumption and km/l 'score' per trip.

But what constitutes a trip? Well, the software can work that out fairly easily - the engine is consuming fuel constantly while it's on, and people usually start it before the start of the trip and turn it off at the end of the trip. A fairly simple check of start and end points could then group your trips by their purpose - going to work, going shopping, etc - and report your average and best score for each journey of the same purpose. You could then also compare fuel efficiency when going at different times and using different connecting roads to determine, on average, which paths and times were more efficient uses of your petrol.

But journeys often start the same way - if you live in a cul-de-sac, you always drive to the end of it to get any further, for example. So looking at the paths can then break those into segments that are common, and you can be scored on your individual performance per segment. This also means that if you drop into the shops on your way to work then this counts for two or more separate segments rather than one. The algorithm could both find short segments - roads you always went along and never deviated from - and long segments that you occasionally deviated from but mostly drove in one go.

For many journeys there's more than one way to get there, and after a period of time the software can tell you which route was the most optimal and even possibly when to drive it to get the best efficiency. This would have saved a friend of mine, who had to suffer her father going many different ways between two points on a common journey in Brisbane to determine, over time and in varying traffic, what the most efficient way was. Of course, it can tell you what your best time was and that may be a different route from the most fuel-efficient path.

And then it can start to challenge you. You want to drive to work? How about doing it using less fuel than your best effort so far? It may even be able to tell you specific segments where you can improve - where your fuel efficiency varies widely, or where it is greater than your average over similar terrain. Once you get something that can actually tell you how to improve your fuel efficiency, I think that'll make a lasting difference to how much money people spend on fuel. Classic positive feedback technique.

Finally, a device which would actually offer to provably improve your fuel efficiency.

Sadly, it joins every other device out there being touted by snake oil salesman, because - like them - it doesn't exist.

Last updated: | path: tech | permanent link to this entry

Fri 27th May, 2011

LWN Professional Supporter

Today I subscribed to Linux Weekly News for another 11 months at the new Professional Supporter level. Rusty asked for a higher level of subscription to be available a while back, and Jon (with his characteristic gentle, wry humour) implemented it recently.

It took me a little time to actually raise my subscription level - I had spent a bit of money on bike parts and other stuff and, though I could still have afforded it, just didn't feel like watching all my money escape in one go. (I'm still recovering from my somewhat exuberant donation to the flood relief funding at LCA 2011). But finally the stars aligned, the checksums matched and I paid for the shiny stars on my name.

Why? For two reasons. One, as Rusty says, is that Jon and the team at LWN are doing huge, exemplary, and difficult work condensing all the news that's important in the FOSS gamut into one easy-to-read site. If I had to buy a magazine for that I'd be paying at least half that. The second reason is congruent to my decision to support webcomic artists: that I love supporting anyone who is getting to do the thing they love. I love working with computers and I'm lucky enough to have found companies that employ me for my skills. If you want to be a journalist who writes about FOSS, it's much more difficult to find a company that gives you the freedom you need to write about the things you love. Being able to support them in that is a good thing.

Plus, I can write it off as an educational expense on my tax, and I get Jon owing me a beer rather than me owing him one :-). So it's good all round.

I'm not calling it maniacal. It's a perfectly sensible judgement in my opinion. There are lots of people who read LWN who are paid well and could easily afford to support them at that level. Hearing Jon's talk about running LWN for thirteen years was an insight into the trials and obstacles confronting anyone that wants to do as LWN has done. Given that there are well-known but not particularly well-respected IT news websites out there that also send their reporters to LCA - usually, it would seem, to stir up trouble - having LWN around to provide an intelligent, reasonably even-handed report on what goes on in the FOSS community is a great, unsung boon to us all.

Jon's philosophy in setting the prices for subscriptions - and allowing mostly unrestricted access for free - has been that Linux users like things to be free. I would argue that they like their software to be both zero-cost and unencumbered, but I don't think that necessarily extends to them expecting a free ride from other people. I'm sure there are lots of people that can afford to support LWN, even in a small way, for the service it provides. It maybe not at the professional support level, but having this option gives people like myself to support it at an appropriate level for our income.

Last updated: | path: tech | permanent link to this entry

Mon 23rd May, 2011

GNOME 3 improved

As part of starting at my new place of employment, I've installed the beta of Fedora 15 (because the real thing comes out tomorrow, curse it). With it comes GNOME 3, the latest update of the GNOME window manager.

So far my experience is pretty good. Yes, it's different, but no, it's not that different that I can't learn how to use it. It's a case of not thinking "why can't I do that the old way" but "I wonder what the new way is", and for the most part it's not that painful. Of course, there were a few things that I did want to make work the same as my previous GNOME setup and the main one was focus following the mouse pointer. After a bit of research on the net, I found the necessary command and will post it here for reference:

gconftool-2 -s /apps/metacity/general/focus_mode -t string mouse
(I'll spare my readers my cunning arguments about why focus following the mouse is the obvious, natural and optimal system for interfaces with an explicit focus indicator such as a mouse pointer. Save to say, just use it.)

Another thing that's changed is that Alt-TAB now groups all windows by application - all Firefox windows are treated as one group for the purpose of tabbing around, for example. When one application has multiple windows open, a little down-arrow appears at the bottom of its icon and, by mousing over it, you can then select the sub-window you require. This, however, is inconvenient if, like me, you use the keyboard a fair bit - moving to use the mouse takes time and effort. I discovered, with a bit of experimentation, that you can use the arrow keys for this as well - press Alt-TAB and use either TAB and Shift-TAB or left and right to navigate; when an application with sub-windows is selected, use down to show a list of its sub-windows and left and right to select from there.

Maybe there are other ways of using this; that's what worked for me. But it shows that a bit of experimentation can take less time than grumbling about how everything's changed and it no longer matches what you see.

And I think it's going to be a surpreme bit of irony that there'll be all these Linux experts complaining about how GNOME has broken everything and they want their old GNOME look and feel back - the same people who keep on looking down on their friends for not wanting to move from Windows or OS X to GNOME because "it's a different look and feel". Take it on the chin, people.

Last updated: | path: tech | permanent link to this entry

Sun 8th May, 2011

A pack of improvements

I (finally) decided to make three changes to my battery pack calculator. One is to lift the maximum weight of the pack up to 500kg - this allows more realistic pack sizes for large vehicles such as utes and vans. The second is a check box to only show available cells - cells that have at least one online store selling them. This cuts out most of the unobtainable cells - cells (like Kokam) that are price-on-application are usually fairly expensive anyway and this at least gives you a ball-park figure for what the more common cells can do for you.

The final one is to allow you to sort the packs by some of the fields displayed. The critical ones are the cell type, pack weight, pack amp-hour rating, price, watt-hours stored and maximum amp delivery. After a bit of help from Khisanth in ##javascript to get the scripting working - basically, don't have a submit button named (or with an ID of) 'submit' or you override the form's natural submit() function - it works now.

I hope people find these improvements useful - I certainly am!

Last updated: | path: tech / web | permanent link to this entry

Fri 29th Apr, 2011

Edgar Wallace - The Green Rust

I've been reading ePub books on my Samsung Galaxy S almost since I got it, and it's been a Good Thing.

The screen is large enough to read text fairly quickly - a speed reader could easily scan each line in portrait mode without moving their eyes horizontally at all. Yet it's not too large to be uncomfortable to hold. The Alkido reader supplied is adequate but I prefer FBReader - with a nice serif font (although there's little evidence at all for it being more legible, it does render the italics of some of my books correctly - something I'd originally thought had been an error in the book encoding or the reader), its 'night-time' white-on-black mode, and its better organisation of my library.

Mostly the books I've picked up have been from Project Gutenberg, which my readers probably know well. As my views on 'strong' copyright are also fairly well known it would be otiose to relate them, but I see Project Gutenberg as proof that the doomsday scenarios of the 'strong' copyright lobby paint of the time after the copyright in a work expires - that either everyone will be copying it like crazy and making money out of it, or that it will mean obscurity and lack of recognition for the author - are baseless. Project Gutenberg makes these works available for free and by doing so the authors works are preserved and gain value by their availability, but without any one company profiting from the process. In fact, the 'strong' copyright arguments basically devolve to "but we won't be making any money from it", even though often they aren't anyway.

(Message to Disney: there are children growing up now who have no idea what Mickey Mouse is. Deal with it.)

Anyway, one of my finds was The Green Rust by Edgar Wallace, based on having found that two early Agatha Christie novels - The Mysterious Affair at Styles and Secret Adversary - were available, based on having read through A Study in Scarlet and looked up the subject "Detective and mystery stories". I'd link you to that search but for several badly thought-out reasons Project Gutenberg have decided that looking up a subject or author should have no well-formed, static URL.

Now, Edgar Wallace was a name I recognised from a different context. Many years ago Severed Heads released a track called "Dead Eyes Opened", originally released in 1984 but with a new dance remix in 1994 that became their only really mainstream hit. It borrows from Edgar Lustgarten's audio recording of the crime that I later found out was incongruously called The Crumbles Murders. In it, he says:

Then — I owe a debt here to Edgar Wallace, who edited the transcript of the Mahon trial — …

Yes, it's that Edgar Wallace. Not only a famous court reporter but also an author of many fiction novels, mostly detective and mystery stories, and as the part inventor of King Kong. And I have to say that his writing is quite enjoyable - not as old as Doyle's and with a touch of genre-savvy, and with a bit less reveal-everything-at-the-end compared to Christie. It's strange how these little synchronicities in life come about.

Last updated: | path: tech | permanent link to this entry

Thu 7th Apr, 2011

The short term fallacy

There are a couple of things that I'm butting my head up against these days that all seem to be aspects of the same general problem, which I mentally label the 'short term fallacy'. This fallacy generally states that there's no point planning for something to survive a long time because if there are legacy problems they can be solved simply by starting again. Examples of this are:

Every time one of these 'short term' solutions is proposed, no matter how reasonable the assumption is that "no-one could ever need to do $long_term_activity for more than $time_period", it seems to be proved wrong in the long run. Then, inevitably, there's this long, gradually worsening process of fixes, workarounds, kludges and outright loss of service. Straight out of classic game theory, the cost of each workaround is compared against the cost of redoing the whole thing and found to be less, even as the total cost of all workarounds exceeds the cost of the correct long-term solution.

Yes, these problems are hard. Yes, limits have to be set - processors will use a certain number of bits for storing a register and so forth. Yes, sometimes it's impossible to predict the things that will change in your system - where your assumptions will be invalidated. But we exist in a world that moves on, changing constantly, and we must acknowledge that there is no way that the system we start with will be the same as the system we end up using. The only thing that's worse than building in limitations is to insert them in such a way that there is no way to upgrade or cope with change. Limitations exist, but preventing change is just stupid.

And the real annoyance here is that there are plenty of examples of other, equivalent systems coping with change perfectly. LVM can move the contents of one disk to another without the user even noticing (let alone having to stop the entire system). Tridge and Rusty have demonstrated several methods of replacing an old daemon with a newer version without even dropping a single packet - even if the old program wasn't designed for it in the first place. File systems that insist that it's impossible to shrink are shown up by file systems with similar performance that, again, can do so without even blocking a single IO. You don't even have to reboot for a kernel upgrade if you're using ksplice (thanks to Russell Coker for reminding me).

It's possible to do; sometimes it's even elegant. I can accept that some things will have a tradeoff - I don't expect the performance of a file system that's being defragmented to be the same as if it was under no extra load. But simply saying "we can't shrink your filesystem" is begging the question "why not", and the answer will reveal where you limited your design. The cost, in the long run, will always be higher to support a legacy system than to future-proof yourself.

Last updated: | path: tech / ideas | permanent link to this entry

Sat 12th Mar, 2011

CodeCave 2011 Update 1

Just a quick update to my readers to say that CodeCave 2011 is definitely going ahead and will be on the 3rd to 5th of June. Cost will be about $80 per person for the accommodation. I have about three or four more places, depending on a few factors. I haven't worked out a cost or menu for the meals for the weekend, but it will probably be fairly reasonable - $40 for the weekend is the figure I'm aiming for. Please email me if you'd like to come along!

Last updated: | path: tech / ideas | permanent link to this entry

Tue 15th Feb, 2011

Codecave Init

After talking with Peter Miller at CodeCon, and other people over the last year or two, I've decided to put together a similar style of event. For a weekend, we all go off into the bush, far from the internet and other quotidian distractions, and write code, eat and drink well, and share great ideas. What's the difference? It's in a cave.

Well, not literally. The location that I'm aiming for is the Yarrangobilly Caves, an area of limestone caves and other scenic delights about seventy kilometres due south-west of Canberra, although it's about two hours by car because the direct route goes over the Brindbella mountain range. As well as the caves, there are bushwalks and (perhaps most importantly) a thermal spring pool to soak in after a hard day's slaving over the laptop. We would be staying in the Yarrangobilly Caves House, a historic homestead of the region offering bedrooms for up to sixteen people, kitchen, dining room, lounge, verandahs, and (also importantly) power.

Interstate visitors who didn't want to drive all the way could be picked up from Canberra airport or bus stations and ferried to Yarrangobilly on Friday evening, coming back to Canberra on Sunday in time for flights home or other onward travel. For those that wanted it, interstate or local, I would do catering for the whole weekend at a fixed price and with a roster for jobs. If we had the whole place booked optimally it would be about $60 per person for the weekend, the complication being that the rooms are not all single - there are some bunk beds and some doubles. They also only book by entire wings (9 or 7 people), so the fewer people the more it would cost per person, in certain ratios depending on requirements. At this stage the earliest we could get a booking for a weekend is late May or early June.

If you're interested in coming to this, please drop me an email. I really need to get firm bookings, preferably by the end of February, to have any chance of getting the accommodation booked and the pricing finalised. I wouldn't run the event if the cost was more than $100 per person for the accommodation, which means that it won't run with less than five people. There's also no way to accommodate more than sixteen people, so bookings would be limited in date order. Please also email me if you've got suggestions, because a lot of the planning is flexible at this stage.

This will also be posted to the CLUG list and the Linux Australia email list.

(P.S. Sorry if the links don't work correctly, they seem to require session tokens which stuff direct linking up.)

Last updated: | path: tech / ideas | permanent link to this entry

Thu 27th Jan, 2011

Saying sorry and moving on

Today's keynote speech at LCA is from Eric Allman, the person who wrote 'sendmail', the main mail transfer agent that moves mail on the internet. There are many things in sendmail that have caused problems in the past, and its configuration syntax is known to cause brave men to wet themselves in fear. So I wanted to see how, or if, Eric would address these things in his talk.

Not only did he do this but he acknowledged the mistakes he had made. He talked about what he would do differently. He talked about the decisions he'd made that were forced by machine limitations, complete lack of standardisation of email address formats, and various other constraints. It's easy in hindsight to criticise some of these decisions but when you're starting out on a new system the horizon is wide open and you don't realise and sometimes can't even determine the scope of the consequences of your decisions.

Interestingly he pointed out the Postel principle - "be strict in what you emit and liberal in what you receive" as perhaps one of these mistakes. In his defence he said that there were many often completely incompatible email address formats and exchange methods, and professors get incredibly' irate when they find out that their grant application wasn't received. But it allows badly-written and incompletely-compatible systems to live and thrive, and I think we've seen this with HTML and other things - by Netscape allowing badly-written HTML to be rendered vaguely correctly it allowed IE to prosper.

But the thing I really appreciate is someone who will say "yeah, in hindsight that was a bad move". We all have reasons at the time, but there are a lot of bad decisions that are perpetuated - especially by large companies - because no-one is willing to admit that they made a mistake. It takes a lot of guts to stand up in front of eight hundred people who've all at one time or another struggled with sendmail and say "Yeah, M4, I don't think that was such a great idea". And yet it means we can now say "OK, well, let's get on with it anyway" and stop trying to blame sendmail for all our email problems.

His "takeaway" ideas were also really great and I think it validates Eric's experience as a programmer and system architect. The thing I would amplify from these was documentation: if you don't have documentation of your project, you'll never get any users.

Last updated: | path: tech / lca | permanent link to this entry

Mon 24th Jan, 2011

Hardware hacking conferences

As with LCA and OSDC, I feel the time has come for a hardware hacking conference.

The Arduino miniconference at LCA has (pardon the pun) taken off in popularity since its first session last year. Several cities in Australia now have hardware hacker groups and even hacker spaces. There are several other activities such as BarCamps where everyone gets to participate in presenting information and run things. Andy Gelme informs me that there are several companies around the world that are making open hardware and firmware that are earning over a million dollars, which shows that the field has a large following.

I've talked with several hardware hackers about this and one common idea that they raise is that there would probably be a lot more hands-on tutorials. I don't know if this is true - you'd think at something like OSDC there would be heaps of tutorials and lots of code being written, but in fact I think lots of people go just to find out what other people have been doing and to learn from that, and that's common to both hardware and software interests. But I certainly think that having more concrete goals - actually producing things, whether software or hardware, is a great thing to aim for.

Personally, I think there'll have to be a hardware hacking conference before the end of the year - there's just too much interest to contain it!

Last updated: | path: tech | permanent link to this entry

LCA 2013 bid process opens - Canberra at the ready!

For the last several months, a small group of people in Canberra including myself have been preparing a bid for LCA 2013. This is not just to give us more time to make the conference the most awesome, mind-pummelling LCA you've ever been to. No - 2013 is also the centenary of the founding of Canberra as the nation's capital. It's a very significant year for us and we'd all be thrilled if we could show the attendees of LCA our great city and Canberrans the great work the FOSS community does to improve everyone's lives.

So we're really stoked that the bidding process is going to be opened early, and I think it'll lead to a really interesting competition that will result, whoever wins, in the best LCA ever!

If you're interested in being a part of the team putting this event together, email me!

Last updated: | path: tech / lca | permanent link to this entry

Fri 17th Dec, 2010

Re-encoding Bridge

In re-reading my article on encoding bridge hands I realised that I made a stupid error in the calculation of how many bits would be needed to encode a complete hand including the order of the cards in each hand. 52 bits would encode not the choice of one in fifty-two cards, but 252 possible choices! The real number of bits required is actually 6*11 (to encode cards 52 - 32 with six bits) + 5*16 + 8*4 + 4*3 + 2*2 + 1 bits, coming to a total of 195 bits. With Base-64 encoding this comes to 33 characters; with Base-85 it comes to 31. So it's not quite as ludicrous as my original calculation, but it's still nearly twice as long as the more optimal coding.

The other thought I had after writing it was just how many hands would benefit from the one-bit-per-card tail handling. This is a bit of a hard one to answer with my level of maths, but basically the first thing to consider here is that the encoding of 104 bits in base 64 does not equal exactly 18 bits, it's about 17.333333 bytes, so we need eighteen characters to encode 104 bits worth of distribution. In order to get less than 17 characters, you'd have to save ten bits - in other words, the first forty-two cards have to be dealt in such a way that the last ten cards are only dealt to two players. You might save even more, because you don't have to use any bits when three players have thirteen cards. Two bits of saving - to get to 17 characters - is probably fairly likely, but ten bits saved is starting to push it; and you can still only get down to 13 chracters anyway...

(I'm not considering unique identifiers which are simply (randomly) assigned numbers or strings given to entries in a database. What I'm wanting is for two programs to be able to decode the same distribution of cards with only the deal identifier being shared.)

Last updated: | path: tech | permanent link to this entry

Thu 16th Dec, 2010

Proprietary Madness

I have just been configuring two machines at work for a SAN upgrade. The vendor has insisted we use their drivers and stick to particular kernel versions. Several delays have been encountered in their setting things up. E-mails and phone-calls were exchanged. Meetings were held, at which they explained how the upgrade would have to progress and I expressed my disbelief. Changes were raised.

The thing that I couldn't quite get past in the meeting was that the procedure for them to move from storage on one section of the SAN to another was as follows:

  1. They create some new devices.
  2. Reboot so that their driver can find the new devices.
  3. Set up the sync process, one per device. You have to know implicitly which device you're going to be moving from and to. Enter 'yes'.
  4. Start the sync progress, one per device. Enter 'yes'.
  5. When that's finished, switch to the target device, one per device. Enter 'yes'.
  6. Commit the changes, one per device. Enter 'yes'.
  7. Complete the transaction, one per device. Enter 'yes'.
  8. They then remove the old devices, which are now helpfully called the new devices.
  9. Reboot again to make the old devices go away.
I think a WTF is called for at this point. Two reboots and a whole bunch of unscriptable manual commands to perform a change? An erudite friend of mine, wise in the ways of vendors, opined that they must do this for everything as part of some methodical procedure for making sure critical data was preserved at every stage. And I answered him with the words of the vendor, which were that Windows and Solaris can do this change automatically, and only Linux required this level of manual attention. I had questioned the process at the meeting and was told that it was all due to the drivers of the vendors of the fiber-channel cards and they would love to do it the intelligent way, really they would, but that those other vendors had made life too hard for them.

This gambit works successfully when the person the vendor is talking to is a manager or otherwise unfamiliar with the way Linux is developed. I at least pointed out that these drivers that they were complaining about were open source - they were freely available for the vendor to modify to support their esoteric requirements. At this the vendor's people shrugged and said that there was nothing they could do about this - which is true but useless. I don't know the details of how these things are constructed, so I really can't say how fiber-channel cards see their SANs or whether they can actually support hot-plug or not, so I can't really tell them source file and line to tell them otherwise. I know that SCSI wasn't designed for hot-plug, but then why is it using the SCSI bus if the devices support hot-plug? I don't know.

While I have been rebooting and waiting and rebooting on these servers for these devices to show up, I have meanwhile plugged a hard disk into my work machine, added its space to my main volume group, moved all the data off my old disk, reformatted the old disk, repartitioned it, put the /boot partition back and moved all the data back using LVM commands while the system was running. I thought I'd have to reboot at one point, but it turned out I still had /boot mounted and once I unmounted it I could rescan the partition table successfully. No reboots, no obscure proprietary binaries, no tainted kernels, no mysterious handwaves, nothing - just solid performance. And this is on commodity hardware! What is wrong with these people?

Oh, and in case you're wondering which vendor, there's a subtle clue somewhere in this document.

Last updated: | path: tech | permanent link to this entry

Tue 14th Dec, 2010

Encoding bridge

I was reading the newspaper the other day and I idly scanned over the section on analysis of bridge hands. It started by saying this was an analysis of 691 different plays on the same hand and went on to talk about a website that stored these different hands, played in tournaments, along with commentaries, scores achieved, and so forth. In a standard tournament of duplicate bridge each table's cards are shuffled once, then played in front of the player rather than mixed in the centre. At the end of the game each hand is put back into a tray and taken to the next table, and the players move around as well, so that by the end of the tournament each pair has played each hand against each other pair (for an odd number of tables and using the standard Mitchell rotation method.)

As a programmer, of course, my first reaction was "Hmmm, how would I represent a hand of bridge?" It would have to be easily identifiable - i.e. a particular deal should have a representation that can be quickly differentiated from another - and human-readable (i.e. relatively short and consisting entirely of plain characters). The first approximation was to start with the cards for North and, one by one, enumerate them. So North's hand would use 52 + 51 + 50 + ... 40 bits, then East's would use 39 ... 27 bits, and so forth. Thanks to Euler we know this would be 53*52/2 or a total of 1378 bits; encoding in base 64 gives 230 characters.

Then I realised I could do better - because this encoding also remembers order as well as position; we don't need to know which order North keeps the cards in her hand in, we only need to know what cards are in it. So instead we can imagine starting with an ordered deck - any ordering will do so long as everyone uses it. Then for each card we remember which of the four players it got dealt to. This gives two bits per card, or 13 bytes per hand - with base 64 encoding this gives 18 characters. Much easier!

(I then got side-tracked with the idea of making a small robotic dealer that would take an ordered deck and distribute it amongst the players according to the deal ID. But even smarter would be a robotic dealer that took a deck in any order and distributed it. The deck would be put in a hopper and each card in turn would be pulled across a scanner. The dealer would then work out which player that card went to from the hand ID and push that card toward that player. It could also fairly easily detect whether there were missing cards, erroneous cards like jokers, or duplicates in the process. But that's a project for another time.)

(And it would be possible to shave a couple of bits off the 52 bits - when two players have thirteen cards you need only store one bit to determine which of the two remaining player each card goes to. But only a relatively small number of hands that have been shuffled and dealt in a fair, random manner will have this happen, and it would make the code more complex and the display less regular for a minimal saving.)

Last updated: | path: tech | permanent link to this entry

Thu 9th Dec, 2010

Send in the Androids

After several months of prevarication, wishy-washyness, and preferring the known to the unknown, I have finally decided to buy an Android. (This was lengthened by Optus' local store deciding that the best thing to tell customers when you're out of stock is to not bother ringing them to ask whether they've got stock or not, and they're not going to tell you when it arrives either.) I settled on a Samsung Galaxy S because it seemed to be the best of the current crop of 'droid phones, and it was lighter and easier to use than the Nokia N8. Besides, I think Symbian is a dead end and would rather get something with a future and a bit of openness.

Overall the experience has been good. The interface is nice, the touch interface smooth, and once you get to know the basics it's fairly consistent. The hard part has been learning the new ways of doing things. For instance, you don't select a ring-tone from your own music files by going to the ring tone settings; you open up the music player, select the file, get the menu and choose 'set as' a ring tone.

The most frustrating thing was realising that my favourite ring tones - samples from The Goon Show and the BBC Radiophonic Workshop - and (more importantly, perhaps) my contacts were all in the memory of my old Nokia 5310i. I thought about the agonising process of trying to save all the numbers onto the SIM (losing lots of useful detail in the process because SIMs think address books are single-entry affairs) and then laboriously transferring the details out of my head into the phone again.

And then I remembered that the Nokia has a "Backup" feature, which will write a file out to your SD card containing the phone's memory. I'd done one of these relatively recently. It's a zip file. A quick search through the files in it showed a whole directory of vCard files. Anther directory held all my ringtones (which also included a fabulous native duck call from a friend that sounds like it's going berserk - I use it for my brother). Unzip, copy them back onto the SD card, and the phone sees them again. Even better, the phone can do a bulk import of all vCard files it finds, so a relatively short time later I had all my contacts in - and saved to Google Contacts, so I can edit them on a reasonable computer.

Hooray for open technology!

Last updated: | path: tech | permanent link to this entry

Fri 26th Nov, 2010

Electronic Differential

One of my interests, congruent to my Electric Motorbike Without A Spiffy Name project, is to build an electric go-kart. One different from the usual ones you can see on youtube doing burnouts and beating mustangs in drag races thanks to their gigantic quantities of torque. This one is to have (dah dah daaah!) two motors.

No, stay with me, it's more interesting than that.

You see, this may show my lack of go-karting purism but I hate the fixed rear axle that most standard go-karts use. This is engineering simplicity but sharp turns require a 'drifting' technique to skid the rear wheels or you lose most of your revs and with the usual tiny motors that's hard to put back. It's also completely different from driving a car, so while kids might not know any better most adults have to learn a new technique for cornering at odds with their usual experience. Cars, of course, have a mechanical differential to distribute the power between the rear wheels - the torque applied to each wheel is the same, so the wheel with less resistance goes faster.

And there's your other problem. Because as far as I can see what you actually want is the torque appled to be in inverse proportion to the ratio of the wheel speeds. In equations this looks like:

total_rpm         = left_wheel_rpm + right_wheel_rpm
left_wheel_power  = right_wheel_rpm / total_rpm
right_wheel_power = left_wheel_rpm  / total_rpm
The left wheel and right wheel power are fractions - they add up to 1; see them as percentages if you like. What this means in practice is that when going straight both wheels get the same amount of power; when cornering the inside wheel gets more power; when one wheel slips power is decreased to it until it grips again. So both wheels will maintain traction around a corner and each wheel gets the maximum amount of power it can take. You can still spin the wheels by applying so much throttle that the wheel with the most grip will still spin, but you are much more likely to have complete traction all the way around.

This isn't really new. Most cars with electric motors - and even the massive trucks that haul hundreds of tonnes of ore in mines (which are driven by multi-megawatt electric motors powered by onboard diesel generators) - get designed to provide differential torque. Once you've got that control it's a no-brainer, it's a simple matter of programming.

What I want to do is to make a simple box that takes the hall effect sensors from two brushless DC motors and works out the RPM on each motor. It then takes the throttle information and outputs two throttles, one for each motor controller, plus feed-through links to the hall effect sensors. This would make it a simple drop-in part for go-karts, cars and anything else. It would also make it completely independent of the amount of power the controllers are supplying to the motor. There's also the possibility to use different power curves, ignore wheel stall, rev limit the motors, provide interactive feedback to the driver, and more.

(I did a bit of editing on the Wikipedia page on electronic differentials because most of the writing showed the original author's lack of familiarity with the English language, and sounded a bit like advertising for a particular company who was mentioned at the bottom of the page. The irony was that their web site hasn't been working since 2007 according to, so while their system-on-chip drop-in solution might have been wonderful it also wasn't enough to keep them going. And I really hate articles that use IEEE members-only articles as references.)

Last updated: | path: tech | permanent link to this entry

Thu 18th Nov, 2010

Fractional Distillation

One of the minor amusements XKCD unleashed upon the world recently was the equation "16/64 = 1/4", derived by cancelling the sixes from the left hand side. Deriving the formula for generating other such misleading fractions is relatively banal but I did it the other day: if the fraction is to be ab/bc = a/c, then the equation 10ac + bc = 10ab + ac needs to be true. Creating a nine-by-nine matrix of a and b allows us to derive integer value for b = 9ac / (10a - c).

Thus the only fractions for which this holds are: 16/64 = 1/4, 19/95 = 1/5, 26/65 = 2/5 and the reducible fraction 49/98 = 4/8; and the uninteresting series where a = b = c beginning 11/11 = 1/1.

Last updated: | path: tech | permanent link to this entry

Tue 16th Nov, 2010

Strange Views

Birds have started sprouting from the ground. Trees hang with fishes, and mice rain from the sky. Thousand foot high potato vines stalk the landscape devouring houses. Record company executives decide to license everything as Creative Commons. Giant space cockatoos from beyond the galactic rim successfully buy all the Weetbix in Australia and use them to fabricate a crude rocket-ship. And, in other news, Paul buys a copy of Windows Seven.

No, it's really actually not that strange. Since I've started using Linux (back in 2004 - a lot of people think it must have been much earlier, strangely) I've been much more conscious of the licensing and conditions of using software. One of the thing that Microsoft implicitly relies on for its monopoly is the huge trade in illegal versions of Windows and other Microsoft software - it educates people to expect to use Microsoft, then they make their further purchasing decisions with that as an implicit default. And one of the standard benefits of using Free, Open Source Software is that you are legally licensed to copy it and share it. Once you put that as a benefit, you have to implicitly recognise that illegal copies are not right.

So, after building a new home gaming system, I decided to therefore get a new license for its Windows operating system. Too many of the games that I want to play are, unfortunately, only available on Windows; and it's an apposite time to upgrade the whole thing at once. It's also a good way for me to acknowledge the cost of my choices - I may not be Richard Stallman but I still think that if I choose to use Windows for something I should at least pay the cost. And considering I get a better operating system with more features and less lock-down for free with Linux, GNU and everything that hangs on top of it, I don't think this is suddenly going to cause me to abandon all my principles - and buy a Mac...

Last updated: | path: tech | permanent link to this entry


Up2date is the obsolete and badly-designed (IMO) install tool from the early days of Red Hat trying to dig its way out of 'RPM dependency hell'. It's badly designed because all of its sources are in one file (so new repositories can't be added via RPM file easily), it doesn't support mirrors, and it doesn't handle its metadata well. This has largely been fixed with yum, but unfortunately for me I have to manage some systems that do not have yum installed.

Recently I came across a problem where up2date would complain about not being able to download certain headers:

Fetching rpm headers...
There was an error downloading:
There was an error downloading:
There was an error downloading:
There was an error downloading:
There was an error downloading:
An error has occurred:
See /var/log/up2date for more information
When I looked at the web server it showed that this package was an old version. Yet obviously up2date thought it was still there. I tried purging the /var/spool/up2date (aside: spool? WTF?) directory but this didn't change anything. This was the only directory on my machine that had up2date files. Reasoning, therefore, that the problem was in metadata that was on the server, I switched servers (to and suddenly it worked. Once again the mirror problem comes back to bite (which would be funnier if this post was about vampires).

Last updated: | path: tech | permanent link to this entry

Sun 31st Oct, 2010

Putting speed in perspective

I just realised something: why don't we measure speed logarithmically? Instead of kilometres per hour or metres per second, why not the logarithm of metres per second? If you take the 'Factor' column in the handy Wikipedia Orders of magnitude (speed) page and take the base ten logarithm, you get this value. (I use the 'TD:' prefix here as an homage to Blake's Seven, which seems to have had the idea first.)

What's interesting here is that most of our fastest man-made things are around TD:2 - from the Bugatti Veyron to the SR-71 Blackbird. Most of our regular movement in everyday life is around TD:-1 to TD:1. Once you move up to TD:4 you're in escape velocity territory. Then it's all orbital velocity, elementary particles and light transmission up to TD:8, where you hit the speed of light (at TD:8.476820703 or thereabouts).

From there it's all conjecture. A lightyear per second, an idea I explore elsewhere, is actually around TD:15 - a colossal speed but even then painfully slow for exploring the universe. At TD:15 it still takes nearly three weeks to get to Andromeda, the nearest large galaxy. Upping the speed to TD:20 makes that journey into three minutes, but still the far reaches of the galaxy will take years or decades.

Bring it on, I say.

Last updated: | path: tech | permanent link to this entry

Thu 15th Jul, 2010

Proposals submitted...

The Linux Conference Australia call for papers is now out, and I've submitted two papers - one for a talk and one for a tutorial. Now the waiting begins...

In 2009 I got accepted to give a talk on writing good user documentation. I'd submitted several papers before then but never got accepted; the chief reason was that I had submitted papers about stuff I was interesed in but was not actually a key contributor to. LCA is crazy hard to get to speak at, but is totally worth it because they really treat speakers well. And to me it's addictive - I loved it so much in 2009 I wanted to do awesome things just to get a place in 2010.

That didn't work out for me; mainly because I'm a neophile. I tend to be interested in a whole bunch of things but only shallowly - occasionally (such as when I decided to write the doco for LMMS) I dip in but I rarely seem to be able to sustain that involvement before the next thing comes along and lures me away. But I'm more hopeful I can get a speakership at 2011 because I'm putting forward two proposals for things that I'm actually really involved in and know about.

Ah well. Now for three months of anticipation. Better keep on working on my electric motorbike then...

Last updated: | path: tech / lca | permanent link to this entry

Tue 22nd Jun, 2010

Weird boot problems fixed by mkinitrd

My brother had a weird problem where his MythTV machine stopped booting. From a screenshot of the boot (as in JPEG sent by email) it appeared that the boot process was missing some part of the LVM volume group. I got him to boot the System Rescue CD and let me remotely SSH into it, and to my surprise I found the LVM working fine.

I mounted the logical volumes, used mount's -B option to bind the /dev, /proc and /sys into the same file system, and chrooted into the directory structure. All worked fine. Suspecting the boot process had an old record of the LVM layout, I rebuild the initrd with mkinitrd. Copying it into place and rebooting brought the whole thing up again, much to his and his partners' delight.

Another successful bit of Linux troubleshooting achieved.

Last updated: | path: tech / fedora | permanent link to this entry

Wed 9th Jun, 2010


Wikis in general have revolutionised information on the internet. Not only is data and information more accessible but it can be improved as time goes on with the same amount of effort. They apply the 'many eyes' principle of fault-finding and make it easy for someone who can improve something to do so. Before wikis, web pages were arcane things guarded by religious orders of designers and programmers, charged with the sacred task of protecting these pages from just anyone editing them. Now content has been opened up to the masses.

We don't, however, have the same kind of tool for data. I'm talking both about tables of information - phone number lists, customer data, etc. - and the relationships between them. There's masses of data like this around, and most of it is in CSV files, HTML tables and occasionally in some database or other. There are interchange formats around and systems like ODBC for communicating between one database engine and another, but this still involves the database administrators to come forth from their temples and bless the queries and connections.

We need tools that can allow a web site to:

  1. Take a variety of formatted data - CSV, HTML, SQL dump, etc. and try to intelligently work out what it contains.
  2. Store the data in actual SQL tables, using data types appropriate to the task (i.e. everything does not get stored in VARCHAR(255) fields).
  3. Allow the user can specify constraints on the data, such as integer fields, phone number formats, etc.
  4. Allow the user to specify relationships between the data, assisted by the site where possible.
  5. Generate views of the data including joins, grouping and arithmetic expressions.
  6. Detect simple errors and inconsistencies if possible.
  7. Allow access to this data from other web sites in a structured format.
Microsoft Access is a good example of some of this - in one of my former jobs I saw many people use it to store and present a lot of useful data. But often they were really using it as a glorified form of spreadsheet - storing multiple addresses as separate columns rather than in a separate one-to-many table because they didn't know any better. Often these applications could have used a link to common facilities - I think I saw three separate tables of our branch locations, all inaccurate or lacking to some degree. Often these were critical applications to the business being maintained by one person and provided in a haphazard, unreliable fashion. Access's approach to multi-user work was to prevent it completely. While Access put the power of a database in people's hands, it did so badly and with little thought to being scalable (at least when I was using it in 1995-2000).

And while I'm finding Django to be a great framework to work with, I still seem to end up doing all the work of manually importing CSV files, KML lists, HTML tables and former SQL databases. This should be a simple process of no more than half a dozen steps. Every table in Wikipedia should be a data reference that can be sorted, keyed against, and used in someone else's pages. Google has put a lot of effort into understanding everything from movie times to stock prices, but for the rest of us it's a matter of asking a programmer. There's all sorts of interesting data mash-ups going on but they still seem to require APIs, server software and lots of code.

It's hard to stand on the shoulders if giants if you can't climb up...

The title of this page comes from the Wagiman language from the Northern Territory. It roughly means fast-find - given that I know of the language only what the dictionary gave me I don't know if I've formed the words correctly. But it's that key property that I think makes this such a compelling idea. The ability to throw a CSV table into a website and it become searchable, sortable and accessible to others instantly is compelling. While I think hand-crafted data relationships will always be faster or more accurate than automatic imports, the latter is still better than locking the data away for want of a system to access it.

Last updated: | path: tech | permanent link to this entry

Tue 23rd Mar, 2010

When awesome is no longer enough

Kate and I went to collect the mail on Sunday from our post office box, and discovered I had a parcel waiting to be collected. "What is it," she asked. I replied that I honestly didn't know. Either it was a T-shirt and book from Penny Arcade, or it was two books from Weregeek, and since both of them were coming from north america it was hard to say which had won the race. Besides, I love getting parcels (who doesn't?) and the delicious suspense was increased by finding out on Monday that the parcel was too large to fit in my small satchel and I would have to collect it on Tuesday.

It was from Penny Arcade, and contained their book The Splendid Magic of Penny Arcade and the Automata T-shirt. The latter was awesome enough, but from my brief foray into the former I have to say that it's everything I had hoped for and more. I'm only really a casual gamer so a fair number of the in-jokes about the industry go over my head, but even without that fine detail the humour still bites in. And it's not all gamer jokes, some bits (Cardboard Tube Samurai, for example) are just awesome, some bits are thought-provoking, and some I just love for how they hit home and make me say "Yeah, I can relate to that."

And I think Penny Arcade shows that two ordinary gamers - two ordinary people - can make something fantastic that goes from strength to strength and keeps coming up with new ways to be awesome. You don't need million dollar executives and lots of marketing. You need good things that reach out to people. Jerry and Mike love what they do, and they do what they love, and in the process they give millions of dollars of toys and money to charity, they run massive events like PAX that are insanely popular, they make a webcomic that is smart, funny, crude, bizarre, beautiful and even touching, and they prove that gamers can also be successfull without sacrificing their origins or being chewed up and spat out by the industry. That's an awesome example to show to people.

This is why I love reading the background and history and detail behind Penny Arcade. I love seeing Gerry's comments on cartoons, I love seeing the toys and stuff that people have made for them, or the cartoons that others have drawn with Tycho and Gabe. I love it for the same reason I love watching the episodes of Penny Arcade TV - because I learn more about them as people by listening to them talk about how they make up a comic, and this gives their output more depth to me. To see a guy at PAX take the microphone and thank the guys for keeping him cheered up during his tour of duty in Iraq, and watching Gerry go and give him a hug, says much about how much those guys really care than all of the millions of dollars that Bill Gates donates to his own charity each year.

My charity philosophy this year is to support webcomic artists. I read in the order of sixteen webcomics - some daily, some every couple of days and some intermittent. (The great thing about being in the east coast of Australia is that 4PM is midnight in the USA, more or less, and that's when new comics are traditionally dropped into their waiting servers). I don't pay for reading them normally, and this year I have decided that I will. I've paid $25 to Cheyenne Wright, the Hugo-winning colourist for Girl Genius when he had an accident. I've now bought stuff from Weregeek and Penny Arcade. When the mood takes me I will buy more stuff from the web comics I like. Because this basically goes directly to them, modulo some postage and handling - there's no publisher or media outlet standing in the doorway taking my cash and saying "we'll 'pay' the guys, yeah, sure".

It's not tax-deductible. I don't get a picture of the child in Uganda that I saved from starvation. But someone out there gets to do what they love - write webcomics - and entertain me in the process, and overall I think that's a win for both of us. And I get some nice books and neat T-shirts too, which I can use much more readily than a photo or a ribbon.

Last updated: | path: tech | permanent link to this entry

Tue 9th Feb, 2010

File system sequences

I recently had the occasion to create a new filesystem on a partition:

mkfs -T largefile4 /dev/sdc1

This creates copies of the superblock on a bunch of sectors across the disk, which can be used for recovering the superblock of the disk should something tragic happen to the main one (such as overwriting the first megabyte of a disk by accident). A useful tip here is that one can do the same command with the '-n' option to see what sectors it would write the superblock to, without actually reformatting the partition, in order to then provide a copy of a superblock to fsck:

mkfs -n -T largefile4 /dev/sdc1

In my case, these copies were written to these offsets:

Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
	102400000, 214990848

What determines these magic numbers? Well, you can see from 163840 and 819200 that they're multiples of 32768. If we work out the multiples of the beginning offset for each offset, we get:

98304 = 3 * 32768
163840 = 5 * 32768
229376 = 7 * 32768
294912 = 9 * 32768
819200 = 25 * 32768
884736 = 27 * 32768
1605632 = 49 * 32768
2654208 = 81 * 32768
4096000 = 125 * 32768
7962624 = 243 * 32768
11239424 = 343 * 32768
20480000 = 625 * 32768
23887872 = 729 * 32768
71663616 = 2187 * 32768
78675968 = 2401 * 32768
102400000 = 3125 * 32768
214990848 = 6561 * 32768

Hmm. 3, 5, 7, eh? Then 9, which is 3 squared; then 25, which is 5 squared. Interesting. The 27 throws us for a second before we realise that that's 3 cubed, and it comes between 5 squared and 7 squared. And, sure enough, there's 81 (3^4) and 125 (5^3) ... it seems to be the sequence of successive square, cubes, etc. of 3, 5 and 7. It's a sequence of successive powers.

Why? Well, the whole object here is to make sure that a copy of the superblock survives if some tragedy happens to the disk. There are two broad kinds of disaster scenario here - destroying a contiguous block of disk, and destroying multiples of a specific sector offset across the disk (e.g. 0, 10, 20, 30, 40...). For the first, we can see the successive powers method quickly generates fairly large numbers without leaving any obvious large gaps - the ratio of number N and number N+1 never goes higher than 3. For the second situation, you can fairly quickly see from number factor theory that multiples of N will increasingly rarely intersect with the successive powers series, and only when N is (a multiple of) 105 will it intersect all three sequences.

It's perhaps arguable here that drive technology has made some of this irrelevant - ATA block replacement changes the mapping between logical and physical block numbers - and in fact the types of disaster scenarios this scheme of superblock copies addresses aren't really reflected in real-world usage. For example, if you're striping blocks across two disks then all your superblock copies are going to start on one disk (even if they then get striped across the second disk) because the successive power series always generates odd numbers. But as a way of avoiding some of the more obvious failure modes, it makes a lot of sense.

Another little bit of trivia explained.

Last updated: | path: tech / ideas | permanent link to this entry

Mon 8th Feb, 2010

Energised communities

Last week I went along to a group at once new and very familiar. They all were passionately keen about a new technology, and yet they'd all had to explain the benefits over and over again to disbelievers. Most of them were working on their own projects but came together as a larger community. While they all knew it was the inevitable way of the future, powerful commercial interests were working against them and governments and the general public seemed indifferent to their cause.

This was, of course, electric vehicle hobbyists.

For my part, I'm keen on constructing an electric motorbike. I'm also interested in adding open source components and microprocessor controllers to various parts of the project, partly to keep the cost down (some of the proprietary parts are really expensive) and partly for the fun of tinkering.

There were three main topics of discussion during the night:

Firstly, there's a lot of interest in the local group in starting a EV racing standard and, within one to two years, getting actual races happening. Initial ideas revolved around a standard car chassis that is fully CAMS approved (which is necessary for official racing), but then someone mentioned go-karts as a lower-cost entry level category which also got a lot of nods. There's already moves in this direction (CAMS has had an Alternative Energy division since August 2008) but getting the community groups - schools, Scouts, youth groups, etc - involved is a great idea.

Secondly, the group is trying to collect information about building EVs into an online resorce. I put in my oar and proposed using a wiki (which they sort of have already) and keeping it public (opposing the person who said it could be monetised in the future), both of which met with general agreement. The current process they're using is for one person to be a 'subject matter expert' that collates all the ideas from the group into an article, and that then gets put on the Wiki and people can edit it from there. This combines the best of both practices of document writing, and I think it's an excellent way to go.

Thirdly, there was a lot of interest in the hardware hacking theme that is all the rage at the moment. Everything from makerbots and repraps to arduinos and programmable fridges was met with interest and requests for more detail. I'm trying to find their email list to make a general announcement and I'm hoping that I'll get a few people coming along to the next CLUG meeting. There's a number of projects out there, from David Rowe's work on controllers to the Tumanako project that are applicable to EVs. I really need to point the Canberra EV group in the direction of the Electric Saker sports car - a New Zealand project!

My main quest for this month is to make the plans for my new electric motorbike, and to understand what a battery management system does and find one that doesn't suck.

Last updated: | path: tech | permanent link to this entry

Sun 24th Jan, 2010

The device will submit!

I arrived a bit early for the Southern Plumbers miniconference at LCA 2011 and ended up watching people trying to work out why the projection system wasn't working - staring at various devices, switching things off and on, sulking, calling for other knowledgeable advisors, opening cupboards, etc. It was rather like that scene in The Diamond Age where Doctor X is trying to get his nanotech working.

And I realised that, with virtually any other conference, if the projection system had stopped working there it would have been "Sorry, everyone, we can't get the projection system working, we're going to have to move". But here at LCA you have so many knowledgeable, analytical, people - people for whom a piece of technology working is almost a personal affront - the problem won't resist for long. The problem will submit.

That's what makes it such a fun conference.

(We do need to realise, overall, that sometimes we need to take a step back and ask whether this is worth solving, or whether even we should solve this problem at all. I put it that if some engineers were told to open the gates of hell and let the unholy minions out upon the earth, they would try to work out how to do it rather than ask whether it was a good idea. But generally I think fixing things so they work - and knowing enough to fix things - is better than relying on someone else to do it.)

Last updated: | path: tech / lca | permanent link to this entry

Wed 2nd Dec, 2009

Power seller

I had known for some months that my laptop battery was gradually waning in power. When, at OSDC, the battery light started flashing three short orange and one long green, I didn't need to Google it (though I did, of course) to find out that this was the laptop's electronics saying "you really need to think about changing that battery soon".

The problem was, however, that in my previous examination of the situation I discovered that most of the people selling laptop batteries under website names that look like they should be Australian are, in fact, pretty much one or two companies in Hong Kong operating a plethora of different web fronts. Using 'whois' on their domain names mostly tells you the truth, as these people don't bother to get an address in Australia to use as their administrative contact. Do not be fooled by use of '.au' in their domain names or Australian flag icons appearing in their banners.

These are made more dodgy still by then having spammed the Canberra Linux Users Group list shortly after I posted to it asking "where do I buy laptop batteries at a decent price from an Australian company?". The behaviour makes me think of the stereotypical chinese market hawkers, yelling at you to try their products, very cheap, for you special price, broken English permeating the whole transaction with the feeling that somewhere, somehow, you're going to be ripped off. I resolved, after OSDC, to find an Australian company that would sell me a battery for my still perfectly serviceable Dell Inspiron 6400.

Lo and behold I found one - Laptop Plus, which advertises itself as "Proudly Australian Owned & Operated". I also found a few eBay sellers advertising batteries, but since eBay cares as much about verifying the location of their sellers as they do about checking whether the postage is reasonable, I decided against shopping there. There are a reasonable number of sellers of Inspiron laptop batteries claiming to be in Australia, some even selling the 7200mAH batteries. But I decided to go with Laptop Plus, even though they were more expensive.

My decision was rewarded with prompt service, prompt answering of my questions, and speedy delivery. The battery came in a nice foam-padded box and checks out by the laptop hardware. It just worked, and I'm very happy with their service. I can only hope they will be around in another three years so I can buy another battery from them.

Last updated: | path: tech | permanent link to this entry

Fri 27th Nov, 2009

The new age of programming

I gave a lightning talk at OSDC this year and thought I'd write my thoughts up into my blog. It was the confluence of a number of ideas, technologies and thoughts gradually merging, and I think it's going to be an increasingly important issue in the future.

Most laptops now have at least two cores in them. It's hard to get a desktop machine without at least two. The same chips for ordinary x86-architecture machines will soon have six, eight and twelve cores. The Niagara architecture has at least this many and quite possibly more. The Cell architeture allows for up to sixty-four cores on-chip, with a different architecture and instruction set between the FPE and SPE cores. The TileGX architecture includes one variant with a hundred 64-bit cores, connected to three internal DDR-3 memory interfaces and four internal 10-gigabit ethernet interfaces.

The future, it can therefore be said, is in parallel processing. No matter what new technologies are introduced to decrease the size of the smallest on-die feature, it's now easier to include more cores than it is to make the old one faster. Furthermore, other parts of our computers are now hefting considerable computation power of their own - graphics cards, network cards, PhysX engines, video encoder cards and other peripherals are building in processors of no mean power themselves.

To harness these requires a shift in the way we program. The people who have grown up with programming in the last thirty years have, by and large, been working on small, single-processor systems. The languages we've used have been designed to work on these architectures - parallel processing is either supported using third-party libraries or just plain impossible in the language. There have been parallel and concurrent programming languages, but for the most part they haven't had anywhere near the popularity of languages like Basic, C, Pascal, Perl, Python, Java, and so forth.

So my point is that we all need to change our way of thinking and programming. We need to learn to program in small units that can be pipelined, streamed, scattered and distributed as necessary. We need larger toolkits that implement the various semantics of distributed operation in the best way, so that we don't have people reinventing thread processing badly all the time. We need to make languages, toolkits, and operating systems that can easily share processing power across multiple processors, distributed across cores, chips, and computers. We need to help eachother understand how things interact better, rather than controlling your own little environment and trying to optimise that in isolation.

I think it's going to be great.

Last updated: | path: tech / ideas | permanent link to this entry

Mon 2nd Nov, 2009

The journey is the destination

I attended CodeCon 2009 this year, along with two friends from Canberra. This is an event where you go camping in a nice out-of-the-way location with no internet connection, take along your laptop, and hack away on code. There's lots of talk, lots of coding, enough seeing and walking and doing to keep the various personalities interested, and lots of sharing of ideas and thoughts. Peter Miller organises it, including hiring a generator and bringing along a bunch of tarps, poles, cables, and other stuff to make it all work - he does a splendid job and gives a lot of his time to making sure it all runs smoothly.

My feelings, coming back from the event, are overwhelmingly positive. This is the sort of affirming event that LCA is to me - talking to people who share the same jokes and ideas and worries, being able to help as well as ask for it, and realising that there are people of all ages who enjoy both geeking out and camping out. It's not for everyone - you have to be prepared to bring everything you'll need, cook your own meals, set up your tent and not have running water or a light at the reach of your hands. But obviously some people do enjoy it, and that's just fine by us.

Highlights for me were those belly laughs from brilliantly timed witticisms by other people; seeing a Lyrebird about 20 metres away (thanks, Kate, for lending me your small binoculars); getting a whole bunch of coding done; those quiet times discussing how life works; and how, sometimes, you just have to be a bit patient to wait for the annoyances to move off.

This isn't really a hour-by-hour account, as I'm not sure that kind of write-up would do it justice. But it was really great, and if you're at all interested in camping and hacking on code then it's well worth making the effort to go to. You don't really need access to the internet to get things done!

(I should investigate the possibility of running something similar at the Yarrangobilly Caves - thermal springs ahoy!)

Last updated: | path: tech | permanent link to this entry

Tue 6th Oct, 2009

The wonders of modern technology

On Sunday I had some friends around to play computer games. Actually, due to one of those typical glitches in communication which happen when trying to arrange things a fortnight in advance with people on a Sunday night, only one friend turned up. The game we were most familiar with was StarCraft, which I still think has the best idea for getting people interested in playing - a 'spawn' version that allows you to run up to eight people on one registered copy of the program. Considering the problems I'm having trying to convince these friends to spend even $10 (for Supreme Commander through Impulse), a spawn version of some of these games makes perfect sense to get people interested without them having to shell out up front.

Of course, we first had to go through that dance of getting the networking set up and the machines talking to eachother. My machine would appear on his display when I first set up the game but would immediately disappear. Wireshark from another computer on the same switch showed traffic from both, but short of being on a dumb hub (and who has them these days) I couldn't tell where the problem was. Probably a firewall problem somewhere. Rather than spend a lot more time faffing around with networking settings in Windows, something I'm not entirely familiar with these days, I went with plan B.

Plan B worked perfectly, first time. Instantly we could see eachother, and our games went perfectly smoothly with no lag or hitches. What was this wonderous technology?

Serial cable.

By some miracle both computers had nine pin RS-232 serial ports; by another miracle I had a null modem cable with nine (and twenty-five) pin connectors. I deduced that it was a null modem cable because it had two female plugs. StarCraft did the rest. Hours of enjoyment.

The next day I found how to get the two machines talking to eachother - more precisely, how to convince the Windows Firewall that StarCraft was one of those programs it could deliver outside packets to. So next time we won't have to get the serial cable out. But I'm pretty happy that the option was there...

Last updated: | path: tech | permanent link to this entry

Tue 4th Aug, 2009

Understanding the chinese room

The Chinese Room argument against strong AI has always bothered me. It's taken me a while to realise what I dislike about the argument and to put it into words, though. For those of you who haven't read up on this, it's worth perusing the article above and others elsewhere to familiarise yourself with it, as there's a great deal of subtlety in Searle's arguing position.

Firstly, he's established that the computer program as is comfortably passes the Turing Test, so we know it's at least an artifical intelligence by that standard. Then he posits that he can perform the same program by following the same instructions (thus still passing the Turing Test), even though he himself "doesn't understand a word of Chinese". Then he proposes that he can memorise that set of instructions to pass the Turing Test in Chinese in his head, and still doesn't understand Chinese. If he can do that while not understanding Chinese, then the machine passing the Turing Test doesn't "understand" Chinese either.

So. Firstly, let's skip over the obvious problem: that the human trying to perform the computer program will do it millions of times slower. This speed is fairly important to the Turing Test, as we're judging the computer based on its ability to interact with us in real time - overly fast or slow responses can be used to identify the computer. A human that's learnt all the instructions by rote and follows them as a computer would still, I'd argue, be identifiably slow. We're assuming here that the person doesn't understand Chinese, so they have to follow the instructions rather than respond for themselves.

And let's skip over the big problem of what you can talk about in a Turing Test. Any system that can pass that has to be able to carry on a dialogue with quite a bit of stored state, has to be able to answer fairly esoteric questions about their history or their current state that a human has and a computer doesn't (e.g. what did you eat last, what sex are you, etc). I'm skipping that question because it's an even call as to whether this is in or out for current Turing Test practice: if an AI was programmed with an invented personality it might be able to pass this in ways a pure 'artificial intelligence' would not. It's a problem for the Chinese Room, because that too has to hold a detailed state in memory and have a life outside the questioning, and the example Searle gives is of a person simply answering questions and not actually carrying on some external 'life'. ("Can I tell you a secret later?" is the kind of thing that a human will remember to ask about later but the Chinese Room doesn't say anything about).

It's easy to criticise the Chinese Room at this point as being fairly stupid. You're not talking to the person inside the room, you're talking to a person inside the simulation. And the person executing all those instructions, even if they're in a high-level language, would have to be superhumanly ... something in order to merely execute those instructions rather than try to understand them. It's like expecting a person to take the numbers from one to a million in random order and sort them via bubble sort in their head, whilst forbidding them from just saying "one, two, three..." because they can see what the sequence is going to be.

To me the first flaw in Searle's argument is that his person in the room could somehow execute all those instructions without ever trying to understand what they mean. If nothing else, trying to learn Chinese is going to make the person's job considerably easier - she can skip the whole process of decoding meaning and go straight to the 'interact with the meaning' rules. Any attempt by Searle to interfere here and say that, no, you're not allowed to do that really has interfered with any attempt to disprove that the person doesn't understand Chinese - if he makes her too simple to even understand a language, then how does she read the books; if he makes her incapable of learning then how did she learn to do this process in the first place, etc. So the basis on which Searle's judgement that the AI doesn't really "understand" because the person in the room doesn't "understand" is based on the sophistry that you can have such a person in the first place.

But, more than this, the fundamental problem I have is that any process of trying to take statements / questions in a language and give responses to them in the same (or any other) language is bound to deal with the actual meaning and intelligence in the original question or statement. It's fairly counterintuitive to make an AI capable of interacting in a meaningful way in Chinese without understanding what makes a noun and a verb, understanding its rules of tense and plurality, or understanding its rules of grammar and structure and formality. If Searle would have us assume that we've somehow managed to create an AI that can pass the Turing Test without the programmers building these understandings of the actual meaning behind the symbols into the program, then I think he's constructed somewhat of an artificial (if you'll forgive the pun) situation.

To try and put this in context, imagine the instructions for the person in the room have been written in English (rather than in Python, for example). The obvious way to write this Chinese Room program, therefore, is by having big Chinese-English and English-Chinese dictionaries and a book of rules by which the person pretends that there's another person (the AI) answering the questions based on the English meaning of the words. I argue here that any attempt to obfuscate the process and remove the use of the dictionaries is not only basically impossible but would stop the Chinese Room being able to pass the Turing Test. It's impossible to remove the dictionaries because you're going to need some kind of mapping between each Chinese symbol and the English word that the instructions deal with, if for no other reason that Chinese has plenty of homographs - symbols which have two different meanings depending on context or inflection - and you need a dictionary to distinguish between them. No matter how you try to disguise that verb as something else, you'll need to put it in context so that the person can answer questions about it, which is therefore to make it meaningful.

So once you have a person capable of learning a language, in a room where symbols are given meaning in that language, you have a person that understands (at some level) the meaning of the symbols, and therefore understands Chinese.

Even if you introduce the Python at this point, you've only added an extra level of indirection to the equation. A person reading a piece of Python code will eventually learn what the variables mean no matter how obscurely the code is written - if we're already positing a person capable of executing an entire program literally then they are already better than the best maintenance programmer. If you take away this ability to understand what the variables mean, then you also (in my view) take away the ability for the person to learn how to interpret that program in the first place.

Searle's argument, therefore, is based on two fallacies. Firstly, that it's possible to have a human that can successfully execute a computer program without trying to learn the process. Secondly, that the program will not at some point deal with the meaning of the Chinese in a way that a person would make sense of. So on both counts Searle's "Chinese Room" is no argument against a machine intelligence "understanding" in the same way we understand things.

What really irritates me about Searle's argument here - and it does not change anything in my disproof above - is that it's such an arrogant position. "Only a real *human* mind can understand Chinese, because all those computer thingies are really just playing around with symbols! I'm so clever that I can never possibly learn Chinese - oh, wait, what was that?" He's already talking about an entity that can pass the Turing Test - and the first thing I would argue about that test is that people look for understanding in their interlocutors - and then says that "understanding" isn't there because it's an impelementation detail? Give me a break!

And then it all comes down to what "understand" means, and any time you get into semiotics it means that you've already lost.

Last updated: | path: tech / ideas | permanent link to this entry

Thu 2nd Jul, 2009

Look Mum, no bugs!

I recently encountered a bug in RhythmBox where, if you rename a directory, it thinks that all the files in the old directory have disappeared and there's a whole bunch of new files. You lose all the metadata - and for me that was hours of ratings as I worked my way through my time-shiftings of the chillout stream of Digitally Imported. Worse, if RhythmBox was running during the rename, when you try to play one of those files that has 'gone missing' it will just say "output error"; when you restart it because (naturally) you think it's borked its codecs or something, it then removes all those previous entries (giving you no chance to fix the problem if you'd just renamed the directory in error).

I decided to try to be good, so I found the GNOME bugzilla and tried to search for "directory", or "rhythmbox", or anything. Every time it would spend a lot of time waiting and then just finish with a blank page. Deciding that their Bugzilla was hosed, I went and got a Launchpad account and logged it there. Then, in a fit of "but I might have just got something wrong", I went back to the Bugzilla and tried to drill down instead of typing in a keyword.

Lo and behold, when I looked for bugs relating to "Rhythmbox", it turned up in the search bar as product:rhythmbox. Sure enough, if I typed in product:rhythmbox summary:directory then it came up with bugs that mentioned 'directory' in their summary line. If you don't get one of those keywords right, it just returns the blank screen as a mute way of saying "I don't know how to deal with your search terms".

So it would seem that the GNOME bugzilla has hit that classic problem: developer blindness. The developers all know how to use it, and therefore they don't believe anyone could possibly use it any differently. This extends to asserting that anyone using it wrong is "obviously" not worth listening to, and therefore the blank page serves as a neat way of excluding anyone who doesn't know the 'right' way to log a bug. And then they wonder why they get called iconoclastic, exclusive and annoying...

Sadly, the fix is easy. If you can't find any search terms you recognise, at least warn the user. Better still, assume that all terms that aren't tagged appropriately search the summary line. But maybe they're all waiting for a patch or something...

Last updated: | path: tech / web | permanent link to this entry

Mon 29th Jun, 2009

SELinux for SLUGs

Last Friday I gave two talks at the Sydney Linux Users Group, at the new Google offices. It was a pretty full-on day, as I'll explain in another post, and I was keen to get to the meeting pretty quickly. Fortunately the light rail in Sydney is pretty good, and finding ones way from the Star City stop to the offices was pretty easy. I happened to meet two people who were also going to the meeting, a lady escorting her young nephew (if I recall correctly) - she came and asked me if I knew where the linux group meeting was. I talked with them for a bit on the tram, but I'm sorry to them if I was a little distracted - my thoughts were on getting to the meeting, getting set up correctly and giving the talk.

We arrived in the twilight zone between the day, when the lifts allow you to get to any floor without a pass, and the night, when the SLUG Google employees were shuttling people up to the fifth floor. So we climbed the ten flights of stairs - I was in the need of a bit of a stretch. I then picked up my name badge - they were using Anyvite, so they could print out named labels easily for those that had bothered to RSVP on the site so beforehand. I had a brief bit of hesitation when my laptop shut down because it thought it was out of power, a curious interaction between the failing battery and Fedora 11, but all came good. Then it was time to work out how to get connected to the projector.

This was the source of two startling discoveries. Firstly, Fedora 11's screen detection now works pretty much seamlessly - if you plug in a new screen and click the 'Detect Monitors' button, it just finds the new output on the VGA port and sets it up appropriately. Secondly, Open Office 3.0 has a 'presenter' mode that can take advantage of two screens and display your 'now and next' screen on your laptop screen while the projector just displays the current slide in all its streamlined beauty. This was one of those "Wow, It Just Works™" moments where you see how fast the pace of Linux development really is - I was all ready with arcane xrandr voodoo but this just worked perfectly.

Sadly, due to slight cabling problems my laptop was sitting on a server cabinet six meters away, but when I muttered to the nearest person that what I needed right now was a wireless presenter device, the same guy just pulled one from his bag and handed it to me. Whoever you are, you really made my day - thanks! Still, I would be deprived of the handy 'now and next' view and would occasionally have to look over my shoulder to make sure I was talking about the right thing. I'd practiced both talks beforehand, so I was able to move on fairly smoothly. If you're going to do presentations, you have to do this - reading off your slides or looking at the screen to see where you are is really embarrassing.

The two talks went well, though I didn't receive anywhere near the amount of heckling that the CLUG people gave me when I gave the same talks. The questions asked were generally quite insightful, and I had to think hard about my answers. I remembered to restate the question for the microphone, and got to give two T-shirts to people who asked good questions. So overall I was pretty pleased about how it went.

I was talking with Andrew Cowie after the talk, and he gave me some very useful advice for approaching talks in the future. After you've done your initial bit of research working out who you're talking to and what level your should pitch your talk at, you really just have to go for it. I'd been worried that it might be too technical for some and not technical for others - and it was, of course; the point is that that's not really my problem. There always will be that spectrum of knowledge in the people attending a talk at a volunteer organisation, and it's not the presenter's problem to try and cater for everyone. You simply have to do the best you can and reach the most people you can, and not worry about whether you've got everyone interested.

After the talk I got to spend a bit of time with Andrew talking about trades and professions, what makes good meetings and presentations, and many other things that are now lost in the blur that that Friday became. He's an excellent speaker and, like me, wants to see people doing the right thing - being moral and ethical in all their dealings. I also have a small envy of his globetrotting ways, and admire his ability to write Java as fast as think about it in Eclipse, so it was good to get a chance to talk to him for an extended time rather than the usual 'nod in the corridor' meetings we've had in the past.

Overall, a good night. I've put both the SELinux for Beginners and SELinux for Sysadmins talks up on SlideShare for people to read.

Last updated: | path: tech | permanent link to this entry

Fri 5th Jun, 2009

What we owe Microsoft

Strangely, over the last month or two I've had a couple of people pose the idea to me that the computer industry should be thankful to Microsoft for producing Windows. One person stated that they keep us computer support people in a job; the other said that Microsoft's development of Windows was such an outstanding achievement that we should allow them to dictate how we use our computers and how other software companies interface with Windows. I've tried to debate rationally about these issues, given that us people who use Free Open Source Software see them as akin to chocolate on a fishing hook with a shotgun aimed at it - they're bait, and when you take it you're going to end up hurt, but all the same... it's chocolate...

Let's start by saying that these arguments, to me, make no sense. Microsoft is a convicted criminal, an abusive monopoly that has lied, cheated, bankrupted, threatened, bullied and undermined its way to the top by killing off competition wherever it could. It's done all this not because they want to improve things for the users - although they've said this, that's why they're liars - but simply to maximise profits. It's obvious from everything they do that they see themselves as the 400-kilo alpha male silverback gorilla in the software industry, and that they should be able to do whatever they like with no justification. You can and they do, of course, attach justification to everything but it's merely the covering on a deeply abusive relationship. Saying that we owe them anything is like saying people deserve to be raped - it's literally unthinkable to me.

On the other hand, the people that have espoused this "thank you Microsoft" point of view have valid points, so it always seems like it's worth trying to examine them rationally. For example: yes, Microsoft is not the only company to do naughty things to competitors and even the alleged friend of us FOSS zealots - Google - is at base a company for making money. Yes, we all hold the dream of having an invention that changes the world and being recognised for it. Yes, standards are a good thing and having a unified desktop has helped developers create software in a way that having many competing operating systems would make difficult. Yes, Ford doesn't 'need' to consult its competitors or manufacturers of after-market accessories for their products if they want to change some detail of how their car is designed.

There are two problems with all of these things. Firstly, they're superficial - the comparison breaks down if you follow the analogy through. Ford doesn't need to consult its competitors explicitly because it already does implicitly - Ford knows that it has to offer competitive features or be left behind. Just because Google puts prices on ads and puts them in our faces and does deals with companies for where their listing is going to sit in various searches doesn't justify Microsoft's behaviour. Ideas are not monopolies and should not imply a monopoly on their execution.

But, more fundamentally, the problems with all of these things is that they miss a fundamental point that Free Open Source Software people realised early on: collaboration works much much better than competition. We live in a community of people; we live in societies with shared goals and ideals, ethics and past-times. Many things in life are not zero-sum games, and to portray everything as a win-or-lose, black-or-white scenario is not just incorrect, sometimes it's actually a form of cheating.

This point was driven home to me this afternoon when, just after having come out of an hour-and-a-half debate with one such Microsoft apologist, I read this exchange and it made sense. You simply cannot compare software to the real world, and you simply cannot compare the entire FOSS suite - the work of tens of millions of people all over the world in every profession and every category - with any other physical entity. Trying to stick to the car analogy is pointless because we're using a car, given to us for free, built by a whole range of people, which contains every possible combination of driving performance, style, comfort and efficiency - simultaneously! For free! And I can give you exactly the same car with minimal effort, absolutely legally. There's literally no analogy to it.

My observation here is that the car analogy, and many of the other analogies that get used to describe software and how it works, suits people who still belive in the politics and economics of scarceity. From the perspective of these analogies, Free Open Source Software makes no sense because it doesn't fit in the analogy. Strangely the people espousing these points of view don't see this as a sign that their analogy is broken, they see it as a flaw in the reasoning for Free Open Source Software.

Let's make it clear: Free Open Source Software works on the four principles of freedom espoused by the Free Software Foundation. They allow you to get software, use it, fix it if it breaks and improve it if you can, and share those improvements with other people. The key point unstated there which I think the Microsoft apologists are missing is that all this works on a community of sharing. The four freedoms make sense for you as an individual, but they are absolutely no-brainer logical when you are part of a large community of people that can help eachother. Proprietary software's principles only make sense when you are an individual, without any connection to anyone else using the software but with only the connection to the software vendor. Even before the internet that was untrue; the internet merely made the processes of being in a community - communication, contribution, sharing and co-operation - available to an audience many orders of magnitude larger.

In the heat of the moment, though, I did think of one fundamental flaw in the car analogy that caused my interlocutor to reconsider his position. It would be a completely different thing for Ford to change the specifications of how after-market gadgets fit on their cars if Ford made 80% of the cars on the market and (most importantly) if the size of the after-market parts industry was ten to a hundred times the size of Ford's business. In that light it doesn't look like fair play to change their specs so that they can sell more brake-pads or steering wheel covers and everyone else has to go back to the drawing board for six months. But, since that breaks the analogy, it might not have made sense...

Last updated: | path: tech | permanent link to this entry

Sat 30th May, 2009

Canberra Linux Users Group monthly meeting for May 2009

The first of many CLUG Linux Learners Meetings!

The meeting was, in general and in my opinion, a success. Lana Brindley gave the first talk, entitled "10 Reasons Why You Do Not Want To Install Linux. Ever.", which was (no surprises) really a "10 old myths about not using Linux and why you should ignore them" talk. It was clever, well presented and covered all the things Linux users get tired of explaining. Several times Lana would pose a myth about Linux and people would automatically call out objections or corrections - which I take to be a good sign that her talk dispelled the myths that us enthusiasts want cleared away.

My talk, unfortunately, I feel was doomed from the start. It was "Paul's Ten Tips About Bash", and the content was definitely useful to some people - and I think it says a lot about Linux users that even the most learned people in the room still learnt a few tricks and mentioned some that I didn't know. However, it wasn't a talk for everybody, and importantly it contrasted with Lana's number one point: that Linux is perfectly possible to use without ever coming near the command line. My disappointment is that I didn't think of this earlier - I got carried away by my own geekiness. It should have been "Paul's ten tips on avoiding the command line", which would have been something that many more people learned from. Heck, I could have learnt a lot putting that talk together.

I'll do it at the next Linux Learners meeting, which will be in August (I think we'll set a schedule of doing them every three months and see how that goes).

Last updated: | path: tech / clug | permanent link to this entry

Mon 25th May, 2009

Canberra Linux Users Group Install Fest for May 2009

The Canberra Linux Users Group invites anyone in the Canberra region to come along and learn more about Linux. We help you install Linux on your computer, we teach you about how it works and the way to get around a modern Linux desktop, and we help you fix the problems that have been nagging you with your Linux system in the past. We can even demonstrate what it looks like and why it's not as dangerous as it sounds if you're not ready to install it just yet.

There's more information on my website, so please feel free to email me with your questions. If you want to come along and are interested in having a sausage for lunch, please also drop me an email by Friday!

Last updated: | path: tech / clug | permanent link to this entry

Canberra Linux Users Group monthly meeting for May 2009

The first of many CLUG Linux Learners Meetings!

This meeting is a 'fixfest' and learner session for people new to Linux or still finding their way around. (That's most of us!) We'll be having short talks about a variety of subjects but the majority of the night will be given over to people helping other people fixing problems and learning their way around Linux.

Lana Brindley will be starting the night with a talk entitled "10 Reasons Why You Do Not Want To Install Linux. Ever." and with a provocative title like that you can tell it's going to be interesting! Paul will then give a short talk on tips he's learnt in using Bash, the current standard command line shell in Linux.

You're welcome to bring your computer along but please email me ( beforehand so I can get an idea of the numbers of machines involved.

Last updated: | path: tech / clug | permanent link to this entry

Sat 23rd May, 2009

Installing and debugging Minimyth for fun and profit

In the ongoing quest to save power, I am moving my MythTV backend onto my low-power home web server, and building a new low-power frontend based on a fanless Via EPIA M board and the Minimyth custom linux distro. This is a cut-down system designed to boot off TFTP or NFS, but can be also adapted to boot off a CompactFlash card, which is what I'm doing (my firewall has a DHCP server but the web interface doesn't allow me to set a TFTP boot option, and I can't be bothered to work out PXE boot just this moment).

It's a very neat little package, with everything you could want compiled in, but this also means that setting it up is a rather complicated process. First you write a specialised config file, which has to go in a specific place. Then you unpack the various bits and pieces onto the CF card and run syslinux on it to make it bootable. Then you stick it in the machine and boot it, and if anything goes wrong you telnet (yes, telnet) into it and peruse its /var/log/messages to find out what went wrong. Then you take the CF card out, fiddle with the contents a bit on your own computer, plug it back in and see if that helped.

This is made somewhat frustrating by the lack of examples and somewhat minimalistic approach to explanation that the minimyth documentation takes. It also leans more toward the network boot process (to have no flash drive in the machine at all, and to allow you to run a writable root partition) and covers the flash install side somewhat minimally. The site also doesn't provide a sample / working / minimal minimyth.conf file, so you have to google around and womp up one on your own. Add to that the minimyth machine's habit of only bringing up its telnet connection (yes, telnet) about two minutes after you've booted it,

I started this nearly a week ago, and various delays and frustrations have prevented me from documenting all the steps. But I'll try to get more of the process documented soon.

Last updated: | path: tech | permanent link to this entry

Mon 18th May, 2009

CLUG Programmers SIG for May 2009

Last Thursday night we had Paul Fenwick from Perl Training Australia giving two talks - "The Art of Klingon Programming" and "Awesome Things You've Missed In Perl". I'd see the first one before at OSDC 2008, but Paul is still constantly improving this talk so it's still worth seeing again. In particular it's the examples that often have me saying "of course!" and "how clever", as Paul introduces some subtle new feature or variation on what he's said before that gives a new perspective on autodie's usefulness.

"Awesome Things You've Missed In Perl" is a good way of updating seasoned Perl coders to the new things you can do with recent versions of Perl. It's more about the modules that have come out in recent years that make writing Perl much easier, such as Moose, autobox, and autodie (of course). But there's still much that Paul mentions that exists in Perl 5.10 that makes a coder's life easier; given/when, the smart match operator and named captures just for starters. For us people that still have a Camel book second edition on their desks (somewhat guiltily), it's an excellent refresher and reminder to get with the times. It also makes the transition to Perl 6 much easier.

The meeting was very well attended - 18 people - including three from the class that Paul was teaching. For me it was a wonderful "small world" moment as a friend of mine from Melbourne happened to be at the course - of course, my embarrassingly useless people memory caused me to have to ask her name. But it really was quite wonderful to see Louise again, albeit somewhat briefly. The main programmer for the water resources project that Paul was tutoring was interested to learn about the Canberra Perl Mongers group and will hopefully join and come along as a regular participant.

So, thank you Paul Fenwick for making this a really great night!

Last updated: | path: tech / clug | permanent link to this entry

Wed 25th Mar, 2009

Ada Lovelace Day 2009

I feel moved by the many posts on Ada Lovelace Day to mention a woman I know who has inspired me in my love of computing and really taught me that there are no limits to what you can do with a bit of perseverance. Many of the people talked about in the posts I've linked to above are all deserving in their own right, and many of them are well-known for their great works in the field. Maybe I'm biased in talking about the person who has inspired me but I think it's justified.

I want to tell you about my mum, Jane Fountain.

I can't link to her web page, because she doesn't have one. Her main activities on computers have been emailing friends, doing the job of secretary for the Society for Growing Australian Plants, and archiving her digital photos; she still occasionally struggles with the technology. It's been several months since her mobile phone account expired and she's only recently noticed - so she might not initially appear to be a good person to think of when dealing with technology.

But: my mum works as a teacher aide at the Chapel Hill State Primary School. Two years ago they got some Lego Mindstorms, and of course the teachers and teacher aides are supposed to teach the children how they work. So mum - on her own initiative - took the kit home, learnt the manual, and programmed it in the living room over a weekend. She taught herself how to program them, how to debug them, and how to program - having never learnt a programming language or gotten closer to programming them than watching her sons.

And, over the years, she's encouraged my brother and I to learn about computers and to use our abilities to their fullest. She might doubt her abilities some times but when she has a task to do she applies herself with a will. Fifteen years ago I can remember her remarking about a friend who knew the botanical names for plants and she stated that she'd never be able to do that - now she not only knows the botanical names for everything she plants but she gets frustrated with people who say it's difficult! She's artistic, intelligent, skilled at a huge range of crafts, and every time I visit she shows me some new clever thing she's thought of to help the children at her school learn and enjoy learning. And she's humble into the bargain :-)

In short, my mum continually inspires me to learn more, to apply myself, to never say I can't do it, and to stick to it when I do. I think those are qualities Ada Lovelace herself would have admired.

Last updated: | path: tech | permanent link to this entry

Tue 24th Mar, 2009

The intangible smell of dodgy

My Dell Inspiron 6400 has been a great laptop and is still doing pretty much everything I want three years after I bought it. I fully expect that it will keep on doing this for many years to come. Its battery, however, is gradually dying - now at 39% of its former capacity, according to the GNOME power widget. So I went searching for a new battery.

I came across the page, which I refuse to link to directly. It looks good to start with, but as you study the actual text you notice two things. Firstly, it looks like no person facile with Australian English ever wrote it - while I don't mind the occasional bit of Chinglish this seems more likely to have been fed through a cheap translator program. Secondly, it seems obvious that the words "Dell Inspiron 6400 laptop" have been dropped into a template without much concern for their context. Neither of these inspire confidence.

I was briefly tempted to write to the site contact and mention this, but as I looked at some of the other search results it became increasingly obvious that this was one in a number of very similar sites, all designed a bit differently but using the same text and offering the same prices. This set off a few more of my dodginess detectors and I decided to look elsewhere.

Last updated: | path: tech / web | permanent link to this entry

Fri 13th Mar, 2009

The Helpful Internet 0002

One of my work duties is to set up Nagios monitoring on our servers. I intend to use the Nagios Remote Plugin Executor plugin - 'nrpe' - and didn't want to futz around on work servers possibly stopping things from working correctly, so I set it up at home. (Yes, of course I have two servers at home, doesn't everyone?) I was following the handy guide to setting it up, when I hit this error (running nagios -v nagios.cfg to verify the configuration):

Error: Invalid max_attempts, check_interval, retry_interval, or notification_interval value for service 'CPU Load' on host 'media'
Error: Could not register service (config file '/etc/nagios/hosts.cfg', starting on line 41)

There's no setting you can get correct on this; changing the values doesn't seem to work, and if you remove one of them you get warned that they are required. Reading the version 2.0 documentation tells you that they're required but even if you obey it the above command still gives you the warning. This, by the way, is a hint that the problem lies elsewhere.

A bit of Googling found a few people with this error and not much more; one had someone 'helpfully' pointing out that the retry_interval keyword was only used in version 2 (which, of course, I was using). Our local Nagios expert came over and had a look, tried all the things I'd tried, and declared it unsolvable. After a bit more fiddling, I noticed that the service definitions in the PDF examples use the generic-service template, but the definitions in the localhost.cfg file (supplied by the EPEL nagios package) used local-service. I changed it in my new hosts configuration.

It worked.

And there it was, a second host in my Nagios display. Things looked even better after working out that, by default, nrpe's configuration doesn't allow commands to be given parameters by the server (as a plug for the obvious security hole), and therefore one had to set up specific command definitions for each command you wanted (rather than the standard Nagios configuration, which is to configure them in the service definition).

So the summary is that Nagios is a powerful tool, and its documentation really needs some tender love and care. I mean, its standard install instructions for Fedora ignore any possibility of packages and install from source, and then disable SELinux. On a server! I shudder to think what the other parts of the source package contain - maybe the CGI is set up to allow all users by default.

Last updated: | path: tech | permanent link to this entry

Thu 26th Feb, 2009

Error: insufficiently sincere headdesk

I discovered one of the servers that I manage had been placed in my hands with SELinux turned off. This, when SELinux is available, is a mistake, because if you ever need to turn SELinux on again you will find that nothing in the file system has the correct SELinux contexts, and everything will fail. Since this was a server that didn't have anything important running on it, I decided to reboot it with the /.autorelabel file in place and SELinux in permissive mode, thus re-establishing the permissions while the server wasn't heavily relied on.

After an hour with the server not appearing on the network, I started wondering what had happened. The console, it turns out, was displaying this message:

Creating root device
Mounting root filesystem
mount: error 6 mounting ext3
mount: error 2 mounting none
Switching to new root
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!

Googling mount error 6 showed mostly complaints of disk and file system drivers not being compiled correctly, but further up the page (fortunately still on screen) I could see that the correct modules were being loaded. Pulling out my trusty rescue CD, I did a file system check. All fine. I tried recompiling the initrd file. All fine. I looked at the LVM information. All fine. Something niggled at the back of my head. I looked at the /boot/grub/grub.conf file. Then I discovered my error.

Concurrent to this reboot, I had renamed the logical volumes from the unhelpful (e.g. "LogVol00") to the sensible (e.g. "root_lv"). This can be done while the system is live (I discovered) so long as you also edit the /etc/fstab file to rename any device names. However, I hadn't renamed the devices in the Grub configuration - I usually make sure that the volumes are labelled and use root=LABEL=/root in the kernel parameters to set where the root volume is. So the kernel was looking for a non-existent block device and failing with the unhelpfully-named "mount error 6".

After a brief bit of enthusiastic head slapping, I fixed the Grub configuration, rebooted and all was well. Hopefully the next time someone encounters this error, they'll find this information useful.

Last updated: | path: tech | permanent link to this entry

Wed 18th Feb, 2009

The 'helpful' internet

I've been wrestling with the new VMWare Server 2.0 interface for a number of months now. Thwarting me has been the use of a "web" interface and its complete failure to be able to log in. Each time I try I've been presented with the dreaded and inexplicable error:

"The server is not responding. Please check that the server is running and accepting connections."

The 'help', if it can so be called, that people on the internet have provided for this has ranged from disabling SELinux, through disabling IPv6, to fiddling with various configuration in everything from VMWare to the /etc/services file. Almost universal in these pages is the fact that the person has done a couple of things (put some new line into /etc/vmware/locations, disabled iptables, installed xinetd) but none of them worked. Users are left either cluelessly following all of these things, hoping in a cargo-cult way that one of them might do the trick, or (like me, who eschews such thoughtless ritual) wondering what the real answer is.

Well, I finally found one suggestion that worked, for Fedora 10 on i386 at least. I can't say it will fix your problem, but it's at least a start. And it does seem to make sense with respect to what the problem seems to be - a failure in some part of the authentication process. Simply edit your /etc/pam.d/vmware-authd file to include these lines:

auth		required
account		required

I'm not entirely sure what this does, which probably puts me back in the cargo-cult camp.

You will also need to fix a couple of the libraries' SELinux permissions to allow VMWare to use libraries which require text relocation:

chcon -t textrel_shlib_t '/usr/lib/vmware/hostd/'
chcon -t textrel_shlib_t '/usr/lib/vmware/vmacore/'
And, once you've installed the VMWare viewer plugin for Firefox:

find /root /home -name -print0 | xargs -0 chcon -v -t textrel_shlib_t

Hopefully this will help a few people who want to get VMWare Server working on Fedora 10 without having to do crazy stuff (e.g. disable SELinux).

Last updated: | path: tech | permanent link to this entry

Wed 28th Jan, 2009

LCA flies by

In certain circumstances, bringing an airplane, the sun and some clouds into the proper relationship will show you an interesting phenomena - a ring of brighter cloud, centred on the shadow of the plane. This happens at the angle where the crystals in clouds perfectly reflect the incident light back to you, and I'd love some optics physicist to explain it to me one day. But it has the unusual property, if you are close enough to the clouds, of focussing that small band - every detail of that area stands out. Individual filaments of cloud are shown to you before you swiftly move on to the next. If you watch one bit it fades into dullness and its detail is lost, but if you keep your eye moving every part of the cloud has its own delicate, infinitely detailed beauty.

I found myself in just such a conjunction of plane, sun and cloud on my flight back from Hobart to Melbourne after Linux Conference Australia, still dazed by the early morning start to get to the six o'clock plane. In this contemplation-conducive state, I thought the image above was a good metaphor for the conference overall - each little bit brilliant but fading when compared to the next bit of brilliance, and the overall brilliance only capturable in the human mind, where the individual experiences can be overlaid rather than replaced and forgotten as in a movie.

I'll stop trying to wax lyrical, and while lyrical waxes someone else will note down some highlights the whole week of fun.

While it was a bit of a slog up the hill to the college from the Uni, it wasn't too hard and certainly got a few of us a bit fitter, myself included. The rooms were very nice, and despite being shunted out of my original room with other Canberrans I got to meet a bunch of new people which I always enjoy. Special thanks to Ian Beardslee for whiskey and perspective.

The venues were pretty good, but the fact that speakers had to hold radio mikes up to their faces led to a lot of pretty variable audio. Some people, like Tridge, Jeff Waugh, and Rusty already know how to project well - others were a bit shyer and/or uncertain how to speak to a microphone. The trick is to have it up near your chin - close enough to pick up every sound, but out of the direct breath path so that your 'P' sounds don't pop. The main point is that you are trying to get your spoken words across to everyone in the room and on the video, and that is much more important than feeling embarrassed. And never, ever blow into the microphone to test if it's on - tap it or scrape the mesh on the top instead. There's much less chance of damaging the pickup that way, or having an audio professional decapitate you with your own shirt for maltreating their equipment.

Being a speaker for the first time, I was really blown away with how well they treat speakers at LCA. You get picked up at the airport, you get your own (speakers) dinner and you get to go to the Professional Delegates Networking Session. So not only did I get to go to two very nice places to eat and see some of the attractions around Hobart, but I also got to pretend to be a professional. Being a part of the process that makes LCA great - the talks - is pretty awesome too. And having people talk to and email you afterward about the topic and ask more questions and have more discussion is even better. Still very happy with that.

However. In order to really rock as a speaker giving a "here's the coding project I've been working on" talk, I think you need one simple thing: results. There were a couple of talks - the High Def H.264 decoding in Intel GPU talk for example - that gave an overview one might give to technical management and showed us almost nothing in the way of actual code or working software. Compare this with the CELT talk, where Tim not only demonstrated why the code was so clever and why low latency was important, but demonstrated it right there. I don't really need a working demo, but I do need to see that the code is in use by real live people, not still on the drawing board. If drawing-board projects were the criterion for a good talk I would be occupying my own day at LCA. :-)

The conference dinner was very good - buffet style wins! The fund raising was also pretty awesome - although I'm not a big fan of the whole 'auction' thing when pretty quickly it has got out of the reach of any single person in the audience, I still think that it's an excellent example of why Open Source really does rule when we can raise over $40,000 for a charity from essentially a bunch of individuals with one tangible and a few intangible prizes (pictures in the kernel, people's integrity, etc.). If anything, the guy who spoke about the disease could have talked more about the research - most of the table I was sitting with was pretty bored through the 'here's some pictures of bad stuff' part but were riveted when it came to the 'and here's why it's a technically interesting problem' part.

The laptop case cover was well received but needs some work to straighten it out and stop it from cracking. It no longer attaches to the laptop - the tension on the outer surface simply pulls the catches back off again.

A judicious balance between coffee, V and water is what kept me going for most of the conference. I've found the 700ml Nudie bottles are light, easy to use, and contain enough water to keep you hydrated. It took me most of Monday to really feel like I was fully compos mentis.

I met lots of nice people in the LUG Comms meeting and more nice people in the LinuxChix lunch. I now owe Jon Corbet two beers, as part of a "I must buy you a drink for your excellent Linux Weekly News" plan gone horribly wrong, and Steve Walsh, Cafuego, James Purser and others need to be pinned down in a bar somewhere so I can buy them beers. Jon Oxer and Flame (who really should be called Black Flame) were excellent value, the keysigning was underpopulated but still worthwhile, and the sheer quantity of BOFs happening in spare rooms, in corridors, up trees and elsewhere were just too much for me.

The MythTV miniconference was a highlight - giving my talk at it was a lowlight because I should really have had much more technical detail; the lesson is "if you see someone suggesting a miniconference, only volunteer to talk on the subject if you have something that is at the generally high quality of Linux Conference talks". There were a few other MythTV talks that left me wanting a bit more detail, but there's no feeling quite like realising that all the technical people have left the room for your talk, and the only developer remaining is working on his presentation....

Overall, the quality of LCAs is still high, and I have no doubt that Wellington will pull out all the stops for a top-quality LCA too. If they can get their videos up a bit quicker than this year...

Last updated: | path: tech / lca | permanent link to this entry

Fri 23rd Jan, 2009

LCA - the conference that keeps on giving

I haven't really been in the right frame of mind to blog more regularly about LCA. But my current criteria for any new employer is whether they consider it important enough to my employment to be interested in sending me to LCA -

Last updated: | path: tech / lca | permanent link to this entry

Thu 15th Jan, 2009

Shared Spectrum Stupidity? or Telstra Treachery?

I'm on Internode's standard ADSL2 offering and I have a voice line through Optus. I've been watching Internode's progress to their Ultra offering, combining naked ADSL line, VOIP phone, and number portability through Optus. Finally, it arrived, and in order to procrastinate over all the the things I need to do to get ready for Linux Conference Australia, I decided to try to sign up.

After a long wait (thank Bell for call back services!) I spoke to a nice young lady who, after the usual confusion over exactly what bits of documentation she wanted from me, informed me that changing over to this service would require a downtime of 10-21 days, starting at some arbitrary point in the future and continuing to whenever Telstra got around to changing my service over from being connected to their equipment to being connected to Internode's. This was something of a shock - I never go without internet access for more than a couple of days unless deliberately divorcing myself from all forms of technology - so I had to have a think about it.

This, to me, epitomises why Telstra, as a company, should basically be torn to shreds by the powers that be and thrown to the winds. Because its whole modus operandi for collecting money is basically down to these kinds of stupid extra delays and charges. If you're not a Telstra customer you have to wait until last because they have a monopoly on who can work on their copper. They'll slug you for every possible thing they can because it keeps their revenue up - we pay more in line rental to be an Optus customer than we would if we were with Telstra. And they introduce huge and unnecessary delays - it should take ten minutes to re-terminate some bits of copper, not three weeks - because it implicitly penalises you for not being a Telstra customer. You can bet that if I was a Telstra customer I'd be able to be chopped over instantly with perhaps a ten second loss of carrier.

So remember, if you're a Telstra shareholder your share value is being propped up by this kind of anti-competitive, nasty-business, strong-arm tactics. Your shares retain some kind of value only because Telstra is milking every customer for every cent that they're worth with every devious and underhanded tactic they can get away with. And remember when you voted to not allow the board to increase their salary, and they did anyway because they basically ignored a two-thirds majority? Yeah, that was good for those share prices too, eh.

I have no doubt that the line technicians and customer service people and all the low and middle ranks in Telstra are nice people in the usual ratio. I don't hate them specifically, except as a general principle of feeling that they should all just revolt against the bad reputation they get as Telstra's representatives. What I hate is the corporate bullies up in the upper echelons, who dribble broadband to us through NextG and other pitiful connections and then openly state that they could really open the throttle up for everyone but they'll only do so to drive other competitors off - in the mean time they're just quite content charging us huge amounts of money for a thing that barely earns the name 'service'.

There will be a reckoning, some day...

Last updated: | path: tech | permanent link to this entry

Tue 25th Nov, 2008

Random randomness

The SECRET_KEY setting in Django is used as a 'salt' in (one would hope) all hash calculations. When a new project is created, a piece of code generates a new random key for that site. I'd seen a couple of these and noted, in passing, that they seemed to have an unusually high amount of punctuation characters. But I didn't give it much thought.

Recently I had to generate a new one, and found a couple of recipes quite quickly. The routine (in Python) is:

from random import choice
print ''.join([choice('abcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*(-_=+)') for i in range(50)])
(Aside: Note how Python's idea of line breaks having grammatical meaning in the source code has meant making one liners is now back in style? Wasn't this supposed to be the readable language? Weren't one liners supposed to be a backward construction used in stupid languages? Is that the sound of a thousand Pythonistas hurriedly explaining that, yes, you can actually break that compound up into several lines, either on brackets or by \ characters or by partial construction? Oh, what a pity.)

Anyway. A friend of mine and I noted that it seemed a little odd that the upper case characters weren't included in the string. Maybe, we reasoned, there was some reason that they didn't include these characters (and the punctuation that isn't on the numeric keys). But, looking through the actual changeset that combined all the various salts and secrets into one thing, and looking at where the secret key was used in the code, it seems that it's always fed into the md5 hasher. This takes bytes, basically, so there was no reason to limit it to any particular character subset.

So my preferred snippet would be:

from random import choice
s = [chr(i) for i in range(32,38) + range(40,127)]
print ''.join([choice(s) for i in range(50)])
So you can at least read your secret key, and it doesn't include the single quote character (ASCII 39) that would terminate the string early. The update to the original functionality is in ticket 9687, so let's see what the Django admins make of it.

Last updated: | path: tech / web | permanent link to this entry

Fri 7th Nov, 2008

Don't Repeat Yourself - Next Generation

I've become a big fan of Django, a web framework that has a nice blend of python, good design and flexibility. The template system might not appeal to the people who like to write code inside templates, but to me it forces programmers to put the code where it belongs - in the views (i.e. the controllers, to non-Djangoistas) or models. I love the whole philosophy of "Don't Repeat Yourself" in Django - that configuration should exist in one place and it should be easy to refer to that rather than having to write the same thing somewhere else. The admin system is nice, you can make it do AJAX without much trouble, and it behaves with WCGI so you can run a site in django without it being too slow.

The one thing I've found myself struggling with in the various web pages I've designed is how to do the sort of general 'side bar menu' and 'pages list' - showing you a list of which applications (as Django calls them) are available and highlighting which you're currently in - without hard coding the templates. Not only do you have to override the base template in each application to get its page list to display list to display correctly, but when you add a new application you then have to go through all your other base templates and add the new application in. This smacks to me of repeating oneself, so I decided that there had to be a better way.

Django's settings has an INSTALLED_APPS tuple listing all the installed applications. However, a friend pointed out that some things listed therein aren't actually to be displayed. Furthermore, the relationship between the application name and how you want it displayed is not obvious - likewise the URL you want to go to for the application. And I didn't want a separate list maintained somewhere that listed what applications needed to be displayed (Don't Repeat Yourself). I'm also not a hard-core Django hacker, so there may be some much better way of doing this that I haven't yet discovered. So my solution is a little complicated but basically goes like this:

First, you do actually need some settings for your 'shown' applications that's different from the 'silent' ones. For me this looks like:


    ('portal', {
        'display_name'  : 'Info',
        'url_name'      : 'index',
    ('portal.kb', {
        'display_name'  : 'KB',
        'url_name'      : 'kb_index',
    ('portal.provision', {
        'display_name'  : 'Provision',
        'url_name'      : 'provision_index',

INSTALLED_APPS = SILENT_APPS + tuple(map(lambda x: x[0], SHOWN_APPS))
We build the INSTALLED_APPS tuple that Django expects out of the silent and shown apps, although I imagine a few Python purists are wishing me dead for the map lambda construct. My excellent defence is a good grounding in functional programming. When my site supports Python 3000 and its pythonisations of these kind of concepts, I'll rewrite it.

So SHOWN_APPS is a tuple of tuples containing application paths and dictionaries with their parameters. In particular, each shown application can have a display_name and a url_name. The latter relates to a named URL in the URLs definition, so you then need to make sure that your index pages are listed in your application's file as:

    url(r'^$', 'kb.views.vIndex', name = 'kb_index'),
Note the 'name' parameter there, and the use of the url() constructor function.

You then need a 'context processor' to set up the information that can go to your template. This is a piece of code that gets called before the template gets compiled - it takes the request context and returns a dictionary which is added to the dictionary going to the template. At the moment mine is the file

from django.conf import settings
from django.core.urlresolvers import reverse

def app_names(request):
        Get the current application name and the list of all
        installed applications.
    dict = {}
    app_list = []
    project_name = None
    for app, info in settings.SHOWN_APPS:
        if '.' in app:
            name = app.split('.')[1] # remove project name
            name = app
            project_name = name
        app_data = {
            'name'  : name,
        # Display name - override or title from name
        if 'display_name' in info:
            app_data['display_name'] = info['display_name']
            app_data['display_name'] = name.title()
        # URL name - override or derive from name
        if 'url_name' in info:
            app_data['url'] = reverse(info['url_name'])
            app_data['url'] = reverse(name + '_index')
    dict['app_names'] = app_list
    app_name = request.META['PATH_INFO'].split('/')[1]
    if app_name == '':
        app_name = project_name
    dict['this_app'] = app_name
    return dict
Note the use of reverse. This takes a URL name and returns the actual defined URL for that name. This locks in with the named URL in the snippet. This is the Don't Repeat Yourself principle once again: you've already defined how that URL looks in your, and you just look it up from there. Seriously, if you're not using reverse and get_absolute_url() in your Django templates, stop now and go and fix your code.

We also try to do the Django thing of not needing to override behaviour that is already more or less correct. So we get display names that are title-cased from their application name, and URL names which are the application name with '_index' appended. You now need to include this context processor in the list of template context processors that are called for every page. You do this by using the TEMPLATE_CONTEXT_PROCESSORS setting; unfortunately, if this isn't listed (and it isn't by default) then you get a set of four very useful context processors that you don't want to miss, so you have to include them all explicitly if you override this setting. So in your file you need to further add:

The most inconvenient part of the whole lot is that you now have to use a specific subclass of the Context class in every template you render in order to get these context processors working. You need to do this anyway if you're writing a site that uses permissions, so there is good justification for doing it. For every render_to_response call you make, you now have to add a third argument - a RequestContext object. These calls will now look like:

    return render_to_response('template_file.html', {
        # dictionary of stuff to pass to the template
    }, context_instance=RequestContext(request))
The last line is the one that's essentially new.

Finally, you have to get your template to show it! This looks like:

<ul>{% for app in app_names %}
<li><a class="{% ifequal this_app %}menu_selected{% else %}menu{% endifequal %}"
 href="{{ app.url }}">{{ app.display_name }}</a></li>
{% endfor %}</ul>
With the apprporiate amount of CSS styles, you now get a list of applications with the current one selected, and whenever you add an application this will automatically change to include that new application. Yes, of course, the solution may be more complicated in the short term - but the long term benefits quite make up for it in my opinion. And (again in my opinion) we haven't done anything that is too outrageous or made

Last updated: | path: tech / web | permanent link to this entry

Tue 28th Oct, 2008

Cue high speed tape recorder sound effect

For the more technically minded, here is a brief synopsis of my criticisms of the "Clean Feed" 'initiative', sent in a letter to Senator Stephen Conroy:

  1. 1% false positive rate is way too high to be usable.
  2. 75% slower is too low to be usable, and the faster filters have a higher false positive rate.
  3. It only blocks standard web traffic, not file sharing, chat or other protocols.
  4. If you filter HTTPS, you cripple the financial system of internet shopping, banking, and personal information (e.g. tax returns).
  5. If the Government ignores who's requesting filtered content, then those wishing to circumvent it can keep on looking with no punishment. If the Government does record who requests filtered content, then even ASIO will have a hard time searching through the mountain of false positives.
  6. We already have filtering solutions for those that want it, at no cost.
  7. Mandatory filtering leads to state-run censorship and gives an in for the Big Media Corporations to 'protect their assets' by blocking anything they like.
  8. The whole thing is morally indefensible: it doesn't prevent a majority of online abuse such as chat bullying or file trading, and it relies on the tired old 'think of the children' argument which is beneath contempt.
  9. People who assume that their children are safe under such as system and therefore do not use other protection mechanisms such as watching their children or providing appropriate support are living in a false sense of security.
Instead, the Government should either put the money toward the National Broadband Network programme, or run their own ISP with the clean feed technology to compete with the regular ISPs.

Regards, Paul.

I urge every Australian to write to Senator Conroy and/or their local Member of Parliament on this issue - it is one we cannot afford to be complacent about!

Last updated: | path: tech / web | permanent link to this entry

Tue 14th Oct, 2008

Hacking the LCA registration process for fun and, er, fun

With the way tickets went for Linux Conf AU this year gone, and getting paid today, I decided to get my registration in early. Once again I noted they had continued the fine tradition of having a random silly message per registrant. Once again I decided to hack it to make it say what I wanted it to say.

Needless to say they raised the bar this year. Up until 2007 it was just a hidden field in the form. In 2008 they added a checksum - this delayed me a good five minutes while I worked out how they'd generated it. This year they've upped the ante, including both a different checksum and adding a salt to it. Another five minute's playing with bash revealed the exact combination of timestamp, delimiter, and phrase necessary to get a correct checksum. I am also made of cheese.

Naturally, don't bother emailing me to find out how I did it; the fun is in the discovery!

Last updated: | path: tech / web | permanent link to this entry

Thu 9th Oct, 2008

Perl threaded database query processing

In my work I've recently had to implement several pieces of code which follow this basic pattern:

  1. Retrieve data from the database
  2. Process data
  3. Store data somewhere.
Because of Perl DBI's habit (on the systems I've used) of grabbing all the data from the database into memory before actually giving it to the caller, and because that data can often get large enough to get my process swapping or killed, what this usually turns into is:

  1. Get a list of 'grouping' items (e.g. days, months, IP addresses, etc.)
  2. For each item in that group:
    1. Retrieve data from the database for that item.
    2. Process data
    3. Store data somewhere.
This runs into an unfortunate problem when the database server you're talking to takes a noticeable time to process your query - the whole thing slows down hugely. A typical slowdown I've seen is in the order of 500% - and both the database and the client processors are mostly idle during that time, as each query has to be individually fetched, processed, dumped back to the client, and then processed. It suffers the same problem if the time to process each group of data is significant - by the time you've got back to fetching the next group, the database has gone off and done other things and needs to get its disk heads back in the right place for your data.

These days we have processors capable of doing multiple things at the same time, and so it would be nice if the client could be processing rows at the same time as it's also requesting more data from the database. This is where Perl's threads and Thread::Queue libraries come in. It seems to me to be a generalisable task, so I'm sharing my first attempt at doing this in a generalisable way here. My main subroutine is:

sub Thread_Process {
    # We take one query which returns a list of items, a query which
    # returns other rows based on each of those items, and a function
    # which processes those rows.  We then run the processor function
    # in parallel to the fetching process to utilise the connection
    # to the database and keep the local processor active.
    # Requirements:
    # The item query must be executed and ready to return rows.  It
    #   can return any number of fields.
    # The rows query must be ready to be executed, and will be
    #   executed with the row_args_ref and then the items from each
    #   row in the item query in turn (as arrays).
    # The function takes as its last argument the Thread::Queue object
    #   that data will be passed through.  It must know exactly how
    #   many items it will take from each row, and that should match
    #   the number of items returned in the query.  For reasons as yet
    #   unclear, we can't pass references of any kind on the queue,
    #   so we pass the fields in each row as single scalars.  Any
    #   arguments that it needs should be given in fn_args_ref.  It
    #   should exit on receiving an undef.
    my ($items_qry, $rows_qry, $row_args_ref, $fn_ref, $fn_args_aref) = @_;
    my ($items_aref) = $items_qry->fetchall_arrayref;
    unless (ref $items_aref eq 'ARRAY') {
        carp "Warning: got no rows from item query\n";
        return 0;
    my $queue = Thread::Queue->new();
    my $thread = threads->create($fn_ref, @$fn_args_aref, $queue);
    foreach my $item_aref (@$items_aref) {
        $rows_qry->execute(@$row_args_ref, @$item_aref);
        while (my $aref = $rows_qry->fetchrow_arrayref) {
    return scalar @$items_aref;
A sample caller function would be:

sub Send_mail_to_everyone {
    my ($mail_handler, $template, $start_date, $end_date);
    my $servers_qry = $dbh->prepare(
        'select distinct mail_server from addresses'
       .' where birth_date between ? and ? and current = true',
    my $args_ref = [$start_date, $end_date];
    my $email_qry = $dbh->prepare(
        'select user_name, server, full_name, birth_date'
       .' from addresses'
       .' where birth_date between ? and ? and server = ?'
       .' and current = true'
    my $mailer_sub = sub {
        my ($queue) = @_;
        while (defined my $user_name = $queue->dequeue) {
            my $server = $queue->dequeue;
            my $full_name = $queue->dequeue;
            my $birth_date = $queue->dequeue;
            my $email_body = sprintf $template
                , $username, $server, $full_name, $birth_date;
            $mail_handler->send("$user_name@$server", body => $email_body);
    # Here most of the work gets done.
    Thread_Process($servers_qry, $email_qry, $args_ref, $mailer_sub, []);
Of course, this is somewhat of a contrived example, and gives you little in the way of feedback or error handling. But it's an example of how to use the Thread_Process subroutine. The mailer subroutine gets its $mail_hander and $template from being within the Send_mail_to_everyone routine.

There are two problems I've discovered so far. The first is that trying to do any kind of database operation within the subroutine doesn't work, because the database handle needs to be cloned. On the systems I've tested this on, unfortunately, $dbh->clone seems to be a no-op and the DBI engine complains that a different thread is using the database handle. I've tried passing $dbh->clone to the handler function, and doing the clone inside the handler function, but they change nothing.

More annoying is the fact that the memory used by the process continues to rise even if the number of outstanding rows is constant or dropping. I haven't traced this down, and haven't really the time now, but it seems to be related to the Thread::Queue object - I've tested variations of my handler routine that reuse existing memory rather than doing undef @array and push @array, $data in the handler, and this changes little.

What I don't know yet is whether either to package this up as a Perl module and start being a maintainer or whether it's too trivial or not generalised enough to be useful for anyone but me.

Last updated: | path: tech / perl | permanent link to this entry

Wed 8th Oct, 2008

Anonymous Earpieces

I bought a "Stereo Headset Kit" at Landmark Computers in Braddon the other day. This consists of a pair of wireless headphones, a bluetooth dongle that functions as a stereo encoder, a CD, a USB cable, a charger and some basic instructions. It's almost impossible to find out exactly who made this - there's no actual brand on the box or devices and the only identifiable branding is of the USB connection software. The software does have a Linux version available but (almost inevitably) it isn't supplied - the disk is really Windows-only.

The dongle has a switch that allows it to function in either USB or Audio mode. In USB mode it is a fully-featured USB Bluetooth dongle - plugging it into my Fedora 9 install allowed me to see all nearby Bluetooth phones, computers and the headset. I haven't tried to see if I can get it to function as a Bluetooth audio device, but PulseAudio does apparently provide this. In Audio mode, it encodes input from the headphone jack on the dongle and sends it to the paired headset. This allows you to use other devices such as computers, phones and music players that don't have Bluetooth capability.

The headset supports the A2DP profile, which basically supports (reasonable quality) stereo audio over Bluetooth. The quality and the stereo separation are quite good and, although it might not be up to full studio monitoring quality, it is easily capable of delivering good quality audio for everyday use. It also supports the headset profile for phones that don't have A2DP capability, but the headphones don't have a microphone so you can't use it as a full headset.

The headphones are comfortable even after a couple of hours of use. They sit around the back of the head, and even for people who have a large skull (such as myself) they don't press in uncomfortably. They can also fold to be flat so they can easily fit in a pocket when not in use. The right speaker has a volume up/down control, a skip forward/back control (for phones that support such control) and a main button that can be used to turn the unit off and on and put it in pairing mode.

These are not a cheap device, at around $110. However, they are cheaper than many of the brand-name devices and are more comfortable than the BlueAnt X5s that I tried a while back. The larger speaker gives them a better bass response than smaller earphones and the lack of cord prevents all sorts of tangles and trip-ups. As someone who seems prone to turning away from the computer and pulling the earphones out of my ears by accident (and force), that's a good thing.

The major downside so far has been that, while they give perfectly good stereo audio between the encoder dongle and the headphones, the other two devices I've tried both have bizarre and annoying behaviour that makes them nearly unusuable. My phone, a Nokia 5310 XpressMusic easily capable of A2DP, will drop them down from A2DP to Headset protocol, detectable as a flat mono signal, and then eventually (in three tests) drop them altogether, often getting wedged on the song being played back (a different song each time which has played normally otherwise, so that wasn't the problem). This wedges the headphones too, requiring a little press of the hidden reset button with a handy bent paperclip. Don't have one to hand? Too bad. (Note to young players - the little hole beneath the reset button is actually to seal the rubber protector in place, and should not be poked into unless you want to put a damaging hole in the speaker diaphragm. And you don't.)

The computer was even worse. I got the laptop bluetooth working with the instructions at and it all came through nicely. For two minutes. Then it went to Insanely Loud Mode. Once I'd dialled back all the settings in PulseAudio (which, for reasons as yet unclear, muted the audio completely at 60% main volume or 40% RhythmBox volume) it was listenable, although you could tell there was hard clipping going on somewhere before the volume reduction stage (audible as a 'crackliness' to the sound). Then, two minutes later, it dropped back to completely inaudible, and only by turning all the volumes up again did anything come out. Two minutes later it cycled back to insanely loud and kept doing this as long as I was prepared to put up with it. Adjusting the volume on the headset seemed to do little, although when I later connected it to my phone for a second test the volume on the headset was turned up very loud, and changing the volume on the headset altered the volume on the phone. So I assume that something in PulseAudio was 'helpfully' adjusting the volume, for reasons as yet unclear.

I haven't tested it with anything else that outputs A2DP via Bluetooth, so I haven't any other benchmarks to work against. But so far this is a device that works perfectly with its own adapter and appallingly with everything else; not a trait that endears it to me.

Last updated: | path: tech | permanent link to this entry

Wed 20th Aug, 2008

Error Message Hell

If there's one thing anyone that works with computers hates, it's an error message that is misleading or vague. "Syntax Error", "Bad Command Or File Name", "General Protection Fault", and so forth have haunted us for ages; kernel panics, strange reboots, devices that just don't seem to be recognised by the system, and programs mysteriously disappearing likewise. The trend has been to give people more information, and preferably a way to understand what they need to do to fix the problem.

I blog this because I've just been struggling with a problem in Django for the last day or so, and after much experimentation I've finally discovered what the error really means. Django, being written in Python, of course comes with huge backtraces, verbose error messages, and neat formatting of all the data in the hopes that it will give you more to work with when solving your problem. Unfortunately, this error message was both wrong - in that the error it was complaining about was not actually correct - and misleading - in that the real cause of the error was something else entirely.

Django has a file which defines a set of regular expressions for URLs, and the appropriate action to take when receiving each one. So you can set up r'/poll/(?P\d+)' as a URL, and it will call the associated view's method and pass the parameter poll_id to be whatever the URL contained. In the spirit of Don't Repeat Yourself, you can also name this URL, for example:

url(r'/poll/(?P\d+)', 'view_poll', name = 'poll_view_one')

And then in your templates you can say:

<a href="{{ url poll_view_one }}">{{ }}</a>

Django will then find the URL with that name, feed the poll ID in at the appropriate place in the expression, and there you are - you don't have to go rewriting all your links when your site structure changes. This, to me, is a great idea.

The problem was that Django was reporting that "Reverse for 'portal.address_new_in_street' not found." when it was clearly listed in a clearly working file. Finally, I started playing around with the expression, experimenting with what would work and what wouldn't in the expression. In this case, the pattern was:

new/in/(?P\d+)/(?P[A-Za-z .'-]+)

When I changed this to:


It suddenly came good. And then I discovered that the the thing being fed into the 'suburb_id' was not a number, but a string. So what that error message really means is "The pattern you tried to use didn't match because of format differences between the parameters and the regular expression." Maybe it means that you can have several patterns with the same name that will try to match based on the first such pattern that does so. But until then, I'll remember this; and hopefully someone else trying to figure out this problem won't butt their head against a wall for a day like I did.

Last updated: | path: tech / web | permanent link to this entry

Tue 29th Jul, 2008

Django 101

At work I've started working on a portal written in Python using the Django framework. And I have to say I'm pretty impressed. Django does large quantities of magic to make mothe model data accessible, the templating language is pretty spiffy (it's about on a par with ClearSilver, which I'm more familiar with - each has bits that the other doesn't do), and the views and url mapping handling is nice too. I can see this as being a very attractive platform to get into in the future - I'm already considering writing my Set Dance Music Database in it just to see what it can do.

So how do I feel as a Perl programmer writing Python? Pretty good too. There are obvious differences, and traps for new players, but the fact that I can dive into something and fairly quickly be fixing bugs and implementing new features is pretty nice too. Overall, I think that once you get beyond the relatively trivial details of the structure of the code and how variables work and so on, what really makes languages strong is their libraries and interfaces, and this to me is where Perl stands out with its overwhelmingly successfull CPAN and Python, while slightly less organised from what I've seen so far, still has a similar level of power.

About the only criticism I have is the way the command line option processing is implemented - Python has tried one way (getopt) which is clearly thinking just like a C programmer, and another (optparse) which is more object oriented but is hugely cumbersome to use in its attempt to be flexible. Neither of these hold a candle to Perl's GetOpt::Long module.

Last updated: | path: tech / web | permanent link to this entry

Tue 15th Jul, 2008

The lost limericks list

After that post, I thought I'd just check which category I'd put my previous limericks in. To my horror, I discovered that I hadn't blogged them at all, but had (merely) posted them to the Linux Australia list. So I rescued them and posted them here for posterity.

That wonderful man Andrew Tridgell
Over SaMBa keeps permanent vigil.
SMB, it is said,
He decodes in his head,
And CIFS 2 will some day bear his sigil.
The great LGuest programmer Rusty,
Is virtually never seen dusty.
He eats 16K pages,
And has done so for ages,
Yet his moustache is clean and not crusty.
That marvellous girl Pia Waugh
Is certainly hard to ignore.
With her leet ninja moves,
Open Source just improves -
All Linux Australians show awe!

Last updated: | path: tech | permanent link to this entry

The Wireless Jonathan Oxer

After the three limericks I wrote about Tridge, Pia and Rusty, the conversation came up on #linux-aus about whether I could make a similar epgiram for Jon Oxer, former Linux Australia president, front-line hardware hacker and all-round good guy. It took me two months, but in an email to Jon I finally cracked it, packing much more into the rhyme than I originally thought would be possible:

The wireless Jonathan Oxer,
Waves his hand and his front door unloxer.
A remote-control loo,
And home theatre too -
If you as me, his whole house just roxor!
Who's next, I wonder?

We tune to podcasting James Purser,
Long known as a rhymer and verser.
With his darling wife Karin
They are not known as barren:
Three children now stare at their cursor.
Steve Walsh, however, is going to take a bit more thinking about.

Send your suggestions of who should be next under the pen to

Last updated: | path: tech | permanent link to this entry

Sun 15th Jun, 2008

Common code in ClearSilver 001

I've been using ClearSilver as a template language for my CGI websites in earnest for about half a year now. I decided to rewrite my Set Dance Music Database in it and it's generally been a good thing. Initially, though, I had two problems: it was hard to know exactly what data had been put into the HDF object, and it was a pain to debug template rendering problems by having to upload them to the server (surprisingly, but I think justifiably, I don't run Apache and PostgreSQL on my laptop so as to have a 'production' environment at home).

I solved this problem rather neatly by getting my code to write out the HDF object to a file, rsync'ing that file back to my own machine, and then test the template locally.

I knew that ClearSilver's Perl library had a 'readFile' method to slurp an HDF file directly into the HDF object, and a quick check of the C library said that it had an equivalent 'writeFile' call. So happily I found that they'd also provided this call in Perl. My 'site library' module provided the $hdf object and a Render function which took a template name; it was relatively simple to write to a file derived from the template name. That way I had a one-to-one correspondence between template file and data file.

Then I can run ClearSilver's cstest program to test the template - it takes two parameters, the template file and the HDF file. You either get the page rendered, or a backtrace to where the syntax error in your template occurred. I can also browse through the HDF file - which is just a text file - to work out what data is being sent to the template, which solves the problem of "why isn't that data being shown" fairly quickly.

Another possibility I haven't explored is to run a test suite against the entire site using standard HDF files each time I do a change to make sure there aren't any regressions before uploading.

Hopefully I've piqued a few people's interest in ClearSilver, because I'm going to be talking more about it in upcoming posts.

Last updated: | path: tech / web | permanent link to this entry

Wed 2nd Apr, 2008

Stupid Error 32512

For a while my brother's been having a problem with his MythTV setup - the mythfilldatabase script won't run the associated tv_grab_au script when run automatically, but will work just fine when run manually. In the logs it says:

FAILED: xmltv returned error code 32512.
Now, after a bit of searching I have finally found that 32512 is a magic code from the C system(3) call, which basically does a "sh -c (system call and arguments)". If sh can't find the file you've specified in the system() call, it returns 127, which is shifted into the upper eight bits of a 16-bit smallint (as far as I can make out, the lower eight bits are reserved for informing the caller that the system call was aborted due to a signal - e.g. a segmentation fault).

After a lot more searching, and a good deal of abuse on the #mythtv-users channel on, I finally found some information about shell exit codes, and it turns out that 127 is "command not found". In other words, mythfilldatabase at that point is trying to call the tv_grab_au grabber and not finding it. On my brother's machine, this is because sh under root does not get the path /usr/local/bin, which is where the grabber is stored.

(It works on my machine because I run it from a script which picks a random time, and includes /usr/local/bin in the path.

So there are two solutions, as I see it:

1) Put tv_grab_au in /usr/bin/.

2) Run mythfilldatabase from cron using a script which includes /usr/local/bin in the path.

Given the bollocking I got in #mythtv-users for suggesting something so crude and hackish (in the words of Marcus Brown, mzb_d800) as cron, I guess I'll have to go with option 1. But here's hoping that this blog entry helps someone else out there - almost every post on the mythtv-users email list that mentions 32512 never mentions a solution...

Last updated: | path: tech | permanent link to this entry

Tue 18th Mar, 2008

Standard Observations

Simon Rumble mentioned Joel Spolsky's post on web standards and it really is an excellent read. The fundamental point is that as a standard grows, testing any arbitrary device's compliance with it it grows harder. Given that, for rendering HTML, not only do we have a couple of 'official' standards: HTML 4, XHTML, etc., but we also have a number of 'defacto' standards - IE 5, IE 5.5, IE 6, IE 7, Firefox, Opera, etc. etc. etc ad nauseam. For a long time, Microsoft has banked on their desktop monopoly to lever their own defacto standards onto us, but I think they never intended it to be because of bugs in their own software. And now the chickens are coming home to roost, and they're stuck with either being bug-for-bug compatible with their own software (i.e. making it more expensive to produce) or breaking all those old web pages (i.e. making it much more unpopular).

I wonder if there was anyone in Microsoft Internet Explorer development team around the time they were producing 5.0 that was saying, "No, we can't ship this until it complies with the standard; that way we know we'll have less work to do in the future." If so, I feel doubly sorry for you: you've been proved right, but you're still stuck.

However, this is not a new problem to us software engineers. We've invented various test-based coding methodologies that ensure that the software probably obeys the standard, or at least can be proven to obey some standard (as opposed to being random). We've also seen the nifty XSLT macro that takes the OpenFormula specification and produces an OpenDocument Spreadsheet that tests the formula - I can't find any live links to it but I saved a copy and put it here. So it shouldn't actually be that hard to go through and implement, if not all, then a good portion of the HTML standard as rigorous tests and then use browser scripting to test its actual output. Tell me that someone isn't doing this already.

But the problem isn't really with making software obey the standard - although obviously Microsoft has had some problem with that in the past, and therefore I don't feel we can trust them in the future. The problem is that those pieces of broken software have formed a defacto standard that isn't mapped by a document. In fact, they form several inconsistent and conflicting standards. If you want another problem, it's that people writing web site code to detect browser type in the past have written something like:

if ($browser eq 'IE') {
    if ($version <= 5.0) {
    } elsif ($version <= 5.5) {
    } else {
When IE 7 came along and broke new stuff, they added:
    } elsif ($version <= 6.0) {
It doesn't take much of a genius to work out that you can't just assume that this current version is the last version of IE, or that new versions of IE aren't necessarily going to be bug-for-bug compatible with the last version. So really the people writing the websites are to blame.

Joel doesn't identify Microsoft's correct response in this situation. The reason for this is that we're all small coders reading Joel's blog and we just don't have the power of Microsoft. It should be relatively easy for them to write a program that goes out and checks web sites to see whether they render correctly in IE 8, and then they should work together with the web site owners whose web sites don't render correctly to fix this. Microsoft does a big publicity campaign about how it's cleaning up the web to make sure it's all standard compliant for its new standards-compliant browser, they call it a big win, everyone goes back to work without an extra headache. Instead, they're carrying on like it's not their fault that the problem exists in the first place.

Microsoft's talking big about how it's this nice friendly corporate citizen that plays nice these days - let's see it start fixing up some of its past mistakes.

Last updated: | path: tech / web | permanent link to this entry

Fri 14th Mar, 2008

Beat Counter Project

For a variety of reasons, I'm looking for a library that can not only determine the BPM of a song but count how many beats and bars are in it, excluding the introduction and finish of the song where there may be no actual music. (In other words it's not just a case of dividing the track length in real-number minutes by the BPM). Furthermore, one application has the complication of working with music that isn't four quarter-beats in a bar (i.e. 4/4 notation) - it might be 2/4, 3/4, 6/8 or 12/8. I do know this ahead of time - mostly - but automatic detection would be nice. The other application will require millisecond-precision locations of each beat, and must be able to compensate for tempo changes in the song.

So I've started a MicroPledge project for it and pledged $100US of my own money. The project must run on Linux, Mac OS-X and Windows and must also use an open source license, preferably the GPLv3. But I guess this is a bit of a bleg (thank you Mary Gardiner for introducing me to that term :-) if anyone knows of such a thing or some project that I can add code to.

Now to watch it fade into obscurity as I cast around for some way to write the thing myself...

Last updated: | path: tech | permanent link to this entry

Sun 2nd Mar, 2008

Floor Wax, Dessert Topping, Make-Up, Mould Release...

As a woodworker, I use Carnauba Wax mixed with lemon oil on my wood turning pieces to give them a nice shine that's also dust-proof and preserves the wood, preventing it from drying out and cracking. And as a student of popular culture, I've seen the reference to the Saturday Night Live sketch about Shimmer, the revolutionary product that's both a floor wax and a dessert topping.

So it amused and amazed even me to find out that Carnauba Wax is all this and more. It's the product of the Carnauba Palm, has a melting point way higher than most waxes, and is harder than concrete in pure form. It is used both in woodworking and in car polishes for its high-gloss, protective coating, but in that capacity (and because it's edible) it's also used as an ingredient in some cake icings and on the coatings of Tic-Tacs and othe candy to make them glossy. Likewise it's used in products such as lipsticks and blushes for the glossy, resilient coating. With a solvent in a can, it's sprayed into moulds for epoxy resin products such as semiconductors to make sure the product breaks free from the mould easily; because it's not soluble in water or alcohol it can be used in liquid epoxy casting too.

And to think that most people think that Shimmer doesn't exist...

Last updated: | path: tech | permanent link to this entry

Sat 2nd Feb, 2008

LCA 2008 Google Party Mix

The day finally came, and though I was a ball of sweaty clothing from giving my Lightning Talk I was ready to do some mixing for the LCA 2008 Google Party. Afterward, thanks to some pre-prepared scripts, I put the mix up on my torrent server pretty soon afterward. If you want it, you can download the mix via BitTorrent or read the track listing. All the music is Creative Commons licensed and therefore my mix is also similarly licensed; I'll work out the exact license code when I've looked at the licenses on all the music, but for now I will release the mix under a Creative Commons 3.0 By-NC-SA license.

Thank you to Peter Lieverdink and the LCA 2008 team for allowing me to mix at LCA - I had a great time doing it. And my collection hat (thank you Stewart Smith) raised $24.30 to donate to the artists. I reckon that's pretty good for something completely voluntary where most people hadn't been really getting into the music much (that I could see). Now to work out how to donate it...

Last updated: | path: tech / lca | permanent link to this entry

Thu 31st Jan, 2008

Network Interactionativity

For some reason, on certain access points at LCA - for instance the one in the St. Mary's common room - I need to set my MTU to 1000 (i.e. down from 1500) in order to get Thunderbird to do secure POP. Everything else works fine, but Thunderbird just sits there timing out. I discovered this by watching the Wireshark log and noticing packet fragments disappearing (i.e. some packets where the tcp fragment analysis couldn't find parts of the packet to reassemble). Hopefully this isn't also causing Steve Walsh to pick up his specially sharpened LAN cable and hunt me down...

Last updated: | path: tech / lca | permanent link to this entry

On to other things

After spending four hours or so working on my hackfest entry, I was less than optimistic. My entry had yet to even be compiled on the test machines, and it still had huge areas of code that were completely unimplemented. When I went into the common room at St Mary's, Nick from OzLabs recognised me and helpfully mentioned that someone else not only had their code completely running but was in the process of optimising it. I promptly resigned.

I say "helpfully" sincerely there. It is a bit of a pity that my ideas won't see the light of day this hackfest, and that I won't be in the running to win whatever prizes they might offer. But since I don't have a snowball's chance in a furnace of winning anyway that's hardly a real disappointment. And I can go to bed with a clear head and prepare for my lightning talk and the Irish Set Dancing and mixing I plan to do at the Google party, which realistically are much higher priorities.

I do hope that we get to see the winning solutions, though...

Last updated: | path: tech / lca | permanent link to this entry


I've decided to have a more serious look at entering the hackfest, since I'm familiar with processing fractals with parallel algorithms. Downsides are that I've only done it with PVM, I haven't done anything with the Cell architecture and there's all these other really cool talks to go to. That and I need to have my eyes stop glazing over when I start reading anything more detailed than the "Fire hydrant and hose reel" sign opposite me.

Last updated: | path: tech / lca | permanent link to this entry

Tue 29th Jan, 2008

All Systems Go

After a night of continued problems with hardware in Canberra, I decided to test my mixing setup. Having borrowed a nice Edirol UA-1A USB audio input/output from my friend Mark, I wanted to test this in combination with the guitar amp from Andrew. I'd also changed my VMWare system over to use Host-only networking and convinced Samba and IPTables to talk to the VMWare client over this. So, was it going to actually work? Best not to find out on Friday Night...

After a bit of odd-hackery, I got it going - pleasingly well. The sound skips slightly when context-switching from the VMWare client, which is nothing unusual - the standard performance practice is to boot afresh and only starting those things which you absolutely need anyway. So it's all systems go for Friday night...

Last updated: | path: tech | permanent link to this entry

Finding Sets Made Easy

I can't believe I only just thought of it. My Set Dancing Music Database has its sets and CDs referenced on the URL line by the internal database IDs. While this is unique and easy to link to, it looks pretty useless if you're sending the link to someone. I realised this when writing my post on my experiences at Naughton's Hotel I wanted to link to my page on the South Galway Reel Set and thought "how dull is that?"

Suddenly I realised that I should do what wikis and most other good content management systems have done for ages - made URLs which reference things by name rather than number and let the software work it out in the background. Take the name for the set, flatten it into lower case and replace spaces with underscores; it would also be easily reversible. CDs might be a bit more challenging but there are only one or two that have a repeated name, and I'd have to handle such conflicts anyway at some point.

That combined with my planned rewrite of the site to use some sane HTML templating language - my current choice is ClearSilver - so that it's not all ugly HTML-in-the-code has given me another project for a good week or so of coding. Pity I'm at LCA and have to absorb all those other great ideas...

Last updated: | path: tech / web | permanent link to this entry

Fri 18th Jan, 2008

Microsoft OOXML compliance

I don't know if anyone else has asked this about Microsoft and its proposed OOXML standard, but what guarantee do we have that Microsoft's own software implements it correctly? How do you know? What test suites do they have to prove that they comply with the 'standard'? Given that what I've seen of their standard includes the ability to have arbitrary binary blobs of data which seem to allow the abilty to include proprietary formatting outside the standard, and that these and many other options (such as DoLineBreaksLineWord95 and so forth) are deliberately left unimplementable, how do we know that Microsoft's own software is using as much of the standard that can be implemented by other vendors? For all we know, they could put the critical Office 2007 formatting in binary blobs and any other vendor implementing the standard would look like they'd done it incorrectly, where it would simply be a case of not being given all the information.

Microsoft have already said that they plan to not follow the standard in the future, allegedly so that they can continue to innovate. So why would they even bother to implement the standard now? This to me is as compelling a reason for voting 'no' to OOXML as any other reason, because it doesn't matter how good that standard is, if Microsoft choose not to follow it it won't be worth the paper it's written on.

Last updated: | path: tech | permanent link to this entry

Sat 12th Jan, 2008

Preparing for LCA early

I have a hope of performing a mix for LCA, as I did in 2005 and 2006 and post-2007. While I wait for the team to get back to me to work out if that will be possible or not, I've been making some (non-CC and possibly non-PC) mixes. Check them out: one Trance and one Goa/Psy, in OGG and MP3 format.

Some day I will work out the .inf format enough to be able to supply a simple file that can turn a mix file into a CD with tracks and everything, but since we're now in the age of 'portable music players' I think this may be a retrograde step. Alternatively, if someone worked out some format for annotating an OGG (or MP3) file to put points that you could skip to inside the file with the appropriate player then that would be great, but this sounds like a chicken-and-egg problem. Annodex, maybe?

Last updated: | path: tech | permanent link to this entry

Fri 7th Dec, 2007

The werewolf biting

Last weekend I bought some more RAM and a new, larger hard disk for my laptop. I was worried about the RAM, seeing the dramas that Chris Smart had with buying RAM for his laptop recently, but it went in smoothly. It may have been a bit more to buy Kingston RAM but I felt a lot more confident that it would work, and I really hate having to grovel around with different settings and playing with different configurations to try and determine whether a new hardware upgrade has worked.

The hard disk was likewise pretty easy to upgrade - SATA laptop drives don't have to have a special easy-to-connect dongle wasting space internally. Also, one of those naked drive to USB adapters is an essential part of any techie's kit. I set up a new boot partition and LVM system on the drive, copied my partitions over to it, and then booted up the Werewolf install CD (I find it faster to install off an NFS share, and better on the environment to reuse a CD-RW). Yes, that's right, I was installing Fedora 8 as well.

That last part has caused me the most grief of all. Werewolf is a very nice piece of work overall - it detected the full resolution of the screen without having to fiddle the BIOS with 915resolution, it had the Intel 3945 WiFi working with zero hassle (a big change from FC6) and its font choice, though very large, is quite nice. I have, however, had major problems with it in getting Compiz working and getting it to suspend to RAM without subsequently crashing. Both, more annoyingly, seem to have come good of their own accord, although when resuming I seem to have to press Ctrl-Alt-F6 and then Ctrl-Alt-F7 to get the screen to refresh. Also annoyingly, because of having to do a fresh install (because it couldn't resolve dependcies for the upgrade) it's clobbered my GNOME keyring.

Still, compared with Fista it's an absolute gem.

Last updated: | path: tech | permanent link to this entry

Wed 28th Nov, 2007

Gnumerical Inaccuracies

I've got three minor irritations with Gnumeric:

  1. I originally thought that it was impossible to a point in a X-Y chart series have a transparent fill. Then I found out that I can use the 'custom colour' chooser to turn the opacity down. However, I cannot save this back to the colour selector, despite there being a row of colour buttons at the bottom that are specifically for custom colours. Since having a non-transparent fill for the 'cross', 'plus' and 'star' symbols is completely pointless, this is a needless extra hassle to go through for every single series I wish to use in this manner. Bug reported as GNOME bug 500113.
  2. Saving the chart as an SVG is great, but it doesn't remember where you put the last file. I'm saving to a directory five levels in from the root directory, and not underneath my home directory (the default 'save to' location) and this is incredibly irritating to either retype the path, select it from the directories, or copy and paste it from a previous foray. Bug reported as GNOME bug 500116.
  3. For X-Y charts, at least where the chart is a point-only chart, if you have three columns selected, then Gnumeric assumes that the third column is another X axis, not a second series to plot against the first column (the original X axis). It should assume that the first column is your X series data and the other columns are separate Y series data to plot against it. In fact, it almost makes no sense whatsoever the way it currently does it, because graphing two series against different X axes is almost unheard of (it's far more usual to graph the same X series data against two Y axes). Although I applaud them allowing you to easily edit both the X and Y axis data sources and the title's data source, and make the whole structure of the chart a logical tree structure that's easy to navigate, it would be much easier if you there was an extra control that you could use to simply choose one of the original input series - like making each a drop-down box. Keeping the current control to allow you to select a new range is important, however. Bug reported as GNOME bug 500117.
On the plus side, Gnumeric's chart editing and SVG export are really great. It's a worthy little spreadsheet in its own right. I especially like its ability to use tab-completion and dropdown-prefill when selecting a place to save (although this may be a feature of GNOME save dialogs in general that I have not hitherto discovered). If only their choices for the default series colours didn't suck.

Last updated: | path: tech | permanent link to this entry

Mon 26th Nov, 2007

Saves some typing?

I had an idea on Friday for a utility that fills a little niche that I hit regularly. The particular example was wanting to save the iptables configuration after a couple of updates. This is put (on Red Hat standard boxes) in /etc/sysconfig/iptables, and I keep copies named /etc/sysconfig/iptables.yyyymmdd (where yyyymmdd is the current year, month and day) in case a change breaks something and I need to go back to a previous version. Other people use revision control systems like Mercurial for this, with its ability to watch edits to a file that isn't in a pre-set directory. I may be old fashioned here but this method does me fine. Normally, in order to roll the configuration over to the new version you would do:

mv /etc/sysconfig/iptables /etc/sysconfig/iptables.yyyymmdd
iptables_save > /etc/sysconfig.iptables

But what if you'd already done one edit today? Then you'd use a name like /etc/sysconfig/, where inc is an increment number or something. And you want that number to increment up until it finds a 'free' number. The usual convention for log files is to roll each file down, so /etc/sysconfig/iptables.yyyymmdd becomes /etc/sysconfig/iptables.yyyymmdd.1, /etc/sysconfig/iptables.yyyymmdd.1 becomes /etc/sysconfig/iptables.yyyymmdd.2 and so forth; I usually end up putting the latest revision at the end of the sequence rather than the earliest.

Now, of course, it would be relatively simple to do that renaming automatically given the base file name. Cafuego coded up a Bash one-liner in half an hour or so, and Debian already has the savelog utility to do just this (a fact I found out much later, not running Debian). However, that only really does half the job. We still end up with:

savelog /etc/sysconfig/iptables
iptables_save > /etc/sysconfig.iptables

That's one repetition of that annoying path too many, with its hostile tab-unfriendly sysconfig directory, for my taste. I realised that what I wanted was something like:

iptables_save | roll /etc/sysconfig.iptables

that would both roll the log file over and then 'redirect' standard input to the target file. Again, a relatively short piece of work in Perl or bash. But do you really want to have to call up all that framework just to roll one file over? I resolved to learn a bit more and do it in C. Not only that, but I'd forswear my usual use of the talloc library and do it as raw as possible.

It took a day, but by the end of it I had coded up the first working version of the code. I presented it to the gallery on the #linux-aus IRC channel on Freenode and Cafuego pointed out that I'd only implemented the all-move-down method, not the move-to-last method. A bit more work that night added that. A bit more work with valgrind found the couple of memory leaks and odd overwrites. More work today put the command-line options processing in place, and corrected the move-to-last method to not only work, but in the process be more efficient.

So, now I release to the wider Linux and Unix community the roll command. You can find the source code at and check it out via Subversion through svn:// Comments, criticisms and suggestions as always are welcomed via Of course, the irony is that I could have written that mv /etc/sysconfig/iptables /etc/sysconfig/iptables.20071123 command by now...

Last updated: | path: tech / c | permanent link to this entry

Tue 20th Nov, 2007

Wiki Documentulation

In the process of writing up the new manual for LMMS, I've been asked by the lead developer to be able to render the entire manual as one large document. This he will feed into a custom C++ program written to take MediaWiki markup and turn it into Tex markup, for on-processing into a PDF. Presumably he sees a big market for a big chunk of printed document as opposed to distributing the HTML of the manual in some appropriately browsable format, and doesn't mind reinventing the wheel - his C++ program implements a good deal of Perl's string processing capabilities in order to step through the lines byte-by-byte and do something very similar to regular expressions. Although I might be mistaken in this opinion - I don't read C++ very well.

I had originally considered writing a Perl LWP [1] program that performed a request to edit the page, with my credentials, but I figured that was a ghastly kludge and would cause some sort of modern day wiki-equivalent of upsetting the bonk/oif ratio (even though MediaWiki obviously doesn't try to track who's editing what document when). But then I discovered MediaWiki's Special:Export page and realised I could hack it together with this.

The question, however, really comes down to: how does one go about taking a manual written in something like MediaWiki and producing some more static, less infrastructure-dependent, page or set of pages that contains the documentation while still preserving its links and cross-referencing? What tools are there for converting Wiki manuals into other formats? I know that toby has written the one I mentioned above; the author of this ghastly piece of giving-Perl-a-bad-name obviously thought it was useful enough to have another in the same vein. CPAN even has a library specifically for wikitext conversion.

This requires more research.

[1] - There's something very odd about using a PHP script on to get the manual of a Perl module. But it's the first one I found. And it's better than, which requires you to know the author name in order to list the documentation of the module. I want something with a URL like

Last updated: | path: tech / web | permanent link to this entry

Mon 19th Nov, 2007

What if we had no walls at all?

Incidentally, it struck me that my use of CGI::Ajax referred to in my previous post is an example of where Perl's loose object-orientation works with the programmer, not against it. I'm sure some anal-retentive, secretive paranoid that writes in Java or C++ would have made most of those methods private, and thus would have forced a major re-engineering of my code to fit in with their own personal way of doing things. Perl naturally tends toward letting you see all the methods, not just those publically declared, but to me this is a general argument for letting people have a bit more control of your object than you think is good for it.

Last updated: | path: tech / perl | permanent link to this entry

Fri 9th Nov, 2007

Perl, Ajax and the learning experience - part 001

AJAX as a thing I use regularly on web pages is still an unknown territory to me, a person who's still not entirely au fait with CSS and who still uses Perl's CGI module to write scripts from scratch. I understand the whole technology behind AJAX - call a server-side function and do something with the result when it comes back later - but I lacked a toolkit that could make it relatively easy for me to use. Then I discovered CGI::Ajax and a light begun to dawn.

Of course, there were still obstacles. CGI::Ajax's natural way of doing things is for you to feed all your HTML in and have it check for the javascript call and handle it, or mangle the script headers to include the javascript, and spit out the result by itself. All of my scripts are written so that the HTML is output progressively by print statements. This may be primitive to some and alien to others, but I'm not going to start rewriting all my scripts to pass gigantic strings of HTML around. So I started probing.

Internally this build_html function basically does:

if ($cgi->param('fname')) {
    print $ajax->handle_request;
} else {
    # Add the <script> tags into your HTML here
For me this equates to:

if ($cgi->param('fname')) {
    print $ajax->handle_request;
} else {
    print $cgi->header,
        $cgi->start_html( -script => $ajax->show_javascript ),
        # Output your HTML here
I had to make one change to the CGI::Ajax module, which I duly made up as a patch and sent upstream: both CGI's start_html -script handler and CGI::Ajax's show_javascript method put your javascript in a <script> tag and then a CDATA tag to protect it against being read as XML. I added an option to the show_javascript method so that you say:

        $cgi->start_html( -script => $ajax->show_javascript({'no-script-tags' => 1}) ),
and it doesn't output a second set of tags for you.

So, a few little tricks to using this module if you're not going to do things exactly the way it expects. But it can be done, and that will probably mean, for the most of us, that we don't have to extensively rewrite our scripts in order to get started into AJAX. And I can see the limitations of the CGI::Ajax module already, chief amongst them that it generates all the Javascript on the fly and puts it into every page, thus not allowing browsers to cache a javascript file. I'm going to have a further poke around and see if I can write a method for CGI::Ajax that allows you to place all the standard 'behind-the-scenes' Javascript it writes into a common file, thus cutting down on the page size and generate/transmit time. This really should only have to be done once per time you install or upgrade the CGI::Ajax module.

Now to find something actually useful to do with Ajax. The main trap to avoid, IMO, is to cause the page's URL to not display what you expect after the Javascript has been at work. For instance, if your AJAX is updating product details, then you want the URL to follow the product's page. It should always be possible to bookmark a page and come back to that exact page - if nothing else it makes it easier for people to find your pages in search engines.

Last updated: | path: tech / web | permanent link to this entry

Tue 18th Sep, 2007

Software Freedom Day 2007 in Canberra

On Saturday September 15th members of the Canberra Linux Users Group ran a stall at the Computer Markets at the Old Bus Depot in Kingston. We had two tables generously paid for by Ingenious Software, conveniently on the left near the entrance and the food (so that more people would see us, of course). We set up several laptops and a big wide screen (courtesy of Jason) on my gaming machine to show off the different capabilities of Fedora, Ubuntu and Open Source Software in general.

Mike Carden and Chris Smart were a force to be reckoned with, giving away almost all of our Ubuntu CDs and then our Open CDs by standing at the front gate giving them to people in line to get in. I discovered some CDs that I had put in my bag and forgotten at about 1:30PM, which gave us an extra stock, but while it was nice to have CDs available to hand to people who came to the actual stand, I agree with their fundamental aim to get those CDs to as many people as possible. Mind you, I still have over a hundred Open CDs left, so I'll be giving out more at work in the upcoming week.

We also had Daniel, Jason, Rainer, Matt, and particularly Fred and Ian on the stand, talking to people about the benefits of FOSS and Linux and answering their myriad questions on compatibility and usability. Fred's demonstration of DVB-T tuners in Kaffeine impressed many people. Ian had only recently posted on the CLUG list and had started using Linux with no background in IT at all. He'd patiently worked away, learning by asking questions and Googling, and was a great example to all of us and a great person to talk to the less technical people out there who might have been afraid of taking the plunge. Hopefully I haven't forgotten anyone else who helped on the stand.

I should also like to thank the many people from the Canberra Linux Users Group who donated $360 to purchase over 1000 CDs, and the many (overlapping set of) people who also helped burn Ubuntu and Open CD images onto them. That huge effort made the stand really worth something - many people had the CD and came over to the stand to find out more about what they'd just got :-). Thank you to everyone who made it possible.

Next year, I'd do three things differently. Firstly, I intend to get funding from Linux Australia to print large quantities of the CDs, for all SFD groups in Australia. That will significantly save on shipping, relieve the burden on us to both buy and then burn and print them (although getting donations from groups and individuals would be good), and give a more professional quality to the CDs we give to people. Secondly, I intend to get a bit more organised with the equipment on the day - it was good but I'd have liked to have a few more machines, even one or two which people could play with themselves. Thirdly, I'd like to have an Install Fest / Bar Camp day a week or two afterward so that people had a place to bring machines to to get them installed, could learn a bit more about the Free Open Source Software world, and give the people who do know about Linux but haven't come along to a CLUG meeting before a chance to meet us and get into the Bar Camp atmosphere.

So thank you everyone to helped on the day and in the weeks beforehand. I think the response we got was very very positive - most people were interested, many people already knew about Linux and free software in general, and I think most people that looked at the stand were impressed by the quality of what FOSS has to offer. Go Software Freedom Day! (But preferably, next year with a T-shirt colour that looks good? Please?)

Last updated: | path: tech | permanent link to this entry

Fri 14th Sep, 2007

And They Pay These People?

I've spent several days printing The Open CD SFD 2007 labels on CDs for Software Freedom Day. Since it's all black and white, I selected "high quality greyscale" from the print options dialogue with the view to saving ink. Having a printer that can print directly on CDs has been a boon, and the Epson Stylus Photo R210 now seems to have full support in the gutenprint driver for all its more esoteric options. Three hundred and fifty CDs went past with only a few minor hitches. Then, at three in the afternoon on Friday the printer decides to stop right in the middle of a label and complain that it just cannot print any more until it gets some more.

I was a little surprised - even though the ink cartridges are tiny they'd usually lasted many prints more than that of full A4. But worse news arrived when I asked the printer which cartridges were out of ink. Yes, you guessed it - it was the cyan, light cyan and magenta cartridges. I've been only printing in greyscale, and I've run out of colour cartridges? It hadn't been a brilliant day, I still had any number of CDs to print, and here was the printer being obstreperous. Something in me finally snapped: I called Epson Tech Support to complain.

I will at least credit them with having a system where you can request a call back rather than wait on the phone, and even better having this system actually work and having someone call you back. This was the only bright point, because the conversation took the turn I had been expecting yet fearing: their tech support person told me, in the tones used to educate a particularly thick Lousianian, that the black ink could only produce black and in order to make grey they had to mix the colours. In hindsight he didn't even try to explain how this worked, but that may have been simply that I knew the explanation. It did not, however, satisfy me.

"What about the variable dot size your printers have?" I queried. "That only produces black," he responded automatically. "What about dithering?" I pursued lamely. "That would still only produce black," he repeated. Obviously no-one was going to get this guy to admit that their printers might be doing something stupid, so I said "Thank you" and hung up. Reasonable explanation I can cope with, but blatantly stupid and illogical statements that are repeated as dogma without answering my questions is just too much to take. If time allows, I will write a letter of complaint to Epson about this treatment. I can accept "sorry, the printer still just automatically uses coloured ink to produce a fine greyscale despite what you set the printer options to", but I can't accept "black only produces black" repeated endlessly.

What I ultimately want in a printer is three things: an attempt to use the absolute maximum of ink from their cartridges, the print head on the cartridge so that they get replaced regularly, and cartridges that are worth what they actually cost to make. I don't mind buying a printer for $500 instead of $150 if I know that the cartridges are going to be cheaper and that I'm going to get a lot of distance out of them. I think Choice would come out in favour of this kind of printer too. And with picolitre dots at 5760dpi resolution these days it can't be hard to produce greyscale that is truly only using black - I've even seen printers with 'grey ink' to produce better light grey scales.

Last updated: | path: tech | permanent link to this entry

Wed 12th Sep, 2007

Tiny wireless footsteps

I bought a tiny Targus bluetooth mouse yesterday in preparation for our trip overseas - I can stand using the touch pad for a good long time, but having a mouse that was also small and wire-free sounded like a good thing. Being cautious, however, when I bought it I asked if I could return it if it didn't work with my computer. The lady in the shop told me that, in that case, I should bring the computer in and she would get it to work. Trying to explain what a Free and Open Source Operating System is is hard enough without people wondering how you run a Windows executable on it, so that kind of pronouncement ususally makes me want to get it working on my own.

Naturally, the manual had nothing on Linux at all - amusingly, it does have instructions for the three different driver sets that exist under Windows to talk to Bluetooth devices, all different of course. So much for standardisation, eh? I could use hcitool scan to see it but this wasn't actually connecting me to the mouse at all. Googling for the terms "fedora core" "bluetooth mouse", however, was fruitful: it took me to, which has the following cogent instructions:

To get the Logitech V270 Bluetooth mouse to work was simple.

As root in a terminal window type:
hidd --server --search
then push the reset button on the mouse and it will find it and pair.

You can check by doing (as user):
$ hidd --show
xx:xx:xx:xx:xx:xx Bluetooth HID Boot Protocol Device [046d:b002] connected 

Then you only need: 
chkconfig --level 35 hidd on

From now on the mouse just pairs up after the system is booted by moving the
mouse around until it responds.
And it did! Not only that, but it recognised every one of the mouse's buttons and knobs - it comes with a little joystick that acts as a two-dimensional scroll wheel, and when you press down on it it turns into a set of media keys (next track, previous track, volume up and down, pause, stop, and even fast-forward and rewind). All of this worked perfectly - no special programs to run, my other (USB optical) mouse didn't stop working, and I didn't even have to play around with my existing media key settings in the GNOME keyboard shortcuts. OK, I did have to do a bit of looking around to find the exact command I needed - it didn't actually do that step automatically for me - but still to me this compares favourably with having to use one of three different driver programs (which one?) and still hoping that it worked correctly. I've tried using Bluetooth drivers on Windows in the past with not a lot of success - even the Bluetooth hardware in my laptop (which Linux can see and use just fine) is disabled when I enter Windows and I cannot determine why or turn it back on.

Yet another proof that Linux is as current with hardware as any Microsoft offering.

Last updated: | path: tech | permanent link to this entry

Thu 6th Sep, 2007

The Soft Interface

This is adapted from a conversation between myself and a person I know in the USA on software, interfaces and Linux. Since I'd been having this argument with Hugh recently, I found the confluence of the two arguments enough to blog about.

In his last two paragraphs, he starts by observing "...I see no problem with striving to make the computer something that a person could use with the same amount of acumen they bring to using an oven or driving a car." He observes that this is a universal desire in the software industry, and finishes by saying that we shouldn't "have to understand the handshake protocols that underlie their cellular phone's tech to make a call." He then observes that he would be happy to use any other OS, "but [...] I'm not prepared to restock my shelves with new software that I don't know how to use or that promises functionality it cannot deliver."

I think it interesting that you talk in one paragraph about making it easy to use computers, and in another about not knowing how to use new software. What I think you demonstrate here is that fundamental problem: there is no such thing as the 'intuitive' interface. There are plenty of people who can't operate anything more than the basic functions of their oven, car, mobile phone or TV set; likewise, there is little or no standardisation of functionality in the interfaces of most of those things. The car may have the most standardised basic interface of all the devices we use regularly, but beyond steering and making it go and stop the interface possibilities are endless. How do you start it? How do you wash the windscreen? Turn off the radio? The lads at Top Gear often demonstrate these badly-designed interfaces brilliantly - one of my favourites was when they got their mothers to review some small cars, and one of the events was to open the windows, set the airconditioning on, and tune the radio to channel 4: all things that you'd want to do but often turned out to be maddeningly difficult for these ordinary people to master.

The real truth of it is that there is no standard interface - nor is one a good idea, for there's no standard human being or human brain. The only real constant thing is change, and because of this we have to teach people to adapt and change. Here, I argue that design standards such as Apple's HUI guidelines actually do as much harm as good - as soon as you take a Mac user away from their beloved interface and plonk them in front of the most popular operating system on the market today (Windows XP) they're suddenly out of their depth. My dad encountered a web page that had a link that he needed to use in grey text, and since on his Mac grey text in a menu environment has always indicated an option that was not available, he never thought of trying to click on it. This is not adaptability: this is an in-bred, forced monoculture that removes the person's ability to change and learn.

The problem is closer than you think. You can't buy a copy of Flash 5 any more, and in the upgrading and changing Macromedia and Adobe have removed functions that people use; the Chapman brothers who produce use Flash 5 simply because it does some things better than the current incarnations of Flash. So Adobe has removed functionality that worked for people. Pick up a copy of Photoshop CS and it doesn't do things in the same way that Photoshop 3 used to. You'd think that Microsoft would have kept Word the same, but no - they've changed the way things are pasted in Word 2007 that make it substantially different to the previous versions; and that's something that people use almost every single day! So complaining about "software that [you] don't know how to use" is hardly a reason to stick with your proprietary software. Realistically, you're better to learn what you can do with the various software packages available, and try to adapt and change.

Sure, some standardisation is necessary. Just don't fool yourself into thinking that what you're used to is the One True Way.

Last updated: | path: tech | permanent link to this entry

Mon 27th Aug, 2007

The truly bizarre world of spam, part 0001

Normally I can get just under 100 spam messages per day, a record I'd be proud to not have achieved. This is really because, for years, I've been using SpamCop to deal with spam, and thus have no fear in leaving my email address in the clear all over the place. Even the emails that get past it and get into my home email can be dumped into their checkers, and I like to believe it's an active step in trying to stop (or at least slow down) the spammers.

Every once in a while I get a spam that is truly bizarre, however. I'm not just talking about spam with no text, or spam advertising web sites that don't (yet) exist; something like this:

                                       ANTI-CRIME CONTROL
                                  UNITPENTAGON-WASHINGTON DC.
                                  Email :

Att: Sir/ Madam,
Be informed that the America ( FBI) is Monitoring all your movement,
so stop  communicate with any other person again for your payment.
This is to bring to your notice that the FBI eagle eyes discovered
earlier today, that those scammers  are reaching you again,They are
under  disguise telling you that,you are going to received your funds,
meanwhile, asking you to pay for charges, those scam message that has
been sent to you has been noted here in our office  ok.
The FBI have earlier warned you, and still warning you again to be very
careful. Becouse every transaction from Nigeria for now and concentrate
 on the one you have with the FBI.
Also be informed that the FBI is monitoring every bit of your movement.
Any attempt by you to send any money to Nigeria will be tracked down
by the FBI.


The FBI is here to protect you but you are going behind us to
perpetuate yourself in unclean affairs with Nigerians.
We are going to assit you legally to claim your funds from those
scammers, without you spending any scent ok, hope this is very clear to

Note: the America FBI will direct you on how to claim your funds with
out going through the bad eggs again.
Our international presence currently consists of more than 50 small
Legal Attaché offices (Legats) in U.S. embassies and consulates around
the world.

Their goals? Simple:

1. To stop foreign crime as far from American  0ff _ shores as
2. To help solve international crimes that do occur as quickly as
3. Make sure all America citizen who are scam vintins cover all there
Funds back without any fee again, or fell into any more problems.
More directive will be sent to you as soon as we hear from you again.

Ms. Mary Howards.
Cheif Secretary.
Email :
There are so many bits of bizarre weirdness in there that I almost don't know where to start. An FBI chief secretary sending from a address? "BEHIND THE BAR"? "vintins"? Why would anyone even bother to reply to them? It's like the spam equivalent of yak shaving - something so far removed from spam that it's almost ceased to be meaningful and yet seems, somehow, to be related to the original purpose of separating clueless people from money. I didn't even bother to file this one with SpamCop, since they really can only take action on websites advertised by spam and this just doesn't even get close.


Last updated: | path: tech | permanent link to this entry

Mon 13th Aug, 2007

The "solution giver" problem

Andrew Bennetts recently linked to Nicholas describing the kinds of a problems that programmers can cause by not thinking things through. I went to add a comment to this, but since LiveJournal decided that that was too difficult I put it here instead. It also references something I was talking about with the people at PSIG last Thursday night, for a bit of extra relevance.

To me, programming today is like being an engineer in the early days of the industrial revolution, where new devices were created for new tasks seemingly every day - exciting and filled with endless possibilities. The problem Nicholas describes is that we usually don't have much practice, and a lot of programmers have never done any formal training, in order to execute our plans well. There's one other big problem that I see in the industry, as well.

Most of us enjoy solving problems. If someone asks us for a new shower, we do it ourselves for the challenge rather than refer them to an actual builder. But, especially in employment, this has a downside. If a manager goes to his programming team and says "We'd like to call up dread Cthulhu and all his ghastly horrors from the nether pit of hell", the programming team will say "well, I suppose we could try randomly reading pages from the Necronomicon." No-one will say, "no, calling up dread Cthulhu and all his ghastly horrors from the nether pit of hell is a bad idea and you should not do it," nor will they say, "how about you go to Arkham Asylum and ask them to do it, because we aren't touching this one with a mystic pentacle." In our eagerness to solve problems, we not only implement stupid things that should not be, but we make our own lives a living hell. Sometimes, we have to resist the temptation to solve the problem as presented and instead answer the questions, "should we be doing this?" and "is there a better way to achieve the end result?"

This also manifests itself in choosing how we solve the problem. Usually there are a variety of ways to implement a new feature, or fix a bug; in our haste to explore all these options we often point out that there's a quick and dirty kludge that will probably fail but might just work. Management (at least, the traditional management that most techies face) will always choose that option - partly because they don't actually have to live with the consequences, and partly because they often don't realise that support time is always orders of magnitude larger than development time. They see it as simply playing the odds: if we luck in, the business gets the problem solved cheaply. We need to be strong when presenting solutions and do things the right way, rather than making rods for our own backs.

There's heaps of other problems: the missing example problem, the testing problem, cargo cult programming, the 'learned blindness' problem, the "yak shaving" problem (with a tip of the hat to Mikal Still) and more. But I've nattered on enough already. :-) Maybe I should put this into a "Programming 201 - The Things Your Professors or Hacker Buddies Know But Never Told You About How To Program Right" talk for a future LCA. It needs a catchy title, though. How about "Argh, The Giant Clams, Get Them Off Me!" or "Press The Button And Watch Paul's Brain Really Explode!"

Last updated: | path: tech | permanent link to this entry

Fri 3rd Aug, 2007

Brute force, cunning, and spare DVDs

I'm in the middle of burning a heap of CDs and DVDs for Software Freedom Day, and am rapidly getting through a supply of sticky CD labels (thanks to gLabels and cdrecord). In the middle of all that I realised that the DVDs I was writing to had a nice, matte white side for printing onto, and my old Epson Stylus Photo R210 had a CD printing tray.

*pondering ensues - but not for long*

Of course, the handily online Epson manuals say that you need specialised software that only exists for the favourite proprietary desktop operating systems. Unperturbed, I pressed on. Inserting a coaster into the tray and aligning it properly, I attempted to print onto the CD using gLabels. Easier said than done: minutes later in #linux-aus I remarked:

If there's one thing I hate about Linux right now it is the complete and utter lack of anything approaching standardisation of print dialogs.

Because, of course, gLabels' print dialog had but one printer option: when to print (because, when printing labels, one need only ensure that ones labels are only printed during daylight hours). Inkscape's print dialog is even worse, basically making you choose the printer via a unix command. OpenOffice had the right tools, but after one inadvertent trial on a real DVD I was faced with a dilemma: OpenOffice Writer does not allow me to (easily) overlay text on the label graphic (which in this case is handy, since the default Fedora 7 cd artwork does not include architecture information), and OpenOffice Draw completely ignores such trivialities as page size and paper format when printing. The former is preferable in this instance as architecture information can be added with a pen later.

So. Don't believe that it can't be done, but don't leap to conclusions that you know what you're doing either...

Last updated: | path: tech | permanent link to this entry

Wed 11th Jul, 2007

Accessing the Deep Web

IP Australia has an interesting post about the "Deep Web" - those documents which are available on the internet but only by typing in a search query on the relevant website.

On reading their article I get the impression that they think that this is both a hitherto-unknown phenomenon and one which is still baffling web developers. This puzzles me, as even a relative neophyte such as myself knows how to make these documents available to search engines: indexes. All you need is a linked-to page somewhere which then lists all of the documents available. This page doesn't have to be as obvious as my Set Dance Music Database index - it can be tucked away in a 'site map' page somewhere so that it doesn't confuse too many people into thinking that that's the correct way to get access to their documents. However, don't try to hide it so that only search engines can see it, or you'll fall afoul of the regular 'link-farming' detection and elimination mechanisms most modern search engines employ.

Of course, being a traditionalist (as you can see from both the content and design of the Set Dance Music Database) I tend to think that lists are still useful, at least if kept small. And I do need to put in some mechanisms for searching on the SDMDB, as well as a few other drill-down methods. So giving your people just a search form alone may not be catering to all the methods people employ when finding content. Wikis have realised this years ago - people like interlinking. And given that these 'deep web' documents are still accessible via a simple URL, if you really need to you can assist the search engines by creating your own index page to their documents by basically scripting up a search on their website that then puts the links into your index, avoiding listing duplicates.

So the real question is: why are the owners of these web sites not doing this? We may just need to suggest it to them if they haven't thought of it themselves. The benefits of having their documents listed on Google are many - what downsides are there? I'm sure the various criticisms of such indexing are mainly due to organisational bias and narrow-mindedness, and can either be solved or routed around.

There are two variants of this that annoy me. One is the various websites where the only way to get to what you want is by clicking - no direct link is ever provided and your entire navigation is all done through javascript, flash or unspeakable black magic. These people are making it purposefully hard for you to get straight to what you want, either because they want to show you a bunch of advertising on the way or because they want to know exactly what you're up to on their site for some insidious purpose. There is already one Irish music CD store online that I've basically had to completely ignore (except for cross-checking with material on other sites) because there is no way for me to refer people directly to a CD. I refuse outright to give instructions such as "go to and type in the words 'Tulla Ceili Band' in the search box", because that's not good navigation.

The other type of annoyance I find ties in with this: it is the practice of making a hidden index, or a privileged level of access, available to search engines that normal people don't see. I've seen a few computing and engineering websites do this, and Experts Exchange is particularly annoying for it: you can google your query and see an excerpt from the page with the question but when you go there you find out that access to the answers requires membership and/or payment. This, as far as I'm concerned, is just a blatant money-grabbing exercise and should be anathema. Either your results are free to access, or they're not - search engines should not be privileged in that respect.

Last updated: | path: tech / web | permanent link to this entry

Mon 18th Jun, 2007

Scraping WiFi off the walls

Apropos of Matt Palmer's observations on availability of WiFi in airports, it would be remiss of me to not mention that you can get free internet access through the Qantas Club just by sitting relatively nearby one of their lounges. You don't have to sit in it, of course, because they haven't come up with a way to keep it from going through walls. This means that most of the Qantas section of Canberra airport, a couple of cafes and some conveniently placed seats in Brisbane airport, and anywhere near gate 4 in Sydney terminal 3 will be good enough.

Having taken my laptop on a trip up to Brisbane in order that I visit my family has also spurred my interest in learning how to, er, how shall we say it... gain access to WEP-secured networks. Not particularly because I desperately need internet connections, or because I want to play with their computers and download bomb-making manifestos on other people's networks, but, you know, because they're there...

P.S. I also discovered that I do not enjoy removing three separate botnet instances on my mother's Windows XP machine, but I can do it rather easily. Most of them are fairly easy to spot with HijackThis, and they all seemed to essentially consist of a version of mIRC with a bunch of separate scripts set up to run off various commands. In the Task Manager you need to look for those processes with system-like names running as the user, rather than as SYSTEM.

Last updated: | path: tech | permanent link to this entry

Mon 11th Jun, 2007

Man triumphs over machine

MixMeister, among its myriad of great features, can burn a mix to a CD. However, for a while now MixMeister has been refusing to talk to the CD writer in my laptop, reporting only "error -2110" when it finishes 'rendering' the mix and tries to start writing. I had let this frustrate me enough; it was time to try and give Nicole the same mix I gave Kira.

First brainwave was to try and find whatever MixMeister had 'rendered' and see what format that was. Lo and behold, in c:\temp there were files from '0.rb' to '11.rb' for a twelve-track mix, and an 'index.tlf' file in text that listed all of these, with a curious 'DA -P:0' postscript on all but the first. A quick Google on that incantation revealed that MixMeister apparently use Gear Software's CD writing library, since that text is listed in their Linux software manual. This told me that they were probably either ready to write or needed only the smallest amount of mabulation to write onto disk. I copied the files to my network SaMBa server and verified in hexedit that it had all the appearances of sixteen-bit signed integer dual-channel audio, my guess being in MSB order. When you see FFFE in a segment of near silence, you can guess that that means '-2'.

A bit of further reading of the cdrecord manual revealed that it can take .wav and .au files and ignore their header section, but anything else is assumed to be plain audio of exactly the type I was looking at. I tried this, and it burnt all the tracks and produced a CD of audio. A CD of white noise, actually - another coaster to add to the collection. White noise in this instance almost certainly means that you have got the endianness of the bytes wrong; fortunately there is a -swab option to cdrecord to fix this. So, after the mantrum:

cdrecord dev=/dev/cdwriter pregap=0 -v -eject -swab -dao -audio {0,1,2,3,4,5,6,7,8,9,10,11}.rb

I arrived at a perfectly working mix CD with no gaps between tracks, with only minimal help from MixMeister and with no thrashing around in Windows trying to wrestle with its unhelpful error messages and general lack of anything approaching information. As a friend commented later, a triumph of man over machine.

Last updated: | path: tech | permanent link to this entry

Sun 10th Jun, 2007

Maybe Swordfish had a point...

For a variety of otiose reasons, I have found myself on the phone for long periods of time. I can play card and other visual games, and solve a Rubik's Cube, but this cuts no code and designs no websites. But concentrating on words and sentences on a page or screen override the voice in my ear and I miss what the other person has been saying. I've been trying to think of a couple of graphical things I can be doing, beyond a limited amount of photo editing or designing icons for my website, that can be done purely graphically, allowing me to carry on a phone conversation and still keep some level of productivity.

Maybe that detestable, ridiculous scene in "Swordfish", where Hugh Jackman's character is typing obviously randomly on a keyboard while some three dimensional object swivels around on his display - because in a film which is all about an elite hacker that will gain them so much credibility with the hacker audience - has something after all...

Last updated: | path: tech | permanent link to this entry

Mon 4th Jun, 2007

Killing another Windows machine

We have a Windows XP machine at work which has somehow borked its network configuration. It can see its network hardware, and as far as it can tell it's all working perfectly, and the lights are on on the connector socket, but no actual packets get through anywhere. And, strangely, I'm still not happy. You see, we have two users in the lab who use a program called MrBayes, a tool to look at a list of DNA sequences and try to determine what was descended from what. For ages, they've been using the DOS version on this machine, transporting files over to it via USB stick and tying it up for hours. It's a dual-core Intel Pentium 4 machine and they're using only half its potential. And the fact that I have a perfectly good dual Opteron as a server and have run parallel programs using the LAM MPI libraries across all four processors named above seems only to aggravate the sense of waste.

Only recently, however, did it occur to me to ask "so what forms is MrBayes available in?".

My research showed that it's available in source form for UNIX, and comes with a nice pre-build Makefile. Better still, and in deference to the many other programs (such as Muscle, Clustal-X, IQPNNI and so forth) that seem to bolt on parallelism as an afterthought - usually as the result of the work of another programmer entirely - this comes with MPI support built-in. The only minor pity is that the Makefile doesn't have a target to support it, so you have to make it using the slightly different command line they suggest on their FAQ. In fact, you seem to be better editing the Makefile and changing the 'MPI =? no' option, which does use mpicc despite the FAQ warning you not to. And why not separate build options to build MPI and single binaries - the former needs quite a different environment to run. Who knows what these crazy programmers are up to?

So now I've got a multiprocessor MPI-capable version, but when I run it under LAM MPI it berks out with:

MPI_Recv: message truncated (rank 1, comm 3)
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
Rank (2, MPI_COMM_WORLD):  - MPI_Recv()
Rank (2, MPI_COMM_WORLD):  - MPI_Bcast()
Rank (2, MPI_COMM_WORLD):  - MPI_Bcast()
Rank (2, MPI_COMM_WORLD):  - main()
MPI_Recv: message truncated (rank 2, comm 3)
Rank (4, MPI_COMM_WORLD): Call stack within LAM:
Rank (4, MPI_COMM_WORLD):  - MPI_Recv()
Rank (4, MPI_COMM_WORLD):  - MPI_Bcast()
Rank (4, MPI_COMM_WORLD):  - MPI_Bcast()
So, still some work to be done. But at least the dual-core Pentium is now running Fedora all the time, so I can actually use it.

Last updated: | path: tech | permanent link to this entry

Sun 3rd Jun, 2007

First test case cover out of the press

On Saturday I took the plunge and started the final stage of the Laptop Case Cover Project - that being actually trying to produce a laptop case cover. I took my other piece of MDF and added front and back stops to it to make the top form, then did a trial run. Satisfied with that, I then glued the first laptop case cover together. And, surprisingly, it didn't turn out too badly. It needed a few minor bits of glue to fix up a few dry patches, but otherwise I like it. It holds its shape well, it fits to the form nicely, and it's light but robust.

Now all I have to do is sand it and give it a coat of finish, and then I can see if it will actually attach to the case. That'd be nice.

It also looks like I will have to steam or soak the figured pieces. The tasmanian oak front piece had its grain parallel to the curve, rather than at a tangent to it, so it bent around the front metal piece nicely. The figured pieces are much harder, which means they'll bend a bit but they'll probably break if I try to fit them around the relatively narrow corner of the front edge without some kind of prior inducement. So, I get to set up my steam press after all. Glee.

You can check the whole gallery out here.

Last updated: | path: tech | permanent link to this entry

Fri 25th May, 2007

Hardware problems

I completely failed to record the CLUG meeting last Thursday night, basically because I just didn't think it through enough. I set out with good intentions, a nice stereo microphone, a five meter extension cable to allow me to monitor the recording without being up the speaker's nose, and a little impromptu microphone stand I'd fashioned out of some handy wood (the old 'if all you have is some wood' design criteria). I'd tested the set-up before and made sure the microphone was working, and got the levels right beforehand. I started recording way before the first speaker, so I could check that it was all working again.

The first minor glitch was discovering that I couldn't record and play back simultaneously. So I could watch the levels moving up and down, and note that they were at least vaguely reacting to the speaker and the audience, but I couldn't check that what was recorded was actually sensible. I tried stopping one recording (again, before the first speaker started) and then continuing recording in another session and playing back in the first session. Audacity played nothing. I tried to normalise the first while the second was recording. Audacity crashed. Don't do that again, then.

The first clue that something was going wrong was that the levels weren't following David's talking closely enough. They'd pick up loud noises in the room, but not David's pauses. The second clue was that this changed before Tridge started talking - the levels went up - but they stayed even flatter. Tridge knows when to raise his voice - we have no amplification for the CLUG meetings - and I should have seen definite peaks. And the levels weren't flat, either - so it wasn't recording silence. I thought I hadn't changed anything when the levels went up, but as it turned out I had.

The third clue had to wait until I got the recordings home: I listened to one and I could hear my comment to Tridge sitting beside me fairly clearly, but not his response. Otherwise, it was mostly buzz. Diagnosis: something was working, but the noise from the circuitry on the rather ordinary on-board audio was pretty high (no surprises there, laptops are not renowned for their onboard sownd, er, sound). And why had the levels gone up for the second recording? What had I changed? I couldn't remember. It took a sleep for it to all congeal in my head, and the next morning the ghastly truth of my elementary error flooded into my brain unbidden:

I had put the microphone in the headphone jack, and vice versa.

This is why it picked up vague noises, and could hear my comments: because headphones will act as fairly microphones in a pinch (it's just the same circuit, only being read from rather than written to, if you will). This explained why the noise had gone up but no voice had been actually recorded in the second phase: because I'd unplugged the headphones half-way through (since they were useless as monitors). From then on it was picking up uncompleted circuit noise. And what had probably done it was that the extension cord from the microphone looks a lot like the extension cord that I use at home to plug my audio output into my home-made switch board.

So: save up for a nice Edirol UA-1EX USB adapter which can do 24-bit 96KHz recording and playback and has a microphone input, and possibly a nicer microphone stand so I don't look quite like a reject from Woodworking 101 with it. And try to solve the hardware problems in my brain before doing this again.

Last updated: | path: tech / clug | permanent link to this entry

Fri 4th May, 2007

Real Time Suck

$950 worth of new computing hardware arrived in my hot little hands yesterday. It being my regular day off, I carefully took my old Athlon-XP system out of my gaming case (a Micro-ATX thing for portability) and put into it an Athlon64 X2 system with a Gainward GeForce 7900GS video card. The thing is now a Frankencomputer in a more literal sense, because the video card is so big that it gets in the way of the DVD drive bay (which in this case is removable) and the case's own power supply - a non-ATX style thing of annoying noisiness - is too pitifully weak to run the system (the video card requires two separate feeds from the power supply in addition to the on-board power). So the drive bay and the power supply are now sitting outside the machine, tethered by cables, so the whole thing risks destruction if I try to move it about.

All of this rather expensive and not aesthetically pleasing process was really in order to be able to play Supreme Commander at a frames-per-second rate rather than a seconds-per-frame rate. And, now that the responses come back less than a sip of coffee after when I click, I've been really enjoying it. As the logical successor to Total Annihilation, it has all the improvements that Chris Taylor put in TA - order queueing, construction assistance, and unlimited raw materials. And in addition, we now have co-ordinated attacks, a very nice ferrying system (once you get the hang of it) and repeating construction orders - very useful for pumping out a balanced army without a lot of micromanagement.

Of course, I do think that it needs a few bits and pieces. The left and right click thing in particular is a little contrary to what we regular computer users are used to. In SupCom, you select units with the left mouse button, and order them to do stuff with the right mouse button - what they do is dependent on what you're clicking on. Except when you've specifically selected a command from the toolbar at the bottom right (or by keyboard), in which case you use the left mouse button. In this case, if you want to not complete your order, you right-click. If you've got a unit selected and you don't want to order them to do anything, you left-click. Now, all this is perfectly logical in the game, but it's not the same as the regular GUI interface I'm used to. So it kind of grates.

More annoying is that the command queueing interface feels like it's half-done. If you're queueing movement or patrol orders, you can move the orders about but can't delete them from the queue. If you're queueing construction orders, you can delete them from the queue (via right-clicking to reduce the number of construction units to zero), but you can't move the order around. My solution to this would be to have the movement orders listed in the same area of the screen that the construction orders are listed in; then each order in that section has a 'minus' button that you click to remove it from the queue. Construction orders would also have a 'plus' icon to add to it (shift-click to get five more, control-click to get 50 more, etc.). But I wasn't asked, obviously.

Another niggle I have is with the 'assisting' process. A standard scenario is to have one engineer building a bunch of structures, and several other engineers assisting. You set them assisting the original engineer and they follow them round. But it's too easy to forget which was the assistant and which one the leader, so you have to do a bit of shift-key juggling to work out who to give commands to in order to have them obeyed buy the entire group. Of course, once you think of them in terms of a 'group', you realise that that's the big feature that's missing - group orders. Sure, you can collect a bunch of units and give them the same order, and in SupCom they'll even form nice straight lines instead of TA's untidy messes. But you can do that with any selected bunch of units. I've often ended up adding a transport or an engineer to a group simply because I forgot to stop holding down 'shift'. I can think of a bunch of different ways to group units so that you can give them one single order that appears in one single queue, and selecting one of those units will then allow you to select the entire group that it's in. Once again, they didn't think of asking me.

And what is with this stupid back-story? StarCraft might require you to play the abilities of your units like an oran-gutan on a gigantic organ filled with stops marked "?", but at least it had factions that were interesting dissimilar and a back-story that could entertain the average child of five. Here we have every faction with exactly the same attitude: kill everyone else to make the world safe. This to me sounds like standard USAdian brainwashed patriotic bullshit. Even the Aeon Illuminate, a sect allegedly populated almost entirely by women and dedicated to a peaceful religion, have their (male) war ministers spouting the finest "kill them all" claptrap. Their motto is "Cleanse" - as in cleanse the galaxy of everyone who doesn't believe. Maybe I'm analyzing this too much, but to me this is just morally repugnant. They're not selling people on the storyline (because they haven't bothered with a credible one), so why not just say "you play the part of one of the war staff in [faction]"? Bah.

But do not mistake my annoyance with these factors for a lack of liking for the game. Do not also mistake it for trying to make up for being not very good at it. I can build my initial army quick smart, but it's the process of generating the right army at the right time that I find the hardest. I fail miserably when trying to coordinate air and land attacks. I stumble constantly over trying to defeat the standard layout of enemy bases - a couple of anti-air turrents and a couple of anti-land turrets at strategic corners of the base (or a couple of anti-air and torpedo launchers for island bases). I seem to lose far too many units in such actions. (And don't get me started on the "why can't they just have a unit that can target air and land attacks?" harangue.)

So really it's been a mixed blessing. I've played the second scenario through four or more times, each time trying different tactics and troop ratios. I found out why I couldn't upgrade my structures, even though they said they were upgradeable. I found out how to ferry. I constructed massive armies, only to find that I had to a naval run; then when my armies were depleted and my navy top notch I'd be given the next bit of the mission: defeat a land army. I feel as if any chance of going into battle on a public server will see me whipped from here to a the Skellig Islands and back. It won't be pretty.

Oh well, it's too much fun to play just to give up. Now to save up for a nice big LCD monitor...

Last updated: | path: tech | permanent link to this entry

Thu 26th Apr, 2007

CLUG meeting April 2007 - DAR, OSM, ISO

Tonight was a bit of a last-minute effort, as Chris hadn't managed to arrange any speakers. I volunteered to do a talk on DAR, since I'd recently had a think-my-data-all-gone scare, and Andrew Loughhead volunteered to do a talk on OpenStreetMap, a project he's been involved in for a while. Brad Hards also volunteered to bring along some DVDs and some ISOs of the recent Feisty Fawn release, so our trio of TLAs could be complete.

I'd also managed to combine not preparing my talk until the last minute with forgetting that I had already said I'd go to see "For The Term Of His Natural Life" at the National Library with some friends. And I didn't realise this until I was basically committed to going to the CLUG meeting. So it all felt a bit rushed and unorganised.

To cover my bit of the talk, DAR is an archiver built for the modern world: it compresses, it slices, it encrypts, it allows you to include and exclude file easily, it seeks right to where you need to be on the disk rather than reading through an entire .tar.gz file, and it saves Extended Attributes so your SELinux contexts and Beagle annotations will be saved too. I'll put the slides of the talk up soon, but it wasn't a very in-depth look at the program anyway so it's not really going to tell you any more than you'd get reading the man pages.

I'd forgot, when I was standing up, to point out the absence of my laptop case cover. This is because, on this very day, I have picked up six pieces of 0.5mm stainless steel cut to my exacting specifications by a powerful jet of water, and taken them to a place where they can be bent into the right shape to fit in my laminated laptop case covers. I had to leave the case cover with Precision Metals so they could work out how to bend both pieces to the right shape. So I'm very close to actually being able to make them. With my dining table near completion, it looks like I'm actually finishing some of my projects!

Andrew's talk on OpenStreetMap was much more interesting, and he fielded a lot more questions on it as a result. OpenStreetMap is solving the libré problem of the data on Google Maps and similar mapping sites being free to look at but not free to use in your own work. With OpenStreetMap you can correct it if you think it's wrong (leaving aside the 'easter egg' issue), add to it, analyze the map data in new ways, and so forth. There were a lot of very good questions, which I think shows that geodata is one of the current hot topics in computing these days.

Finally, I took orders for pizza (a new process to me!), Brad fired up his DVD burner, and we had the regular stand-around-and-talk that seems to be perennially popular at CLUG meetings.

Last updated: | path: tech / clug | permanent link to this entry

Wed 18th Apr, 2007

Cut with a wet knife

My research is producing fruit: Serafin in Queanbeyan can cut my six pieces of metal out of 0.2mm sheet stainless steel using a water cutter for $77, or around $25.66 per pair. So the raw materials cost so far per case cover is $26 for the metal, $5 for the ply and $20 for the veneer, so let's say around $51. So I'd probably sell them at around $100 pressed, sanded and finished. If I need to get other companies to bend and press the sheets into the required shapes, then that'll be extra.

Last updated: | path: tech | permanent link to this entry

Tue 17th Apr, 2007

A brilliant realisation

In a bit of idle browsing, I suddenly realised that there's another possibility to save energy when using a home 12V power supply 'rail'. All those halogen lights run off 12V and have 50W plug-packs up in the roof to power them. The plug packs overheat sometimes and switch off to protect themselves, which is a little irritating when you have visitors. But with a 12V power supply rail in the house, you'd just power them directly off the 12V supply, thus cutting out another conversion inefficiency.

The only question that this leaves me with is how to adequately protect the 12VDC circuit against all the things that the regular 240VAC circuits are protected from: ground leakage and short circuit. But this is the matter of a small Google search: there are, of course, quite a few circuit breakers for 12VDC systems. There's even a thread on the Home Owner's Chat forums to the effect that some modern AC circuit breakers are also rated for DC operation. Maybe not the same rating, however, because there's a difference between the RMS power of AC and the direct power of DC. If all else fails, I can probably find a suitably high-rated fuse pack for a car and use that.

There's even plenty of discussion on the question of wiring. The key observation is that there's more of an issue with voltage sag over household distances, but I'm still trying to determine the DC rating of standard 240VAC home wiring. The estimates I've made show that American Wire Gauge 12-14 is considered 'standard' for their 110VAC circuits, so we're probably using AWG 8-10 - for those who prefer metric, that's a cross-sectional conductor diameter of about 3.26 - 2.59 mm. There's a handy voltage sag calculator on-line. It'd be kind of useful to be able to use standard wire sizes, although I think I'd probably spray-paint mine some recognisable colour to indicate which lines were DC and which AC. Helps stop those annoying destruction of costly appliance moments...

Last updated: | path: tech | permanent link to this entry

Mon 16th Apr, 2007

More projects, more sticks

Two projects occupy my idle time at the moment. The first is ringing around to find a place that will cut the metal end pieces for my laptop case cover design. I've got a very promising lead out of Serafin P/L in Queanbeyan, who do laser and water cutting, and have also given me the names of two companies who can do the folding and pressing of the cut pieces. So that's good.

The other project is to make a power converter box, a rehash of an old project that I've decided to bring off the back burner. I have a perfectly good 50W 12VDC power supply from my old Via EPIA firewall that died (capacitor bloat - watch out for it) that runs off my APC power line filter to stop it going down from transient power spikes or drops. It's running the Yawarra firewall, and I also want it to power the ADSL modem (15VDC at 1A - WTF?) and the network switch (6V at 0.75A). I've been scouring the net for compact DC-DC converters and have found several companies - V-Infinity, Recom and Analog Devices, as well as another supplier who I don't have to hand (I took the spec sheets home to study).

My further aim is to have a solar panel charging a 12V battery array which can keep these, and other essential items like MythTV, going in the event of a power failure. Although regular solar panel systems involve an inverter and a grid interface system, I was thinking of also having a direct 12V feed cabled to where it was needed. The above components all take direct DC, and I'm sure a bit of searching will turn up some suitable power supplies that run off 12VDC input. I understand that this kind of direct power is popular for computers run off big UPS systems, as you don't take a double loss of converting from DC to AC and back to DC. They all seem to take screw-in terminals, which is a bit ... shock-prone for my liking - I'm surprised there isn't some 'IEC'-style connector designed for this purpose. But then I suppose they usually live in locked cabinets in secure facilities.

The original project was a 'wall wart eliminator' that could be run off a computer's 12V supply to run the various peripherals connected to the computer that didn't need to be on when the computer was off. Apart from being more efficient - a lot of older wall warts are simple linear power supplies and use power even when the device they're attached to is off - it also eliminates a lot of the unsightly clutter on your desk or wall. I personally hate the things - they're never a convenient width so that you can put two side by side, and on vertical walls I'm always afraid that they're going to fall out and my network switch is going to go down unexpectedly.

The problem is, of course, what voltages does one need to supply? For a generic device that suited everyone, the answer would seem to be "a huge range" - I've got devices that I use that take 3V, 4.5V, 6V, 7.5V, 9V, 12V and 15V. This had me stymied for a long while - being able to choose your preferred voltage for each output would imply a very complex, custom switched-mode converter. But gradually an idea congealed in my subconscious, and when I read the key words "industry standard pin placement" on one spec sheet I realised: you could simply have a number of blank spaces or sockets on the board, and the user simply orders and obtains the necessary converters for their desired outputs and everyone's happy. About the only thing that wouldn't require this is the 12V output, which would just be a direct feed from the 12V input. While it might sound like this would need some kind of regulation, we can be reasonably confident that the input is also fairly well regulated, so this would just make that circuit less efficient. The 12V-12V converter just becomes a couple of wires.

So: more simple hardware hackery coming soon.

Last updated: | path: tech | permanent link to this entry

Thu 12th Apr, 2007

CLUG PSIG April 2007 - Python in C in Python in C ...

At 5:00 or so I got the phone call - Bob Edwards, the speaker for the night's Programming SIG at the Canberra Linux User's Group was unwell and wouldn't be able to make it. I sent out a quick note to say that we'd be reverting to the more traditional "everyone talk in their little groups about whatever" format, which was a little unsatisfying. This may seem a contradiction from a person who loves to tell tall tales of programming history and relate episodes from the Daily WTF. But I want to learn as much as I like to teach, and I want to make sure that the CLUG PSIG doesn't turn into the "solve coding problems for Paul Wayper" SIG.

We had a good turn-out, though, and people who had seen my last-minute email had still come along. But the day was really saved when David Collett and Michael Cohen did an impromptu talk about integrating Python with C. Those of us with laptops and internet connections went to the PyFlag code repository that David and Michael and others have been working on, and followed along as David showed us how they'd progressed from using SWiG to writing entirely in C and using the Python integration library to pass data in and out and to call Python methods from within C. David knows his code well and he and Michael were able to demonstrate all the standard things you need to do to integrate the two languages, as well as why those methods were chosen. I was really impressed at their off-the-cuff presentation and it really saved the night.

And then somehow I found myself explaining all about my sequence counting program, why I'd used C instead of Perl to implement it, and what its limitations were. Though everyone was listening attentively, I was secretly fearing that it was turning into a conversation between myself and Michael, who was doing most of the questioning. And there were a couple of good ideas - things like using array-based trie systems, seeking through Clustal .ALN files, and using buffer systems to break the problem down - that he mentioned that I'll have to follow up. But I'm very annoyed with myself that it turned into exactly what I have felt all along that it should not be (q.v.).

Finally it came down to Owen, Ian, Rhys (or Ryan, I can't remember) and myself talking about esoteric things like Big Bang vs Steady State theory, the four quantum forces and their relative strengths, and the families and groups of the Periodic Table. So it ended on a good note after all.

Last updated: | path: tech / clug | permanent link to this entry

Mon 2nd Apr, 2007

How hard can it be?

My "wooden laptop case cover" project hit a snag last week. I'd spent some time the previous weekend using Qcad to draft up plans for the metal bits that I need for my laptop case cover so that it can clip onto the case and not fall off. I'd drafted them in a CAD program so I could send them through to a metal fabrication company. I sent the plans in DXF and PNG format, and asked them to email me if they had any problem reading the plans. Not hearing anything back, I assumed it was all OK and drive all the way out to the edge of Queanbeyan to talk with them.

First disappointment: they hadn't read either of the files, they said they didn't recognise them, but they hadn't let me know. Luckily I had thought of this eventuality and brought my laptop, case and plans. Second disappointment: they couldn't make anything even vaguely like what I needed. This was made even more bitter by the fact that if they had let me know about their inability to read the plans beforehand, I could have converted them to some format that they could read and then they could have told me over the phone that they weren't able to manufacture them and thus saved me a trip. There must be a certain point that businesses reach where they just get so used to working with their regular customers that they simply don't care whether or not they piss anyone else off. A pity.

There are a couple of alternatives that I thought of on my drive home. The most expensive was finding a company that could do this kind of intricate metal fabrication. It would probably require a custom-made metal folding jig to be made up, probably costing in the order of a thousand dollars or so. The companies that do this are in Sydney, so it'd be long-distance correspondence and long trips to inspect. Let's scratch that option from the list right now.

The rest of the alternatives roughly translate into "make it myself". I can either make a suitable mould and cast the things, make up my own metal jigs to press sheet steel, work out some CNC thing to carve the pieces out of larger blocks of metal, or talk to Vik Olliver to see what RepRap can do. The fundamental criteria are exact measurements, hardness of the resulting material, and resistance to wear and corrosion.

Time to do a bit more research. I'll post the CAD files of the parts up at some stage, I suppose...

Last updated: | path: tech | permanent link to this entry

Wed 14th Mar, 2007

MythTV Transcoding HOWTO

My brother has recently installed MythTV. Well, he installed MythTV, and then called me to get the TV guide and channel settings sorted out. But his was the dirtier job: crawling up into the roof to splice in a new (powered) aerial splitter. His signal to noise ratio and signal strength aren't great, but it works.

Now to help him learn how to use the thing :-)

Murray Wayper wrote:
> How do I transcode existing files?

The first thing to do is to edit the commercials and stuff you don't want out.

You do this by going into the show and pressing 'e' to edit the cut list. This is presented as a bar across the bottom. If the show has been commercial flagged - and, by default, any show on a commercial station will have been - you can press 'z' to import the commercial cut list that the commercial flagging process has determined. That will give you a list of regions. (Incidentally, commercial flagging also puts in a superfluous cut point at the start and end, so I remove them otherwise I get a half a second of the start and end of the recording tacked onto my transcoded programme.)

You navigate with the arrow and pageup/pagedown keys. Left and right move by the interval specified at the bottom right; up and down change that interval. You start at one second, and can go down to one frame and up to ten minutes. Pretty quickly you develop the technique of homing in on an ad break by a sort of binary subdivision method - start at a minute and move forward until you're in the ad break; then go down to twenty seconds and move out of the ad break; then go down to five seconds and move into the ad break, and so on. Press Space to insert a new break point - it'll ask you which direction you want it to go. The default direction alternates, so you usually don't have to change your selection. If you're close to an existing break point you can move it (useful for fixing the commercial detection, which isn't flawless), delete it, flip it around, or leave it alone.

Page up and page down move between the break points. This is very useful for checking that the commercial detection has worked - set the jump interval to a second, then move to the first point and check either side. Move the break point if necessary, then use page down to move to the next break point and so on. Here's where you remove the superfluous break points that commercial detection puts in at the start and end of the show. It takes a little while to get the hang of this kind of editing, but you get used to it. You press 'e' when you're finished with the cutlist editing mode.

To start transcoding, you can do two things. If you're watching the show, you can press 'x' on the keyboard and it will start transcoding using the default profile for that show. So once I've finished editing the cut list, I usually skip about two seconds back into the end of the show, press 'e', and immediately press 'x' just before the show ends at the new cut point. Alternatively, from the 'watch recordings' menu, when you're looking at the particular show you want to transcode, press the right arrow and choose 'job options'. Then 'transcode', and choose the profile you want to use. High quality basically leaves the MPEG2 stream as is and just removes the commercials from it; Medium quality and Low quality involve a transcode down to MPEG4 (Xvid). You can set up the actual parameters of how much compression and so forth in the setup menu.

BTW, you'll also find it useful to know that you can press 'd' when watching a show to jump back to the 'watch recordings' screen and go into the 'delete show' menu. (I.e. you do get to confirm whether you want to delete the show or not, but it's faster than exiting the show, pressing right, and choosing delete.) And, for shows that have commercial detection that haven't been transcoded, you'll see a little box in the top right corner (I think) informing you of an upcoming commercial. At this point you can press 'end' to skip to the end of the commercial flagged region.

Last updated: | path: tech | permanent link to this entry

Tue 13th Mar, 2007

New discoveries

I made two new discoveries this morning. The first is that my MythTV system's auxiliary components - ffmpeg, transcode, mplayer et. al. - work better if all sourced from the same repository. I finally settled on FreshRPMs, but AFAICS there's no real reason to choose one or the other.

Second is that, with ADSL2+, I can run mplayer remotely and get ~1FPS on the local display. Of course, that's a tiny window and there's a whole bunch of latencies in the chain. But the fact that it ran at all surprised me.

Last updated: | path: tech | permanent link to this entry

Tue 6th Mar, 2007

Wiki defacement

Subject: Defacement of our wiki page by your user

Dear people,

On Wednesday the 28th of February, a user from your address made two edits to our Wiki. You can see the page as changed at, including the above address as the editor. Your client is obviously defacing our and other sites like it, which is probably against your terms of service. In addition, they are too lame to be on the internet. Please take them off it so that they do not do any further damage to themselves and others.

We have reversed their changes and our site is back to normal.

Yours sincerely,

Paul Wayper

Last updated: | path: tech / web | permanent link to this entry

Fri 23rd Feb, 2007

CLUG February meeting - PS3 and Sushi

It's at times like this that I'm really glad I discovered the CLUG.

Firstly, the main focus of the presentation was the PS3. Hugh Blemings and Jeremy Kerr (I think) gave a talk about the heart of the PS3, and several of IBM's large blade servers, the Cell processor. I'll gloss over a lot of the technical detail because it's pretty easy to find, but the key things to me were:

Of course, writing code for the Cell's Secondary Processing Units - the eight 'sub-processors' that do most of the SIMD work - is not an easy process. The 'Hello World' example involved lots of complicated IO, but that was only because the SPU isn't the right platform to use to put words on the console. What Jeremy and Hugh are interested in is seeing various libraries - FFT, audio and video codecs, rendering libraries, all sorts of other things that require lots of brute-force computation - ported to use the Cell's SPUs. The power of these things is not to be underestimated: their classic demo is a four-dimensional jula set (!) made of glass (!) ray-tracing (!!) in real-time (!!!), and it's done with no clever OpenGL output, just blatting pixels onto a frame buffer. With so much work being done on general-purpose data processing algorithms that can be run on the Graphics Processing Unit of your modern graphics card, this offers significant performance increases if we can get these libraries in place.

Nick Piggin also gave a talk about his work on getting better performance from NUMA machines. NUMA is Non Uniform memory Access, where each processor or core has direct access only to a subsection of the total memory in the system, and has to ask for other blocks to be sent to it if the block it wants is attached to another processor. Blocks that are read-only can be duplicated in each processor's local memory: for example, the pages containing libc. Blocks that are read-write can be replicated while everyone's only reading them, and flushed out and reloaded when a write occurs. So overall this was a night for the supercomputing enthusiasts amongst us (e.g. me).

(Note to self: I need to find a good way to talk to Nick about his presentation style.)

Once most of the presenting is over, the night is given over to eating and chatting. Usually in CLUG meetings this is given over to a pizza feeding frenzy once famously compared to a gannet colony. Last night, however, we also had sushi. This was organised by myself with some assistance from Pascal Klein, and had to be arranged in advance. It was an experiment in alternative foods prompted by Hugh Fisher's talk on Open Source software communities. We had seven people, Hugh and Jeremy included, request sushi; one didn't show up, so despite me asking for an eighth person to join in we still only had seven people sharing the cost. So I got stuck with a bit of the bill, but it was worth it for the quantity of sushi. There was a good variety, in reasonably good quality, and enough wasabi to entirely destroy the sinuses of the entire CLUG attendance for the night. So I think this was a success; I'll do it next time.

I'm still casting around for other cuisines that have small, easy to eat portions that don't require cutlery and can be sourced relatively quickly and don't go off their serving temperature for the period they're stored between picking them up and dispensing them. But sushi twice in a row won't be all that bad...

Addendum: I called my brother up in Brisbane to tell him of the PS3 coolness, and ended up spending more time talking to his friend Nick who works there. Nick's a linux user in a crowd of Windows geeks, like myself, so we ended up chewing the fat over processing coolness and Vista badness for a good hour or so. I also passed on to him the news about - i.e. that it exists. That should save him the agony of getting MythTV to compile...

Last updated: | path: tech / clug | permanent link to this entry

Mon 19th Feb, 2007

Programming Sig February 2007 - Rusty Russell

I should have reported this a week or more back, but the weekend was filled with dead and dying firewalls and hardware purchases that didn't work. So:

At the CLUG Programming SIG for February, we had Rusty Russell speaking on his project LGuest, a Linux x86_32 hypervisor system. He talked about his aims (to develop a framework for testing virtualisation in Linux) and how he overcame the various obstacles in his way, such as glibc's way of using an segmentation calculation overflow to store information about the program code, and how he got around them. He finished up with some benchmarks and a bit of a comparison between the LGuest and the various other virtualisation and hypervisor packages out there - primarily Xen and VMware. There were at least a dozen people there, which was big for a PSIG meeting, and this included two people from ANU who had never been to any CLUG event before but had wanted to hear Rusty's presentation. The talk was very well received, and Rusty gave an excellent presentation in his usual engaging manner. He handled the constant interruptions from food arriving and plates departing neatly, and didn't miss out on his own meal either (this last is important, I think, for a speaker). My only apology to him was my brief and inelegant introduction, but since pretty much everyone knew who he was and what he was doing, I felt words were superfluous :-)

I've been working in the background trying to get various people that I know in the CLUG scene to give talks at the PSIG. There are essentially three things that I want to hear about:

That last one in particular interests me greatly. There are a number of topics I'd personally like to hear about:

The question is: how do I find people in the Canberra area to talk on these topics?

Last updated: | path: tech / clug | permanent link to this entry

Fri 16th Feb, 2007

The touch of life

After my firewall fun last week, and my modem fun at the start of the week, I think I now have working replacement gear to run my home network. I'm gradually configuring up the two boxes - the nice red Yawarra box running pfSense and a new Netcomm modem purchased from Harris Technology while I get my refund from Orange Computers (who, after a bit of fiddling around getting case numbers from Netgear, have agreed to not worry about the '7 days or less for refund' clause on their receipt).

After a bit of poking around in various manuals, I found that it is possible to put the modem in bridge mode, effectively allowing the pfSense firewall to do the PPPoE connection. I prefer this method to having the modem be a firewall and the second firewall just act as a pass-through - it seems less flexible to me. But this raised a somewhat sacrilegious question in my head.

With the way modern modems run cut-down free operating systems to do their firewalling and administration, is it necessary for me to have a separate firewall running another free operating system in order to get the functionality I want? My old Belkin Wifi AP, for instance, allowed two WPA pass-phrases - one for 'full network access' and one that could only access the internet; pfSense only has one, that gives full network access. The NetComm's advanced setup was as sophisticated as anything I've seen from pfSense or Smoothwall, albeit without the neat graphs and SSH access. Should I be paying an extra $450 for a separate piece of kit where a $100 modem has the same functionality?

Last updated: | path: tech | permanent link to this entry

Comment spam eradication, attempt 2

Dave's Web Of Lies allows people to submit new lies, a facility that is of course abused by comment spammers. These cretins seem to not notice the complete absence of any linkback generation and the proscription of any text including the magic phrase http://. Like most spammers, they don't care if 100% of their effort is blocked somewhere, because it won't be blocked somewhere else. And there's no penalty for them brutalising a server: their botnets are just trawling away spamming continuously, leaving the spammers free to exploit new markets. It is vital to understand these two factors when considering how to avoid and, ultimately, eradicate spam.

For a while now, I've done a certain amount of checking that the lie submitted meets certain sanity guidelines that also filter out a lot of comment spam. In each case, the user is greeted with a helpful yet not prescriptive error message: for instance, when the lie contains an exclamation point the user is told "Your lie is too enthusiastic". (We take lying seriously at Dave's Web Of Lies.) This should be enough for a person to read and deduce what they need to do to get a genuine lie submitted, but not enough for a spammer to work out quickly what characters to remove for their submission to get anywhere. Of course, this is violating rule 1 above: spammers don't care if any number of messages get blocked, so long as one message gets through somehow.

This still left me with a healthy chunk of spam to wade through and mark as rejected. This also fills up my database (albeit slowly), and I object to this on principle. So I implemented a suggestion from someone's blog: include a hidden field called "website" that, when filled in, indicates that it's from a spammer (since it's ordinarily impossible for a real person to fill any text in the field). Then we silently ignore this field. No false positives? Sounds good to me.

Initial indications, however, were that it was having no effect. I changed the field from being hidden to having the style property "display: none", which causes any modern browser to not display it, but since this was in the stylesheet a spammer would have no real indication just by scraping the submit page that this field was not, in fact, used. This, alas, also had no effect. I surmised that this was probably because the form previously had no 'website' field and spammers were merely remembering what forms to fill in where, rather than re-scraping the form (though I have no evidence for this). Pity.

So my next step was to note that a lot of the remaining spam had a distinctive form. The 'lie' would be some random comment congratulating me on such an informative and helpful web site, the 'liar' would be a single word name, and there was a random character or two tacked on the lie to make it unlikely to be exactly the same as any previous submission. So I hand-crafted a 'badstarts.txt' file and, on lie submission, I read through this file and silently ignore the lie if it starts with a bad phrase. Since almost all of these are crafted to be such that no sane or reasonable lie could also start with the same words, this reduces the number of false positives - important (in my opinion) when we don't tell people whether their submission has succeeded or failed.

Sure enough, now we started getting rejected spams. The file now contains about 36 different phrases. I don't have any statistics on how many got through versus how many got blocked, but that's just a matter of time... And I'm probably reinventing some wheel somewhere, but it's a simple thing and I didn't want to use a larger, more complex but generalised solution.

I'd be willing to share the list with people, but I won't post the link in case spammers find it.

I really want to avoid a captcha system on the Web Of Lies. I like keeping Dave's original simplistic design, even if there are better, all-text designs that I could (or perhaps should) be using.

Last updated: | path: tech / web | permanent link to this entry

Tue 13th Feb, 2007

A warning to others

How stupid am I? I fell for a trick I already knew.

I picked up a Netgear DM111P on Saturday at the Computer Swap Meet, in order to accommodate my new Yawarra firewall with it's lack of USB connector for my old ADSL modem. Since this new one does ADSL2+ via an ethernet port, and theoretically Internode will get around to upgrading the Belconnen exchange to ADSL2+ any year now, it's a timely upgrade. And it's exactly what I need: one RJ11 port for the phone line, one ethernet port for the firewall, no wifi, no extra ports. And it does PPPoE passthrough too, AFAICS.

I noticed at the time that the box had been opened and resealed with sticky tape. Stupidly, I thought nothing of it: maybe someone wanted to see what it came with - there are plenty of people who like opening the boxes in order to waste the sellers' time and look like a guru ("Oh, this comes with a yellow ethernet cable, I'm not having it!). But subconsciously I recognised it as a bad sign.

The next bad sign was when I got it home and opened it, to find a previous receipt and post-it note stuck to the leaflets. This meant someone else had already bought it and returned it. You can bet it wasn't because they didn't like the colour. And, stupid and annoying as it is, it is a not uncommon practice of these low-margin swapmeet vendors to take a return and put it back on sale. Sure, they'll probably end up with another return in a week's time (assuming you can find the bloody people again), but in the meantime they've got your money and they haven't dirtied their reputation with their upstream suppliers.

Sure enough, I plugged it in and turned it on and the power light remained red permanently. After hunting down the manual, I determined that this meant that it wasn't working correctly. Pressing the 'reset' button made it go green and flash, but it never went out of this reset mode again. And I couldn't ping it at all, or find it using nmap on the subnet.

So, thank you Orange Computer & Networking of 352 Rocky Point Rd, Sans Souci, phone number 02 9583 9838. Thank you for bodging up another dodgy sale; thank you for pissing another customer off in order to make a quick buck. I will be complaining to the organisers, and ringing you to arrange a replacement - and that won't involve me driving up to Sydney or waiting for you to appear at a swap meet again, either. You've printed "All goods are one year warranty." on your receipt - time for you to honour it.

And just when I thought I'd got exactly what I wanted, too.

Last updated: | path: tech | permanent link to this entry

Fri 9th Feb, 2007

The Touch Of Death

It had to start on Thursday morning, right when I was involved with getting Kate to the airport for a conference on time. My Smoothwall firewall, which had given four years of continuous service, had decided during the night to stop working. Of course, since it's the DHCP server on the network as well I found this out by not having an IP address.

First I suspected the drive, since on rebooting it would get to the same spot in the boot process and freeze. So I got it working with a spare 40GB hard drive (everyone has one lying around, right?). That did me for Thursday night.

But in the morning it was dead again, and the heatsink on the Via C3 processor was almost too hot to touch. So, dead processor, eh? Hmmm. I grabbed one of my old machines which was still pretty much all in one piece (everyone has an old P3-800 lying around, right?), put the drive in it, and fired it up. Hmmm; not booting at all. I fiddled with a few things, but still didn't get the screen to fire up. Not so good.

Casting my eye around the room, I spied my gaming machine, which I had recently swapped for Kate's machine (everyone has a spare Athlon 3000 in a Micro-ATX machine with a GeForce 6600 lying around, right?). Plugged it in, turned it on, big grinding sound, smell of scorched plastic, no boot at all. Not good. I'd put in my large heatpipe cooler and hadn't noticed that the fan cowling was being squeezed by pressing on the Northbridge heatsink. So I got the spare AMD heatsink (everyone has a spare AMD heatsink, right?) and attached that, but it still wouldn't boot.

So, three machines down, and still no firewall. Time to bring that old P2 that no-one wants home from work. Everyone has a spare machine lying around at work, right?

But meanwhile, I have ordered a WRAP-1-1 box from Yawarra, with a mini-PCI 802.11a/b/g card to replace my Belkin. This finally realises my goal of having a firewall with no moving parts. Having pfSense preinstalled is a bonus, and with pfSense's ability to log to an external syslog server, there's no unnecessary writing to the CF card. Hooray!

Last updated: | path: tech | permanent link to this entry

Tue 6th Feb, 2007

The Linux Ads 1

The Linux Australia email list has been alive with questions about video ads that promote Linux as a usable alternative to other closed-source, proprietary, costly operating systems and software. I had a series of three ads in mind.

All three show two people using an ordinary PC side-by-side. We see brief snippets of the software they use in their everyday work and play. Each time, the one on the left is using Windows; the one on the right is using Linux. In the bottom corner, we see a constantly-updating price of the software they use, where the package name and cost appear and disappear and the running total is left on the screen. The voice-over describes what the people are doing, what operating systems they're using, and finishes up with a conclusion that varies per ad.

The first ad shows the two people using the same free, open-source packages: firefox, thunderbird, OpenOffice, gaim, inkscape, gimp, and so on. The price tag on the left shows the cost of the version of Windows, and each time a package is used it's shown as free. At the end, the voice-over points out that they've been able to use exactly the same software, but the second person hasn't had to pay for their operating system - it's free. And they can give copies to their friends, but this is a minor point in this ad.

The second ad shows the two people using different software. On the right the Linux person is using the same software as before; on the left, their proprietary equivalents: IE, outlook, Microsoft Office, MSN, Photoshop, Illustrator. The price tag quickly goes up into the thousands of dollars. The voice-over points out that not only can the person on the right use the files generated by the person on the left, but they haven't had to pay for the software. And they can still give it away.

The third ad shows the two people using the same list of software as in the second ad. But this time, the price tag for the person on the left comes up as PIRATED each time. Just as the demo ends, two police officers appear beside the person on the left and take them away. The voice over points out that copying proprietary software is illegal and you can face criminal penalties, but that Linux and all its software is still free and legal to share with your friends and family as well.

I'm sure there are a few variants on this theme: you can play many games such as Quake 4 natively, use Wine or Cedega to run Windows software you can't do without, be protected by industry-proven firewalls and security technology, have the latest GUI wobbly window whiz-bangery, and so on. But I really like the idea of comparing the actual cost side-by-side, and showing that you can still do all the stuff you want to in Linux, without paying a cent, and you're allowed to share it with your family and friends. That's one of the key things that I think we overlook in the open software world - it's so obvious that we never think how much of a revolutionary change it is to people bound into proprietary software licensing.

Last updated: | path: tech / ideas | permanent link to this entry

Wed 31st Jan, 2007

At last you know what you're getting

I've finally mangled up the track listings for the Flashing Google Badge Mix and the Wired Kernel Hacker Mix. Now you know what you're listening to!

Last updated: | path: tech / lca | permanent link to this entry

Mon 29th Jan, 2007

Domain Search Squatters Must Die episode #001

It looks like the SpinServer people that I mentioned nigh on nine months ago have disappeared. That I can cope with - a pity, because I liked their designs, but businesses come and go.

What INFURIATES me beyond measure is the way the people who run the domain registers then cash in on any businesses' past success by installing a copy-cat templated redirector site that earns them a bit of money from the hapless people who mistake it for the real thing. They're getting good too: it was so well layed out it took me several moments to work out that there was nothing actually useful on the site. Previous attempts I've seen have been pretty much just a bunch of prepackaged searches on the keywords in your previous site listed down the page, with a generic picture of a woman holding a mouse or going windsurfing (or for the more extreme sites going windsurfing holding a mouse). Now it's getting nasty.

It's not good enough that these domain registrars take money for something they've been proven to lose, 'mistakenly' swap to another person, revoke without the slightest authority, fraudulently bill for, and costs them nothing to generate. They they have to leech off the popularity of any site that goes under, not only scamming a few quick thousand bucks in the process but confusing anyone who wanted just a simple page saying "this company is no longer doing business". There must be something preventing this from happening in real life - businesses registering the name of a competitor as soon as they'd closed, buying up the office space and setting up a new branch. Except that there'd be some dodgy marketing exec handing them money for every person who wandered in and asked "Is this where I get my car repaired?". This sounds criminal to me.

Last updated: | path: tech / web | permanent link to this entry

Thu 18th Jan, 2007

The "solving things" conference

Last year at LCa 2006 I had two guys take five minutes of their time to help me get my work laptop on the wifi, a process which included one of them lending me his PCMCIA card (that didn't require firmware) so that I could download the firmware for my inbuilt card. I've forgotten your names, whoever you were, but you guys rocked. And I've told the story many times to illustrate why groups of hackers getting together can achieve things that would take a single person a lot of work to troubleshoot.

This year I've had similar experiences. The first one was getting my DVD drive set up to use DMA. On the Intel 82801 ICH7 family of ATA bridges, it supplies a SATA interface for the hard disk and a PATA for the DVD. Unfortunately, the standard ATA driver doesn't interface with this combination correctly and doesn't enable DMA on the DVD drive (or allow you to set it via hdparm). To fix this, put the following incantation on the end of your kernel line in your GRUB configuration file (for me on Fedora Core 6, that's /boot/grub/grub.conf):

kernel ... combined_mode=libata hdc=noprobe

The other was finally getting CPU frequency scaling working on my Intel Core 2 Duo. It's an unfortunate but now well-known bug that the Fedore Core 6 anaconda installer will not correctly work out what type of chip this is. It therefore thinks that you need the Pentium (i586) kernel rather than the Pentium II and later (i686) kernel. Since Pentiums didn't come with frequency scaling, the kernel package doesn't include the necessary kernel objects for speed stepping. You'll know if this applies to you with the following command:

rpm -q kernel --queryformat "%{NAME} %{RELEASE} %{ARCH}\n"

The third column will have the architecture - standard rpm and rpm -qi commands won't tell you this. uname -a will tell you i686 even if the kernel is i586, so don't believe it. To download the new kernel version, use:

yum install kernel.i686

I think that you have to do some special magic to get it to install the i686 architecture of the same version. As of my writing, it picked up the 2.6.18-1.2868 version of the i686 and installed that beside the 2.6.18-1.2869 version already installed. Yum won't correctly replace the i586 architecture version with the i686 architecture version if it's the same release number, as far as I know. I don't know what you do in this case.

Of course, while you're running the current working kernel, download all your kernel-specific packages for things like wireless networking support. These you have to download the RPMs from your local mirror and install manually, because it's currently running a different kernel and yum will only install packages for that. Of course, if your ipw3945 driver is compiled from source, you'll have to make that clean and compile it and the ieee80211 module from scratch again. Take it from me, there's some weird voodoo to get this working that took me a day to correctly incant.

Then you should have an acpi_cpufreq.ko module installed and be able to use one of the CPU speed regulator daemons. I think I have both installed somehow, which means they're probably fighting it out or something. Go me. Still, I can blog about it, which hopefully means that Google will index it and someone else will learn from my mistakes. That's the only reason I'm doing this, you know.

Last updated: | path: tech / lca | permanent link to this entry

Rad GNOME Presenters

Andrew Cowie and Davyd Madeley put on a good show for how to start writing GNOME applications in C and Java. Andrew in particular is an enthusiastic speaker, and understands that it's very difficult to choose which talk/tutorial to go to and sitting for ninety minutes and listening to one topic is sometimes difficult. I totally appreciate that. And, by Torvald's Trousers, he's fast at using UIs - watching in work in Eclipse makes you realise how good programmers can churn out a fully implemented file browsers in an evening.

I'm going to have to kidnap one or both of them and bring them to the CLUG Programmer's SIG meeting. This talk was exactly what the people at the PSIG that I've been talking to have asked for.

Last updated: | path: tech / lca | permanent link to this entry

I'm on the shirt that killed River Phoenix

In crowds, there's always someone heading vaguely toward you but heading somewhere else entirely. There are a whole lot of little protocols - not meeting their eye, negotiating a slightly different course - that allow a certain social space. So it's always disconcerting to have someone stride directly up to you - when they actually do mean to meet you and you've now been pointedly ignoring them. He pointed to my LinuxChix Miniconf "Standing out from the crowd" T-shirt and said:

"Where can I get one of those?"

I gave him Mary Gardiner's email, and whatever other methods I could remember of how to get in contact with her. But though it's a long sleeved shirt and it's a warm day, I'm totally chuffed to have got one now. LinuxChix roxxors!

(BTW, the title is a reference to TISM's popular song (He'll Never Be An) Ol' Man River, of course.)

Last updated: | path: tech / lca | permanent link to this entry

Wed 17th Jan, 2007

Kernels meeting in the middle?

Andy Tanenbaum's talk on microkernels was, IMO, really cool. The interesting thing to me was that this almost exactly mirrored Van Jacobsen's talk at LCA 2006 on speeding up network access by moving the network drivers out of the kernel. Not only did this speed network access up, but it also removed a whole bunch of ugly locking stuff from the kernel, improving its quality as well. Another side benefit of this was that you could now run half a dozen network processes instead of one. With architectures like Sun's Niagara, Intel's quad cores and many other systems getting many cores on the same chip, this is going to deliver an increasing speed-up.

It occurs to me that this is the other good thing of Minix. The disk driver, the network driver and the screen driver can all run at full speed because they get 100% of their own CPU time. Separating these out onto separate processes that can run on separate CPUs will deliver better scaling than bloated kernels that have every driver and every system all bundled together. To me, this is not really a problem for Linux - we already have proof that these trends are happening. Linux might have a larger kernel, but we're meeting microkernels in the middle.

For Windows, though, I'd say that it will become increasingly obvious that it just can't compete on reliability and scaling in the area that they so desperately want to get into: the server market. The annoying thing about this is that it won't really matter, because Microsoft knows who to market to (the upper management who don't read technical journals) and have the budget to make anything look good. The fight is still on, but it's still not between Linux and Minix. Sorry, Marc, stirring that particular pot again does not get you any kudos.

Last updated: | path: tech / lca | permanent link to this entry

Tue 16th Jan, 2007

Submitting patches and watching devices

Two more excellent talks at the LinuxChix miniconf - how to work on open source if you're not a programmer and how to understand PCI if you're not a hardware hacker. It was amusing to see that the small room the miniconf has has been constantly full, with people often having to sit on side tables or stand in order to watch. For the latter talk in particular, a huge contingent of guys turned up to listen and strained the capacity of a room that had been boosted with lots of extra chairs. Very cool.

One of the key elements that has come out of the LinuxChix miniconf (in my opinion) is that social networking is just as important as digital networking. Part of this is meeting and greeting, something that even if LCA was twice as big would still be just as awesome. Another part is the smoothing of feathers, the shaking of hands, the stroking of egos - the little things that sometimes you have to do to get patches accepted or problems resolved. One trick which Val Henson mentioned is to submit a patch with one or two obvious errors (like submitting it in the wrong format) - then the developers can feel all important and tell you you did it wrong, and you quietly submit the correct patch and everyone feels happy.

Logically, it shouldn't have to be this way. Open Source prides itself on the idea that anyone can modify, anyone can help. But this, as Sulamita Garcia (the first LinuxChix speaker) pointed out, is a fiction - the reality is flame wars, shouting matches, and sexist comments. Getting patches accepted can often be as much a knowing who to talk to as what format to submit it in. A woman going along to a LUG meeting for the first time can be, as Sulamita described it, akin to the scene in the spaghetti western where the stranger walks into the bar and everything stops. This must change if we're to be anywhere as equal and egalitarian as we claim to be.

And certainly for men it's sometimes a huge struggle. I think of myself as a feminist and consciously support equality and fairness, yet I still make the same mistakes as all the other guys I personally shrink away from. And even after this example, when you'd think I should have put a cork in my mouth, I was still putting my foot in instead.

At the last session of the LinuxChix miniconf, where we went to the library lawn to sit in the dappled sunlight and talked about how difficult it is to get a fair rate of pay. This followed on from Val Henson's talk on negotiation and knowing how to get what you deserve, which was excellent and (I feel) applied to the wider community of computing workers. Mary Gardiner organised us into small groups and specifically cautioned the men in the groups to not talk too much (which would have been a good idea even if it wasn't a LinuxChix miniconf). So we start introducing ourselves, and what do I do?

Go into a long and tedious ramble about the pains of one of my previous jobs.

Mary, the lady organising our group, gently interrupted me and moved on, and I realised my error. Andre Pang, who was also in the group, was much better than I at keeping quiet and letting the women[1] talk. I silently made the motions of putting a cork in my mouth and managed, I think, to restrain myself.

Why must my urge to speak and be heard fight with my desire to be fair and equal?

[1] - Women? Ladies? Girls? Females? Whatever term I choose, I hit the age-old problem of them having social connotations.

Last updated: | path: tech / lca | permanent link to this entry

The invisible macho danger

I worked out that there were 38 women and 12 men for the first session of the LinuxChix miniconf. In the question time, it came out that the FOSSPOS study (I've yet to find it on the intarweb) showed that FOSS and Linux has an order of magnitude fewer women compared to the rest of the IT industry. And yet, when I asked my question "why is this?" Val Henson pointed out to us that, even with that proportion of women in the room, all of the questions up to and including mine had been asked by men.


Last updated: | path: tech / lca | permanent link to this entry

Getting your hands on a child's laptop

Chris Blizzard's talk today about the OLPC covered the question that everyone from the FOSS world (apparently) asks: can I have one. It's very true that even if you got 50,000 people wanting an OLPC (or whatever the actual thing is called), that's peanuts to delivering 20 times that number to one nation alone. However, the 50,000 number is being bandied around - what if there was a website for people to register their interest? And they could say how many they wanted? The more that people spread the word, the more people might find more uses for them. Entire classes or schools in first-world countries could sign up, whereas they would currently be denied. That's got to be good, right? Things like PledgeBank make it easy to get a good feel for how many people are interested in doing something - why not do something like that for the OLPC and see what the real interest from the people is?

Last updated: | path: tech / lca | permanent link to this entry

Mon 15th Jan, 2007

Odd hackery across operating systems

So the next step in getting MixMeister working under WINE seems to be to get a bunch of SELinux context problems sorted. The command to do this is chcon -r textrel_shlib_t <file> - it allows the file to be loaded as a shared object library. I must remember that. This got MixMeister to show its front screen, but it still complains that WMVCore.DLL is missing.

Aside: my fallback, if this WINE hackery wasn't going to work, was going to be starting up in Windows (I still have an XP Home license installed on a smallish partition, since I'm loath to throw away something that costs money when I'm given it and I can find a use for it.) Resigning myself to not using Linux, I restarted in Windows. And then discovered two problems. One is that I only have a demo version of MixMeister Studio 7 that I was trying out a while ago. I'd have to grab the install files and my registration key off my home server - not impossible despite its 34MB installer size.

The second problem was that my portable drive, which regular readers will recall I specifically formatted into FAT32 (yes, using the -F 32 switch to mkfs -t vfat, which otherwise will give you a 12 or 16 bit FAT) in order to make it accessible under Windows should I have to go to this fallback position, is not recognised under Windows. It sees the drive, and the partition, and can even determine that it's a 32-bit FAT partition and see how much free space there is. But it just remains greyed out, and there is no option active in the context menu apart from "Delete Partition". Strangely enough, I don't want to do that.

Of course, Windows won't bother the user with such useless information as why it won't allow me to see that partition. Or what I can do about it. Or why it sees my LVM partition and assigns it a drive letter without being able to read it in the slightest. That's totally useless information to the user. Oooh, sorry, I said too much: that's totally useless. That's all that I need to say.

So, it's back to Linux again to see what I can do. Using Windows has reminded me that I have a perfectly working copy of Windows on my hard disk, with MixMeister installed and working on it. This means that there's a fully working copy of WMVCore.DLL in there somewhere. And thanks to prescience, I have already loaded the kernel NTFS drivers and can mount NTFS partitions. A bit of finding later, I've copied the WMVCore.DLL and another one it seemed to need (wmasf.dll) over to my WINE \windows\system32 directory and given them the necessary permissions. And MixMeister is now no longer complaining about it missing DLL files, or producing SELinux audit messages in the system message log.

Instead, it's just crashing with the message "wineserver crashed, please report this."


Another thing to try and figure out. Another thing to wade into, blindly trying to find out any information I can about what's going wrong. Another thing to stab haphazardly at, pressing buttons at random just to see if anything changes. I'm sure at this point more clueful people just give up, knowing smiles on their faces, saying "only a true lunatic with far more time on their hands than is good for them would ever bother to try and work out what's going wrong at this point."

Maybe this would be a good Lightning Talk.

So, I find wineserver and find out how to run it in debug mode (-d2 -f). That didn't actually really help - nothing in the debug was any different between a good run (with the unpatched server) and a bad run except the bad run just cuts out with the helpful message (null)( ). Antti Roppola, looking over my shoulder at one point, suggested running wineserver under strace, and this revealed that wineserver is getting a segfault. Now to try and put a bit of debugging in the things I've added to see if this can tell me why.

Last updated: | path: tech / lca | permanent link to this entry

Conserving power

With my laptop on 32% charge (not quite a record for me) and my own internal batteries needing a bit of charging, I looked at the programme after afternoon tea. There wasn't anything that really stood out for me as a must-see, and as Jeff said in the opening speech it's important to conserve one's energy and pace oneself at LCA. So I enjoyed a quiet walk back down the slope to Shalom College, to see if they have Wifi here yet.

I have to say that I do like the UNSW campus. It has the same style as QUT in Brisbane - fairly closely packed, and a mixture of the old and the new with little nooks and lawns of greenery amongst it all to enjoy as one goes past. Passing the John Lions garden outside the Computing Science building is particularly poignant given last year's fundraiser. And, while some people might complain a bit about the walk uphill to the conference in the morning, and even I end up feeling unfit and slightly out of breath after tackling it, it's very pleasant to walk down in the afternoon.

After Jeff's talk was a talk on Conduit, a GNOME subsystem designed to make synchronising data between a source and one or more destinations easy. It looked absolutely awesome, because the guy who gave the talk understood the problem space well and had solved it in a way that allowed both 'headless' sync to happen behind the scecnes and fully GUI-enabled ad-hoc sync with conflict resolution all beautifully handled. This allows any application that uses DBus to facilitate syncing of data, without having to be an expert in asynchronous synchronisation (not, in this case, a contradiction in terms). Jeff's talk was about enabling social networking and putting GNOME on a phone (so to speak), and this talk was about making synchronising that data seamless and easy. Cool!

And now to my afternoon's entertainment: getting my WINE patch to work. I've got all the code working, including the two bits I'd commented out because I didn't quite understand where the data was coming from. It compiles with no errors and only one or two warnings, which as far as I can see aren't caused by my code. Now I have to go through and manually label my build directory so that it has the right SELinux contexts to have execmod permissions. Presumably Fedora's SELinux configuration assumes that any big bunch of libraries compiled quite recently in your home directory aren't guaranteed to be trustworthy to use as libraries. Fine by me.

Maybe I'll find out if Wifi is enabled in Shalom, so I can post this and Google for solutions to the execmod problem.

Update: Nope, Wifi is not working in Shalom. A quick call to my front man Steve revealed that it hasn't worked today, will be ready when the gods have been appeased and the troubleshooters have shaken their voodoo sticks over it, and is only likely to be sporadic even then. I realise that this is as much a question of getting bandwidth down here as getting the time to set stuff up, and I least of all people want to hassle the network guys with requests that they're already trying to handle. But it's not a great way to end the day for everyone staying on campus.

Last updated: | path: tech / lca | permanent link to this entry

Why are wii here too?

Just a quick clarification before dozens of Jdub and GNOME fanboys jump on me and toast me to a crisp: the overall idea is sound. Social networking and making nice interfaces and embedded GNOME, yay. But demonstrating this with half an hour of rambling Wii play was not my idea of a good way to get this across.

Last updated: | path: tech / lca | permanent link to this entry

Why are wii here?

Here at the GNOME miniconf we've been watching Jeff play with his Wii. The talk's called "connecting the dots", but as far as I can see it should be titled "watch Jeff play with his Wii and get absolutely nothing done". We've got some guy down from the audience and are creating a new 'Mii' character for him, but I've already switched off. And the repeats of the music are going from mildly irritating to annoying, and in another five minutes it'll be insane rage time. Sorry, Jdub, but this doesn't cut it as a talk.

Still, I've got another inconsistency with my WINE patch sorted out, and I've just discovered another, so I've got something to do.

Last updated: | path: tech / lca | permanent link to this entry

It's not just for sharing music?

First talk: virtualisation; and Jon Oxer talking about trying to manage Xen clients across multiple machines without twelve layers of abstraction and SAN thrashing. Very good stuff, and while a friend of mine commented that it was "welcome to the 1980s" as far as IBM and large-scale mainframes are concerned I think that Jon's got a lot of good ideas to bring this to the Open Source world. My tip to speakers from that talk is to assume that you get about half of your time to talk and half of your time to answer questions. Oh, and manage your questions - the microphones are being passed around so that questions can be recorded; take the question from the person with the microphone.

Second talk: Avahi. I'd noticed this a while back when I was using RhythmBox at work - all these extra playlists would turn up in my sources list. I soon realised that these were Apple iTMS instances of other people just broadcasting themselves on the net. Learning about how these things work - and learning some of the tools to add Avahi functionality to programs - is pretty cool. Now to create my Avahi Mandelbrot Set calculator that grabs whatever processors it can find around the network. World domination, here we come!

Aside: had one of those embarrassing moments that seem common at LCAs: meeting a friend and not quite remembering where I'd seen him. On the one hand, I'd gone to sell him my old car, so I should have remembered him instantly. On the other hand, he'd changed his beard (again), grown taller than I remembered him, and my memory was tricked into thinking he was a person I knew from 2001 or so and I still (logically) owe a CD of Renaissance music. My memory is weird.

Jeff confirmed that they're still working out the programme for the Conference Party, and will get back to me tomorrow regarding whether I can mix there. I'm good with that.

Now, with a good spicy Beef Rendang in me, a lead recharging my laptop and a can of V (bought along with seven others and a block of 70% cocoa chocolate on Sunday afternoon) ready to go, it's time to decide what new coolness I'll see this afternoon. Probably Jeff's GNOME talk.

Last updated: | path: tech / lca | permanent link to this entry

LCA Day 1 - so far so good

Waking up was as easy as ever at LCA conferences - at 7AM the alarm rang and I cursed. My tossing and turning at the new, hard and somewhat scratchy bed hadn't made it an easy sleep. Getting to bed at half past midnight after coming home from the Lowenbrau Keller hadn't helped either. But, strangely, I was energised - rather than dropping back into sleep I was ready for the first day at LCA. Shower. Dress. Eat breakfast. Head on up to the top of the campus. Get connected to the Wifi, courtesy of a random Seven team person who was keen to find out whether the network was working. All good.

I note that the Programme has been updated, and that my 'request' to do some mixing at the Conference Party hasn't. All good. I better start practicing mixing again, then. I don't think my patches to WINE are going to come good, and VMWare is having trouble getting installed. It's going to be back to closed-source sinning...

Last updated: | path: tech / lca | permanent link to this entry

Mon 8th Jan, 2007

First Patch Manual Assist

I'm taking my first steps into a wider world of open source coding today, because I've decided to try to put in a patch for Gedit. It suddenly occurred to me today that the document menu, which shows you the various files you have open and allows you to choose from them, amongst other things, doesn't give you a 'short cut' key to select the document quickly. Given that the files (as tabs) are already accessed by <alt>-1 through <alt>-9 and <alt>-0, I thought that this number would be the logical choice for the short cut key (or accelerator, or whatever it's called in GNOME jargon).

So, off I go, downloading the CVS source for Gedit and charging into it. It took a little time to find where I needed to edit, but I guessed it was in gedit-window.c and it would be related to the words 'Document' and 'New Window' (since the text 'Move to New Window' occurs in the menu). A little searching turned up the update_documents_list_menu function, and a little further down there was a bit that got the various file names and turned them into radio actions complete with a tip for choosing them (e.g. <Alt>-1).

A bit of looking in the GNOME Developer's GTK+ reference manual taught me that I'd have to change the item label, and in Gedit this is helpfully fed through a function called gedit_utils_escape_underscore so that underscores in the file name wouldn't trigger GTK to make that the accelerator key. I just needed to prepend the suitably formatted accelerator number. Since the underscore escaping function obviously manipulated the string in some way, I looked at it, and found out that it used the g_string_append utility function to append characters to the new underscore-escaped string (which also grows the target string if necessary). I needed to prepend, so I looked for a g_string_prepend function. And, lo and behold, there was one.

So my patch so far is:

diff -r1.47 gedit-window.c
>         /* Put a 0..9 as a shortcut key before the first ten items */
>         if (i < 10) {
>             g_string_prepend(
>                 name,
>                 g_strdup_printf("_%d: ", ((i + 1) % 10))
>             );
>         }

I'm still installing bits and pieces (my intltool is out of date, it seems) so I haven't test it yet. Who knows, it could be complete garbage. But ones task of contributing additions to open source code is made manifestly easier when the code base is already feature-rich - you can look at previous examples to find out how to do what you want, and there's probably already utility functions to do the lower-level manipulation.

Fotonote: Now submitted as bug #394153!

Last updated: | path: tech | permanent link to this entry

Tue 2nd Jan, 2007

Nouveau Riches!

I was keenly browsing the LCA 2007 Programme when I noted David Airlie's talk on nouveau, an attempt at a reverse-engineered driver for nVidia graphics cards, including full 3D, DRM (that's Direct Rendering Manager, not Digital Rights Restrictions Management) and XvMC support. Or so I believe. if you like the idea of having all this in a blob-free, open source driver, then pledge some money for it!

Last updated: | path: tech | permanent link to this entry

Thu 28th Dec, 2006

Basic No Output System

Before Christmas, I bought a new, larger case for my home web and file server because I wanted to move the two SATA drives from my old desktop machine into it. The board is a Via EPIA Mini-ITX board with two SATA connectors, but not only did the old case not have any room to put them, the little 60W power supply just couldn't spin all three drives up. So, with a few enquiries as to noise output and size, I bought a Morex Venus 669 case (sorry, Dan, but AusPC Market's fee for delivery of same is just a little too high).

After a couple of days (for otiose reasons) I got it home and, casting those connected to my tracker to the four winds, I swapped the board over from its old case. This wasn't too difficult, although the new case is a little cramped for 3.5" drive space - I'd rather they made space for one 5.25" drive and had two or three 3.5" drive bays spaced out in front of the case. The server and the firewall are both attached to a KVM switch for historic reasons, so I plugged that back in and fired it up. It seemed to start, but no "loadity-load-load-load" activity came from the main hard drive. Hmmmmm. The KVM switch has no keyboard or screen, again for historic reasons, so it was time to get that old standby monitor from downstairs and the old bombproof IBM keyboard I use for gaming and see what was what.

Which was that one of the SATA drives wasn't being recognised at all. The BIOS seemed to be doing some kind of work trying to find it - when I started the machine again with it unplugged the BIOS booted normally, but with it connected it would pause for a long time at the "Searching for IDE devices" message, before finally timing out. I swear it came up once, but that was only with the other SATA drive removed and I tried that combination again without success.

Of course, Linux doesn't see hide nor hair of it.

The case power supply (200W) should be perfectly up to it. The motherboard is up to it. The SATA drive that works is 200GB one and the one that doesn't is 250GB, but I'm not aware of any >200GB limit in BIOSes or anything. Even finding BIOS upgrades on VIA's site seems hard enough, but they don't list any specific fixes for problems like this so I'm not going to flash the BIOS just in case. And I really don't want to have to buy an external USB case for the drive, because I bought the larger case specifically to avoid this possibility.

So, Lazyweb: any ideas what might be wrong?

Last updated: | path: tech | permanent link to this entry

Mon 27th Nov, 2006

Internode vs. the D-Link DSL604-G

While at Mark and Carol's place this weekend, I was able to debug a persistent problem they'd been having with their internet connection. Every once in a while (apparently), websites wouldn't load and links wouldn't work. As they use Internode because I recommended it to them, I wanted to help them get to the bottom of this. I was able to immediately confirm that I'd never had such a problem, more or less eliminating Internode from the picture. So, what could it be?

I suspected their router, a D-Link DSL604-G. Firstly, I had problems trying to get my laptop on their WiFi network: despite several retypings of their WEP key (yes, I know and they know that this is not brilliant security) I was never able to connect. Watching the kernel messages established that I was connecting fine, but though I asked for a DHCP address (politely) several times I never received one. At this point NetworkManager tosses in the towel and assumes you can't get on the network, and there's no way to say "No, just use a static IP address". DHCP or nothing, eh? I was able to connect to it using a wired port, but as this was in another person's room this wasn't very convenient.

Sure enough, sniffing the wired network connection confirmed that there was something severely wrong with the way the D-Link answered DNS requests. If I pinged a DNS name, it would send out an DNS A request and get back the correct IP address. However, a Firefox request for the same website would do a DNS AAAA request (for an IPv6 address) first, which would be given a "no such name" response; then Firefox would ask for the A record and this would also get the same (bad) response. I could cause my machine to 'cache' the correct A record by pinging the server name before asking for the web page, but this was tedious in the extreme and not always reliable.

A bit of (slow and tedious) Googling turned up that the DSL604-G has this kind of problem and a firmware upgrade is necessary to fix it. But, even though I patiently found and downloaded both firmwares (there's a model A and a model B, just to confuse everyone) I never got the chance to upload it. If the only way to do so is to run the included Windows executable, then their Mac-only household is going to be out of luck.

However, they've all put Internode's primary and secondary DNS servers in their TCP/IP configuration manually, ahead of the DSL604-G, so the problem has disappeared.

Last updated: | path: tech | permanent link to this entry

Dell DVD Driver Despair

I think I've discovered the cause of the problem I'm having on my laptop, where DVDs play in stuttering, jerky fashion under Fedora Core 6. This is a classic symptom of the drive not being in DMA mode, confirmed by the large quantities of CPU time used during playback. The message I was getting back from hdparm -d 1 /dev/hdc was "Operation not permitted", and for a while I suspected that the SELinux context on the device wasn't allowing root to change its parameters (despite this being very unusual for Fedora Core's SELinux setup). But nothing was showing up in the kernel messages, so I had to go further.

Using strace on the above command revealed the ioctl call that was throwing the error, and as luck would have it this turned out to have been seen by one or two other people that had noted it somewhere that Google could catalogue. The salient comment was from none other than Mark Lord, who not only happens to be "The IDE Guy" but also the guy who build the excellent Hijack Kernel for the Empeg Car and Rio Car players. He asked if the correct kernel module had been loaded for the IDE interface.

I spent some time digging around in lspci and dmesg to find out what driver was being loaded, and if there was one for the PATA device that runs the NEC DVD drive (as opposed to the SATA hard disk, which is using the correct PIIX driver). But I didn't have much luck. I'm not even really sure that this is the correct line of investigation; but it's all I have to go on at the moment. According to Intel's documentation on the 945PM chipset, the 82801GBM provides one PATA channel and two SATA channels on the same chip. Given that information, and the fact that the DVD drive is coming up as /dev/hdc rather than /dev/sdc, I don't think I need to void my warranty open the case up and looking at the cables to determine that it's a PATA drive.

I still need to find out whether the correct driver is being used for the PATA interface. But how?

Last updated: | path: tech | permanent link to this entry

Sat 25th Nov, 2006

CLUG, Women and GPL3

On this month's CLUG meeting, Hugh Fisher led a wide-ranging and stimulating discussion on a bunch of issues facing Free and Open Source Software today. I always like these kinds of discussion - partly because it expands the area of my knowledge and partly because I think sometimes we need a devil's advocate in order to really understand why we stand for the things we do. If we never question the knowledge we have, we're no better than the unreasoning zealots who promote proprietary software.

There were two problems with the process, I felt. One was that having a discussion like this is possible up to about eight people - for a CLUG meeting of twenty or so it sometimes degenerated into a shouting match. I'm as guilty as the rest - I'd stick my hand up sometimes and wait patiently to be noticed, but then five minutes later I'd be calling out amusing comments or counterexamples with the rest of us.

The second problem was that Hugh's approach was basically to attack FOSS's dogmas and articles of faith. This often ended up with arguments coming from both sides - you can't say the Free Software Manifesto is equivalent to Marxism, and then say that there's nothing wrong with capitalism and proprietary software without ending up sounding like you're arguing about completely different things. And these are also the sorts of declarations that get Open Source practitioners somewhat riled up, which means that they want to go on the attack, which is hard if it's coming from the other side of the political arena.

Personally I don't have a problem with a lot of these statements. The Free Software Manifesto is a lot like classical Marxism - where people get confused is then thinking that it equals communism or Stalinism. Proprietary and free-but-closed-source software has a lot to teach Open Source programmers: Mark told me of two features of a Finder-replacement in Mac OS X which would make Nautilus or its KDE equivalent green with envy[1]. Sure, we can copy features from them just as they copy features from us, but it seems a curious inversion of the "it's worth nothing if it's free" mentality to say that the only software worth using has no cost. I certainly don't mind paying for a game or an application I might use; I know the hidden cost is there that I've been locked into their file formats and so forth. I just factor that into the equation. It's the everyday equivalent of reading and understanding all the EULA stuff.

One interesting topic came up right at the start: for a group of people that prides itself on 'openness' and hates companies and governments putting up barriers to participation, we're an awfully White Anglo-Saxon Programmer group. That night was especially poignant as we had no women in the twenty or so people there - and that's not uncommon either. As a contrast, even SLUG has a higher proportion of women. What are we doing wrong.

It's not that hard to deduce, when you ask the question 'where are all the beginners?' CLUG is, somewhat unashamedly, a very technical (and technical-for-the-sake-of-it) audience. Some newcomers (e.g. a former acquaintance) are driven away by the sheer technical complexity of the talks; others are driven away by the heckling random technical questions launched at the speaker from anywhere in the audience without warning. Others still, I would argue, are driven away by the way little cliques will form and gossip about geeky, technical, 'you have to have been reading the mailing list for three months to understand the joke' stuff given any opportunity - the speaker taking a breath, for instance. All of these and more drive women away - the guy at the back saying "I love women - I appreciate how they look" (or something like that) was just the tip of the iceberg.

I think there's a reasonable area between being patronising and being gruffly neutral in our attempts to encourage women to come along. I think part of the problem is that, just by there being fewer women, us guys feel a bit uncertain. Women don't automatically think that we're talking to them because we're chatting them up or trying to impress them. You can treat someone as an equal without them having to know as much as you in your little special interest fields. While I fear that the next woman who visits a CLUG meeting for the first time is going to be swamped with people trying to make her feel welcome ("have a chair!" "no, have mine!" "this one's warm!"), I think we may obsess too much over the problem to admit that the solution is just to be friendly and supportive - much as we (gender-neutral) all like being treated.

But also, we're going to try organising different meals to the standard Pizza Gannet Feeding Frenzy that CLUGgites call "a good way to wrap up a talk". My proposal is:

That way the gannets still get fed, and tnose of us (me included) who would also like something different (and possibly more nutritious) get to eat too. I don't think it's going to cause a mass influx of women in IT coming along to CLUG meetings, but at least it's a step on the way to making CLUG meetings more appealing to more people.

[1]: Feature one: tabbed panes, so you can keep multiple places in the file system and move between them easily. Feature two: a 'drop box' that you can collect files into and then drop them into your destination. Saves all that confusing control- shift- alt- left-click-with-a-fringe on tip selecting in file list windows in order to grab the files you want.

Last updated: | path: tech / clug | permanent link to this entry

Wed 22nd Nov, 2006

Preventative partitioning

The virus which is attacking my nose and throat is not letting up, and while the lab I work in is really interested in identifying viruses quickly, I don't think they want to be identifying this first-hand. But I'm not doing myself any favours if I stay up until 2:30 in the morning hacking.

Last night's adventure was redoing that command that I've been killing the MythTV machine with. My first suspect was the mirrored LV - something about mirrors came up in the messages when I did it. The second time, I ran pvmove with the -v (verbose) option, which showed the following bit of logging:

[root@media ~]# pvmove -v /dev/hdb /dev/hda3
    Wiping cache of LVM-capable devices
    Finding volume group "vg_storage"
    Archiving volume group "vg_storage" metadata (seqno 80).
    Creating logical volume pvmove0
    Moving 56559 extents of logical volume vg_storage/lv_storage
    Moving 86 extents of logical volume vg_storage/lv_swap
    Moving 0 extents of logical volume vg_storage/lv_backup_root
    Found volume group "vg_storage"
    Found volume group "vg_storage"
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/vg_storage" (seqno 81).
    Found volume group "vg_storage"
    Found volume group "vg_storage"
    Suspending vg_storage-lv_storage (253:0)
    Found volume group "vg_storage"
    Found volume group "vg_storage"
    Suspending vg_storage-lv_swap (253:1)
    Found volume group "vg_storage"
    Creating vg_storage-pvmove0
    Loading vg_storage-pvmove0 table
  device-mapper: reload ioctl failed: Invalid argument
  ABORTING: Temporary mirror activation failed.  Run pvmove --abort.
    Found volume group "vg_storage"
    Loading vg_storage-pvmove0 table
  device-mapper: reload ioctl failed: Invalid argument
    Loading vg_storage-lv_storage table
  device-mapper: reload ioctl failed: Invalid argument
    Found volume group "vg_storage"
    Loading vg_storage-pvmove0 table
  device-mapper: reload ioctl failed: Invalid argument
    Loading vg_storage-lv_swap table
  device-mapper: reload ioctl failed: No such device or address
[root@media ~]# pvmove --abort
And there it fails. The ioctl makes me think that moving a PV on a physical volume (rather than its own partition) must be what's causing it. Maybe the IO commands just don't like working on that part of the disk. So, test number three - try moving the data off the 160GB disk. Nothing wrong with that, the PV is all on a partition just fine.

Nup. Same problem again.

This time I'm quicker to restore the system to working order. I booted up off the System Rescue CD, but I couldn't issue the pvmove --abort command because the VG isn't complete (because I've unplugged one drive for the DVD). That's OK. I resize my root partition, create a new, small partition in the available space, copy the root filesystem off the System Rescue CD (i.e. off where it's uncompressed it - nifty trick, the partition image has a header that's a shell script that loads the cloop driver and mounts itself), copy the initrd and vmlinuz files off where the CD boots from, mangle up a new item in grub's config with those files, and, after changing the drives back so the LVM can initialise correctly, boot it.

It doesn't quite go perfectly, but it gets a long way further than I'd feared. I spend a bit of time trying to work out how this particular mangling of gentoo boots, but I don't get too far. So I just mount the uncompressed file system where it should be, run pvmove --abort and check the LVM state: everything comes up good. Reboot again and it's all working.

So now I'm making my own little equivalent to Dell's system rescue install - an unobtrusive partition that will give me all the tools to get the MythTV machine back on line without having to fiddle with cables and disable the LVM and so forth. Maybe I should sit down and make something like this - a version of the System Rescue CD, or a tool within its boot config, to install a version of the System Rescue CD on a partition, including (if possible) plumbing the correct options into grub to get it to come up as an option. It makes a lot of sense for those times when you want to get around a failure but can't put in a CD (or can't get in the box to attach one).

Maybe all the true Linux gurus just boot off their distro's rescue CD and go from there, though.

Last updated: | path: tech / fedora | permanent link to this entry

Tue 21st Nov, 2006

Not learning the hard way

It's now mid Sunday, and I'm mentally and physically wrecked. This is partly due to a throat infection thing that's going around, and partly due to weird LVM stuff, and partly due to sheer bloody-minded stupidity.

It started a couple of days ago, when I took a day off because of a sore throat. I decided, finally, to upgrade my MythTV machine to Fedora Core 6, and in the process remove the old boot drive and change over to a new Logical Volume (LV) under LVM. After several attempts at this problem before, I'd decided to use a mirrored LV to store the root volume on. Luckily I had three disks - mirroring in LV requires a disk per mirror and an extra for the 'transaction log' - and I set it up, copied the old root file system to the new mirror, and that was enough for Fedora Core to recognise in order to install.

But, after a couple of mysterious crashes that ended up in file system checks throwing pages and pages of errors, I started really wondering. LVM is wonderful and stable and allows you to agglomerate disks in ways you would otherwise pay lots of money for hardware solutions to achieve, but my experience so far is that when it goes bad, it starts getting rather difficult to recover. Having the root file system stored in a way that I wasn't sure I could ever recover if one disk went bad - all FAQs and HowTos to the contrary - I decided to go back to plain old partitions.

I bought a 400GB disk for $199 at Aus PC Market with the intention of pensioning the 160GB drive off in the MythTV machine, and giving it a bit more recording headroom. But for various otiose reasons Friday knocked me out and I started feeling very congested and the sore throat had returned from Wednesday. Unable to sleep, I put the new drive in the MythTV machine, partitioned it, copied all the files from the old root file system across, and booted it - it came up fine. In a fit of what seemed at the time to be inspiration but I now know to be a madness brought on by addiction to Lemsip, I also decided to move the data off one of the 250GB drives temporarily so I could partition it.

Long ago when I was setting up the system, I had realised that LVM PVs can be created on the raw disk device as well as in partitions. This sounded like a brilliant idea - no partition to worry about, LVM could put LVs on it anyway, and one less command to perform. Interestingly, you also get about 96MB of extra space. However, this decision has come back to haunt me.

Firstly, back when I was first trying to eliminate the old 40GB disk, I wanted to have a three-way RAID. LVM doesn't do that, but MD does. But you need three partitions the same size. I couldn't repartition /dev/hdb because, well, there wasn't a partition on there to alter. So that idea eventually went out the window.

Now, I thought, I could lay the problem to rest. I had used pvmove before to move space off a SATA disk that I'd bought without knowing that (at the time, at least) the way SATA drives are accessed also causes my DVB cards to stutter (I think it's something to do with DMA, but I haven't traced this down). So, innocently, I issued pvmove /dev/hdb /dev/hda3.

Nothing happened. It wouldn't respond to Ctrl-C or Ctrl-Z (although other characters, uselessly, came up fine). Then every process that tried to access the LVM also seized up. "OK," I thought, "reboot and it'll be fine." But no: rebooting threw up a bunch of errors about a bad LVM state and kernel panicked. It's 5AM and I'm not feeling well and I have a dead MythTV machine - brilliant.

Of course, to add to my complications, I had returned to the old four-drive problem - I had to unplug one of the LVM drives (and thus render the LVM inoperable) in order to plug in the DVD drive to install something. I had the old MythTV partition still backed up in LVM (hopefully), so I reinstalled Fedora Core 6 from scratch (after a bunch of fruitless searching about how to disable the LVM checks at boot-up - it's possible, but you have to edit the nash init script and repack your initrd image and even then it didn't work perfectly; I was hoping for a nice kernel command-line option). Oh, and I have to install in Text mode because I didn't feel like lugging the monitor from downstairs, and even though the NVidia GeForce 5200 will display boot-up on all monitors and TV sets you have plugged in, it won't thereafter show any graphical modes on the TV without options in the Xorg config. Yay.

The new Fedora Core install allowed me to do a pvmove --abort, which then allowed me to see the storage VG and the old root VG. "Hell," I thought, "while I'm here I'll just rebuild the thing from scratch - I've got too much ATRPMS kruft in there anyway." That merrily ate up the hours from six until nine - copying config across, setting daemons to start, turning unwanted services off, updating the repository config with local mirrors, getting the video drivers working again, and so forth.

That night, I woke up for otiose reasons at about four in the morning. Unable to get back to sleep, I decided to look at the config again. The wool in my head and the nettles in my throat made me decide that retrying the pvmove command would be perfectly reasonable - it must have been a temporary glitch. This time, just in case, I dd'd the entire newly-created partition over to another system on my network, created a new 'old root' LV that wasn't striped, mirrored or afraid of water, copied the old 'old root' LV over to that, and removed the old one just in case it was something to do with the mirroring that had caused LVM to bork out. Now secure in my preventative measures, I issued the pvmove command again.

Same result. System locked up.

I rebooted, this time using the System Rescue CD, which allowed me to see the network and the partitions. Right, copy the dd image back again, and reboot... Nope, same problem. Worse, now the LVM partition on /dev/hda3 doesn't exist. Hmmmm. This is bad. Hmmmmm. /dev/hda3 sounds familiar - with that growing horror that computer problems specialise in, I realise that I copied the 20GB partition to /dev/hda3 (the LVM PV) rather than /dev/hda2 (the ext2 file system). Bugger. I can boot, and everything runs, but now the VG won't come up because one of its PVs is AWOL.

I tried grabbing the first couple of sectors of another PV, inserting the correct UUID (which, fortunately, the VG still knows about and includes in its complaints) in the correct spot (after a bit of guesswork - thank Bram Moolenaar for the binary editing capabilities of vim). Nup, no luck - didn't think I could fool it that easily. No-one in any of the IRC channels I was in could offer any assistance (#lvm on freenode is usually quiet as a grave anyway).

One of my worst habits is the way I avoid any problem that's stumped me a bit. Several games of Sudoku, Spider and Armagetron and a lot of idle chatting on various IRC channels later, I was still no nearer a solution. Then, realising that no-one was going to help me and I had to do it myself, I probed around in the options of pvcreate, and found I could specify a UUID. Brilliant! Suddenly the PV, VG and LV was back on the air. Five hours after I'd woken up, I collapsed back into bed. It was Sunday. (At this point, LVM hadn't put anything permanently in the /dev/hda3 PV, so it was merely a question of making sure it was included.)

That afternoon, I made sure that MythTV was going to update its programme guide and relaxed, watching a few TV shows. It seemed an uncommon luxury.

Last updated: | path: tech / fedora | permanent link to this entry

Thu 9th Nov, 2006

Jumping into a new project

(Nearly all of this was written after the PSIG meeting on the 9th of November; then I got too busy and didn't finish it off. So "tonight" is two weeks ago as of this posting.)

Tonight at the Programmer's SIG we were 'supposed' to be having a sort of round-table discussion, with people with ideas meeting up with people who know how to implement them. Or, at least, have more knowledge into the way that Linux is organised and may be able to recommend language choices, libraries to look for and people to speak to. If any of those people had actually turned up, this would have happened. But they didn't.

After the usual early round of "Hey have you seen this cool stuff / weird shit" as meals were served (amazingly quickly, this time), I tried to jump start the thing by asking what people's ideas were. Maybe it's just me - this didn't seem to get any real discussion started. Conversation kept revolving around Pascal Klein's idea for rewriting the Linux kernel in C#, and the multivarious reasons why this would be a Bad Thing. As amusing as it is to discuss bad language choices, the things we hate about customers, and what's new on Slashdot, this wasn't really doing it for me as someone who a) has ideas and b) is a programmer.

Despite the good nature of Steve Walsh's teasing, I do worry that I'm talking too much about my own ideas. I say this because we then had a long and quite spirited discussion about how to solve a problem with my backup process. It started with me noting that I'd thought of an improvement to rsync:

At the moment, rsync will only try to synchronise changes to a file if the destination directory has a file with that name. If you've renamed the file, or copied it into a new directory, then rsync (AFAIK) won't recognise that and will copy the entire file again. However, rsync already has a mechanism to recognise which files are the same - it generates a checksum for each file it encounters and only copies the checksums if the file is different. So the idea is for the receiver to check if it already has a file with that checksum somewhere else. There's more to it than this, but I'll develop that in another post.

This all supports my partner's method of backing up her PhD - every once in a while, she takes all the files so far and copies them into a directory named 'Backup date'. Separately to this, I then rsync her entire directory up to my brother's machine in Brisbane, as an off-site backup. While I'm not especially worried about the time it takes or the amount of data transferred, since rsync's principle aim is to reduce both of these I thought it would be a useful improvement to optimise for the case where a file has been renamed on the client - why transmit the whole file again if you can just copy and delete on the server?

I suppose the thing I enjoyed was the idea of co-operatively solving a problem using the tools at everyone's disposal. Several people suggested that Revision Control Systems would be better in this scenario, because they would only store the diffs and would give instant reversion to any point in time. Other people suggested automated folders that would pick up the files in a 'drop' directory, put them in an appropriately labelled directory, and then start a remote copy of the appropriate folder on the remote server. Other people suggested that having two backups was overkill - that as long as I had the remote server updated I could retrieve backup copies should anything go wrong locally. All of these were good suggestions, and despite the problem that they didn't really solve the problem the way I wanted it to be solved, I did really appreciate the new ideas and approaches.

That led me to my next question, which was: rsync is a largish and complicated piece of software. The philosophy of Open Source says that if you have an idea, you should modify the source rather than ask someone else to do it; and I can program in C so the source of rsync wouldn't be foreign to me. So where do I start? One approach suggested was to generate a tags file and start tracing through the execution of the main routine; another was to find the printed text messages that are generated at the time that I want my revision to be used, and start reading from there. A further approach was to draw a concept map - sketch out the top-down design of rsync in order to narrow down the code I had to read. All excellent suggestions, and when I have some spare time I shall try them.

Then we had some real nuts-and-bolts stuff; I showed Hugh how to do Doxygen documentation, and Daniel showed me a bit about autoconf/automake and how to integrate them into my coding. He also suggested a technique of checking for the existence of a library at runtime (e.g. libmagic) in order to determine whether we should call the libmagic routines to check file type; unfortunately I can't now remember what this magical call was. I should have been writing this nine days ago.

It started out not looking so good, but I think it was one of the better Programming SIGs I've been to.

P.S. I've also learnt tonight that, if my WiFi is connecting and then almost immediately disconnecting after showing now signal strength, unloading and reloading the kernel module (after stopping the ipw3945d service) will reset it; starting the ipw3945d service again will get things back on track. Or so it would seem from this initial test.

Last updated: | path: tech / clug | permanent link to this entry

Wed 8th Nov, 2006

Progress at last

In the long, complicated and frustrating quest to actually upgrade my MythTV machine to Fedora Core 6, and in the process get rid of the 40GB drive which holds its operating system, I think I've finally begun to make progress.

The final realisation today was that, if I abandon my crazy idea to have a MD RAID5 on top of three LVM LVs, I have much more likelihood of success. Instead, I can configure a mirrored LV[1] that spans two of the disks. Then, with a suitable mantra[2], you can copy the files across and set it up to boot from the new partition[3].

<voice style="bullwinkle">This time for sure!</voice>

[1]: Creating a mirrored LV with only two disks is actually completely impossible with LVM, despite it seeming perfectly logical. This is because you need a further physical volume on which to store the log volume, which is automatically created by lvcreate and stores the write journal to keep the two mirrored volumes in sync.

[2]: find / -mount -depth -print0 | cpio --null -apmuvd /new/mount/point. Or, if you're like me and want to leave it to do its job without clogging up your console, dtach -c ~/cpio-socket bash -c '( find / -mount -depth -print0 | cpio --null -apmuvd /new/mount/point )'

[3]: You do this by changing the grub configuration file /boot/grub/grub.conf. Change the 'root=LABEL=/' text to be 'root=/dev/volumegroup/rootvolume'. Make sure that your new root's /etc/fstab lists itself correctly.

Last updated: | path: tech | permanent link to this entry

Mon 6th Nov, 2006

Binary Lump Compatibility

I was thinking last night, as I vainly searched for sleep, about a long-standing idea of mine: the Blockless File System. If you imagine the entire disk as just a big string of bytes, then several problems go away. You don't need to have special ways of keeping small files or tail-ends of files in sub-parts of blocks (or, for that matter, waste the half-block (on average) at the end of files that aren't a neat multiple of the block length). The one I'm really excited about is the ability to insert and delete arbitrary lengths within a file. Or append to the start of a file. And what about file versioning?

Aaaanyway, for some reason I was thinking of what would go in the superblock. I have only done a tiny bit of study into what goes into the superblock in modern file systems, so I'm not speaking from the point of view of a learned expert. But I thought one idea would be for a header that could detect which endian-ness the file system was written in. A quick five seconds of thought produced the idea that the block 'HLhl' would not only be fairly easy to recognise in a binary, bit-oriented way, but would make it really easy for a human to check that it was big-endian ('HLhl'), little-endian ('lhLH'), middle-endian ('hlHL') or any of the twenty-one other combinations of 32-bit endianness the computing world has yet to explore.

It would also mean that the reader could simply translate whatever they read into their own endian-ness and then write that straight to disk (including setting the HLhl header in their native endian-ness). So writes are not penalised and reads are only penalised once. If the entire disk was written this way, it would mean that you could take the device to a machine with different endian-ness and it would behave almost exactly like normal - almost no loss of speed in writing or reading little-endian defined inode numbers or what-have-you. The entire disk could be a mix of endian-ness styles and it would still be perfectly readable.

Of course, it's not really going to matter if you don't take your disks to foreign architectures, or those architectures support some sort of SWAB instruction to swap endianness with little slow down, or if the users of disks that have been taken to foreign architectures don't mind a bit of slowdown reading the file system structure.

This is probably one to chalk up in the "Paul solves the problems of the early 1970s computer industry - in 2006" board.

Last updated: | path: tech / ideas | permanent link to this entry

Sun 5th Nov, 2006

Familiarity breeds contempt

Now that I've had the laptop for a bit over a week, I'm starting to find all the niggling little problems. Nothing disturbing, nothing worth enquiring to the manufacturer about, but things that need tracking down in all that free time that I don't seem to have.

I think there are problems with the Intel Pro-Wireless 3945 driver, or at least FreshRPMs' packaging of them. Occasionally I get 'Microcode SW errors' detected in /var/log/messages ("Can't stop Rx DMA" seems to be the popular one) that indicate that the wireless is going to fail. This usually presages the machine freezing for a second or so, and then after it's irritated me with this for a while it'll just hard crash and I'll have to power it off. I've had occasional luck with turning the machine off, leaving it a while to cool down, and turning it back on again. But not always, now being one of those times.

I also need to investigate more thoroughly how to get CPU frequency scaling working. I have a feeling that I'm driving the Centrino at top speed all the time in Linux, which while it gives me a lovely sense of power, means that the overheating problem above (if it is such) is worsened.

After finally getting MythFrontend working and connecting to the back end, it's worked some black magic in the sound settings that means that no audio comes out at all. I've turned up and unmuted every control that the GNOME volume manager can lay its hands on, to no avail. I'll have to see if this is a long-term problem - maybe it's just caused by having plugged my headphones in and confused it somehow.

DMA is apparently not enabled on either the hard drive or the DVD writer. Trying to turn it on gives an 'Operation not supported' on both - the former because it's a SATA device, the latter because... I don't know why. It makes DVD playback in Xine slightly skippy.

The Amazing Everything Card Reader in the side has manifestly failed to work. I get a bit annoyed at the proliferation of physically incompatible memory cards out there and at the complete lack of an obvious 'correct' insertion direction. I can figure that the contacts go in first, and the label probably goes up. But there's certainly nothing registered in the kernel messages whichever way I insert the thing.

I've worked out now that it won't go into standby mode when you shut the lid if you have the power connected. This confused me at first.

The few odd little crashes that seemed to accompany leaving it on and connected for any length of time have now apparently gone away, though. It's too short a baseline to tell.

Overall, it's still a wonderful thing. The AIGLX/Compiz Window Effects and the general smoothness and speed of the thing have been great. Surfing the net from the couch is starting to become very attractive. Typing while in bed likewise. I'm just trying to note down what has soured the experience slightly so that, if nothing else, I can read it later and remember the data points...

Last updated: | path: tech | permanent link to this entry

Tue 31st Oct, 2006

Brief notes on a working laptop

I'm still working on my page about installing Fedora Core 6 on a Dell Inspiron 6400. That will contain all the bits and pieces I've had to do to get it working to my satisfaction. But I have to say at the outset I've been very happy with how easy it's been to get working most of the things a laptop should do.

In my opinion, we're now at the stage that if Dell did the same process of installing, configuring, patching and testing on a modern Linux distribution as they do with Windows XP Home, the user would get the laptop with the same amount of functionality 'straight out of the wrapper'. Imagine that - an everyday person with average, non-technical experience with Linux, just opening it up and everything working just as it should. This is what Windows XP and OS X try to ensure, and for Linux it's really starting to become a reality.

Of course, we'd still need to train people. Scarcely anyone these days is completely computer-agnostic - they all have some prior experience, and that's usually Windows. The desktop experience may be quite similar, but it's not the same, and we do need to provide easy ways for people to get used to the different technologies 'under the hood'. Applications install in fundamentally different ways. The drive is layed out differently. Even simple things like training them that they use Evolution instead of Outlook and FireFox instead of Internet Exploder take time and patience. These are not difficult things; this is what the Make The Move project is trying to do. It takes time, but people are always adaptable. If you don't tell them it's a hassle, half the time they'll find it easy anyway.

(Make mental note: I must write the "Paul's friendly guide to finding your way around your new Linux system")

Last updated: | path: tech | permanent link to this entry

High Moral Ground Linux

Michael Ellerman says "distros don't get to take the high moral ground". I'm sure the entire Debian community are now baying for your blood, Michael; they've been taking the moral high ground for years. The current fuss about FireFox vs IceWeasel is as much Debian saying "We don't want to be encumbered by Mozilla Foundation's licensing arrangements" as Mozilla's manifest failure to co-operate with the Linux community over patching and source code control. If distros were supposed to make things work for their users, Kororaa wouldn't get rude letters from kernel developers stirring the pot about closed-source video card drivers. We'd all have MP3, MPEG2, WMA and MOV format players in our distros by default, instead of having to secretly install them behind the developer's backs by a sophisticated combination of extra repositories and secret handshakes.

My stand on the issue of scripts assuming /bin/sh is actually /bin/bash is that this is just plain wrong. It not only makes things a lot slower, on boot particularly, but isn't that difficult to fix: if your script requires /bin/bash, ask for it by name! Submitting patches to all those scripts that require /bin/bash is the punishment for developer laziness. I do wonder about the amount of testing that went into the patch that changed /bin/bash to /bin/sh in Edgy, because the failure reports seem to be coming thick and fast. But, just as kernel developers get to shake the finger when someone abuses the GPL, the entire Linux community should get to shake the finger and glare meaningfully at the developers who assume that they're getting /bin/bash by any name. Linking /bin/bash to /bin/sh is like putting on a blindfold and saying "I must be safe, I can't see the wumpus".

(Besides, it can't be that many shell scripts that require this assumption (i.e. they need functionality which /bin/sh doesn't provide but /bin/bash does). Edgy still boots, after all. I suspect the "millions of scripts" that would require changing is really between ten and fifty.)

"Distros don't get to take the high moral ground, they're supposed to make things work for their users". Bwah-ha-ha-ha-ha-ha-ha! Oh, Michael, you're a funny guy.

Last updated: | path: tech | permanent link to this entry

Fri 20th Oct, 2006

Brace yourself, the laptop is coming

Things I'm doing in preparation for the laptop:

1. I never thought I'd do it, but I'm really starting to consider decomissioning 'Tangram', my main home machine up until now. Not only is it hard to justify having two machines to work on, let alone stupid to even consider trying to share mail on them, but it's bits won't go to waste anyway. With a bit of jiggery-pokery I can move its two 250GB disks into something like a Thecus N2100, but it (too) has a thriving community installing Linux on it. And it saves me having to buy two disk enclosures for something like a NSLU-2, as well as having to secure them somehow to stop them leaping off a desk at a crucial moment, as hardware seems inevitably to do. The Athlon 3200 processor can go back into Media, the MythTV machine, with a suitable cooling solution (I've found that heat pipe coolers don't work very well when the pipe is vertical). I can Ghost Kate's computer and move it onto what remains of Tangram, keeping her machine (Ludos) for its original purpose: Windows games. That way she doesn't have to use a computer whose fans produce a teeth-grinding phaser sound of combined fan hums.

2. Reading up on setting up wireless networks for absolute beginners. This starts with downloading the manual for the bundled Belkin wireless router ($33 US through Belkin but $88 Australian through Dell? And you only get the $100 'cash back' voucher if you pay for the modem? I smell a rort here... But maybe, to borrow the words of Bubs, that's not one of the 99 ways they aim to rip me off...). So far I haven't found any guide that isn't router-specific and that covers more than the simple "Don't use unencrypted connections, set your ESSID to something that can't be easily guessed, and get 802.11g" instructions. Oh well. I know I lose some geek cred by saying this, but it's manual-reading time.

3. Working out my plan for installing Linux (hopefully Fedora Core 6 will be fresh off the press when it arrives). Finding out what I have to do to set up the Intel 3945 wireless (which looks like it requires a firmware driver like the 2200 did on the Inspiron 9300 that work allowed me to use last LCA), the screen (easy: it seems the work on the 915resolution utility has also included adding a "ForceBIOS" option to the xorg.conf file to set it permanently), and power management (which seems to be handled pretty well in Fedora Core 5 now).

4. Plans to blog about this and put a new page up for Linux On Laptops to index.

Last updated: | path: tech | permanent link to this entry

Wed 11th Oct, 2006

Come up and see my library.

I'm now quite experienced in using Doxygen to add documentation to code. I've written pretty much all of the documentation for the CFile and progress libraries, and pwlib is pretty small so it'll be easier to do. Now: how do I get everyone to have a look at it?

Well, step 1: put it on my own web site. That's relatively easy: those sons of fun at Ace Hosting have subversion installed; just check the source code out in a directory underneath my public HTML directory. I also added a snipped of .htaccess file to prevent people looking in the .svn directories thus created, more to stop undue confusion and anxiety than any real security threat. Step 2 was a little more work: building the Doxygen HTML documentation. I'll have to get Ace Hosting to install Doxygen, but in the meantime I compiled it and ran it from my own bin directory. LATEX is installed, so it can automatically generate the formulas as .png files and link them in automatically. Hooray!

The third step is to make this available as more than just a directory index. This probably means either kludging some Doxygen documentation together to contain it, or writing a file that can scan through the directories inside my /code directory and create a nice piece of HTML from a template that lists all the source files and links to the Doxygen documentation directly. I'll probably just write a templater; who needs another reason to reimplement the wheel? Although - maybe there's some clever way that Doxygen does this already - I should look into its recursive options...

Anyway, for now, you can see what's done so far in the /code directory. So far only the CFile and progress libraries are documented, but have a browse anyway. Genuflections, derisive laughter, offers of money, peanuts, etc. can be sent here. Of course, if you want to check out the Subversion code and play with it that way, you can retrieve it from svn://

Last updated: | path: tech / c | permanent link to this entry

Uncompressed bzip2 file sizes the easy way

Cacheing. It's all about cacheing. If you've got some result that took some time to calculate, save it again for later use.

In this case, I'm referring to the uncompressed file size of a bzip2 file. A gzip file is easy - it's the last four bytes as an integer (signed? unsigned? I don't know). But Julian Seward, in his wisdom, didn't put such information in the bzip2 format. So the only way to determine the uncompressed size of a bzip2 file is to physically decompress it. In CFile I do this by opening a pipe to 'bzcat filename | wc -l', which is probably inefficient or insecure in some way, but I reckon does the job better than me rewriting a buffered reader in CFile for the purpose. It means that if you're reading a bzip2 file, you have this potentially long delay if you want to know the file size beforehand. (If you don't care how long the file is before reading it, then you won't incur this time overhead).

So: how do we cache this value for later? In an filesystem extended user attribute, that's how! There's just one minor problem: if you rewrite an existing file, then fopen() and its logical descendents will truncate and rewrite the file, rather than deleting the inode and then creating a new file. Which means that the extended attribute stays where it was, and now refers to the uncompressed size of the wrong file. To solve this, we write a timestamp in another attribute which determines the time that the size was calculated - if the file's modification time is later than this, then the file size attribute is out of date.

(Of course, if the file system doesn't support user extended attributes, then we bork out nicely and calculate the file size from scratch again.)

Of course, you need user extended attributes on your file system. I thought this would be already available in most modern Linux kernels, but no! You have to explicitly turn it on by putting user_xattr in the options section of the file system's mount line in /etc/fstab first. Fortunately, you can do mount -n -o remount / to remount your root file system with the new options - as is so often the case with Linux, you don't have to reboot to set low-level operating system parameters! Yay us! Ext2, Ext3, XFS and ReiserFS all support extended user attributes, too. Once you've done this, you can test it by writing a parameter with something like attr -s -V bar file to set an attribute and attr -l file to list the file's attributes. You have to use 'user.' as a prefix to specify you want the user attribute namespace.

So, now to write the actual code!

Last updated: | path: tech / c | permanent link to this entry

Mon 9th Oct, 2006

talloc and require

BTW, I found a gotcha in recent versions of the talloc library. In order to work on all those platforms that don't have all the standard and not-so-standard utility functions that talloc and Samba are built against, Samba and talloc now have a 'replace' library that implements them all. So in order to build talloc, you now have to check out the replace library - you do:

svn co svn://

Then in the new replace directory do ./, ./configure, and make. You can do a make install to install the replace library, but talloc still seems to want direct access to the replace.o and snprintf.o files in the replace library, so you have to then go back to talloc, ./ and ./configure, then link the replace object files:

ln -s ../replace/replace.o
ln -s ../replace/snprintf.o

Then and only then can you do the make in the talloc directory to make the new talloc library. Oh, and if your code sets a destructor function, then you have to change the type of argument it takes, as it's now correctly typed (rather than taking a void *).

Last updated: | path: tech / c | permanent link to this entry

CFile now does bzip2

Over the last couple of days, I've actually been getting to do a fair bit of actual coding. It started with me adding the basics of support for bzip2 compression into my CFile library. Then I decided to redo my subversion repository for the libraries I've written so far (cfile, progress and pwlib) into separate directories, rather than the more standard but somewhat restrictive trunk|branches|tags structure that the Subversion book recommends.

Today I've added the rest of the bzip2 support, namely being able to read lines from them. It involved me copying an implementation of fgets that I found in stdio.c and implementing my own fgetc for bzip2 using array buffers. The hard part is detecting EOF, because it seems that the BZ2_bzerror routine doesn't actually return BZ_STREAM_END when the stream is at an end, it just returns BZ_OK. But BZ2_bzread will return 0 bytes if you're at the end of file, so I detect that and return EOF accordingly.

This also gave me the impetus to correctly detect EOF in the rest of the code, something that I hadn't implemented correctly. I'm still not sure I'm obeying whatever ANSI or POSIX guidelines there are on this subject, but the test-cat program I've written reports no differences between the original uncompressed file and being fed through CFile, so I'm assuming I'm doing something right.

My experience, given that I can be reading 1.5GB uncompressed sequence files, is that compressing the inputs and outputs saves not only space, but time (to read and write the files). I noticed the other day that The Gimp also allows you to read and write files with .bz2 or .gz extensions as if they were the uncompressed images. Hopefully the CFile library will give that functionality to people who only want a dependency on Talloc, rather than on half of the Gimp, an external compression program, and enough spare filesystem space for the uncompressed file...

Last updated: | path: tech / c | permanent link to this entry

Fri 6th Oct, 2006

Hot for Supreme Commander

After ABC's new show "Good Game" previewed it, I read a couple of reviews of "Supreme Commander", or 'Supcom' as it seems to be known. The review by Gamespy was particularly enthusiastic, in which I learnt that the main lead is a guy called Chris Taylor, who also developed Total Annihilation. I know people who still play TA, and when Starcraft was still a glint in Blizzard's eye, TA had features that made Starcraft look like cheap; TA has full 3D object models, full 3D terrain, and each unit can queue orders nearly infinitely. It still rocks.

Supreme Commander looks to do the same to TA. You can zoom in to see the side of a tank and out to see the entire continent, and every gradation in between. You can swivel your viewpoint around, something that TA alluded to but never actually gave you. It's using the full palette of lighting and effects that modern 3D cards give you. You can move the control panels anywhere - they're not glued to the bottom of the screen like in virtually every other RTS game I've seen. If you have two monitors, you can have two different viewpoints; or, with one monitor, you can split it in two for the same effect. The sheer range of sizes and uses of units seems to be staggering. But, to me, that's not the best bit.

Some people like the micromanagement in these games - setting up each unit with orders 'just so', having to grab each construction unit as it finishes building one structure and command it to build the next. I don't - this explains, in part, why I prefer to play Protoss in Starcraft; you can use one Drone to warp in half a dozen buildings and get it back to work without having to remember where it was each time. RTS games have always struggled with this, and I think it's probably one of the most variable player requirements - players go through the entire spectrum of hands-off to hands-on control.

So it's good to see that, in Supcom, you can not only give units long sets of instructions (as you could in TA) but the style of orders and the amount of detail in each seems to be variable too. Reviews speak of wolfpacks of submarines harassing a coastline - something you could potentially do in TA but that would rarely make much sense, because the scale of the maps was usually too small. You can also warp in a base commander - give em a number of units to command, and e'll keep the base repaired, protected, and producing things to a certain schedule. Even things like being able to tell a factory to just keep on producing tanks on a regular basis, rather than saying 'ten tanks then stop, please, because your tiny brain can't comprehend making more than ten of anything'. Nice.

Of course, there are Open Source RTS games that I've yet to check out. Three that I can see are Stratagus, Glest, and Spring (formed from a project originally just trying to be compatible with TA). As far as I can see, Spring doesn't have Linux binaries but is the most advanced of the three. Glest looks like more of a Warcraft III clone with improvements, and Stratagus is adaptable but, IMNSHO, basic. I have a feeling my previous researches into this topic had found another game, but I can't remember it.

I still have this funny feeling that Supcom is going to blow them all off the face of the earth...

Last updated: | path: tech | permanent link to this entry

Wed 4th Oct, 2006

Working with other people's code

I've long been a convert to the talloc library for memory allocation, after Tridge's talk mentioned it at LCA 2005. Granted, it's not a sinecure for memory problems - you can still have memory leaking out your program's ears, but at least it's not completely unfreeable. And because it's typed and it does all the things you're supposed to do with memory, like zero the memory immediately after it's allocated and nullify the pointer immediately after it's freed, it does help you avoid the worst excesses of pointer mishandling.

For my own sake, I've written a little library that allows you to treat a gzip-compressed file (with the .gz extension) as being functionally identical to a normal file - you use the same library of functions to read or write to it (although not at the same time) regardless of whether it's compressed or not. When you open the file, the file type is determined automatically and the correct lower-level functions are used transparently. It also provides a useful 'getline' routine, which reallocates your string buffer (using talloc_realloc) on the fly to make enough space for whatever line you're reading.

Now, I've just embarked on a project to read and write information in a simple dictionary section/key/value format - much like a .ini file. So I've started looking at a parser library rather than reinventing this particular wheel (I know, I know, shock horror, has Paul actually learned to do some research?). But it doesn't use talloc and it definitely hasn't heard of my CFile library.

It seems likely to me that someone's going to suggest an object oriented language like C++ here, so that I can extend the class. But as far as I can see this doesn't actually solve the problem. I don't want to add functionality to it (yet), I want to replace functionality that it relies on with other functions of my own choosing. Which means that Nicolas would have had to have written his library with a file and memory abstraction class, so that I could then extend those classes and have iniparser use that new functionality with no recompilation or reprogramming. What if he'd thought to abstract the file functions, but not the memory functions (after all, who expects to not use malloc in C?) I'd still be looking at rewriting part of his code. And, since I'm writing in C, it's all a bit of wishful thinking anyway.

So is there any way to do this? Do I have to modify Nicolas's code? I can search through the Samba codebase, because I'm sure they've implemented a .ini file reader in there, but I want to write the file too and maybe they haven't done that. And they won't be using my CFile library.

So is there a solution? Do we have to go to a new, revolutionary programming language, where somehow you not only override these and perhaps other even more basic functions, but do so knowing what other functions you're overriding and what guarantees you're providing to the code 'above'? Does such a thing exist?

Because you can bet Linus Torvalds' hair that there's no library so good, so all-encompassingly correct, that everyone will use it. talloc, for all its plusses, is not ideal in an environment where every byte may count; or at least there will be some people who will fight to the death to avoid using it. My CFile library is perhaps no-one's idea of good (or someone else would have done it way earlier, and I haven't seen one yet), but I reckon it solves a problem that I think is worth solving. It would be good to have ways of using these libraries without having to rework other people's code - it gives one the temptation to just write everything myself and save myself the trouble.

Last updated: | path: tech / c | permanent link to this entry

Mon 18th Sep, 2006

They just don't care, do they?

As I've mentioned before, I run the large and well-designed site known as Dave's Web Of Lies. Amongst it's thousands of features is the ability to submit new lies to the database; naturally they are intensely scrutinised for any speck of truth beforehand. Now, the site's name might seem to give the game away, but those industrious linkback spammers obviously don't have time for such niceties as checking whether their handiwork has had any effect, or is even meaningful. My favourite 'comments' left in the submission form so far have been:

Looking for information and found it in this great site... - Jimpson.

Thank you for your site. I have found here much useful information... - Jesus.

The irony is that I don't know whether to include them because they are, indeed, genuine lies. But, on principle, I reject them. It's not as if liars get linkbacks on DWOL anyway...

Last updated: | path: tech / web | permanent link to this entry

Wed 13th Sep, 2006

Heap merge sort idea

Imagine you have a number of sorted lists that you want to merge together into a single sorted list. You're going to output this single sorted list (somewhere), and the input lists could be quite large, so you don't want to reserve double the memory to put them all together and sort them. So you're looking at an n-way merge, which is a fairly tedious algorithm: basically you have to scan the head of all the lists for the lowest element (i.e. n comparisons) and remove that element, and do this for every element in the lists. Oh, and you've also got to remember which lists have run out of elements, so you have further tests to do each time. Very quickly we're looking at a lot of excess work to get those list merged.

Now, I've been thinking recently about binary heaps. A heap is like a very relaxed binary tree: each node has two children, but the only condition on the children (the 'heap property') is that the value in the parent has to be less than the value of any of its children. I say 'less', but that's one type of heap known as a min-heap - the other is a max-heap, where the parents are greater than any of their children. Read up on the binary heap page linked above for more information.

To return to the problem: we want a better way of finding the smallest element of n lists. I thought: 'heaps'! Put the first elements of all the lists into an array and make it a heap. Then pull the first element off (it's the smallest), put the next element from that list into its place, and re-establish the heap condition. In order to know which list to pull the element from, each element in the heap contains the list numbers as well as the last element of that list. When you run out of one list, it'll be the first element in the heap - swap the last element of the heap with the first, truncate the heap by one element, and re-establish the 'heap property'.

The problem with heapsort is that you're swapping the last element of the heap with the first - i.e. one that's much lower in the heap for the highest. In this heap-merge method, the new element is much closer to the value of the root element, so my intuition is that you're going to do a lot fewer comparisons and swaps than if you'd had to do an equivalent heapsort, whatever that might be.

I did an experiment by hand to merge four lists of eight elements each; for 32 elements I did 23 swaps. So it's less than O(n) performance for that small sample, but I doubt it'd ever be O(log n). We've also had to expend at worst O(n log n) time sorting the lists, so that's the limiting order of complexity. But the main reason I'm interested in being able to sort a number of lists is that this leads to a parallelised version - each slave keeps its own sorted list and then sends them piece by piece to the master, which merges the lists back together with minimal extra processing and memory required.

So: test code, here we come!

Last updated: | path: tech | permanent link to this entry

Mon 11th Sep, 2006

Tense Code

Remember this statement from a previous blog post: "Who would have thought I could write a merge sort off the top of my head and have it work first go?"

I believe the canonical response is now: "Bwah-ha-ha-ha-ha-ha-ha!"

Of course it didn't work first go. Once I'd patched up the obvious memory leaks, it still leaked like a sieve. I then spent half a day putting in debugging code and writing a little perl program that would watch the memory allocation and work out at what stage it'd leak. That cleared up a few more bugs but it was still leaking vast swathes of memory. So I wrote more debugging code to keep a much closer eye on memory allocation, and after debugging that, I found the final leak. That wasn't too difficult to plug, as it turned out, and after a comprehensive run didn't show up any errors and the results matched the known-good-but-memory-inefficient version's output, I was moderately satisfied.

I feel like scrat in the first scene from Ice Age, where he's holding onto a cliff trying to stop it leaking out at him, and balancing a large, precious acorn on his head. There's some satisfaction that I've tracked the problems down, but the overall solution is... tense.

To cut a long story short, I've tested a bunch of different methods: creating new blocks, not creating new blocks, array increments versus array indexing, and in-line sorting of two and three element arrays. The results are generally about equal - interestingly, the overhead from creating huge numbers of pointers and almost immediately disposing of them, and the speed gain of doing tense things like in-place array incrementing, wasn't as much as I thought. The major foul was the 'in-place' merge, which did a fair bit of realloc'ing, which was about three times slower than every other method. No big surprises there.

Of course, now I'm re-implementing it again (that's the fourth time this week!), with my heap-merge method. Stay tuned.

Last updated: | path: tech | permanent link to this entry

Wed 6th Sep, 2006

Simple joys

I love VMWare, it's revolutionised my life. Now I can do my mixing in MixMeister inside a VMWare instance of Windows XP, and it works completely! I can give it two sound cards, too (because I have two sound cards), and even access to the CD burner (with legacy support turned off) to allow it to burn mix CDs. This means I don't have to stop everything I do and leave Fedora in order to mix it up a bit.

So, after a night of idle mixing, I've neared completing my 'mix of all the vinyl records I have' (alas, no snappy title springs to mind) and continued work on my aiming-to-be-huge Goa mix. I'm listening to the latter now, and I've mashed them into a gooey 40 kilobit per second MP3 paste for my mobile phone.

Then, to continue the good news, I found out that Internode is now preparing to build ADSL2 in the Belconnen exchange (which I'm connected to), so I'm going to be on meaty fast internet goodness soon. (I'm going to keep using the old Alcatel Speedtouch 'Stingray' modem until I can be bothered going any faster - AFAICS I'll still get 8Mbps down and 1Mbps up, and the latter is what I'm wanting.)

And to top it all off I've got my license renewed and the program I've been writing for work is nearly working. Who would have thought I could write a merge sort off the top of my head and have it work first go?

Last updated: | path: tech | permanent link to this entry

Fri 1st Sep, 2006

A new approach to the problem

I've revisited a program of mine that I've written for work. The basic idea is to read a set of DNA sequences and produce different files which list all the different 'subsequences' (jargon: oligonucleotides) of different lengths, and how many we saw of each. I presented this idea at a Programmers SIG a while back and the reply came back: write them out and use sort(1) and uniq(1) -c. In my vanity I hadn't thought of this, and was instead making up great complex data structures and slab memory allocators and all sorts of junk. Then I was diverted to work on other projects; when I came back all this looked like so much gibberish.

So I wrote a simple test and showed that, for an average set of virus sequences, I get about files ranging between 24MB (six-letter words, called 'six-mers' in molecular biologist's jargon) to 77MB (21-mers). Sure enough, sort and uniq produce what I want in a not inefficient manner. But I'd like to run this as a single standalone executable, much as that's against the Unix piping ethos. For efficiency reasons I generate all the files simultaneously, and the thought of forking off fifteen separate sort | uniq -c pipes make me shudder. There must be a better way, I think.

The first obvious improvement is to keep the lists in memory and use some kind of in-memory sort function. The files contain about three and a half million words apiece, so it would be possible using qsort(3) to fill a 21MB array with six-mers (since you wouldn't have to store the null at the end). There's a setting in talloc to allow grabbing chunks of memory greater than 64MB in size, so doing even the 21-mers (77MB in size) would be possible using this method.

The problem, though, is that the way I generate the sequences is to generate all the different ranges simultaneously - doing it in one pass through the sequences. Storing all of them in arrays simultaneously would require 812MB (roughly), and this seems to not be much better than my previous dodgy tree implementation.

Then I realised: all the six-mers are just the prefixes of all the seven-mers, plus all the six-mers that didn't fit in seven-mers. This process applies right up the scale. So I could generate an array which contained the longest strings (null-terminated) available at each location (up to the maximum length required), and sort that. If I did that with fixed-length 32-byte strings (more than enough for all the things we've been doing so far) you'd use 112MB or so. That now contains an ordered list of all the strings of lengths between the minimum and maximum we're using. In order to extract all the N-mers, ignore all strings of length less than N, and take the first N characters of the remaining strings. They're still in order, so counting their frequency is a simple matter. You could even do this in parallel, in one pass across the array (which would increase cache read performance).

Consider, for a moment, though, if you can't allocate huge arrays like that. So you have to break the single array into smaller, more manageable arrays, sort each in turn, and do a merge-condense to recombine each block. Which lends itself to parallelisation: each slave is given a contiguous chunk of the sequence (i.e. one with no non-sequence characters in it - like being given a paragraph), and breaks it up into words, sorts them, and then returns the data to the master (either in single messages, or better yet in chunks) for merging.

But think the sort process through: most common sort routines recursively descend through successively smaller chunks of the file, sorting each and then combining the sorted bits back together into larger sorted chunks. Could this process not also combine elements that were the same? So we might sort records of a string and a count (e.g. 30 byte strings and a two-byte count), initially starting each count at one but as each sort finds duplicated strings they be combined and added together? This would also compact the data storage as we go, which might well mean that it might be good to read each sequence into a separate block, which is then sorted and compacted independently (giving a certain granularity to the process) and then the final merge process happens N-way rather than two-way. If that's too complex, make each merge two-way but just do as many two-way merges as necessary to get all the data into one sorted array, which would now be also compacted and contain the counts anyway.

The best part of this AFAICS it it's also neatly parallelisable. I can even write an application, which, if it finds a PVM machine set up, can distribute itself amongst the hosts neatly, but if it doesn't it can still operate in 'single thread' mode and do all the work itself. Which, in turn, means that as long as I keep in mind that at some stage I could distribute the workload, I won't end up having to substantially rewrite the program (as it was looking with the last method).

So, you see, I do do things other than rant and rave sometimes.

Last updated: | path: tech / c | permanent link to this entry

Rocket and Jifty

Work on my Rocket module continues slowly. Slowly, because I've discovered yet another project - a database of music, instructions and notes on Irish Set Dances. It's an extrapolation of the spreadsheet I had that showed which sets were on which CDs and whether I had instructions for them; now it's fully relational so you can have multiple CDs and multiple instruction sources for each set, as well as recording the other tracks on each CD so that if you need waltzes (for example) you know which CDs they're on. Now that I've got the database structure set up, I've started using Rocket again to do the basic CGI work, so I've got back into working on the module again.

However, a friend mentioned Jifty, something he calls "Perl On Rails". A quick look at the Jifty website (which uses a wiki called Wifty - guess what it's based on) shows that, yes, indeed, it does have a lot of similarities to Ruby On Rails - Model/View/Controller structure, centralised config, templates with overrides - without the hassle of learning a new language that's irritatingly similar to one I already know and without a name that's an irritating simile for a long length of steel used to support things. I'm installing it on my home server. My does it need a lot of Perl packages, though...

Last updated: | path: tech / perl | permanent link to this entry

Thu 31st Aug, 2006

Pandora opens my box

I see the National Library of Australia is now scanning my home photo gallery with a spider taken from the people. The project is called Pandora and the crawler site says that they're doing some kind of archiving for Australian pages. Well, that's certainly true of mine. But searching for the term 'Linux' on the main page produces "Linux at the Parkes Observatory", "Linux Australia", AusCERT's page, "Learning Linux" on (which Pandora tells me is currently restricted for some reason), and then we go onto international sites. So I don't know what that's all about...

Last updated: | path: tech / web | permanent link to this entry

Fri 25th Aug, 2006

The Return Of The King|Jedi|MythTV

Last night at the CLUG meeting I 'ran' a MythTV 'Fixfest'. Originally I'd intended this to be a chance for everyone with a MythTV machine in Canberra to come along and smooth the rough edges - get TV guides working, upgrade to latest releases, and set up extra features, all with plenty of expert troubleshooters around to help get things working. When I did a quick poll at the 'start' of the evening and found that only a third had MythTV machines at all, and only about a third of those had brought them along, I went to my Emergency Backup Presentation, which was about storage and transcoding in MythTV.

It took me a while to work transcoding out, mainly because I didn't understand how things worked. When receiving an analogue signal, you have to run some form of compression because otherwise you'd chew up a 250GB hard disk in about two hours of recording. But with digital TV, you get MPEG2 at either 4 or 8 Mbps for standard definition picture, and probably in the 12 to 16 range for high def. So it's already compressed and just gets put into the file system as is. You still chew through 2GB or more per hour of recording, so at some point you can cut the commercials out of the programme and transcode it down to a smaller size, optionally shrinking the picture size and recompressing the audio as well. Or, if you want to move the data out of MythTV altogether - e.g. because you want to play it on a phone or an iPod - you can use nuvexport or user jobs to automatically make a smaller file with a reasonable file name (rather than something like 1001_200608251730000.nuv). But you do lose a lot of the metadata in the process since that file is no longer kept with the metadata in MythTV.


I hope that didn't bore the people who were just wanting a 'spotters guide to MythTV', but everyone seemed reasonably happy and after I'd finished my near-interminable rambling we ordered pizza and got back to the important task of getting machines working. Bob and George Bray from the University of Canberra had a great time getting UC's UDP Multicast streams of various satellite broadcasts playing in the ANU lab subnet, and I had the great pleasure of watching the irrepressible Tridge and a couple of other guys get my infra-red remote control working (finally). The whole process was done from first principles - find out which card is receiving the IR, which device that corresponds to, how to get lirc to read that device (in my case, with an AverMedia DVB-T 771 card, it was /dev/input/event2 and you need -H dev/input in the lirc options file to get it to read the device correctly) and finally how that plugged into MythTV (or, more correctly, how MythTV plugged into lirc). Now I have a working remote control, and I just need to set up a ~/.mythtv/lircrc file to get the remote to do something in MythTV.

I also spent some time with a few people getting them to register for the data. Unfortunately it was playing up, no doubt because we had a dozen or so machines going through NAT and appearing as a laptop upstairs. I also helped Rainer Klein get his MythTV database installed, although I kept on being distracted by other things, the chief of which was upgrading Nick's machine to Fedora Core 5 and MythTV 0.19. That was a little more hassle than we really needed

For no apparent reason, running the Fedora update process took the best part of two hours. Theories abounded but I have no really good idea why; on other machines that I've run it's been much faster. Then the ATI fglrx driver was out of date. In order to upgrade that we had to upgrade the kernel. yum upgrade threw up its hands at a lot of packages, mostly because they were ATRPMS dependencies, so I plugged in my livna and freshrpms setups, removed the worse offending packages (i.e. half of the MythTV front end, not a good sign), and upgraded. Still no upgraded kernel. Yum was firmly convinced that, indeed, there were no kernels available. Finally, at 11:30, in desperation I scp'd the current version of my work mirror. Success! Everyone - the four people who were waiting on giving me a lift, Bob who had to secure the room, others who were just masochistic, and not least Nick - breathed a bit of a sigh of relief. Then we had a bit more fun trying to get the screen configuration working, but eventually it chose the right kernel and all was well. We left at about 12:15am.

I'm really thankful to Tridge, Tom Ciolek and the other guys who got my infra-red remote working. I'm indebted to the patience of Bob and the others who waited around for this machine to work. I think there was a mutual understanding then that we couldn't just deliver Nick's machine back to him unworking and semi-catatonic and say "Sorry, don't have time now, I'll be back on Saturday, I'll get in touch." I feel that that's something that is deep in the Hacker Ethos - you can't just leave a problem unsolved, especially if someone's relying on this solution to work for their continued health and happiness (or at least domestic harmony).

Last updated: | path: tech / clug | permanent link to this entry

Google Summer of Silence

If I could give one suggestion to Google for their Summer Of Code, it would be to try to make blogs for every project and task. That way, not only can us people who are impatient for all the cool features that we've heard in some SoC project - e.g. the various MythTV projects - can actually find out about them, but hopefully it will prevent many of the projects dying and disappearing with little to show afterward.

Last updated: | path: tech | permanent link to this entry

Wed 23rd Aug, 2006

How Not To Solve A Problem

I will admit that I am somewhat addicted to Sudoku puzzles. My current program is gnome-sudoku, which is nice, scalable, and can print arbitrary quantities of puzzles for later enjoyment over lunch and so on. Before this, I used pysudoku, which, while less polished and not having a good grade of skill levels, did have the ability to give you hints of what it could find to solve next, based on the same rules that it used to construct puzzles. I hacked this to also tell you how it had deduced that, which improved my solving abilities no end.

My Nokia 6230i does not come with Gnome or Python, so rather than play mindless games of golf or car racing, I bought a sudoku game off Optus Zoo called Sudoku Garden. Written by MForma, that household name in quality software, it has taken a 'japanese garden' theme, which means that the numbers and text are in hard-to-read colours and shapes. But this is not the aspect which most infuriates me about it.

Most sudoku puzzles are generated using the following method. Start with a blank grid and add numbers in symmetrical patterns that obey the existing numbers until you have a minimum fill level. If you can solve this puzzle internally, mark it as done; if not, add another couple of numbers, again symmetrically and obeying the rules of the existing numbers, and try again. The difficulty level of the puzzle can be ascertained by looking at the 'complexity' of the rules used to solve it.

Somehow, this method was overlooked by MForma in their quest to have puzzles which were aesthetically pleasing. They start with a known fully solved grid, and remove all the numbers in a pattern, leaving only the numbers in a pleasing shape remaining for you to solve. They've chosen templates which leave little nastinesses, like entire rows or columns blank; but this is not the biggest problem. Let me demonstrate by showing you the current grid I have:

7 9 5 4 1 2 6 8 3
4     9 3 8 2 5 7
2 8 3 7 6 5 9 4 1
3     5 8 4 7   9
8 7 9 6 2 3 5 1 4
5 4   1 7 9 3   8
6 5 8 3 4 7 1 9 2
9 2 7 8 5 1 4 3 6
1 3 4 2 9 6 8 7 5
Now, unless I've missed something, or solving sudoku puzzles on tiny mobile phone screens is harder than I thought, this actually admits three solutions:

7 9 5 4 1 2 6 8 3         7 9 5 4 1 2 6 8 3         7 9 5 4 1 2 6 8 3
4 1 6 9 3 8 2 5 7         4 6 1 9 3 8 2 5 7         4 6 1 9 3 8 2 5 7
2 8 3 7 6 5 9 4 1         2 8 3 7 6 5 9 4 1         2 8 3 7 6 5 9 4 1
3 6 1 5 8 4 7 2 9         3 1 2 5 8 4 7 6 9         3 1 6 5 8 4 7 2 9
8 7 9 6 2 3 5 1 4         8 7 9 6 2 3 5 1 4         8 7 9 6 2 3 5 1 4
5 4 2 1 7 9 3 6 8         5 4 6 1 7 9 3 2 8         5 4 2 1 7 9 3 6 8
6 5 8 3 4 7 1 9 2         6 5 8 3 4 7 1 9 2         6 5 8 3 4 7 1 9 2
9 2 7 8 5 1 4 3 6         9 2 7 8 5 1 4 3 6         9 2 7 8 5 1 4 3 6
1 3 4 2 9 6 8 7 5         1 3 4 2 9 6 8 7 5         1 3 4 2 9 6 8 7 5
Now, the whole point of the puzzle of Sudoku is that there is one unique solution to the puzzle. Sudoku means 'one number' in Japanese, after all - one number for every square. Not 'one number in most positions but these four were a little confused', or 'one number in every position, but you can't determine which by anything other than guessing'. Because this is the other dastardly thing that Sudoku Garden does to you: it has a correct solution to the problem and two of those three solutions above will be marked wrong!!!.

*pant pant pant*

So if anyone has a better Java game for their mobile phone that will play a half decent game of Sudoku, let me know. I don't need fancy guess-recording mechanisms, I don't need beautiful backgrounds, and being able to enter a number directly from the keypad rather than having to use left and right to select it from a list would be nice. I'm prepared even to pay money for it. But what I must have is a correct, unique solution. Requiring guesswork in a logic puzzle is not only stupid, but encouraging the worst excesses of the modern world.

But that's another rant. I'm done here.

Last updated: | path: tech | permanent link to this entry

Tue 8th Aug, 2006

A bunch of simple questions.

What is BeijingCrawler, who wrote it, and what is it doing looking at my webserver?

Does the book "Virtual Nation - The Internet In Australia" cover Telstra's overpricing, market gouging, rate limiting and continued anti-competitive behaviour? Does it talk about why Telstra is deliberately holding back on bandwidth improvements and Fibre To The Node in order to benefit its own bank balance?

Does Amazon realise that their "Sponsored Links" are the most irrelevant, stupid avertising tripe ever?

I've got a paper, or even two, to present at LCA, but I don't have any practice at writing up abstracts or things like that. The project in question is intended to be Open Source, probably licensed similar to Perl as most Perl modules are. It's up and running in production mode but a lot of the administration features are still relatively scanty, as is the installation process (since it's evolved from scratch). So how much of this kind of work do I have to do before it's ready to even submit as a paper?

How egotistical and rude is it to have given a talk for the last two CLUG meetings, be running the next CLUG meeting, and want to give a talk about my new programming projects at the Programming SIG?

Is going on a $2200 three-day PostgreSQL course going to teach me much more or give me some kind of qualifications that I can put on my resume? Is that even worth it?


Last updated: | path: tech | permanent link to this entry

Rocket module part 001

To borrow the words of Edna Mode, a new project that I can achieve consumes me as only superhero work can, darling. My main project is the Repository Configurator, which is working well so far. But the key component that I've been finally inspired to write, and that, in laying dormant in my mind for a couple of months, has had the various problems unconsciously solved, is a thing I call Rocket.

Ruby on Rails, by contrast, is a rapid web application framework. It's designed to easily deploy in a site and gradually you implement overrides to the various defaults - display, authentication, data validation and editing being the obvious tasks. From the little I've seen it looks like a great idea. It just means I'd have to learn Ruby - I haven't even got beyond simple Python, I'm not going to start learning another language - and you still have to work within its framework system.

Rocket has the same basic objective - get a website that's based on the contents of a database up and running quickly. But it takes a different tack, and doesn't try to be a whole framework; it just provides (or will provide) a couple of things: print a table or query, print an editor, validate and update the data from the editor. By a flexible system of parameter overrides, which can be set up when you create the Rocket object, when you call PrintTable, or in the database itself in a supplemental table called Booster (either as global, or table-specific, settings), you can control its behaviour. But the Rocket still does most of the work for you - finding the data, handling errors, displaying it neatly.

(My further plan is to have a separate but parallel system that will read your CGI code, find calls to Rocket, and replace them with neatly labelled and commented code that does what Rocket does. This then allows you to further customise the process if you've gone beyond Rocket's capabilities, while saving you the copy-and-paste (or, from my past, trial and error) process of doing the same "Display the data" and "Edit the data" processes. Optimally, it would produce hard, optimised code in the places where Rocket has to be flexible and therefore slower. But that's a whole further project...)

Whenever I have this much hubris, I start worrying, though; it's usually at this point that someone says "But doesn't WellKnown::PerlModule do exactly that?". But, honestly, the HTML::Template and other templating modules I've seen take a third approach, which is to just simply differentiate between presentation and layout. That's a noble goal, and I should look at using that in my code generator. But it doesn't solve the problem of getting a table of data up there quickly and easily - you still have to copy and paste a bunch of lines to get the data and throw it at the template...

So: Who wants to be stuck on Rails, when you can fly with a Rocket? :-)

Last updated: | path: tech / perl | permanent link to this entry

Fri 4th Aug, 2006

Get your nearest mirror here!

It just occurred to me, as I fired up my VMWare copy of Ubuntu and searched its universe repositories, and searched my local RPM mirrors on Fedora Core, for packages of "dar", the Disk Archiver of which I am enamoured, that surely there are local Ubuntu mirrors that I can use here on the ANU campus (I'm doing this from work). I've already found the local mirrors of the various RPM repositories that I use:,,,, and others.

I know other people on campus use Ubuntu. I know about, although I haven't configured my Ubuntu installation to use it as a source. I personally think it makes the Internet a better place to get your new and updated packages from the closest mirror you can. If your ISP has a mirror, then definitely use that because it almost certainly won't use up your download gigabytes per month quota.

So imagine if there was a system whereby users could submit and update yum and apt-get configurations based on IP ranges. Then a simple package would be able to look up which configuration would apply to their IP address, and it would automatically be installed. Instantly you'd get the fastest, most cost-effective mirrors available. You could probably do the lookup as a DNS query, too. It'd even save bandwidth for the regular mirrors and encourage ISPs to set up mirrors and maintain the configurations, knowing that this bandwidth saving would be instantly felt in their network rather than relying on their customers to find their mirrors and know how to customise their configurations to suit.

Hmmmm.... Need to think about this.

Last updated: | path: tech / web | permanent link to this entry

Mon 31st Jul, 2006

Installing MythTV Made Easy

I gave a small talk at the CLUG meeting last Thursday about installing MythTV. Like installing MythTV itself, this fifteen minute talk went for a bit over an hour, partly because the other presenter's laptop was continuing to refuse to talk to the projectors, partly because people were asking questions about the process and various internal details about MythTV (some of which I could answer), but partly (I fear) because of a tendency of mine to ramble.

Coincidentally, my brother is wanting to install MythTV as a general mechanism to getting TV recorded. During a brief chat on Sunday night he pointed out that he and his partner could either visit us this year or 'buy a MythTV setup', but not both. I enquired as to what he meant by 'buy a MythTV setup', which apparently meant 'spend nearly $1000 on a home theatre PC for the living room'.

Nothing could be further from the truth; and this was something I tried to point out (repeatedly) during my talk: that it doesn't take a lot of money to set up MythTV. You can do it simply by installing a TV card in your current machine: then as long as you have the MythTV backend daemon running you will have shows recorded. It's like a stealth VCR - it's just sitting there recording shows whenever they come on, without interrupting your doing other things. You don't have to be running the front end all the time; when you want to watch TV, run the front end and you'll see which shows have been recorded - you can do what you want from there.

(You don't even have to have a TV card to install MythTV, of course: you can install MythVideo and get it to organise your collection of perfectly legally downloaded open source videos, for instance. I don't recommend using the MythMusic plugin to manage your music, unless you have an entire collection of music that is one genre only and you always only ever listen to that genre. It's too difficult to navigate around, otherwise, I've found. But apparently it's undergoing a major rewrite as part of Google's Summer Of Code - if only Google actually had some kind of reporting system for how these projects are going then I'd actually know how far along that was. Anyway.)

My recommended method of installing MythTV in your home these days is to just start with the one machine you have and install a tuner card in it. Then, as you and your family get used to this new functionality, maybe buy an extra tuner and some more hard disk space for show storage. When it gets inconvenient for everyone to cluster around your PC watching TV, invest in a set-top box - as little as $400 will see you set up with a completely silent VIA Epia box, booting off your network and connecting to your backend automatically. Hopefully the Summer Of Code project to revamp and rejuvenate the Windows version of the MythTV front end will mean that those people in your househould still shackled to the dominant proprietary operating system will someday soon be able to watch MythTV without rebooting; in the meantime you can either use the KnoppMyth installation or dual-boot their machine, as part of the inevitable process of converting them to the Light Side.

Either way, getting MythTV running in your home doesn't require massive expenditure or changing everyone all at once. It can, if you want to, though. It all depends on how much money you want to throw at it.

Last updated: | path: tech | permanent link to this entry

Fri 21st Jul, 2006

Indenting considered harmful

Or: If it compiles, it must be correct.

Michael Ellerman takes issue with my example of bad coding practice. It would seem that we both ultimately agree that indenting code is no guarantee for it to do what you want - except in Python. Because his example, to wit:

if (x > y)
	exch(x, y);
	if (x)
	x = 0;
Is another example of coders assuming that indenting is doing what they mean. This code, according to Kernighan and Richie on page 56, will be compiled to mean:

if (x > y) {
    exch(x, y);
if (x) {
} else {
    x = 0;
Which, though it will compile correctly and will not break if exch() is not a #define do { } while (0); macro, is not, I'll wager, what Michael has meant by his example (which is, I believe, that the else clause attaches to the first if). So I'm going to take his example as another proof of the worth of using braces even if you think you'll only ever have one line of code in the 'then' case.

The other point which I left implicit in my post but Michael has pointed out is that compilation failure in this situation is caused by bad coding practice that is not necessarily related to the line you just added. If the code was:

if (x > y)
And I was trying to add the line exch(z,x); after exch(x,y);, then my addition has suddenly caused compilation to fail. But my statement is perfectly legitimate, and spending time looking at the macro of exch() isn't going to elucidate the problem. The problem is that the whole structure is going to assume that the then and else clauses are a single statement; using the do { } while (0); hack doesn't solve this - it doesn't make the bad coding style good. If the if clause is complex, or the lines are more verbose, or the maintenance programmer isn't a language lawyer or is just simply lazy, then they may hit the compilation error and dike their code out to a separate line:

if (x > y)
if (x > y)
Which suddenly means a lot more hassle when the tested expression changes or exch changes meaning or the new value of y is suddenly less than x (because it was exchanged with z in the else clause, before). No Good Will Come Of It.

Maybe I should download the kernel source, remove some of these do { } while (0); hacks, fix up the relevant usages so they're properly braced (heh), and make myself an instant top ten kernel patch submitter... :-)

Last updated: | path: tech / c | permanent link to this entry

Thu 20th Jul, 2006

Kernel programming considered harmful

I happened across a the Kernel Newbies Wiki, a sensible attempt to encourage people new to kernel programming to get started, even if they're not going to be diving into hardcore device driver writing immediately. Reading the FAQ, I found a reference to the question "Why do so many #defines in Kernel files use a do { <code> } while 0; construct?" This piqued my interest.

The basic reason why is that if you have a define like:

#define exch(x,y) { int t; t = x; x = y; y = t; }
And you have use it in code such as:

if (x > y) 
    x = 0;
Then this gets expanded to:

if (x > y)
    { int t; t = x; x = y; y = t; }; # End of if statement here
else # dangling else
    x = 0;
Now, yes, this is bad. But to me it's bad because the programmer has fallen for C's lazy statement syntax and assumed that the 'then' clause of the if statement is always going to be a single statement. This type of construct fails regularly because if you have to add some other statement to that if/then/else statement, like:

if (x > y) 
    x = 0;
Then you've fallen for the same trap. Once again, the exch() call completes the if statement and the call_foobar() call is executed unconditionally. Indenting in this case is worse than a sham, it actively deceives the programmer into thinking that the logic will work when it won't. Of course, if the programmer had initially written:

if (x > y) {
} else {
    x = 0;
Then it would all make sense, the #define would work, the extra statements added in would work. What's a couple of extra characters and an extra brace line, compared to many hours hunting down an elusive bug? Lazy programming is good in certain circumstances, but it doesn't excuse bad style or the hubris of thinking you'll never have to revise or debug this piece of code.

I know I'll have already put some people offside by commenting on this coding practice. But if and when I ever have to write something for the kernel I'll be using a sensible amount of braces, thank you.

Last updated: | path: tech / c | permanent link to this entry

Wed 19th Jul, 2006

Disks Gone Mad

Or: keep on going when you've lost sight of your aim. I should know by now. Whenever I think I'm doing pretty well at something, and I think I've got a reasonable handle on it, in about forty-five seconds (on average) something is going to come along and shatter that perception entirely. The more confident that I've been that some technological idea will work, the more likely that I'll be sweating and swearing away in six hours time still trying to fix the broken pieces, having long since forgotten what I changed, what my objectives were, and why it seemed like such a good idea in the first place.

This was brought home to me forcefully yesterday. At 9:15, in my cycling pants ready to go to work, I thought "I'll just switch over the root partition labels and bring the MythTV machine back up". At 9:18 I was looking at a kernel panic, as it failed to find the new root disk, because either LVM and MD weren't started yet or my clever tripartite disk wasn't set to come up automatically. At 9:30 I was burning a copy of the Fedora Core 5 rescue CD. At 11:45 I'd successfully put the labels back but the thing was getting to point where it loads the MBR off disk and stopping. At 1:45 I was going through grub-install options with a patient guy on the #fedora channel. At 2:00, in desperation I pulled all the drive connectors off except the one I was trying to boot off. Success! I felt like curling up in bed. I still had my bike pants on.

Of course, I'd made things more difficult for myself. I have four IDE drives in this system, so I had to unplug one in order to put the IDE CD-ROM drive in. Which meant either the LVM would be down because of a missing disk, or I couldn't access the 40GB drive that I was wanting to restore boot functionality to. What I hadn't thought of was that the CD-ROM was a master device - when I put it in place of another master device that IDE chain would be fine, but when I put it in place of an IDE slave the two masters would get grumpy and not speak to anyone, which was causing the boot lockup. And of course I was being quick-and-dirty and not swapping the CD-ROM out and the correct drive in when I rebooted just in case I needed it again. So I also made a one hour problem into a six hour problem by just not thinking.

I can only assume that there are a couple of readers of Planet Linux Australia out there chuckling away to themselves at my LVM / MD exertions. Because, in hindsight, MD on LVM makes no sense whatsoever. If one of the disks in a VG goes missing, and that disk has allocated blocks on it, the whole VG is considered dead. This then means that the entire MD is dead, and no amount of persuasion is going to bring it back. This is why an MD device needs to go on a raw disk partition - because MD itself is then doing the fault tolerance, not LVM. Lesson learnt.

Just got to keep thinking, I guess... And pull my head in.

Last updated: | path: tech / fedora | permanent link to this entry

Mon 17th Jul, 2006

CLUG Perl Programmers SIG = waste of time

A quick note about last Thursday's SIG meeting of the CLUG. This month was a special Perlmongers theme, given that we're trying to work with the Canberra chapter of the Australian Perl Mongers. So it was a pity that not only does no-one there other than me show any interest in Perl, but one guy (who I won't name) actively calls Perl "executable line noise". Yeah, thanks. That'll really make people enthusiastic about talking about Perl.

What I thought was ironic was that said Perl detractor has been working on some cool stuff - Microsoft's PDF-alternative format, and some other stuff in KDE that was more than nominally neat but has now slipped off the bottom of my stack. Now, I know that I tend to spend a lot of time telling people about my opinions on stuff - this blog may be a prime example. I do also try my best (I think) to be interested in other people's projects and interests, so I don't think I present a one-sided write-only approach. (As always, please email me at, anonymously if you want, if you want to correct me on this :-) It's a little depressing, though, when no-one wants to talk about the topic of the night but wants their own ideas to be heard and applauded.

(Not that I hold anything against the person in question. I just wanted to talk gnarly Perl stuff, that's all :-)

Last updated: | path: tech / perl | permanent link to this entry

Crazy LVM Partitioning continued

Today, I decided to finally attack the root partition on my MythTV box. By "attack", here, I mean "move off the 40GB drive", or at least do as much of this process as I can. After a bit of a think, what I wanted was a RAID5 array, but LVM doesn't offer RAID. MD does, but all the space on the drives in question is used by LVM. Presto chango, I pvmoved a bit of space around until I had 8GB of space free on each drive, then I made three logical volumes (called, unoriginally, lv_root_1, lv_root_2, and lv_root_3) and created a RAID5 device across them with mdadm -C /dev/md1 -l raid5 -p ra /dev/vg_storage/lv_root_*. This time I kept the mkfs parameters fairly standard; then I mounted it and started copying the root directory across with cp --preserve=all -rxv / /mnt/test.

The thing that most impresses me about this process is that the copying is taking about %5 (on average) CPU for the actual cp process, and barely any time at all for the md1_raid5 and kjournald processes. So doing RAID5 in software certainly doesn't require a huge grunty CPU (this is an Athlon 2400, yes, but it hasn't even broken into a sweat yet). This'd all be possible on a VIA EPIA motherboard... And, once it's finished, I'll have a root volume that can stand a complete drive failure before it starts worrying, and when it does I'll simply add a new drive, create a new PV, add it to the VG, create a new LV for the new drive, add the drive as a new hot spare, and remove the faulty LV; all at my leisure.

The fact that I can understand all this and think it 'relatively simple' gives me a small measure of pride. One of the few things I would thank EDS for in the time I spent there was sending me on the Veritas volume management training course. LVM is still pretty easy to get a grip on without that kind of training, but it's still made things a little easier.

(Educated readers would be asking why I specified -p ra rather than using its default, ls (or, in other words - why use the parity write policy of right asymmetric rather than left symmetric?) There's no particularly good reason. Firstly, I want asymmetric rather than symmetric to spread the parity load across disks, as is consistent with RAID5. Secondly, when I see an option like this I tend to want to choose the non-default option because, if all options are tested equally but most people use the default, then if a failure mode comes up then it's more likely to be found in the default case, and that may not affect the non-default case. It's a version of the "all your eggs in one basket" argument. I don't give it much weight.)

And all of this over an command line, through SSH, to home. I love technology (when it works...)

Last updated: | path: tech / fedora | permanent link to this entry

Fri 14th Jul, 2006

Why I Love Linux part 002

Or: Linux Disk Craziness continued

I'm reconfiguring my MythTV server so I can remove the 40GB disk - it's old, it's dwarfed by the other drives and I'd like to have one IDE channel back for the DVD drive. This involved a bit more fun with LVM, so I thought I'd document it...

The first task was building a non-LVM /boot partition. That was accomplished by using pvresize. I wanted 100MB, but the LVM tools display size in GB, allow you to set it in MB, but round up to an extent size. I played it safe and reduced it by three times as much as I thought I'd need. Then I used fdisk to resize the LVM partition and add a new partition. First challenge: fdisk doesn't allow you to modify a partition, you can only delete it and then create it anew. Luckily I remembered to change the type of the new (LVM) partition to 8e. Second challenge: fdisk only allows you to specify partition sizes in number of cylinders or sectors. So a couple of quick back-of-envelope calculations later, I had something which was roughly the right size. I then formatted the new partition, using -i 16384 because /boot contains relatively few files that are relatively large - this saved a relatively trivial 3MB on copying. I then used pvresize to expand the LVM area back to its maximum extent inside the partition. All done.

The second task was creating a swap partition. Because I'm a speed-mad power freak, I wanted a stripe across all three LVM physical disks. Ooops, the current LV already takes up all free extents on the first two disks, so I go to work with pvmove. After I remember to specify /dev/hdc1:(starting PE)-(ending PE), rather than /dev/hdc1:(starting PE)-(number of extents), this works rather nicely. Then it's a simple matter of swapoff -a, lvcreate -S 1G -i 3 -n lv_swap vg_storage, mkswap /dev/vg_storage/lv_swap, vi /etc/fstab to change the swap device, and swapon -a to get it working again. That's the easy part. :-)

The third task is to create a root partition. Hmmm, slight problem: There's not enough space in the volume group for a 20GB root partition. Further problem: the storage LV in the main VG is formatted as XFS, and XFS can't be shrunk (at the moment). OK, I'm going to have to think about this one. But, driven mad by power now, I consider how I'd configure it: how about a mirror of two three-disk-striped LVM arrays? At first blush it sounds reasonable. How fast would it be? I set up two 1GB LVM partitions, and, delving into mdadm, create a new /dev/md1 as a RAID1 mirror across them (the full command was mdadm -C /dev/md1 -l mirror -n 2 /dev/vg_storage/lv_test_1 /dev/vg_storage/lv_test_2). time dd if=/dev/zero of=/mnt/test/thing count=200000 (displaying my great ability to choose meaningless names) reports 2 seconds to write a 100MB file. That's pretty good, I reckon.

Of course, if any one disk goes down, that does take the whole thing with it, which is not exactly the required effect. But I'll remember that bit of the Veritas Storage Manager course sooner or later, and in the meantime I have larger fish to fry...

Last updated: | path: tech / fedora | permanent link to this entry

Wed 12th Jul, 2006

Too much time, too little gain?

My 'home' home page - - has, for a while now, had the appearance of an old greenscreen monitor playing a text adventure. Since it's more or less just a method for me to gather up a few bits and pieces that I can't be bothered putting up on my regular page - - I'm not really worried by creating a work of art.

But, the temptation to carry things too far has always been strong within me. So, of course, the flashing cursor at the end of the page wasn't good enough on its own: I had to have an appropriate command come up when you hovered over the link. After a fair amount of javascript abuse, and reading of tutorials, I finally got it working; I even got it so that the initial text (which has to be there for the javascript to work) doesn't get displayed when the document loads.

Score one for pointless javascript!

Last updated: | path: tech / web | permanent link to this entry

Shorter isn't always better!

I'm reading lines in a format I invented that uses a simple run-length compression on zeros: a line can be something like '1 4 2 8 5=7 3 6' and that means 'the values 1, 4, 2, 8, five zeros, 7, 3 and 6'. My code was:

foreach $count (split(/ /, $_)) {
    if ($count =~ /(\d+)=(\d+)/) {
        push @counts, (0)x$1, $2;
        $csubnum += $1 + 1,
    } elsif ($count =~ /(\d+)/) {
        push @counts, $count;
    } elsif ($count =~ /\s*/) {
    } else {
        warn "bad format in WAIF $vers line: $_ (part $count)\n";

"No, wait!" I thought. "I can do all that in a mapping:"

@counts = map (
    {$_ =~ /(\d+)=(\d+)/ ? ((0) x $1, $2) : $_}
    (split / /, $_)
$csubnum += scalar @counts;

Testing, though, proved another thing. Reading a file with a reasonable number of zeros per line (lines like '114 0 3 6=3 27=3 3=1 10=3 79=1 8=1 0 1 4=3 16=1 0 1 43=7 15=12 36=16 27=2' are typical) took 21 seconds with the foreach code and 28 seconds with the map code. So that's an extra 33% time per file read. I can only assume this is because Perl is now juggling a bunch of different arrays in-place rather than just appending via push. Still, it's an interesting observation - the regexp gets tested on every value in both versions, so it's definitely not that...

Last updated: | path: tech / perl | permanent link to this entry

New term, rudeness optional

I was explaining on #mythtv-users about the origins of RTFM, and had to explain FGI as well. I realised that we also need another term, FLOW as well when someone is asking for a detailed, moderately comprehensive exposition on a particular topic.

Put it into circulation, people! :-)

Last updated: | path: tech | permanent link to this entry

A new text-based captcha scheme

I had an idea for a text-based captcha that could work by cut-and-paste in the browser but would take a sophisticated CSS parser to decode automatically:

Define nine IDs in CSS, only three of which are set to display and the other six set to not display. These could be a random choice on your part, but remains fixed in the CSS (i.e. the CSS can be static). Then, pick nine numbers or words and put each one in a span with a different ID. The script generating the captcha knows which three of the nine words will be displayed, so it saves those against a random number which you generate and put in a hidden field. Nothing relates the three words to the token except the data on the server, and while the user would see only the three set to display, the source HTML includes all nine. You could even mix up the order of the non-displaying fields, so long as the displaying fields always turned out in the same order.

I realise that it wouldn't take too much code to read the CSS and read the page and work out which fields were going to be displayed. But the whole point to these things is to act like a flashing light on a burglar alarm - to deter all but the (most) determined and resourceful. And I like the idea of not having to generate images - too many of those graphic captchas that I've seen wouldn't be too hard to decode, I reckon.

Another point to using text is that you can include words which identify the site. Fraudsters commonly use a variation of the Mongolian Horde Technique to get past the captchas on Yahoo and other web mail services: set up a simple porn site and require people to register by filling in a captcha - but the captcha they fill in is actually the captcha that the fraudster's script has grabbed off the web mail system. Porn-seeker fills in captcha, result is posted back to Yahoo, everyone's 'happy'. I don't know why the common users of these captchas don't include a watermark that includes the site it came from. In fact, that could be a very good captcha - take the company's name in two randomly chosen shades, overlay a translucent word in another random shade, and get people to pick the word that isn't the company's name.

Quick, off to the patent office!

Last updated: | path: tech | permanent link to this entry

Tue 11th Jul, 2006

Editing databases in Perl CGI revisited

I seem to end up often writing what must be a standard style of editor for database fields in Perl CGI: a table with the fields of the database in columns and the rows in, er, rows, and a "Submit" button at the bottom which then compiles the form data into a structure and passes it to a backend which updates the database. There are two minor but persistently annoying problems with this as I'm doing it, and some day I'm going to have to scratch the itch completely and solve them.

The first is that Perl's CGI module, for all its usefulness, makes the form data as supplied by the CGI POST take priority over what you supply in the field. If there's a field called "foo", and you've just POSTed "bar" to it, but the database update didn't work and thus the field should contain its previous value "wig", the the field as displayed says "bar" unless you supply the "override" option to each and every field. Presto, the user thinks the update worked and is surprised when they use the data somewhere else and it's not actually updated as they thought. Having to supply the "override" flag to every field means that you have to pass the CGI field method an anonymous hash with the parameters set up by name, which (in my petty mind) is not as neat as passing it an array with known parameter order.

(Although the solution to this, to work with the user rather than against them, is to highlight the fields that have errors and the rows that haven't been saved. Field priority therefore means that what the user entered is still in the field, but it's obvious that that row hasn't been saved. Or make it obvious that none of the updates have occurred, which is a little harsher but may make more sense from a data integrity point of view.)

The second is more philosophical : what happens if you're adding new rows? Firstly, it's possible to create a bunch of new empty fields for new data, but how many and how does the program keep the data in them separate from real rows? Secondly, if the user submits the form but the row isn't valid, where does the row data go? With the above "fields over parameters" policy it's easy - the form data persists and doesn't need to be plucked out and re-entered by the CGI code. But we've already established that this persistence causes other problems, and it seems really kludgy to rely on persistence for some rows and not for others.

What I need to invent is a set of generic form handler routines that collect the form data, perform some inbuilt and some arbitrary checks against it, submit it to the backend module, get the updated data and display the whole thing, plus any extra blank rows. It will handle adding new rows, deleting existing rows, and allow for a full range of validation field display (i.e. each field in each row could potentially have a different error code recorded against it and it would all be displayed nicely).

I haven't started this project, though, because of the feeling that this wheel has already been re-invented so many times...

Last updated: | path: tech / perl | permanent link to this entry

Fri 7th Jul, 2006

Ignore the dogma!

Dogma seems to pervade computing just as much as in religion. Two particular pieces of dogma that I've watched through the years are "Linux file systems don't fragment" and "XFS can't be shrunk". When anyone declares things like this out of hand, and I can see obvious counterexamples of other systems that work similar that don't obey these dictums, I start asking why.

On the first, I had a brief chat with Ted T'so at LCA 2006 in which I ascertained that, yes, ext2 and most other Unix filesystems will fragment files if there is not enough space to put the file in one contiguous block. This only happens, of course, when the filesystem is nearly full, and is exacerbated by older filesystems (e.g. FAT) being fairly dumb about where to put new files in the first place - if you just keep on moving through the disk looking for a free block of appropriate size then you're going to wrap around sooner or later and start splitting useful blocks into non-useful pieces.

This is only half of the problem with file system fragmentation, by the way. The other half is the idea of concentrating access for similar things (e.g. files in the same directory) in a locale on the disk to minimise the seek times. As far as my understanding goes, this is done better in Unix than in DOS (and, to a certain extent, Windows) because a directory will be given an extent of the disk, so as you fill up the /lib directory all its files are close to eachother on disk. This doesn't address the problem that you rarely read the entire /lib directory and then move on, nor the fact that most disks have faster data transfer on the outside of the disk (where track zero resides), so distributing files further into the disk than they need to be does result in more slowdown, as well as increasing the average seek times. I've yet to see anything good on this so one of my projects is to research this topic and give a talk at LCA sometime on it.

The other dogma about being unable to shrink an XFS filesystem is interesting. My research so far is that, in September 2005, a guy named Iustin Pop submitted a patch that would allow shrinking an XFS filesystem while it was online, including both the kernel routines to shrink the filesystem if the 'disappearing' area was completely unallocated and the necessary userspace programs to actually move the data around to achieve this. This seems to be the result of a lot of work on Iustin's part, after talking to Steve Lord in 2004 where Steve explained the problems of shrinking the filesystem. He notes:

The only way to really do this is a complete tree walk of all the metadata in the filesystem looking for any inode which has a metadata or data block in the allocation groups being freed, and reallocate that space elsewhere.

And? So? Is that so difficult? We walk through all the metadata all the time, when doing backups and stuff. Who said it had to be instantaneous? I've sent an email to Iustin to see what's become of his patch, because as far as I can see nothing has happened to this in the interim, and as late as June 2006 people (on the Red Hat mailing lists) were still spouting the dogma that shrinking XFS was completely impossible.

Beware of assumptions... they make an ass out of U and mptions. Or something. :-)

Last updated: | path: tech | permanent link to this entry

Wed 5th Jul, 2006

Curses, now I'll have to do something!

I got an email back from Elphel pretty quickly - I forgot they're based in Russia, which will be waking up about now (12:52AM AEST). The news is not so good:


That probably could be done and our FPGA code is GPL-ed so can be reused freely. On the other hand it can be a not-so-simple design and I do not have resources to implement it myself. So you can use our design, get some help/comments on the code, but that's basically all we can offer.


Which means that I'll have to agitate somewhere - presumably on the CLUG list - to see if anyone's interested in a hardware project... I hate having ideas and not the skills to easily implement it; having to beg other people to do my work feels degrading...

I am, of course, extremely impressed at Andrey's work, and the fact that he's made it available under the GPL. I reckon he's pushed the development of Ogg Theora ahead massively by attempting such an implementation, and it's only going to get better as more people work on it. Maybe I sound too whiney and ungrateful above - it's really just the disappointment of having yet another (what I think is a) good idea and not being able to realise it quickly and easily. *sigh*

Last updated: | path: tech | permanent link to this entry

Video Compression the Open Software Way

While hanging out on #mythtv-users on, someone mentioned the Plextor ConvertX, a device to encode incoming video signals (from NTSC, presumably) into MPEG1, MPEG2 or MPEG4 (DivX). I immediately thought of the Elphel 333 camera - a LAN device that uses an FPGA to compress the incoming video into Ogg Theora format. The new model 333 can do 640x480 at 90 frames a second, thanks to a new, faster FPGA, DDR memory and larger buffers.

So I sent an email to Elphel asking them what they thought of making a USB device that encoded video to Ogg Theora. It could either operate off a PAL or NTSC signal, or also by sending it video across the USB bus - effectively making a compression offload device. I'm hoping this idea will be favourably viewed...

Last updated: | path: tech | permanent link to this entry

Fri 30th Jun, 2006

Google Checkout For Lies

I run Dave's Web Of Lies (pluggity plug), an internet database of lies that I like to think of as the only place on the internet where you really know the truth of what you're seeing. One of the features that I've been working on in the background is the Lie Of The Day By Email system, where you can subscribe to get a steaming fresh lie delivered to your email address every day. The site as I inherited it from Dave Hancock will always be free, but for other things I feel allowed to make a bit of money to pay for hosting, etc.

At the moment the site is more or less working. You can subscribe anew, or existing subscribers can log in, and see their details and change things as necessary. Email addresses will be confirmed before actually allowing a change. The only thing that doesn't exist yet is the ability to take money for the subscription. Enter the stumbling block.

Up until now I've been intending to use PayPal to collect money, but the key thing holding me back is the paucity of actual working code to do the whole online verification of payment thing. Maybe I'm more than usually thick, but I find the offerings on PayPal's website and on CPAN to be hard to understand - they seem to dive too quickly into the technical details and leave out how to actually integrate the modules into your workflow. I'd be more than glad to hear from anyone with experience programming websites in Perl using PayPal, especially PayPal IPN. I obviously need time to persevere.

Now Google is offering their Checkout facility, and I'm wondering what their API is going to look like. Is it going to make it any easier to integrate into my website than PayPal? Is the convenience of single-sign-on billing going to be useful to me? Should I wait and see?

Last updated: | path: tech / web | permanent link to this entry

Mon 26th Jun, 2006

Adventures In Ubuntu part 001

After a weekend away in Canberra doing some of what I like best - visiting friends, dancing, drinking Brown Brothers' Orange Muscat And Flora - I've come up to Brisbane for a week with my family. This means I haven't had a chance to respond to Leon's criticism of my criticism of the "Copy A Cat" intelligent design argument; which everyone should probably be thankful for (short answer: I like Leon but I think the "Copy A Cat" argument is stupid and Intelligent Design is a fundamentally flawed idea and rooted in sinister, messianic soil). First stop: my brother's new place with his girlfriend, for a day or two of geeking out. The signs are good so far - there are four monitors across the desk here, and we've got Ubuntu PPC version on a Mac G4 that's going to become a new router for my Dad and his boarders.

Problem one was that the login screen was booting into too high a resolution; it was installed when it had a much nicer larger screen and this poor little 15" LCD screen can't do 1280x1024. (In fact, the first problem was remembering a login which actually worked after I'd last touched it a month or more ago. Fortunately, I have yet to thrash out of my brother the habit of using the same password on his internal systems...) It turns out that the solution, at least for Ubuntu 5.1, is to edit the /etc/X11/xorg.conf so that the resolution you want comes first in the Screen -> Display -> Modes line. Helpful. I can only assume that this is handled better in Dapper Drake and so forth; it's handled much more nicely in Fedora. Maybe it should go with the rest of the "login screen" options?

Now to figure out the various iptables rules we need to run this as a headless four-port router that limits the bandwidth on each port in inverse proportion to how much downloading they've done in the last month... (If we can get the four networks to just be separate and have its own DHCP server, though, I'll be more than happy.)

Last updated: | path: tech | permanent link to this entry

Tue 20th Jun, 2006

1FUI - Rebutting Pascal

At the risk of perpetually turning this blog into a Pascal Klein rebuttal forum, I have to disagree with you, Pascal, about your interpretation of Hugh's One Frickin' User Interface rant. This is my rant.

You yourself, Pascal, work on the Tango Project - trying to get a consistent set of icons that mean the same thing across Linux distributions and across windowing interfaces and websites. You yourself acknowledge how important this kind of consistency is: that when you click on a button that has a little starry glint in its top right corner you expect it to make a new something, for example. If this was true only part of the time, then it'd be useless. I cannot see how open standards or DRM issues can hold back widespread use of Linux one tenth as much as a lack of an interface that is both consistent between Linux distributions and (vaguely) consistent with what the rest of the world uses. No matter how secure, speccy, feature-packed, free, unrestricted, modifiable, configurable, and well-supported Linux might be, if it has an interface that people find hard to get used to quickly, then they won't use it.

The argument that having KDE and GNOME is necessary for competition is a fallacy. We already have competition: Windows and OS-X. Having more than one GUI for Linux wastes development time, causes holy wars, and makes it much harder for developers to make applications that work across the range of installed systems. That's stupid, and pretending that it's useful is just denying that we have a problem. How much time did Canonical spend getting KUbuntu working because the KDE fanatics whined? How much time is spent keeping it parallel with Ubuntu now? Is that time we can waste? No!

To argue that the commercial failure of competing user interfaces is irrelevant because "that's just companies competing" is likewise fallacious: it's a straw man argument. Linux is competing for market share just like the proprietary operating systems; the fact that its source code is open and that by and large it's under non-restrictive usage licenses is just a development methodology. There are plenty of companies that would be more than happy to have a single GUI for Linux to standardise on - Canonical, Red Hat, Sun, IBM, SGI, Oracle... But no, we have more widget libraries and APIs than we know what to do with.

I offer an explanation to your observation that Windows users don't notice the consistent user interface: because it should be completely beneath notice! It would be noticeable if it was bad! Plonk those people down on a Linux distro and what do they say? "All the icons are wrong!" "Everything's in a different place to what I was expecting." "Everything looks funny." That's attention; unwanted attention. If you could plonk a Windows (or Mac) user down on a Linux box and they didn't notice at all, then the interface would have succeeded in its goal of allowing people to use it. This doesn't have to be at the cost of having all the nifty features that we all know and love, such as workspaces or tabbed windows. But it does have to be at the expense of "difference for the sake of being different."

I think it's pointless debating which was the top feature for Apple, a rigidly enforced hardware standard or giving top priority to the user interface. They're equally true. The Macintosh has continued to be a going concern, in spite of almost irresistible strongarm tactics from Microsoft and selling consistently more expensive equipment that was often still slower than same-era PC equipment, because usability is always given priority over flexibility. "Working out of the box" and "providing a pretty login screen" is just part of that usability priority.

The point Hugh, and I, are making here is that to have a plethora of APIs for GUIs on Linux, we're dividing the developer base. Microsoft and Apple couldn't want more division in our ranks. We should have developers saying, "how can we make it so easy for everyone to write Linux GUI applications that they'll do it quicker than for Windows or Apple?" Instead, we seem to have developers sitting in their little couch forts throwing abuse at the others in their other couch forts and refusing to move because they "provide much-needed diversity". All it does is give proprietary OSes time to consolidate and marginalise us again. And that's just stupid.

Last updated: | path: tech | permanent link to this entry

Mon 19th Jun, 2006

Watching the search engines

Or: why Google rocks so much.

I love blinkenlights and CPU load graphs and live logs. tail -f was invented for me. So it's been amusing for me, since posting my page collecting all the Linux Australia Q&A session torrents on my local machine, to watch who's been accessing it and when. Specifically, which search engines picked up on it.

I posted it at around 2PM AEST (= GMT+1000) on Saturday 17th. A couple of people who were on the #linux-aus channel picked it up and grabbed the torrents. Someone from Linux Australia - James Purser presumably - blogged it at about 5:30PM that day and there were a few more. Nothing much more exciting happened until about 11AM AEST on Sunday 18th. Google picked it up then - a total of eighteen hours after it was posted on That is actually pretty quick.

Before any other search engine has time to scan the /~paulway/la page that was posted, Google has scanned it again (at 10:30PM same day). At 1:30 on Monday 19th, Yahoo catches up and indexes the page. During this time, several search engines have indexed various other bits of my local pages - my MythTV notes, a few pictures. IRLbot from has had time to completely stuff up my tracker's URL (dropping the port suffix off the name), so it's asked for a huge number of announce and scrape and stats pages that don't actually exist on the webserver. Forex, whose bot notification sadly leads to an Apache default page, is third. Google, for reasons I start gleefully speculating on, then indexes all the torrent files just in case. No other search engine has hit it in the two days that it's been up so far.

It's little wonder why nothing compares to Google for searching...

Last updated: | path: tech | permanent link to this entry

GNOME oddness

I seem to be having intermittent problems with GNOME on my main work machine. After a seemingly irregular number of button-presses, the gnome-panel applet seems to lock up. I haven't found a consistent set of causal stimuli. Still working on it. Given that my job is not to debug GNOME but to actually write papers and code, and that I'm going away for ten days at the end of this week, it's a low priority.

Last updated: | path: tech | permanent link to this entry

Wed 14th Jun, 2006

Why I Hate Linux Printing Part 02

Anton Blanchard might have shot Eric Raymond, and everyone might be happy, but, by Linus and the penguin that bit him, ESR's right about The Horror Of CUPS Printing. As far as I can see, the people who designed CUPS and the printconf-gui interface knew exactly what they were doing and wanted to provide a vaguely friendly interface on it, rather than actually trying to solve the problem of printing.

It all started when I upgraded the cups package and all my printers ceased to work. I found out that the package upgrade had decided that it was going to put all of the backend drivers in /usr/lib/cups/backend, but that cups itself still looks in /usr/lib64/cups/backend for them. A few symlinks later and at least the laserjet works correctly. You can read all about this in Bug 193987; basically the justification is that they're executables, not libraries, so they should be in /usr/lib. Because that makes so much more sense than /usr/bin/cups/backend, for instance.

Because of this problem, I plugged the USB printer into my testing (i386) machine and got it working. "Fine," says I, "I'll just print to it across the LAN." "Hah-hah!" said cups, "We're going to make things as difficult for you as possible!" I followed Eric Raymond's steps and managed, at least, to get cups to bind to the network interface. But it now still steadfastly refuses to either broadcast itself or let other people use it; every time I run the actual printer configuration program, it resets everything to stop other people see it too. This is just A1 weapons-grade stupid.

OK, maybe there's something screwy in my configuration. Why can't something tell me this? Why did the problem with the 64-bit backend drivers being moved manifest itself as an unknown lpr error, rather than something meaningful? Why did it appear at all? (It seems to be gone in the latest update to cups, 1.2.1-1.7, crossed fingers and touching wood.) Why doesn't the configuration un-screw itself? Why is it not Doing What I Say, let alone Doing What I Mean? And why do I seem to get no help on this at all?

Last updated: | path: tech / fedora | permanent link to this entry

Tue 13th Jun, 2006

Code compiles faster around Anton Blanchard.

After spending a bit of time reading the hard, shocking facts about Anton Blanchard, I tried to post some suggestions based on my own observations of reality and how it is warped near greatness. But for some reason I'm getting a 'Relaying Denied' message from - the fact that it thinks it's probably means that someone hasn't quite set things up correctly. Of course, it wouldn't be Anton Blanchard, the man who can configure servers merely by holding his hands over the keyboard and who can not only read files but write them. No, it'll be that Jeremy, or maybe Hugh. They'll be to blame. :-)

Why does this pique my creative talents in ways that only Dave's Web Of Lies (plug plug) does otherwise? And what does it say about me?

Last updated: | path: tech | permanent link to this entry

Sun 11th Jun, 2006

If HTML is 'semantic' markup...

... why doesn't it have a '<date>' tag? Something that would allow you to specify that a particular field was a date and could be formatted on the client's system to their particular date viewing method? You're allowed to mark up dictionary definitions, acronyms, document titles, emphasis, and citations. And date conversion, unlike money or distance measures, is a well-known, well-understood idea for most date systems.

I was thinking something like:

<date srctz="Australia/Canberra" iso8901="2006-06-11 12:03:15">11/6/06 12:03:15</date>

Time to write that RFC...

Last updated: | path: tech | permanent link to this entry

Thu 8th Jun, 2006

Heads Up on the Super Proxy Server

I don't know, this might be old hat for all you dedicated SANS-watchers. But I just got a ping in my Apache logs for a very long DNS name ending with ''. The only thing I could find about it was on SANS but is still somewhat sketchy. AFAICS it's just a very fast parallel proxy server scanner. I wonder if my curiosity is endangering feline lives by having done a wget on the URL they were requesting...

From an nmap scan (which I seem to do by reflex on these dodgy requests) it has ssh, smtp, httpd, half of the NetBIOS ports, MySQL, Tomcat and what I think is the Squid proxy management port open. Good work, hackers of China!

Last updated: | path: tech | permanent link to this entry

Wed 7th Jun, 2006

The Self-Adjusting Interface Idea

I love how ideas 'chain'. I was listening to my LCA 2006 Arc Cafe Night second mix (Goa and Hard Trance) and thinking of the fun I had while mixing it. I manage to get away with using proprietary, closed-source, for-money software at LCA somehow, but though I'm absolutely dead keen to have a free, Open Source program that could do what MixMeister does I have neither the skills or the time for such a large project.

Still, I was thinking about the way I use the program. It has a main window divided up into three parts - the top half is divided into the catalogue of songs you have, and the playlist of songs actually in the mix. Then the bottom half is the graphical display of the mix and is where you tweak the mix so that the beats line up and the fade-in and fade-out happens correctly and so forth. The key problem with this in my view is that sometimes you want a lot of space for the catalogue so you can very quickly scan through the songs looking for something that has the right BPM and key signature and that you recognise as being a track that will blend smoothly with the current track, and sometimes you want a lot of space for your graphical mixing display.

So why not have an interface that watches what you're doing and gradually adjusts? The more time you spend in the mixing window, the more the top half gradually shrinks down to some minimum width. When you go back to choosing songs, the catalogue expands back to its median setting fairly quickly (over perhaps five seconds or so) and then gradually expands if you're spending more time there. In a way it's mimicking the actions you do with the grab bars to change the size of the window panes anyway; it's just doing it smoothly and unobtrusively in the background. You don't want it moving too quickly or changing your targets as you're using them, so all changes should happen over tens of seconds and any change in direction (from growing smaller to getting larger) should be preceded by a pause to check that the user hasn't just strayed into that area by accident. Even the relatively speedy 'return back to median' would happen over a long enough period that if you were able to to pick something quickly and move back to your work area then it wouldn't involve too much of a wait for the windows to return to where they just were.

Of course this would take a lot of engineering to apply to an application. Or would it. We've got Devil's Pie, an application that can procedurally apply windowing effects to application windows. Could something similar be taught about adjusting the controls within an application? The possibilities are endless, but I have no idea at all how to go about doing it...

Seems to be the story of my life, really...

Last updated: | path: tech / ideas | permanent link to this entry

Tue 6th Jun, 2006

Calculating a correct average, thrashed to death

Regarding this 'signed integer average of two large numbers' problem, I would have thought that a neater solution would be:

low + (high - low) / 2

Works for signed and unsigned numbers, and also works when you have low negative and high positive (i.e. your range spans the negative integers). Of course, this isn't ever going to occur with array indices, but it's still a handy property for other uses.

Last updated: | path: tech / c | permanent link to this entry

Why I Love The Linux Community part 002

I've spent a bit of time hanging around on the #fedora channel on today. In the process I've had a couple of people help me with problems with gstreamer and rhythmbox, and been able to help people with advice on LVM and RAID, getting someone's monitor working (he hadn't found out that the little right-arrow beside the "Generic LCD Display" in system-config-display -> Hardware -> Monitor -> Configure would expand that list when clicked...) and in general been able to help people on their way. A couple of people have thanked me for helping them, and I've basically said that I'm just trying to pay back the help I've had in getting my systems working, and to pay forward the help I wished I'd had when I'd struggled with something for hours before getting it finally working.

There seem to be some people, especially on IRC, who think that helping with simpler queries is beneath them. And there seems to be a small minority who seem to actively enjoy calling people morons when they can't understand the intricacies of the KDE interface or why to not worry about finding a man page for fstab-sync (man fstab tells you that there's this automatic tool...). I can't always answer every question and I usually try to make sure people know when I don't know something but am only guessing. But if I can contribute to the community by answering 'moronic' questions, then I see that as good for me and for them.

Last updated: | path: tech | permanent link to this entry

Wed 31st May, 2006

String List Fixed, Head Screwed On

I think file formats are the bane of programmers everywhere. Certainly a lot of my troubles with my current program stem from one particularly bogus format - the CLUSTAL 'aligned' format (extension .ALN).

Normal nucleotide and protein sequences have a structure like this (this is FASTA, for example, although PIR is almost identical and GENBANK, while more verbose, has a similar structure):

>Sequence 1 name
>Sequence 2 name
I.e. a sequence name with a specific prefix, then everything from there until the next name line is part of that sequence. Clustal is a program for displaying 'aligned' sequences - where special 'zero-length' gaps are inserted into sequences to make them 'line up'. The 'lining up' process is a black art unto itself, but we won't worry about that here. Clustal displays the sequences as rows and the bases aligned in columns, and I shudder to think what hideous perversion caused the programmers to think that this was a more efficient structure to retrieve their data in:
CLUSTALW 1.8.2 format header


Even with a memory limit much smaller than the size of the file, I can think of at least two ways to quickly and easily retrieve the data for any 'rectangular area' of sequence from a FASTA or PIR file. You might need to keep a list of byte offsets where each sequence's data starts in the file, but unless you're doing something really evil (i.e. calculating the offset for a particular sequence derived from the line length and the header length alone) that's what you'd have to do anyway. So you've done nothing good at the expense of sequence name readability (did I mention that it's traditional in ALN files to smash the name down to the first word, so you get lots of sequences with names like GI:37592756 and no clue about what they actually are...). Goody.

For me, it means that I have to read the entire file into memory, appending text to each sequence in turn (and I make no assumptions about whether the sequence order changes per 'block', either...). It means that my sequences are stored as a linked list of strings - i.e. the Stringlist structure has one pointer to a char, and one pointer to the next Stringlist. To make appending slightly less stupid, we keep a pointer to the first and the last Stringlist structures in the list, and therefore appending doesn't require us to traverse the list every time. That was a big speed-up from my stupid first implementation right there...

The only problem is that this method has fallen foul of the talloc block overhead. Each string added therefore incurs the talloc overhead for the linked list structure and the string copy - the former being only sixteen bytes in size (on my 64-bit machine). Not only that, but in order to use it as one string we have to allocate a new string with enough space to hold the entire thing, copy each of the list strings in turn, and then free the entire list structure (which is still handled neatly by talloc but does still mean a complete list traversal, effectively).

The solution was obvious when I looked at it in hindsight: each stringlist structure now contains an array of 1024 chars, into which strings are agglomerated. When we run out of characters in the last block we simply create a new block and put the string into it. So, OK, each string in the list may not be 100% full, but it doesn't really need to be - we're already getting a huge saving in talloc overhead, list traversal and string copying anyway.

The 'stringification' process above now doesn't reorganise the linked list. But that's OK, since we pretty much only use the stringified sequence once anyway, and the caller was getting a talloc_strdup'ed copy of the stringifed sequence anyway. So in a way the code just got simpler... And that can only be good. (And we're probably about to discard the sequence from its storage structure anyway...)

Now to work out the finer points of the Rainer structure. Block agglomeration is good!

Last updated: | path: tech / c | permanent link to this entry

Tue 30th May, 2006

Large Memory Structures in C

I wrote up a simple program that calculated how much memory would be used by my data structures under various length and fill ratio parameters. When I go past length 12, it looks like the previous 'Rainer' structure I was working on - like a tree, but with four bases per node rather than one (still keeping individual level counts) - suddenly gets to around 2^24 nodes, which (at about 2.5K per node) comprehensively doesn't fit in memory.

Length 12 has 1 + 256 + 65536 nodes, for a total of around 171MB in structure. That provides direct access to 16.7 million unique subsequences, much more than my usual number of around 3 million at length 12 or above. The main problem is to construct a suffix tree from there without chewing up lots of nodes (and hence adding lots of talloc overhead). If I was just counting only the subsequences of a particular length, the whole storage thing would be more of a moot point, but I can't get away from the feeling that to have to rescan the sequence or reload the file to calculate each different length is a wasteful process.

Now to find why taking a linked list of string fragments and mangling them into a complete single string is taking so long (it's only copying about 29K/sec)...

Last updated: | path: tech / c | permanent link to this entry

ADSL2, Internode, Belconnen, Glee!

Start laying aside those bottles of champagne and vintage port, for Internode has announced that Belconnen is now in the Planned stage for ADSL2+ rollout. Not only this, but it's actually cheaper to be on the extreme plan at 1500/256 while it's in Planned stage (~$60/mo) rather than my current plan (~$70/mo).

I just have to work out if I can bear stepping down from 512/512 to 1500/256. I seed all of my mixes from my home machine - this gets 20kBps upload in rtorrent - and the rest allows me to fetch things across to work or elsewhere in a timely manner. I hate making changes that cost you money ($20 to change plans)... even if it pays for the change in two months...

(Aside: if only I could find a way to encourage people to keep seeding my mixes...)

Last updated: | path: tech | permanent link to this entry

Sat 27th May, 2006

Fun with QEMU part 1

After scoring a copy of The Linux Journal courtesy of Tridge (courtesy of Vanessa Kendrick, for whom Tridge did some interesting data recovery), I read about QEMU and its ability to simulate a simple set of hardware suitable for running different operating systems on. This was convenient, for I had recently got a project where I wanted to be able to run a program in Windows but didn't want to have to reboot into XP every time to do it.

(The program is the GURPS Character Assistant, fourth edition. Yes, I'm one of those dice-wielding role-playing nerds. And you thought it couldn't get any worse... :-)

In the process, I discovered a small, er, novelty about QEMU: it's FAT drive emulation is FAT16. Any directory that's too large to fit in that will either give you a helpful error message:

Directory does not fit in FAT16
qemu: could not open hard disk image 'fat:/home/'
Or a less helpful one:

qemu: /builddir/build/BUILD/qemu-0.8.0/block-vvfat.c:97: array_get: Assertion `index < array->next' failed.
A couple of people have mentioned this problem in a couple of posts on different forums. No solutions yet, nor any indication that this is indeed the problem. But I thought I'd mention it...

Now to dispel that nagging doubt that this will be another of those moments where I later regret opening my mouth because it appears that it was opened foot-input rather than intelligence-output. The fact that this doesn't seem to get much attention, with QEMU being so heavily used everywhere, probably means that I've just stated the bleeding obvious and trivial. Like emailing Abigail with a suggestion for an improvement to a Perl module and then realising that the very thing I was suggesting was already available in the language and that it was completely a moot point. Yeah, that was a good feeling.

Last updated: | path: tech | permanent link to this entry

Fri 26th May, 2006

Why I Hate Printing Under Fedora part 01

Printing under Fedora Core has been a mixed bag. The most recent installment of non-stop 360° fun was when I discovered all my printers had stopped working on my newish install of FC5. A bit of probing discovered that all the backend drivers that are supposed to be in /usr/lib64/cups/backend/ that actually do something - usb, socket (for JetDirect printers), half a dozen others - have just magically disappeared.

I do a bit of experimenting. yum whatprovides /var/lib64/cups/backend/usb says that the cups package from the base repository provides them, but it's not installed. I download it, remove the old package with rpm -ev --nodeps cups and then rpm -ivh --nodeps --oldpackage cups-1.1.23-30.2.x86_64.rpm to install the base package. Goodie, all the backend drivers are back, but ugh, cupsd now fails with client error 127, whatever the hell that means. yum upgrade upgrades cups, which then gleefully (I swear I heard the words "A working cups installation! Let's adger it thoroughly with pitchforks!" coming from the motherboard...) removed all the backend drivers again.

And no-one on #fedora seems able to help me. I'm up to begging for help on the blogosphere / lazyweb.

(Maybe it's an x86_64 thing - my test i386 machine seems to have both the old package and the backend drivers. But that still doesn't help me ...)

Last updated: | path: tech / fedora | permanent link to this entry

There's Nothing Like Learning Experiences...

And I'm beginning to fear that my sequence counter with the tree structure shows no learning experience happening. I chatted with the knowledgeable Rainer Klein last night and he and I were exploring different ways of storing the data. He pointed out the options I hadn't considered, like suffix trees and hash tables - options I'd discarded because when I'd originally explored this problem, using Perl, hash tables and trees were running out of memory much faster than the current C incarnation.

Of course, Perl's hash functions are designed for any string; knowing that I'm only going to be dealing with A, C, G, and T (or, when using amino acids, A-Z minus B, J, O, U, X, and Z) does make things simpler. For nucleotides (the former case) I can pack four bases into one byte for an instant hash. Or, using the same four bytes, have an array of 256 pointers to further structures. Using this 'four-ply' structure saves me both creation, talloc overheads, and traversal time. And if I think about it a bit there's probably a clever way to do some 32-bit multiply, shift or masking operation that takes four chars known to be of ACGT and turns it into the correct lookup (I discovered, conveniently, that those (base>>1)&3 yields A = 00, C = 01, G = 11, and T = 10, and works for lower case too) in one swell foop.

My 21-long subsequence count had 2.8 million unique sequences. I realised after talking to Rainer that if I stored them as a massive array of string pointers I'd gave 22 bytes per string plus 8 for the pointer (on 64-bit architecture) for 40 bytes per string, or 114MB. Much less than 1.5GB, eh? Of course, the search lookup time on 2.8 million array elements would be prohibitive, but it made me realise that my one-branch-per-character structure was inefficient. Rainer also made me realise that most of my strings after length 14 (at least for this data) were unique. So after that length, why not just store the remaining string? Or use a more generic suffix tree system, where adding ACCT to a tree with an ACGT suffix produces one AC node with two branches, CT and GT. This would again save on lookup time, talloc overhead and traversal time.

And, heaven forefend, I could even do some research and lookup papers on suffix trees.

Resolution: I need to consider many new ways of doing thing rather than just trying the first thing that pops into my head.

Last updated: | path: tech / c | permanent link to this entry

Tue 23rd May, 2006

Talloc, Don't Leave Me!

Since Tridge's talk in 2005 about the wonders of (among other things) talloc, and having Rusty give me a rundown on his own "use talloc or live in sin" talk at LCA 2006 (which I was going to be elsewhere for, unfortunately), I've been using talloc in my C programming for work. While those still into endless pointer maintenance and doing everything by hand might scoff at something that almost but not quite does garbage collection for you, it's been invaluable for saving me having to write and debug a lot of tedious memory clean-up code.

Unfortunately, I have a minor hassle. I'm creating a tree/array hybrid structure - each node has four pointers to subnodes (because I'm dealing with nucleotide bases - A, C, G and T), and rather than have a single root I allocate an array of nodes at the 'minimum tree level'. The structure isn't very complex - it's basically six pointers and an unsigned long count - and even with eight-byte pointers it takes about 56 bytes in memory.

So imagine my chagrin when I discover that the talloc structure that's invisibly maintained behind all that is over twice the size of my tree structure.

This is a problem because, even with 4GB in the machine, the tree structure can easily take up 1GB. Even with a relatively small sample of virus sequences - only 4MB or so of uncompressed data - the tree out to length 21 is 29 million nodes or 1,506MB. Triple that usage and we move to The Valley Of Eternal Disk Paging. It's a small comfort to find out that the internal tally of how much memory has been used by the structures directly is exactly what it should be according to the number of nodes actually allocated. And I'm aiming for much larger sequence sizes too - the shortest human gene is about 12MB in size. You can forget about trying to page that in, out and all about.

Of course, I need to consider moving beyond the bounds of RAM and into using disk - intelligently, not by creating a massive swap file. I can't expect 4GB to always be there, or for my requirements to always fit within the constraints of my memory limitations. So I need to do some research into alternative techniques.

The main avenue I'm working on is a 'slab' allocator, where 'slab' here means 'a chunk of 65536 structures'. Instead of each node having a pointer to its branches, it has an unsigned short slab number and unsigned short positition in that slab. Looking that up then requires a couple of relatively quick array lookups (depending on your implementation). Sure, it's not going to be as fast as direct pointer linking, but the talloc overhead goes way down because suddenly you only have an overhead per slab rather than per node.

And I think I need to find out how mmap works...

Last updated: | path: tech / c | permanent link to this entry

Sun 21st May, 2006

Once Bitten, Twice Shy...

About three weeks ago, I received a large envelope from IceTV which contained a prospectus and an offer to buy in their upcoming public share offering. I'd subscribed to them for a month when I was having with the TV guide for my MythTV setup - NineMSN had changed their site again specifically in order to bugger up people who used the NineMSN guide ripper. I knew it was going to happen again, and I preferred the idea of paying for known-good TV guide information.

In that month I discovered the OzTiVo project's and converted over to using their guide information. OK, it doesn't have the week's solid coverage that IceTV does, but it's free and its community-supported and I already had a username on the OzTiVo Wiki, which you need to download your guide information. Once the IceTV subscription ran out I changed over and haven't looked back.

Now, I've already been bitten by the IPO monster. I used to be an "Agent Of Chaos" - a Chaos Music 'frequent buyer' - and when they had an IPO back in, oh, 1998 or so I thought it was a good idea. After all, CD sales were increasingly internet-based, and Chaos Music was the leader in the field. How could it go wrong?

I bought my $3000 of shares at $1.00 each. They went up to $1.43 in a day or three, and then dropped like a stone. Last time I saw them listed, they were at three cents each; then they disappeared because they were offered back to the owner of Chaos Music as junk stock. Funnily enough I received no notification of that one, and finding the information itself required extensive searching of the Chaos Music website. So there's $3000 I'll never see again.

To me, IceTV looks similar. Yes, it's unique in offering a subscription-based EPG in Australia, but both HWW and EBroadcast - the companies that have the legal rights to the guide data in Australia - are setting up portals and gearing up to sell their data AFAICS. There's the free source, which more and more people will turn to as time goes on. There's, which has said in the past that MythTV users can use their EPG information and then, from what I understand, locked it off so you can't. There's also the possibility of the other companies who put TV guides on the web, such as NineMSN, to start offering their own subscription services. As soon as the EPG-enabled PVRs have started to 'get good market penetration', everyone and her Aibo is going to want a piece of the pie.

(Gordon Capital's prospectus on the IceTV offering does mention this. The fact that they identify the main competitors for EPG information as being Telstra and Foxtel, rather than anyone who actually owns or distributes EPG information, makes their understanding of the market look, to me, rather minimal.) So I'm not going to buy IceTV shares. This does not constitute legal or financial advice, and if you trust it and you lose money I don't care. I wish IceTV all the luck in the world, because to me it looks like a bunch of people who have tried to do the right thing and have been free and open about what they're offering. But once the gorillas start wanting that banana trees, the smaller forest dwellers aren't going to have an easy time of it.

Last updated: | path: tech | permanent link to this entry

Thu 18th May, 2006

Hitting the Hacker Button

I did a bit more research today on Bluetooth headphones, since I seem to be using my new phone as a media player more (and this will only increase when I get the 1Gb MMC card for it). The Nokia wired 'phones have the soul of an epileptic octopus - they seem to tangle themselves up at the slightest provocation and their slightly rubbery cord, while more pleasing than hard plastic, makes them much more difficult to untangle. So wireless 'phones it is.

I found the BlueTake BT-420ex Bluetooth Sports Headphones a while back. They come with the BT430 'audio out to bluetooth' converter dongle that would make these ideal for listening to my Rio Karma or the TV without wires. If they were comfortable then this would be great for Kate. But according to IpDepot, the sole distributor of BlueTake gear in Australia, they're not available in Australia despite being listed on their website. Bummer.

A bit more prodding turned up the BlueAnt X5 stereo headphones. These seem to come with the same type of converter dongle, and function as a headset to boot. Getting a good, comfortable set of earphones that I can use for my home computer, my phone, and to talk on both my phone and on Skype (Linux version of course!) would be excellent. So naturally I'm hesitant at buying a pair from some internet company, trying them on, finding them uncomfortable or difficult to use and then finding that in fact I can't send them back and get a refund. Being forced to join the ranks of dodgy people selling badly-designed half-working things on eBay in order to get a bit of value back on the $200 or more I fork out on these things is not my idea of fun.

So I send off a few emails to a few suppliers asking what their return and refund policies are. I get automated responses from them, but one simply contains the text "RGVhciBWYWx1ZWQgQ3VzdG9tZXINCg0KVGhpcyBpcyBhbiBBVVRPTUFURUQgUkVQTFkgdG8gdGhhbmsgeW91IGZvciB0YWtpbmcgdGhlIHRpbWUgb3V0IHRvIHByb3ZpZGUgdXMgd2l0aCB5b3VyIGVucXVpcnkuICBXZSB2YWx1ZSB5b3VyIGJ1c2luZXNzIGFuZCB3aWxsIGdldCBiYWNrIHRvIHlvdSB3aXRoIGEgcGVyc29uYWwgcmVzcG9uc2Ugd2l0aGluIDI0IGhvdXJzLiANCg0KVGhhbmtzIGZvciB2aXNpdGluZyB3d3cubXJnYWRnZXQuY29tLmF1DQoNCkJlc3QgUmVnYXJkcw0KTXIgR2FkZ2V0IEF1c3RyYWxpYQ0KDQpQaDogMTgwMCA4MzMgMzE1`". Hmmm. The fact that it has '[Auto-Reply] <the original subject of my email to them>' as a subject line indicates to me that it's just an automated reply and something's borked out in it. I send off a quick note to their feedback line telling them that something's wrong.

But the Hacker Button has been pressed, and the desire to know exactly what was in that email and what format it's in has diverted me from doing anything else. A quick play around with Perl's unpack() function (the Camel book includes a miniature Base64 decoder as an example) reveals it to be Base64. The MIME headers on the document say that it was supposed to be Content-type: text/plain; charset=utf-8 and Content-transfer-encoding: 8bit, so obviously it's been mislabelled. The Base64 decoding, however, doesn't elucidate the entire content, however - something's broken in the decoding that causes it to lose sync about a quarter of the way through.

A bit more coding and I have a command-line Base64 encoder in Perl. I recode the start of their message; it looks like they've missed out a character. Paste that in and it makes the next bit come out in plain text but a bit before is now broken. At this stage I've read enough of the message to know that it is an automated reply, so I can at least give up on trying to fix it for now. But at least I'm armed if their actual reply comes out in this format too.

(Later I realised that I could possibly play around with the message content in Thunderbird's mail store, making it look like Base64 encoding, and see what Thunderbird does. But I'll save it for the real message. And besides, it's much more important to blog about it now...)

Last updated: | path: tech | permanent link to this entry

Recycling bytes the easy way

On Tuesday I rebuilt a machine for a friend on Tuesday. Windows 2000 had decided that something in its boot process was sufficiently adgered that the drive would stop responding after a bit and would lock up the machine. Funnily enough, booting into the System Rescue CD was able to see the entire drive and copy all of the information off it with no lock-ups or obvious problems. Trying to boot into the Windows 'Rescue' tools on the 2000 install CD locked it up as well, which I assume was because it tried to access the drive in a funny way that the drive was no longer having a bar of (and which Linux didn't do, at least in its System Rescue CD configuration).

So, one trip to Harris Technology later, we have a new, faster, 80GB drive (and an ADSL modem for the upcoming move into the modern world of broadband for the Armstrong house). While Rob gets on with his actual work, I install Windows, install all the motherboard drivers and add-ons that Windows natively doesn't find, install all the Service Packs, and do so in only three reboots. Finally we copy everything back with the help of Explore2FS (because the only room on my 80GB portable drive was in the ext2 partition). Then, because half of this is client data that Rob's never quite got around to putting on the Raid-1 80GB SATA drive pair that is supposed to be his work drive, I have to shred the contents of this directory on my portable drive.

If only shred came with a -r option.

So, one small Perl script later, we have a mechanism that will span across multiple directories, shredding every file that it can find and then removing the directory after itself. I did a bit more work on it this morning to fix its remaining bug: not being able to cope with names with spaces in them because of the way throws parameters around (i.e. it creates a small shell script and passes the parameters into that, which then gets reinterpreted for spaces).

And then I found wipe.

But, hey, who needs a reason for reinventing the wheel?

(I also plugged the drive into the USB2 connector rather than the USB connector and increased its speed by an order of magnitude...)

Last updated: | path: tech / perl | permanent link to this entry

Tue 16th May, 2006

Well, I am <U>shocked</U>!

Mainly, I'm shocked to see that nesting functions is not allowed in plain C. It seemed so... so harmless! Of course, after reading up a bit about closures and the upward and downward Funarg problems, I can see that it's a bit more complex than I thought. But, really, when Pascal can do nested functions, is it that hard for C to do them? If you avoid the upward funarg problem (by never allowing a function to return a function), you seem to avoid a whole class of problems by never needing to use a heap for argument and function execution record information, and therefore avoid having to do garbage collection, which I agree would be anathema in C.

I'm also a little shocked, Ian, that your 'trusted source' on whether nested functions are good or not is someone who just says "They're evil!" without a bit of justification or explanation. OK, they're evil in the current way gcc implements them, because they change behaviour depending on their contents and they break on architectures that don't support execution from stack. OK, they're evil because they're a perversion of the C standard. But I don't really like people just saying "They're evil!"; there are (I feel) plenty of good reasons why they should exist, so if I say "They're wonderful!" does it balance out?

I'm a tiny bit shocked, Cameron, that you could recommend C++ despite it not offering nested functions or closures either. What it does, in what seems to me to be a typical C++ "Let's adger the rules sufficiently to allow us to do what we want and who cares how difficult it is for newbies to understand" way, is allow you to overload the operator() function in order to have a sort of pseudo function object (because overloading something to do something else that you didn't quite expect is absolutely traditional and part of the C++ way, as far as I can see). This method doesn't even offer actual closure (i.e. encapsulating the environment around the function), though, which is all I really want in this instance. So C++ isn't really a solution to my problems, sadly.

I'm glad that my little example, which wasn't meant to be a criticism of C or gcc, has turned into two good articles which have further enlightened me (and hopefully a few others). And please don't take my comments on C++ to heart, anyone - I find it difficult to understand and overly complex, but I'm sure it's second nature to others. BTW, I did actually deduce that the problem was probably due to the fact that the Opteron architecture didn't allow execution of the code block for the passed function, but I didn't think that it would be on the stack. But this does explain how running under valgrind makes it work again - valgrind simply catches the creation of the function on the stack and puts it back in non-stack (i.e. executable) memory again.

Computers are weird things, eh?

Last updated: | path: tech / c | permanent link to this entry

Fri 12th May, 2006

The hidden trampoline in gcc

I found out today that if you pass a local function (a) as a parameter to another function (b) from an outer calling function (c), then gcc makes that local function a trampoline that's resolved at runtime, because AFAICS the function is on the stack. (To reiterate, if from (c) I call function (b) that takes a function and (c) passes (a) to (b), then the function pointer for (a) points onto the stack). That's amusing, but if for some reason function (a) refers to variables local to (c), then the trampoline pointer goes haywire for some reason and ends up generating a segfault - at least on my Opteron work machine.

If you prefer, have a look at the code to see what I'm talking about.

I tried to trace this down, but found found that valgrind silently fooled with this trampoline, turning it back into a pointer which was no longer off the end of the trolley. So when you run it under valgrind, it worked perfectly. Excellent for debugging, and so practical too! gdb leaves the trampoline as is, which means that the jump to the local function fails under gdb without telling you much more.

At this point, everyone who codes in C is saying "well, don't do that then! Pass a global function." The reason why not is: local state. I want to have that local function (a) to have state across its use inside the calling function (c); (b) shouldn't have to know about this state, and I shouldn't have to pass an arbitrary void pointer in calls to (b) to store this state (because if one (c)-like routine has different state for it's internal (a)-function from another (c)-like routine, you can't just make (b) take a pointer to an integer because your second (c)-like function might need a character string as state, or a complex structure, or whatever).

A quick chat on the ##C channel on confirmed that other people see the segfault behaviour, so it's not just me or my version of gcc. It also confirmed that everyone thought I was mad - a situation that is familiar to me; most people said "Pass the state around in an arbitrary pointer", which I'm not keen on, "Pass the state around in global variables", which I'm even less keen on, or "It's totally non-portable", which both puzzles me and completely fails to answer the question. So that's been fun.

A more sophisticated example might be a search or sort routine; you pass an array to bsearch() or qsort() as well as the upper and lower bounds to search and a function which compares two array indices. Inside this comparator you want to see how many times its' been called, but only for the most recent top-level call to bsearch(). It makes perfect sense for this call count to be a local variable to the caller, and not state in the function passed to bsearch().

Unfortunately I didn't find all this out until after the CLUG Programmers SIG meeting, otherwise I would have brought it up as a question there and have the opportunity for better minds than mine tell me how stupid and backward I'm being in doing something so manifestly non-C-like as wanting portability and opaque state...

Last updated: | path: tech / c | permanent link to this entry

Mon 8th May, 2006

Travel, Geometry and Interactive Maps all rolled into one!

A random neuron fired somewhere and I was inspired to read the Wikipedia article on the Great Circle. It linked to the Great Circle Mapper, a map that allows you to put in airport destinations and it'll show you the 'correct' (i.e. least distance) path between them. You can put in a route (CBR-MEL, MEL-SIN, SIN-LHR, LHR-SNN), you can get the map in a variety of projections, you can get information about the airport (including, in some cases, a map of its runway configuration), and you can show the 'range' circle that is equidistant from a point (e.g. 8000nm@SYD will show the range of the Airbus A380 in passenger configuration). Fun!

Incidentally, did you know that the Airbus A380 uses a switched 100baseTX fast ethernet network in star topology to control the plane using UDP? And this is significantly more advanced than the bus topology of ARINC 629 used in Boeing 777s (which themselves are a step ahead of the old 747-400)...

Last updated: | path: tech | permanent link to this entry

Sun 7th May, 2006

The Delicious Revenge of OpenOffice

Kate sent out an email to the teachers attending my Canberra Set Dance Weekend letting them know the final programme. Because she has to work with me on this, and while I can read and write Microsoft Word documents with OpenOffice, the subtle conversions each time cause enough 'format rot' to make Kate concede to using OpenOffice for all documents for the event. She must be too used to it by now, because she sent them all an OpenOffice Writer document.

When I saw this, I opened it and converted it to PDF, and then sent them all a brief email saying "What, Microsoft Word can't read an open, publically available standard document format? How much did you pay for it then? My word processor comes for free, reads and writes Word format as well as a dozen or so others. Try it today!", apologised for my Open Source Zealot Ranting, and attached the PDF. And then snuggled into bed with extra words of love for Kate.

(The first win was when she conceded that Word's habit of 'correcting' anything that looks vaguely like a list into a full-blown styled list without asking you, which OpenOffice only does if you ask it to, was almost enough reason for her to convert over. Almost but not quite. I'll have to find another feature :-)

Last updated: | path: tech | permanent link to this entry

Wed 3rd May, 2006

Fighting With Blosxom

Sorry for the recent return from the grave of a few of my previous posts, fellow bloggers. I updated them to remove a few minor tag problems (unnecessarily closing <p> tags and so forth), and Blosxom likes to use the modification date on the file as when to show the post. I assume there is a fix, but I just haven't got the time to research it.

Last updated: | path: tech | permanent link to this entry

Tue 2nd May, 2006

Return Of The SELinux Security Contexts

I wrote my own SELinux policy file today!

Today I realised my CGI pages weren't coming up because the scripts weren't allowed to connect to, read and write the Postgres socket. (They also seem to require the ability to getattr and read the krb5.conf file, and I have absolutely no freaking clue why, because my code doesn't use Kerberos in any way). I'd done a bit of research and found the command:

grep <error messages> /var/log/messages | audit2allow

A bit of questioning on #selinux on revealed that if I did audit2allow -M <name>, I'd get a module with the name I gave, so that later I can identify what particular policy modules I've loaded into SELinux and what they do (rather than just 'local'). (They even have version numbers too!) The module .te file is a text file, so you can edit it. Including all the various permissions you want to set in one file means that when you compile and load it with:

checkmodule -m -M -o <name>.mod <name>.te
semodule_package -o <name>.pp -m <name>.mod
semodule -i <name>.pp

(which the audit2allow script will do all but the last automatically) then you can have all your policy revisions in one neat place, rather than grabbing each separate error, making a separate policy for, and then probably overwriting the last policy module you called 'local' or whatever.


Now all I have to learn is how to create new SELinux types, so I can say "only these HTTP scripts are allowed to read and write to this directory". Then I will truly know what the hell I'm doing. Possibly.

Last updated: | path: tech / fedora | permanent link to this entry

Sun 30th Apr, 2006

Junk DNA Is Watching You

I note Leon has linked to a story about the IBM research into repeated bits of the human genome. Coincidentally, I've got that very paper on my desk as it's in a field that I'm working on. But I'd like to tell you about a lecture I heard from a professor from UQ on the issue of what all this 'Junk DNA' is really used for. If I could remember names, or technical details, I'd use them, but for now you'll just have to cope with my limited memory.

A quick revision for you non-biologists: DNA is transcribed and translated inside the cell to produce proteins, which then go out of the nucleus and do their work in the body. The shape of the protein, as it folds up in three dimensions as it's being created, is what gives the protein its particular abilities - it binds to the three-dimensional shape of whatever it's supposed to work on. There's a fairly direct mapping between the DNA and the protein which it produces it, so it's relatively easy to find the bits of DNA that 'code for' a particular protein. These are called 'coding regions', and the 'non-coding regions' are where this 'junk DNA' lies.

Firstly, junk DNA is not restricted to humans. Anything past prokaryotic stage - anything complex enough to have a cell wall and a nucleus to contain its DNA - has junk DNA. The more 'complex' (in an arbitrary, non-technical sense) the organism, the more of this 'junk DNA'; the actual number of proteins that the DNA codes for stays roughly the same (in fact, some viruses express more proteins than our DNA does). And the most interesting thing is that large tracts of the non-coding regions are still transcribed perfectly across generations, which implies that there's a lot of positive pressure for them to be there. More mutation occurs in coding regions than some of these non-coding regions!

This Professor's hypothesis is that the 'junk DNA' codes for proteins (or even RNA - roughly, single-stranded DNA) that stay inside the nucleus and regulate when various proteins are made, and possibly even how it's folded. This would explain a lot of what we don't know about protein production, which is mostly in the are of why the body produces some particular protein at some times and not others. They do have evidence to show that RNA within the nucleus affects and regulates protein production. Research continues, as far as I know.

It's like observing a machine from the other side of the internet. You have the source code but it's not in any language you understand, and you're trying to deduce what parts of the code do. You can map what inputs and outputs it has, and from those you can pick up what bits of code might produce those messages. But the memory management stuff? The swapping? The disk IO routines? Even the process management code is never physically represented by a single packet sent from the machine. So you write off all that code as 'junk code' and don't worry about trying to understand it.

Stupid, eh?

One final challenge that the Professor offered, which I think is worthy of the minds of Open Source: Try to come up with a way of encoding a picture such that the picture contains the instructions to build itself, and the machinery to execute those instructions, at any scale, and is still a recognisable picture (i.e. simple quines don't count - it has to look like something.)

Last updated: | path: tech | permanent link to this entry

Sat 29th Apr, 2006

The CanberraNet idea

I had a very nice afternoon drinking beer and eating at Das Kleinhaus (as long as I've used the correct gender - I don't know) on Saturday with Rainer, Chris and Matthew from Kororaa with brief appearances of a very tired Pascal. (We shared a brief complaint about how non-Canberrans, Sydneysiders especially, feel a need to disparage Canberra, then both dismiss any attempt at rebuttal with disdain and get all defensive about their native city as if no person in their right mind could question the urge to live in Sydney. Thank you, Hypocrisy Central.)

We talked about the idea of a wireless mesh network in Canberra; specifically, a network that existed separately to the internet (which avoids many of the legal problems that you get embroiled in if you look like an ISP). My concept here is that this mesh would duplicate many features of the internet; it would have its own IP range (possibly using IPv6), DNS TLD, and enthusiastic contributors could provide search engines, web pages, VOIP, Jabber, and so forth. Because the same basic structure and technology that powers the internet would be used in the mesh, it would be covered by the same laws: which means that publishing unauthorised copyrighted material is illegal but the network is not held responsible for enforcing that.

I know there's been a similar proposal hanging around here (and elsewhere) for years. I don't know the specifics, and to my mind it gets hung up on the whole "how do I know what people are using my internet connection" problem that's implied when you talk about making the mesh join the internet. I think there are deeper technical issues such as routing and address spaces and such that also need to be solved in that case. This is why I think that any successful mesh needs to have its aim solely as providing an extra backbone for data transfer on its own network that's completely independent of the internet. But this in itself is not a compelling reason to anyone individually to set it up.

There are two problems here: having useful content available to actually make it interesting, and having a way for end users to find that content. Again, these problems have been solved on the internet - we now have lots of people putting all sorts of interesting stuff there, and search engines go around and find out what's there and index it for easy finding later. The real problem is content; and to complicate it is the issue of why put anything on the mesh if you're not going to put it on the internet (and, vice versa, why put anything on the mesh that's already on the internet). There may be some things, like live video and audio or high-quality voice chat, that can be done better in the mesh than through the internet - but why reinvent the wheel?

Last updated: | path: tech / ideas | permanent link to this entry

Tunnelling PVM, or Clustered Computing Without IPSEC

I've experimented with PVM (Parallel Virtual Machine) for a long while. Like MPI, it's an abstraction of the message passing, data translation and task scheduling mechanisms that drive a parallel computation so that the separate processes can be on separate processors or separate machines on a LAN. It grew up in the 'good old days', when your only firewall was the one that protected you from the outside (and sometimes not even that) - each server you add to the PVM group communicates on UDP ports somewhere in the upper half of the port range. The port is chosen randomly when your initial PVM daemon starts up the PVM daemon on the new server; and is then 'shared knowledge' amongst the group that your PVM programs (unwittingly) use when sending work to the remote machine).

In these enlightened days, of course, running a machine without iptables blocking everything that you don't explicitly trust would be considered grounds for sectioning by some. Of course, if you're running a PVM cluster, you would normally put all the machines on your local LAN, put a big firewall at the door (or, even better, remove any connection between the LAN and the internet), remove iptables and anything that might slow the network down, and go for it. Fair enough.

Recently, here at the CLUG we have a loaner IBM Power5 machine, and the irrepressible Steve Walsh has a 60 day free trial of a Sun Sunfire. It's more multiprocessor power than I've ever had ssh access to, but of course not only do I have to do something useful with this power (rather than just running but both machines are on the other side of campus; so they're both naturally running firewalls everything locked down. Neither Steve nor Bob is going to be interested in opening UDP access to the upper half of the spectrum for a system which looks suspiciously like Sun RPC. Nor are they going to be interested in setting up IPSEC to VPN my machine to theirs. PVM, after all, runs as a user process for a good reason - separation of privilege. Let's keep it that way.

Now the applications I've written and the ones I'm trialling exclusively do a scatter/gather operation - a 'master' node starts a bunch of 'slave' nodes, sends them all a bunch of work to do, and collects the results. The slaves only communicate with the master. Other topologies are, of course, possible in PVM, but the tasks I'm working on (mandelbrot sets and sequence comparison, for example) are suited to this star topology. This means that, as far as I'm concerned, I only have to provide a pipe between the master and the slave; this may be suited to ssh's port forwarding capabilities.

PVM, when starting the daemon remotely, ssh's in (using keys to avoid password prompts) and starts the remote daemon. I renamed my ssh to /usr/bin/real-ssh, put a script in as /usr/bin/ssh that echoed the date and the command line to a file, and then called the real ssh. (I also managed to create a recursive bomb when I forgot to change the script to use the real ssh program instead of itself... but that's typical for me.) The command line it uses is:

ssh remote $PVM_ROOT/lib/pvmd -s -d0x0 -nremote 1 c0a81701:8665 4080 12 c0a817fb:0000

($PVM_ROOT/lib/pvmd is actually a script which finds the correct pvmd binary, makes sure that its environment is correct, and runs the daemon. It includes the amusing little note:

export PVM_ARCH
# make a joyful noise.

-nremote is the name of the remote machine's daemon; my guess is that -s is the flag to say "you're the remote side over there". I guess that the 1 after -nremote is the number of processors the remote side has. Everything else is unexplained; there doesn't appear to be any documentation on the flags to the pvmd3 binary. So it's time to trawl the source, and try some experiments.

So far I've discovered that the :8665 part is the UDP port of the source machine, in hex. The UDP port of the remote machine must be communicated through the ssh connection, because when I exclude ssh from the ethereal session and add the host, the first thing I see is the local machine talking to the remote on the new high port. I'll have to look at being more invasive with my fake ssh, and trying to log the entire session bidirectionally. Oh goody.

This may all be futile anyway: ssh's tunnelling abilities only act in the way of a proxy. Since the remote PVM daemon is wanting to send UDP packets to the local machine directly, and not via another port on the remote machine, it may be that I can't do what I want with ssh anyway. Even if I added the ability to use a proxy for the remote end (which is stretching the bounds of my C hacking skills), I'd still have to make the local PVM process use a proxy, and possibly a different proxy per connection. And it still doesn't cover the case of true multiprocessing, where messages pass between slaves. Maybe IPSEC is the only way to go here...

Last updated: | path: tech | permanent link to this entry

Fri 28th Apr, 2006

More research...

Hmmm - maybe I don't want bluetooth, maybe I want Wireless USB. Of course, it's more pie-in-the-sky than existing technology at this stage, but the bandwidth (480Mbit/sec unwired!) is easily enough to just send a 48kHz 16-bit stereo pair unencoded across the wire. Hell, the microphone and the palm player could both be talking to the WUSB base station simultaneously and it wouldn't sweat at all. (I could send multiple HDTV streams across it and it would be only marginally irritated).

Of course, the distributor here in Australia doesn't have the headphone kit that I want for kate in stock, they only have the more irritating 'clip onto the ears and attach with a loose cord' style. And they want $200 for them. I wonder what their return policy is...

Last updated: | path: tech / ideas | permanent link to this entry

Thu 27th Apr, 2006

The Dance Caller's Gadget

I, as some people may have noticed, teach Irish Set Dance 1 - er, Dance. To do this without developing a voice that can kill at ten paces, I bought a PA system with a built-in wireless microphone, CD player, echo (!) and can run off its own internal batteries, which, when combined with my music player, means that I can go pretty much anywhere 2 to do a dance.

Of course, the Karma is plugged in via line-in, and it doesn't have a remote control, so I have to race back to the player to turn it off, all the while avoiding getting too close to the speaker to set up a quick bout of ear-pulverising feedback. I've got a belt-pack and somewhat uncomfortable headset to wear, which has some niggling internal fault that causes its gain control to not work, meaning that 2.11 on the dial on the back of the PA system is too soft to hear, and 2.13 is dangerously loud. I also occasionally have to carry around an index card with the notes for the dance on it, due to Irish Set Dance's one consistency: there's no rule about how a particular movement is done that isn't broken by at least one historical set. (Occasionally I even have to refer back to the book because something isn't quite clear in my notes, and sometimes even the book doesn't clarify it perfectly...). If I had a remote control, that'd be another thing to carry.

A while ago I started work on the Irish Set Dancing Markup Language, my "what's a DTD?" attempt at writing a XML specification for encoding set dance notes. (As an aside, here, I think that Irish Set Dancing and American Contra are the two forms of dancing that programmers and techies grok best: they involve keywords that code for a set of specific, usually standardised movements, they have recursive and iterative structure, and you almost always get to dance with members of the opposite sex.) The idea was that, with an appropriate browser on a palm computer, you could get the entire instructions for a dance in a modest size; and you could increase or decrease the complexity using some simple controls. You might have "{A&R, HHW}x2", but at the click of a button it turns into "Advance and Retire (in waltz hold) once, then House Half Way (in waltz hold), and repeat those two to get back to place". Sometimes all you need for the same instruction is "Slides". The ISDML was to try and give a short description at each level, so that each subgroup would have an abstract when 'rolled up'. If someone better at speaking XML and with time on their hands could email me, then I'd appreciate it.

And now that palm computers can play standard media files like mp3s (and possibly oggs), we can start to construct a palm device that can act as a reminder card and a music controller. I think what I need next is a bluetooth audio interface - a bluetooth device that provides an audio plug (two RCA sockets, a 1/4" jack, a 1/8" stereo jack, whatever), so that the palm computer can send its audio across the room wirelessly to my PA system. If the palm computer could also simultaneously have a connection to a bluetooth phone headset - i.e. a wireless microphone - then I'd throw all the other stuff away. Hell, half of this my Nokia 6230i (link not shown because it requires MacroDictator Flash 8) could do.

I'd be willing to pay $1000 for software that could do all this, and would open source it with whatever license you want. Anyone interested?

1: While 'Safe For Work', this picture may cause involuntary vomiting and the uncontrollable desire to poke ones eyes out. Be careful. And don't ask why all eight men in the set are dressed up as women. It's safer not to know.

2: A set dance weekend away camp at Katoomba YHA, where I'm told that they have a big wooden floor just right for doing set dancing on? Why, whatever made you think of that idea?

Last updated: | path: tech / ideas | permanent link to this entry

Bluetooth Audio Devices

Regarding my quest for bluetooth devices that can send and receive audio, I've found the Taiwanese manufacturer BlueTake, with their distributor in Australia being IP Depot (IP here standing for Innovative Products, natch). The bluetooth headphone pair BT-420EX with BT430 transmitter dongle looks particularly interesting, as Kate has been wanting a set of wireless headphones for TV watching that aren't heavy, ugly things like the Dick Smith monstrosities (those are the best we've seen so far, and they've still been uncomfortable, heavy, and bulky).

Time to try the idea out on Kate...

Last updated: | path: tech / ideas | permanent link to this entry

BeOS, Haiku, what else?

First there was BeOS;
From its ashes came Haiku.
Where do we go now?

I played around with the free distribution of BeOS R3. It was cool, although (like any shift to a new and quite different User Interface) it took a while to get used to how things were done. There was this feeling inside it (or inside me) that an "out with the old, in with the new" approach to Operating Systems was needed in the industry around 1998, as the kludge of Windows 3.1 on DOS 6, and Windows 95, and the growing snarl of problems that was Mac OS System 7, threatened to choke user interfaces in the legacy of their own dim, dark pasts.

One of my lectures in 1994 talked about a plan (by HP, I think) to have processors of 1GHz, with ten cores per processor, with ten processors per machine, by 2010. We can see the tip of it now, with processor speeds peaking at around 2GHz to 4GHz and more work being done on making multi-core chips and multi-processor boards. For Be to say in 1991 that they were making a full multi-processor OS for consumers, complete with multi-processor hardware to run it, was daring and inspirational. To talk about getting rid of the legacy of single processors and dedicated eight-bit hardware and kludgy file system designs could only be a step forward.

Cut to now. Be doesn't exist. A group of committed enthusiasts are working on Haiku, an attempt to build the BeOS that was hinted at in R5, working not from any Be source code but from the release of the BeOS APIs as codified in R5 Personal Edition. Blue Eyed OS and Cosmoe are other projects attempting the same thing but Haiku seems to have the most support. They can do this because BeOS was modular, so as they write each unit they can put it into the rest of the OS and see how it behaves. (Try doing that with a more expensive OS.)

Certainly one of the things that strikes me about the current state of play with commercial and free OSes is that the common thing they have is legacy code and legacy APIs, and in some cases legacy hardware, to support. Apple got caught in that trap back in the 1990s, where System 7 had to support the possibility of running on a Mac Plus and a Mac IIfx, which were quite different processor architectures. Now they only support a small group of relatively similar architectures. Microsoft is caught in a similar trap, with people trying to install Windows XP on Pentium IIs with 64MB of memory. I'd feel sorry for them both if it wasn't for the fact that Linux can run on most of these architectures almost equally well.

As far as I can see, this is because the Open Source community surrounding the GNU-Linux Kernel and the various distributions on top of it are relatively quick to take in new ideas and throw away old systems if the new one is better in some tangible way. Rather than some manager calling a meeting and starting a three month process to evaluate the stakeholders and maintain shareholder value, someone with a better way of doing something comes along and writes the code to do it. If this is seen to be better, it's included. These days they register a domain name and put up a web site and make it easy for other people to contribute - acknowledging that, although they might be expert in the field they're working on, other people are too.

But I still wonder, looking at the other OSes out there, if there are still legacy bits of code in GNU-Linux that are slowing things down. I can't help look at the horrible experience I've had trying to get printing to work on my brand new install of Fedora Core 5, or the hassle I have trying to get Bluetooth to do anything more complex than find out the equivalent of a MAC address on my phone, and wonder what's holding these things up. Programmer time, to be sure. But are there people being told "No, we can't just scrap the old Berkeley LP system, we've got to work on top of it?" or "You have to integrate bluetooth into a system designed for 2400 baud modems"? Is Fedora, or Debian, or Ubuntu, being held back in producing an OS that comprehensively and without question whips Microsoft's and Apple's arses to a bleeding pulp because mailing lists and IRC channels and web forums are clogged with old command-line hackers who refuse to grant anyone the ability to use a mouse or talk to their new mobile phone because "arr, in my day we din't 'ave none of that fancy wireless stuff, we had to toggle the opcodes of the boot loader in by hand, uphill both ways, and we enjoyed it!".


Please mail me (or, as my fingers originally typed in a subconscious forecast of doom, 'maul me') with your opinions. I'm interested to know what you think holds GNU-Linux up from real World Domination.

Last updated: | path: tech / ideas | permanent link to this entry

Wed 26th Apr, 2006

Learning SELinux-fu 101b

Incidentally, this is much better than the audit2allow method of fixing this problem, which just blasts a new rule to cover that specific case in. This might solve access to the directory but not then allow access to the files therein, requiring further audit2allow calls to fix, and so on. You're better off finding out what the original policy was for this daemon and then adding a new rule that covers your new configuration.

It seems so easy, I wonder why I haven't found it before...

Although I still want to know where the policy rules file is so I can make sure its backed up...

Last updated: | path: tech / fedora | permanent link to this entry

Learning SELinux-fu 101

Today I get to play around with SELinux all day, because today I'm trying to get all the services running that need to be. Because I've moved some of the directories around (to put all my data on /opt, out of habit), and restored some of those directories from DAR backup, not only do the files not have the right contexts but the rules for determining contexts aren't in place for those new directories. So, after a bit of Q&A time with the folks in #selinux on, I worked out how to use the semanage command.

semanage fcontext -l | grep mysql
told me what I needed to know about the existing context rules. With a bit of copy and paste,
semanage fcontext -a -t mysqld_db_t "/opt/mysql(/.*)?"
restorecon -v -R /opt/mysql
installed the new rule and updated the rules on the /opt/mysql tree. Finally I found out that I had to put the [client] section into the /etc/my.cnf file with a socket line to tell it look in the new path for the socket, and all was well.

Ironically, the server was starting just fine; it was the 'check that the server is now running' part of the script that was failing. It took me a while to work this out... :-/

Last updated: | path: tech / fedora | permanent link to this entry

Fri 21st Apr, 2006

A tiny bit more pressure not to be stupid

Reading the actual article on the Phillips patent for a device to stop people changing channels when there's advertising on, I came up with another group who should be worried about this invention.

Program Producers.

Think about it: if people can't change channels during an ad break, then when can they change channels? During the show. Ratings will be homogenised, because the way the ratings organisation measure watching is if you tuned in to any part of the program, so as everyone frantically skips around hoping not to be locked into an ad that's showing on the channel they're surfing to every program gets flagged as having been watched. And ratings is what makes programs popular; it's hard to get more money from your show if everyone else's show seems to be as popular.

But the main people won't care: TV manufacturers will bang it all in regardless if Phillips gets smart and starts offering all-in-one digital TV chips which obey their program coding, and TV broadcasters don't care one whit about the sanity or interest of their audience. As I said in my other post, they'll probably find ways to abuse the flag so that you can't change at all, anytime, ever.

Last updated: | path: tech | permanent link to this entry

Thu 20th Apr, 2006

The Price Of Fitting In

There's a big empty spot in the garage now. It used to be filled with eight monitors, half a dozen computers, and a few miscellaneous bits and pieces of hardware that I'd been meaning to build into firewalls and other machines for friends and family. But finally I realised that I wasn't actually inspired to get a bunch of decrepit Pentium Is and IIs working, and Kate convinced me that it was better to take them to a Computer Charity (called, coincidentally, Charity Computers) and let them (with their boxes full of RAM and racks full of hard disks and their volunteer labour) take care of it.

I came away relatively unscathed. It only cost me $75 for them to take the four non-working monitors to a Better Place; the rest was gratis. That's OK, I can take that $75 out of Trevor's hide since he was the one who dumped them on me in the first place ("I've been keeping them in my garage for the last three years - do you want them?"). Of course, what do they put on them? Microsoft Windows 2000. They've got a special 'distributor' license or something.

It's the Microsoft Standard. Everyone uses it, so everyone has to learn it. Charity Computers are dealing with people on unemployment cards who need to learn modern computing skills to get into the workplace. Since half of these people are still at the "Why is turning it on so difficult?" stage, making them learn another operating system for work versus the one they have at home is allegedly doing them no favours. It's what everyone's familiar with, after all.

I can't deny that it's popular. Pity that that popularity is because of strong- arm business tactics and dirty dealings, rather than being technically and operationally superior (i.e. easier to use). Pity that that Windows 2000 interface is quite different from the Windows XP that the big businesses are now using at work. Pity that Windows 2000 support is being phased out. Pity that they're probably not supplied with firewalls or anti-virus software - they're probably lucky to be up-to-date with patches. (I think they said they installed OpenOffice, but who knows? Maybe the poor unfortunates get Microsoft Works.) Pity that all the latest games and software probably won't run on their old OS, even if they pay the $450 or so to get a modern machine - so they're still going to have to pay the Microsoft Tax sooner or later...

So, overall, I don't think they're doing as many favours as they think they are. Do they offer any training on how to use these new computers? What about support and maintenance, when they get a virus or have a breakdown? At least the people there are learning how to repair computers. But I have this fear that, like many Christian projects (of which this is one), they're going more for good intentions and less for full plans and consequences taken care of.

And I still don't see what the big problem with Linux in that environment is. It still won't be able to play all their favourite games, and it'll look and feel a bit different compared to the operating system that they'll probably be using at work. But it'll have an industry-proven firewall, pretty much no virus susceptibility, a whole range of free software just waiting to be installed on it, and it comes with no hidden Microsoft Taxes. How can you say no?

Last updated: | path: tech | permanent link to this entry

Watch The Screen, Big Brother Commands You!

Thanks, Chris Samuel, for pointing out the patent for a TV advert enforcer that would stop you changing channels during ads. But why stop there? Why not have the device not allow you change channels, ever! The broadcaster certainly doesn't want you to change, and they're the ones that are going to be broadcasting the 'lock channel' flag... Let's go further! Have a TV that doesn't allow you to switch it off! Let's make it force you to watch by requiring you to strap yourself in to watch. Let's make people who buy the TV sign a contract that says they will watch TV all the time. Let's make the contract force them to go out and buy stuff we advertise! What a wonderful idea!

It's times like these that I really seriously wonder about how people get to be adults. This is not something that just popped out of some automated slot: a number of people ranging from techs to management went through the entire process of making and filing this patent. Did any one of them think "But I wouldn't want it in my house"? Did they think "There's a lot of people that wouldn't like this device." In the face of these personal ethical questions, why did they continue?

I know people that are convinced that Bill Gates is completely evil; that he's created a business whose sole purpose is to make life miserable for anyone who comes into contact with them. Now, this is obviously stretching the truth. But did Bill, at any point where Microsoft was grinding Netscape or Digital Research into the dust, say "You know, we're probably not going to get away with this." Or "We can't rely on a monopoly position forever - some day we're actually going to have a product that people want to buy, rather than are forced to because we've stuffed it down their throats.". Or "You know, if we keep shafting competitors, users and others in the industry, some day we're going to meet an organisation that's more powerful than us and are going to do the same thing to us. Maybe we should play fair to start with."

Coincidentally I got talking with a friend last night about whether the sixth book of the Lord Of The Rings sextet should have been put in the last movie. It's the part where the hobbits, after their part in the more ethereal battle of Good against Evil, come back and find that the Shire has been taken over and they have to fight the fight at home as well as off in Far Off Lands. (I don't think it was necessary - to me that whole section raised Frodo's determination and stamina from the heroic to the unbelievable; but that's another argument.) And this, I think, is a good point for us in the Open Source world to think on: that we work with an operating system and (for many projects) a set of principles that encourage us not to bow to any master, to not give in to any evil even if it seems convenient at the time.

I think part of the problem with the world today is that people don't think they need to fight any fight - that they can be excused for devising machines to kill and maim and brutalise people, or to screw over thousands of people in third-world countries indirectly by purchasing cheap foods and goods, because they haven't been doing the nasty work directly. The people who work for companies like Nestle, Microsoft, Enron, Union Carbide (responsible for the Bhopal disaster) and so on - from the people who turn on the valves and make the milk formula to the marketers and executives who can see the health problems and suffering of millions of people as just another marketing opportunity - how is it that they can sleep at night with their consciences at peace? Only by pretending that they're not directly responsible.

We all have a responsibility. We all have the choice. It's up to us to choose, as often as we can, the most ethically, environmentally, socially and morally responsible options that we're presented with. Yes, we sometimes have to weigh up the balance of several of those options and make our own choice. But that is still our responsibility; our choice in these situations makes up the moral, ethical, environmental and social temperament of our society. We need to work constantly to keep ourselves informed of the true impacts of our decisions. We make this decision in our own workplace as much as at home; we cannot assume that it'll be somebody else's problem. If all the workers voted with their feet and left organisations that were doing irresponsible things, then there'd be a lot less 'evil' companies around.

Go Google!

Last updated: | path: tech | permanent link to this entry

Mon 17th Apr, 2006

Why I Love The 'Linux' Community part 001

(I put Linux in quotes there because this applies to the wider Open Source and Internet communities, but more to some parts than to others, and trying to fit that in a heading is too much work.)

The problem I had with SELinux is apparently not so uncommon. Someone else, in searching for it, found my blog entry. They then found the solution. And then they posted the solution in and emailed me to let me know what he'd found. It's this kind of helpfulness that, I think, characterises a good community; and it inspires me to post more solutions and to help other people more. Thank you!

Last updated: | path: tech | permanent link to this entry

Wed 12th Apr, 2006

Why I Love Computers Part 001

At my work I've written a program which is designed to find 'conserved sites' in DNA - places in a set of sequences that don't change much. It uses a variety of different methods of ranking sites and chooses the top N based on which ranking method you pick. In order to test this I wrote another program to pick random subsamples of a set of viruses, and Mark asked me to compare how many sites chosen by each of the methods were in the full sequence, for each of the five random samples and four random sample sizes (5%, 10%, 20%, and 50).

So naturally I wrote another program to do that calculation for me (after doing about half of a page of 20 by hand, throwing the pencil across the room and saying "Why am I doing it the hard way?"). (This took half an hour to write the program, versus more than two hours to do it by hand - always a good sign.) Then I wrote another program that expanded further on that and did the entire statistical summary for me, being able to tell the difference between each data file by its name (which I had kept to a good pattern, based on long experience). This output its results in CSV format, which I loaded into OpenOffice Calc and used to made nice graphs.

We took the plots of three of these comparisons to a statistician at ANU who said that it sounded like we were being reasonable with numbers and agreed that the method that I didn't like didn't make a lot of sense. Later, Mark asked me to produce the plots for the other two comparisons. Rewrite the command line to do the other two plots and run the program took ten seconds. That's the sort of work I like to do.

Last updated: | path: tech | permanent link to this entry

Mon 10th Apr, 2006

SELinux Strikes Back!

Now I've found that the reason my USB harddrive isn't mounting is because of SELinux messages:

Apr 10 09:37:52 biojanus kernel: audit(1144625872.015:1953): avc:  denied  { getattr } for  pid=2316 comm="hald" name="/" dev=dm-1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
Apr 10 09:37:52 biojanus kernel: audit(1144625872.015:1954): avc:  denied  { getattr } for  pid=2316 comm="hald" name="/" dev=sdc1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
Apr 10 09:37:52 biojanus kernel: audit(1144625872.167:1955): avc:  denied  { getattr } for  pid=2316 comm="hald" name="/" dev=dm-1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
Apr 10 09:37:52 biojanus kernel: audit(1144625872.167:1956): avc:  denied  { getattr } for  pid=2316 comm="hald" name="/" dev=sdc1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
Apr 10 09:37:52 biojanus kernel: audit(1144625872.335:1957): avc:  denied  { getattr } for  pid=12272 comm="hal-system-stor" name="/" dev=sdc1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
Apr 10 09:37:52 biojanus kernel: audit(1144625872.335:1958): avc:  denied  { getattr } for  pid=12272 comm="hal-system-stor" name="/" dev=sdc1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
Apr 10 09:37:52 biojanus kernel: audit(1144625872.339:1959): avc:  denied  { search } for  pid=12280 comm="touch" name="/" dev=sdc1 ino=2 scontext=system_u:system_r:hald_t:s0 tcontext=system_u:object_r:file_t:s0 tclass=dir
... and so on. Now to figure out how to fix it...

Last updated: | path: tech / fedora | permanent link to this entry

To run ISP: require one TCP/IP Networking License.

I've been trying to repay my IRC debt by hanging around on #samba on and trying to help people with simple problems. It feels a bit like the blind leading the blind, or the bland leading the blonde, but I've helped one or two people so far and that's been good.

Then I started chatting with a Jordanian who is using an ISP called Horizon Satellite Services (, range It seems that the people running this ISP haven't ever been to a basic TCP/IP course, because they're allocating all their internal subscribers addresses in the 90.1.x.y range. Yes, that's right, not 10.x.y.z - 90.1 is a range owned by Wanadoo in France.

The depressing thing is that the guy I was chatting to was using Ubuntu. He's trying to share files using Samba so that other people on the ISP can read them. He has no idea if there's a firewall in Ubuntu, or how to get at any of the graphical tools to configure Samba, or even why 90.1.x.y addresses on his local network were a canonically Bad Thing. At that point I lost all interest and pointed him in the direction of Eric Raymond's How to ask questions the smart way, since he seemed to think that I had enough time to hold his hand through every single step. He finally decided it was probably easier to set up an FTP server.

Coincidentally, yesterday I was talking to a friend of mine who's just come back from living in Oman for two years. Her opinion of the problem with the Middle East was that the entire Middle Eastern, and Asian, philosophy of learning is to learn by rote and copy exactly. Never try to think through a problem yourself, never question or ask if something seems illogical. Everything is justified in terms of what went before - what historical precedent can be warped into suiting the new situation. Maybe I'm harsh in categorising this guy as symptomatic of that whole problem - Trellis knows that there are enough morons who expect to be spoon fed through every challenge in Australia, let alone anywhere else in the world. But the more people expect to just not know, to show no interest but to demand results, the worse the world will get...

Last updated: | path: tech | permanent link to this entry

Fri 7th Apr, 2006

I have a sudden craving for purple rackmount cases...

In wandering around looking for where to submit a bug about CFS to, I stumbled across the pages of Spinserver. No, it's not a daemon to allow you to run spin locks faster; these are rackmount cases where the 'back' of the motherboard - the bit with all the ports on it - are at the front. Despite it flying in the face of tradition, there's a lot to recommend this configuration:

For a start, it's a lot cooler. With the back being almost entirely comprised of large fans, you can move serious amounts of air through these cases. The 3RU cases can get up to 3 cubic metres of air per second - something I have a hard time just imagining. And, because the only cable at the back is the power cable, most of the clutter that slows air down back there is eliminated. You can even put the entire rack unit against a wall, or back to back with another, since you pretty much don't need access to the back for anything short of complete disaster.

Since all the cabling is at the front, you can do all your patching in one place. That's where the ports on your network switch are, right? You can even get rid of KVM switches by simply plugging the head monitor, keyboard and mouse cables in directly, saving a lot of extra clutter and making it bleeding obvious which machine you're working on. You get all the diagnostic lights at the front, too, so it's much more obvious when a cable isn't plugged in quite right.

And to cap it all off they're made from top-quality components right here in Australia. I want one so badly now! I wonder if they do a 2RU case that'll take two Via EPIA motherboards side-by-side...

Last updated: | path: tech | permanent link to this entry

Keeping your coding directory secure...

I've used the Cryptographic File System for a while now. It's a bit of a misnomer, though, as it's not really a separate partition with its own filesystem encrypted somehow (of which enough exist anyway).

You have a directory whose entire contents are encrypted, although the files are still plain files with encrypted content. You perform a specific cattach command, and this creates a directory under a special NFS mount point. This directory is the unencrypted version of your encrypted directory: a read goes through NFS to the local cfs daemon, which fetches the block you want and decrypts it. Writes happen in a similar but reverse fashion. So your files are never ever seen as an actual cleartext sitting on an unencrypted filesystem. There's a further variant where the NFS sees the encrypted file and not the unencrypted one, and which therefore allows you to run encrypted shares across a network (whereas CFS would leak like a sieve).

The minor problem is that there's a bug on the x86_64 platform. I can create an entire new encrypted directory and the password is correct, but the cattach command just says cattach: incorrect passphrase. I don't know exactly where it's caused; I haven't scrutinised the code. (Again the tyrrany of Open Source... :-). I don't know if this also affects the PPC architecture as well. But for the rest of us running on i386, it's a very useful way to keep those little bits of your filesystem that you don't want to share with others nice and secure.

Last updated: | path: tech | permanent link to this entry

Scratching the Surface of SELinux

In restoring the server to its former glory I needed to restore the installation of Lahey Fortran for our old Fortran programmers. (With Fortran programmers, as with Fortran, one learns to not question why they need to do something, or to ask them to learn something new, but to just work with what they're giving you. I'm not about to start asking them why they can't write their programs in some version of Fortran that the GNU Fortran compiler supports...)

Unfortunately, running the program gave me the error "library name: cannot restore segment prot after reloc: Permission denied. ". A Google on this error message showed me that it's caused by SELinux, which doesn't just allow anyone to come along and install new shared object libraries - you have to make sure that they're set to be a shared library (type 'texrel_shlib_t'). So applying the command 'chcon -v -t texrel_shlib_t /path/to/library' made everything suddenly work. And made me learn a little more about how SELinux sits in with all the other parts of Linux.

Learning: it's a good thing.

Last updated: | path: tech / fedora | permanent link to this entry

Where's the Bloody Bluetooth?!

Resisting the temptation to rant, I'll just say that I'm struggling to get bluetooth working under Fedora. I can get the hci device started (hciconfig hci0 up, surprisingly enough!) and see the phone via hcitool (and another device only identified as "SuperDuper", which no-one in the lab is owning up to owning ;-). I can even get an RFCOMM device connected to the phone in question using sdptool. From there it goes downhill.

Gnome-bluetooth-manager doesn't see anything when I do a scan. Gnokii seems fixated on /dev/ttyS0, and infuriatingly impossible to move to any other TTY port. There doesn't seem to be a lot else to use. A quick google doesn't show up much. I'd appreciate an email from anyone that has been able to successfully transfer their phonebook from one phone to another via bluetooth in Gnome, or Fedora, or Linux, or anything that isn't proprietary.

Last updated: | path: tech | permanent link to this entry

Wed 5th Apr, 2006

Bite the bullet for a cheaper phone

I decided yesterday to swap my old Nokia 6310i for a Nokia 6230i. This was a hard decision that's taken a long time, because I'm quite attached to the 6310i. It's a long phone, which fits well in my larger hands and makes the keypad easier to use, and it doesn't have all these modern folderols - camera, video, colour screen, MP3 player, FM radio and so on - that I don't really need. OK, the MP3 player is cool, and being able to have a ghastly scream or the "Radiophonic Stomach" effect off the Goon Show as a ring tone has been an objective of mine for years, but they haven't been make or break decisions.

What finally brought me to upgrade was the keypad, specifically it's gradual deterioration into senility. The 8 and 0 keys now no longer always register, and texting away to find out that it's misspelled something because it didn't know to put a space in there, despite you asking nicely, is really a pain. The desire to change over to a new, cheaper phone plan was another way to entice my sentimental side to go for the new and the swish.

The big disappointment was having the retail Optus outlet that I got it from unable to transfer the old 6310i's phone memory into the new phone. I have to go home, transfer the data off the old phone (presumably using the old crappy Nokia PC-only software, because I haven't got the Linux bluetooth utils working yet), swap the sim over, and swap it back. I have to do this because no sim card yet invented has thought of the idea of having an address book with multiple contact details per entry. Get with the 19th Century here, guys, people worked out how to do this with paper and pencil.

The minor disappointment was the keypad. The keys feel a lot more wobbly and you have to push them fairly squarely in order to feel like they aren't going to spring off the phone like a particularly badly made calculator. From observation, I would say that the old 6310i's keys were rounded so you put force on the middle, whereas these are flat and rectangular so it's easier to get a fingernail near the edge. Gee, that's a feature. I'm glad it looks better than it feels to use...

So I've updated its wallpaper, changed the ring tone a bit, send a text message and made a phone call. I still haven't found the games yet, which is important for me when waiting for buses or whatnot. And unfortunately the 512MB MMC card and case will come next week. Still, I haven't played with the bluetooth earpiece I got for it, so that may be my next little bit of fun.

Last updated: | path: tech | permanent link to this entry

Tue 4th Apr, 2006

All that is visible must grow beyond itself...

Dumont is now back online, thanks to AusPC Market and AAE. I feel whole again.

Better still, I finally nailed how to get the vt1211 kernel module installed so I can check the temperature of the processor and so forth. I know this probably sounds sooo easy for everyone else but I'll record it here so I can remember it later...

  1. Grab the latest version of the vt1211.c source from Lars Ekman's vt1211 page.
  2. Use the script which I wrote, which contains the commands to make the module against the current kernel source that are in Lars' page.
  3. Install the kernel module by copying it into the kernel's device module heirarchy:
    cp -p vt1211.ko /lib/modules/2.6.16-1.2069_FC4/kernel/drivers/i2c/chips/
  4. depmod
  5. modprobe vt1211
Now if only I can get the Via Unichrome X drivers installed, it'll be complete. It might even be capable of running a MythTV frontend!

I've also found that scourge of repositories, atrpms (fx: ptui!), has kept its ghastly stranglehold on my base repository, and it's continued to load base modules from atrpms. I can see why people say that the only way to get rid of it is to reinstall from scratch. And so we continue with another round of beating its yum configuration into submission...

Last updated: | path: tech | permanent link to this entry

Mon 3rd Apr, 2006

The Heatsink Fandango part 2

Plug the Via back into its case, start it up, it reboots at exactly the same point as before. The heatsink is now firmly glued to the board; nothing short of actual disintegration is going it get it off again. The whole thing's rooted. Place purchase at AusPC Market for another Via board, this time an 800MHz fanless board with SATA connectors. Sigh as $310 leaks out of my savings.

At least this means that I can now take my brother's 200GB backup drive (when it arrives) and plug it into Dumont (the machine that the Via runs) directly rather than having to use Media or Tangram (the machines with SATA connectors) and NFS shares.

Last updated: | path: tech | permanent link to this entry

Aaaarrgghghgh, the horrors of publicity!

Reading the Planet Linux feed today, I discovered that I'd been added to it. My mild embarrassment turned to utter horror as I discovered that, for no readily apparent reason, every post I've made before April 1st appears on that day at 4:21AM. It's like an April Fool's joke gone suddenly very bitter. Anyone reading Planet Linux now gets an additional 42MB of the finest PaulWay Drivel™ imaginable. Goody.

To regain my honour, I shall commit suppuku with a frisbee, as is traditional among my people.

Last updated: | path: tech | permanent link to this entry

Sun 2nd Apr, 2006

The Heatsink Fandango - or how to take two working computers and get only one

40mm heatsink fans are the bane of my existence. No 40mm fan that I've ever owned has kept on spinning silently doing its job until the motherboard has been retired. Instead, I seem to have to replace them every six months or so, as one after another starts whirring, chugging or just plain stopping. And, since each one seems to have a slightly different way of mounting, I end up with carcinogenic heatsink goo all over the place as I attempt to get a good bond that will transfer heat reliably from chip to heatsink.

I've been going increasingly for the Zalman NB47J Northbridge cooler. I have two systems that I've converted from noisy heatsink fan to silent Zalman blue coolness. Unfortunately my Via Epia M-10000's main C3 processor didn't like it - it would run fine but under heavy load such as it would reset. At least the board has thermal protection. When my media machine's heatsink fan died, I knew what I had to do - move the Zalman to media and get a new heatsink with fan for the Via.

I'd bought a rather dodgy but suitably beefy Northbridge cooler plus fan from a computer swap meet as a replacement for the Via's Zalman. You know it's a bad sign when the heatsink is a nice glossy black - not matte black anodised aluminium, but something that's been dipped in paint. OK, maybe reflective surfaces radiate slightly more heat than matte surfaces, I don't know. (Cue half-hour distraction reading pages about what's wrong with the US Army, starting from bad camouflage in deserts through to sleep deprivation, out of date equipment and most of their time spent mowing lawns rather than doing exercises.) But it didn't inspire me with confidence.

My interest took a nosedive when I looked at the mounting system for it. The Zalman, in order to accomodate different locations for the Springy Push-In-And- Lock Things that all Northbridge heatsinks seem to have, has a set of arms that have a hex nut that fits into a channel in the heatsink. You get them into roughly the right position, screw them up until their stiff but movable, position them exactly without pushing them through, do up the screw until it's tight, and your arms are locked in position. Good grip, good pull on the arms from the SPIALT, no problems.

The same channel was in the heatsink, but the arms just have a pushed-out lug to keep them in the channel. The springs are supposed to exert enough pressure to push the arms up (at nearly 10° angle) and hold the heatsink down on the chipset hard enough to get a good thermal contact. Yeah, right!.

Sure enough, I tried several times to get it working but it steadfastly refused to be a heatsink for more than about five seconds at a time. Fedora Core 4 would boot up and often not get past loading the vmlinuz file before the machine would happily reset. In one test it was sitting at 40°C in the BIOS PC Health screen, which is far too hot for any chip to be idling at in a Canberra autumn. And each of these tests is carried out by laboriously fitting the board back into its case, plugging everything back in, and going to the other end of the house to plug it back into its KVM port and power, only to have it do a good impression of the Eiffel Tower upside down.

I tried other combinations with other heatsinks, including the one the board originally came with. I tried small and large amount of thermal paste. I tried no fans, low speed fans, high speed fans. It was obviously a cooling problem, since the more heatsink compound I applied, and the presence of a fan, kept it up for a few more seconds. But nothing has restored to it the stability of the Zalman, and the Zalman is now needed in the Media machine.

There are two options left. One is to swap the Zalman back; the last heatsink on the Northbridge of the Media machine was a dodgy piece of flat aluminium with folded sides stuck on with about a five-cent-piece's area worth of heatsink compound, so it can obviously cope with worse heatsink bedding than the Via. But that would leave the glossy black thing with the abysmal fastening mechanism on the Media machine, and I can cope with Dumont being down far better than I (and Kate) can cope with no TV being recorded.

The other is to whip out the final resort: a two-part thermal epoxy resin from Arctic Silver. This should guarantee a good contact with the chip, but at the cost of never, ever being able to take the heatsink off again. So if it turns out that I haven't done quite a good enough job of sticking the heatsink onto the chip, I can't just have another go. It'll be a trip to AusPC Market to buy $300 worth of new motherboard. $300 that I was trying to put into the home loan to be a good boy.

So, here goes. Clean motherboard and heatsink thoroughly, add part A to part B in equal measure, apply carefully to heatsink, stick together and hope...

Last updated: | path: tech | permanent link to this entry

Thu 30th Mar, 2006

How not to install Fedora Core 5

I wanted to reinstall the operating system on my server. After my totally newbie install of Fedora Core 2 test 3, then using the development and bleeding-edge repositories and adgering the Python system (including up2date), and then using atrpms with innocent glee until I found out that they had a totally different and even nastier version of 'bleeding edge', the whole system felt as if it had been limping along being delicately prodded to stay upright. The final straw was the attack on another machine in the lab which, while it didn't seem to have actually broken into my machine, still didn't give me the Ring Of Confidence.

But here's how not to do an install of Fedora Core 5:

  1. Start on the day you've just installed a Java system for another person in the lab - the day before he goes overseas for a conference.
  2. Make sure the last disk image you have is slightly corrupted so that, after getting through all the other packages, Fedora Core says it can't install trivial-program- and has to reboot, leaving the entire install adgered.
  3. Don't check your images against the SHA1SUM file but try for a while to install using the net boot disk off your local hard drive, getting the same install error as above in different places.
  4. Use the i386 net install disk when you want to install an x86_64 server. Waste some more time finding and burning the x86_64 net install disk, as you can't quite get either the internal or the external burner to erase and write the disk correctly. Waste some more time finding this out the hard way by booting off a disk that's not correctly burnt.
  5. When you finally get the system upright, find out that the portable USB disk that has its own power supply seems to not recognise any system you plug it into, with the exception of the Windows XP system that is, for some reason, adgered sufficiently that it cannot see anything on the network anywhere.
  6. Once you've got the disk freed from its enclosure and plugged into a separate machine (because finding an IDE cable and installing it in the actual install machine is 'too hard'), remember to make the temporary logical volume too small to take all the files. If you're lucky, you can catch yourself doing this and re-make it before you actually copy anything across the network.
  7. Remember to install specific rules in your backup scripts that don't backup your Thunderbird email or your scratch directory full of music. Especially, remember not to disable these rules when making that final backup.
  8. Try restoring the differential backup first, without restoring the base backup. Assume that doing the first automatically also does the second. Install the files in the places they're supposed to go. Wonder why half your configuration doesn't seem to be there, or working. Waste some more time redoing it the right way. Here, again, you can be lucky if you learn from previous mistakes and don't just restore over the places things were backed up from but restore to a temporary directory and be selective about what you copy back.
Why am I so stupid?

On the plus side, I learnt a few things:

Last updated: | path: tech / fedora | permanent link to this entry

Why I Love Linux part 001

When I set up my new FC5 installation it allowed me to create a swap partition on my LVM, but the installer didn't have the option to make it stripe across the two drives. It's a trivial thing, I know: the server has 4GB of SDRAM, so it's not really likely to need a large bunch of fast swap space. But I could play with it, so I did.

And the thing I love about Linux is that you have total control. You want to turn off swapping? swapoff -a. There it is, gone, and the system is still mabulating away happily. Delete the old swap partition? lvremove /dev/mainvg/swaplv. Create the new swap partition? lvcreate -i 2 -I 64 -L 4G -n swaplv mainvg to create a 4GB partition striped across the two disks. After a bit of mental prodding I remembered I have to do mkswap /dev/mainvg/swaplv to make the partition look like a swap device, and then swapon -a Just Works. There you go, 4GB of new improved faster swap space. No reboots, no special utilities, no "what's this C:\win386.swp file?". Easy.

Last updated: | path: tech / fedora | permanent link to this entry

Tue 28th Mar, 2006

"Presentation in Python", or "Does Zope solve the problem?"

I've started to use Zope as a web interface for doing CGI stuff. It has a lot going for it - it encourages you to break your page down into little modules that each do something relatively simple, and pull all that together via a HTML (actually HTML plus some extra XML bits) template. It encourages you to separate your presentation and your logic by making it really easy to separate out a little bit of logic (e.g that displays a file system date in a nice format) into a separate script rather than jumping through hoops in your presentation template.

Er, hang on, what did I just say? Move a piece of presentation into a script? Does that make sense? I thought we were trying to get away from having presentation in scripts. What I've come to realise is that this is a mantra that people use without actually either meaning it or making sense. It has become a way of pushing whatever technology the proponent is selling, er, giving away. Zope separates presentation from logic, just like Clearsilver, or Microsoft ASP.NET, or PHP, or...

The fallacy here is that it's possible to completely separate presentation from logic. By building a dynamic web site, you're saying that some parts of the presentation are going to change over time, which requires logic to do this. The logic has to know the semantic meaning of what it's switching on and off, and the presentation has to know that some bits of it are going to be switched. Ergo, interlinking. There's no way to avoid it.

Now, it is possible to separate these enough to mean that the logic doesn't write HTML and the HTML doesn't do complex logic. But OTOH the worst examples of pages that are infernally difficult to understand are PHP and ASP pages where the coder (it's never a graphics designer) says "It's too difficult to do this bit of presentation in HTML, I'll just spit bits of HTML out of my code." The worst bits of it turn up on The Daily WTF, but they are the tip of the iceberg that is resting on the vital organs of maintenance web designers everywhere.

What brought on this rant was a brief session on the #zope IRC channel on I asked a pair of questions about how to do fairly simple things (yes, I'm that new to Zope) and every reply was "I'd put that in a Python script". All I was talking about is 'if a picture with this URL (constructed out of a product code) is present, display it now'. Zope provides excellent path introspection in its templating language: I can do exactly what I want with:

<img tal:condition="exists:path:string:container/covers/${item/item_code}.jpg" tal:attributes="src string:/cgi/covers/${item/item_code}.jpg">

But the couple of people I posed this question to suggested that the correct way to do this was to call a python script that would find out if the file existed and write up the IMG tag for you. One even suggested that the exists:path:string idea was a dirty kludge that was against the spirit of Zope and TAL. Instead, I'm supposed to have a separate piece of code somewhere that is called with an item code (which, to pass it the item code, would require me to do <tal:block tal:replace="python:here/item_code_image(item_code = item.item_code)"> Image Goes Here</tal:block> that then mystically returns the correct HTML (i.e. it knows exactly where it's called and what tag it's in). Yeah, like that's reliable design!

The impression I'm getting is that, once again, people are falling for the old trap that moves all logic, and the presentation it controls, out into scripts. The template ends up as a child's handful of calls to the code, which does all the presentation that the template can't be trusted to do. And we're back where we started: locking the designers and artists out of doing what they're good at (designing websites) and making the programmers do everything. Again! As if it was a good idea in the first place.

Now I can see why the web designer in this team of two (that I'm gradually becoming a third of) doesn't want to learn about Zope or TAL. Because the control of what he's actually displaying is off in Code Hell. He not only doesn't want to learn Python, he doesn't have the time to - and he shouldn't have to.

I'm all for separation of code and logic, but web designers can cope with simple repeat loops and conditions if they're easy to understand. It's not too difficult. The beauty of TAL, as opposed to ASP and PHP and Perl and all the other things that web pages are built in, is that it has a XML namespace that things like Dreamweaver can respect and leave alone. My partner can understand TAL when I point it out to her, even though she's never used a programming language in her life.

I do agree with Zope's principal of encouraging people to make small scripts to do simple things, and then binding them together in the template. We should resist the temptation to say "But it's easier to do that in <scripting language of choice>" (BTW, can you believe that some lunatics have taken Zope, a Python system, and made it recognise Perl scripts? Because two languages are so much easier to learn than one! And it's so simple to cross back and forth between them in the same system! What were they smoking?)

It's not as if the Python scripts in Zope are particularly simple, either. You have to start by binding and then grabbing a bunch of extra objects that link you into the Zope namespace. You can print HTML out, but then you have to return what you printed because it didn't actually get printed when you thought it did. If your session variable uses a list, you have to buggerise around with it because it's sort of but not quite a mutable object. Yes, of course, it makes things so much easier!

Keep up the work, guys.

Last updated: | path: tech / zope | permanent link to this entry

Mon 27th Mar, 2006

More window effects.

I'm fired up to see what's involved in writing plugins for Compiz and Xgl. It seems to be a pretty good interface - something that's easy to add new effects into. I reckon there's a lot of visual coolness still to be written, and having this kind of environment, as well as the Open Source model to make it easy to learn from the ways of others, will mean that Compiz and Xgl have much more cool effects available than their proprietary Operating System 'competitors'.

One area to explore is the view of the workspaces. Sure, the default out-of-the-box configuration has four workspaces side-by-side, and this neatly maps onto the four side faces of a cube. But what about if you have more than four in a row? What if you have two by two? At one place I worked I had three across and two down, and I've seen some people that have four-by-four. So how does that map?

The shape we need to follow is a toroid. The first idea is to have the workspaces as panes tangential to the toroid. Two windows across get duplicated into four, you see the front of the current workspace and, behind it, a 'back-to-front' view of the same workspace. Three windows becomes a triangle formation, and more horizontal workspaces cause more faces to appear. Two workspaces vertically get seen as a 'two-sided' pane, like a picture with two sides, that gets flipped vertically when you move 'down' or 'up' a workspace. Three vertically is a triangular prism, and so on. There's no 'top' or 'bottom' faces like in a cube, so you can see across to the other workspace on the far side from where you are.

The second idea is to map the workspaces directly onto the toroid, by making them toroidal surface segments. This actually makes the display rules easier, as we don't have to have special rules for handling two workspaces vertically or horizontally. So a two-by-two workspace would have the workspace you just viewed as the nearest outer half of the toroid, the next horizontal space being the furthest outer half, the one 'below' where you were as being the nearest inner half, and the remaining workspace being the further inner half. The transformation animation between flat workspace and toroid surface segment is fairly easy to imagine.

At first glance I thought it would be a bicubic antialiased son of a bastard to do this, but then I realised that what's probably going to happen is that you break the screen up into subsegments and map them into positions on the surface of the toroid. Their location in space is actually fairly easy to calculate, even when they're flying into position to or from the torus. The user could specify the subdivision factor - less than two shouldn't be allowed, to keep the object looking nominally like a torus - and more subdivisions would make the object look more like a toroid at the expense of rendering speed and/or computation power.

And this is only the start of the possibilities. Hooray, Open Source and plug-in architectures!

Last updated: | path: tech / ideas | permanent link to this entry

Sun 26th Mar, 2006

I am the worst event organiser of all time.

What is it about me? Do I smell? Do I do something ghastly in public that no-one can bring themselves to tell me about? Do I talk about myself too much (and here I am blogging this...)? Is there some bad thing I do with organising? What is it?

I turned up at 10:20AM outside the right building. I waited until 10:50AM before I set off - alone. I cycled as casually as I could up through Turner, across the barren wasteland that is the GDE, and arrived at Haydon Drive to find Dileepa waiting for me (as he said he would). We then rode on, swapping Sun stories and enjoying nearly falling off my bike. We arrived at the designated picnic spot at 12:00. I ate lunch (Dileepa had had a large breakfast) and met up with Rainer Klein (who had gone kayaking) at 1:00. I then headed over to Das Kleinhaus and Dileepa headed back to Macquarie.

Steve Hanley organises a ride on a rainy day and gets half a dozen people; I organise one on a beautiful fine day and get two. Yay for me.

Last updated: | path: tech / clug | permanent link to this entry

Fri 24th Mar, 2006

The Oldest Trick in the Book

Shortly after I got into to work this morning, I found out that one of the lab machines I administer (running FC4) has been rootkitted. Damn. I feel incredibly guilty for this, as if I've done something personally wrong by not examining the admin logs every day, as if that could prevent such a thing occurring. Fortunately, Fedora Core 5 has recently been released, so I can do my trick from last time - boot up off the network install disk and install from the ISO images through NFS back to the server. I go to work out whether everything has been backed up on the troubled machine, and it's got a screen saver lock. I say to the user, "Can you type in your password?" and he says "Oh, it's just the same as my username."

Oh dear.

Fortunately he hasn't used the same password elsewhere, so my main server and the dual-core Intel machine are still intact. As far as I and chkrootkit can tell.

I'm still going to be upgrading my server by blowing everything away and restoring, to finally blow away the lingering cobwebs of my problems with development and atrpms repositories that I had when I installed this thing when Fedora Core 2 was just out... My plan is to use dar to back up everything with the permissions intact, and then restore selectively from there using dar-static from the archive disk (a 250GB USB drive). Or at least, that's the plan once I've finished editing the paper I've got to finish.

Last updated: | path: tech | permanent link to this entry

Thu 23rd Mar, 2006

Eduroam, Tango and Kororaa

Two presentations this evening. The first from Steve Walsh about Eduroam, a system that allows academics from a participating institution to go to any other participating institution, log on using their own credentials, and get internet access from there. So I could go to University of Glasgow and log in using my ANU password and get internet access from there. It's obviously not "access as if you were a staff member of that institution", but it means you don't have to pay for dialup and roaming dialup logins or make special arrangements for each person visiting somewhere else. A simple idea and a good one, although their fights with LDAP servers and authentication systems need to go down in legend, preferably in Old English to sit alongside that of Beowulf.

The second one by Pascal Klein was about the Tango Project, which is a project to create an icon set under a Creative Commons license, so that a consistent look and feel can be applied to GNOME, KDE, XFCE, and (if Pascal gets his way), XGl. Too many old-school hackers deride anything more complex (or simpler) than a command line as dumbing things down, usually in the same breath as they whine about how proprietary operating systems are taking over the planet. You cannot underestimate how valuable it is - both for new users and old - to have a consistent interface. The same command-liners will probably cringe when you take their beloved emacs away from them and give them an ordinary GUI text editor, because it doesn't have their favourite alt-left-shift-control-spoon key combination for correctly indenting XML in a boustrophedontic environment. That's called the interface, you morons. Get with it.

The third, 'unofficial', presentation, was by Chris Smart, showing off Kororaa and XGL. Heaps of funky stuff, some borrowed from Mac OS X, some completely new. Pascal made himself dizzy by holding down Ctrl-Alt-Shift- Right-Arrow and watching the cube of the workspaces whizz around before his very eyes. Hopefully it will support i810 integrated graphics, because that's what Pascal's new laptop is going to support, and he's going to be a very disappointed boy if he doesn't get shiny and whizzy. Actually, I should lay off Pascal because he copped enough stick from Steve over his double-edged-sword work with Mark Shuttleworth. But we do need to register

The thing I wanted to note here is that one thing I'm worried about with things like XGL is that we're just going to have rubberised windows as the only behaviour because it's whizzy enough. I think there are a lot of ways of making things behave on a desktop, and I think Linux is all about choosing what behaviour you want. Just on the issue of window moving, I see several more ways to make windows behave as you move them around the screen:

There are a couple of key issues here regarding behaviour.

  1. It has to be quick. Don't do some glacier-like melt and flow or a Cheshire Cat fade and reappear if it takes ages to do. People instinctively wait while these things happen, because (a) they think that the system is too slow for them to grab another window while the first one is moving, and (b) they want to know where that new window has ended up before they make any more decisions about what they can click on. If this takes more than about half a second, people are going to get very tired of waiting. It will feel slow to use and the glamour will wear off all the effects.
  2. It has to show you where the window will go. Imagine starting to drag and only having the cursor go with you - the window stays where it was. Only when you drop the window does it appear in its new spot. What if that's not exactly where the user intended it? They have to guess again. That's bad. Providing constant feedback as to where the window will end up is essential. (It doesn't have to be perfect - you can just show an outline or a shadow or a translucent image. But the user has to know where it's really going to end up.)
  3. It should maintain the idea that windows are semi-solid things that have a physical presence in some virtual space that we're looking into. Imagine if the window sort of melted into other windows as you dragged, and solidified into position when you dropped it (with all the other damage to other windows being undone in the process). People are going to be afraid to move anything in case it doesn't undo correctly, or in case the text from one window does get mixed up in the other.
  4. It shouldn't startle the user or make them think that something's gone wrong. Rubberised windows is OK because as soon as you let go it all snaps back into place and everything's OK again. Having a dematerialisation process that throws electric sparks out of the window as if it's suffering a major electrical failure, and then plonking down the window in its new position like a new clone in Paranoia is going to be disturbing.
And now I really have to fly - bowling tonight with friends! Whee!

Last updated: | path: tech / clug | permanent link to this entry

Shuffling a list - the power of factorials.

I needed to shuffle a list of DNA sequences for a work program. Having been suitably chastised by Damian Conway's excellent entry in maddog's ticket picking mini-competition, I decided that in future I'd do more searching on CPAN before re-inventing the wheel for the sixteenth time.

I feel so stupid. I downloaded Algorithm::Numeric::Shuffle, written by abigail, as I think it was the first entry that turned up in a search on CPAN. I should have looked by category under List::Util, being a utility function on lists. But anyway. abigail pointed out in the POD that the Perl rand() routine is a linear congruential generator - in other words, from a given random seed the numbers will always follow a predictable pattern. The algorithm works by going through the entire list and swapping the Nth member with another random member. Thus, the actual final ordering is dependent only on the initial RNG seed, which is a 32-bit number.

(The List::Util::shuffle routine suffers from this similarly, despite using a somewhat more bizarre method of giving a unique random ordering to the list.)

Now, 13! is 6227020800, and 232 is 4294967296, so there are more combinations for a 13-element list than a 32-bit number. Which means, if you think about it, that there are some combinations that you will never get from the shuffle routine. It's just simple information theory, like compressing data: there's no random seed that will cause that ordering of elements to come out.

Here's where once again I forgot to learn the basic rule of my programming existence: What You Want Has Probably Already Been Invented. I wrote an email to abigail suggesting that, if the user of the shuffle routine wanted a truly random shuffle, they could pass in a sub reference to their own rand() function, or one from e.g. MCrypt or OpenSSL::Rand, that does at least have more bits of random seed (or, for preference, draws on a 'true' source of randomness.) I even started writing up a revision of abigail's code to do this.

Then I asked on the #perl channel on whether the inbuilt rand() function could be passed as a sub reference (it can't apparently). Someone asked why I wanted to do such a thing. I told them the story. I spent some time convincing them that the statistics said it was possible, and that someone might actually care about being statistically correct. Then someone else said "why don't you just override the rand() function for that module?".

Why not indeed! Perl doesn't make package namespaces sacred, so it's perfectly possible to put a sub rand(!) definition in my code after a package List::Util declaration which calls the rand() function I want. Of course, I've yet to work out how to do that, and the random picks of the sequences are good enough for testing so far, so it's a bit of a moot point. But I hate it when I should have known better all along.

I'm somewhat glad that my email to abigail bounced...

Last updated: | path: tech / perl | permanent link to this entry

At last, I've worked out why &lt;pre&gt; formatting in Perl CGI is wrong!

The problem: When I dump a file out in a <pre> block, it's come out with a space in front of every line but the first.

The realisation: I used a $cgi->comment() block to throw in some debugging comments in a CGI script, and lo and behold every line but the first had a space in front of it. For some reason, today it reminded me of the $cgi->p() formatting, which will put a space between each argument if it's given a list. I'm guessing this 'join with a space' behaviour is going all the way through the rest of the CGI module routines.

The solution: Write my own <pre> formatter routine, which just prints '<pre>', the text, and '</pre>'. Ugh. There's probably a better way somewhere in CGI, but at least I know what's going wrong.

Last updated: | path: tech / perl | permanent link to this entry

Research is your friend

I have a rule in Thunderbird that tags everything that has an X-Mailer header as 'The Bat!' as spam. It's never been wrong yet. And yet a bit of research found that The Bat! is apparently a legitimate mailer. Just as Outlook gets imposted by spam bulk mailers, apparently they also impersonate The Bat!. Strangely enough, I haven't seen them use the default setting in the Advanced Mass Sender, which is 'Advanced Mass Sender'. I wonder why...

I find it ironic that this probably makes The Bat! look much more popular and Outlook less so...

Last updated: | path: tech | permanent link to this entry

Wed 22nd Mar, 2006

Prove that you wrote that - ten years ago

In the lab that I work in, I'm supposed to write down everything I do, and everything I think of, and date it, in a big red book. Should there ever be a time where someone else informs me that my idea is not original and they were first to think of that, this book is theoretically going to settle the matter. Of course, I might still be wrong, but I wouldn't be just me saying so. And, of course, it looks better if I've got it in a book and they haven't.

Of course, I'm lazy, I type much faster than I write, and I don't have ideas that can be neatly mapped out on a page. But files - ah, files are untrustworthy. Discs are untrustworthy. You can fiddle with their contents whenever you like. It doesn't take much work to make a file look it was created on the day I was born in 1971 - although going past the Unix epoch (or your local filesystem's equivalent) is a bit more difficult. And since any challenge like this happens over the course of weeks, not in midnight raids, the temptation to fudge things a little to make it look like you came up with that idea for a method of delivering formatted content through the internet two years before Tim Berners-Lee ever thought of HTML.

So what one needs in this situation is a trustworthy repository with an audit trail. It must be a trusted third party, so you can deny any direct involvement. The audit trail must itself be untamperable. The third party has to also prove to you that your files, and their record thereof, hasn't been tampered with - that no-one else has submitted work claiming to be you. And, most importantly, it must use fairly simple mechanisms that can act on a day-to-day (or even more frequent?) basis, so that I don't have to wait until a CD is filled before mailing it off to the escrow agency.

Now, all this isn't too hard. Public Key encryption allows you to sign your work in ways that are provably hard to forge or tamper; it also allows them to sign their logs in such a way that you can verify with their public key that the log is a true and correct account, even if you can't play with it directly. Rsync and other methods provide a simple way to make a copy of your work here to a remote location with a minimal transfer, also using a secure transport (ssh). There are other methods - scp, webdav, version contron systems; I don't think it should need to be one protocol for moving the files to the remote location, just so that you can be confident that it's been received and that no-one else can tamper with your work.

At the remote site, I would imagine a system where every change - and using rsync or diff here would make a lot of sense - would be written to a log. Each entry in that log would be digitally signed by the secret key of the logging system. This then gets written to a CD-R, or some other permanent "can't be changed again" system - stone tablets, for example - stored in enough locations that it's too difficult for an attacker to change or destroy them all. And because your changes were signed with your public key, you can now prove that the online log of your changes agrees with what you say happened.

In fact, if you use the model of open source backups (i.e. "Real men don't make backups, they just upload their work to a public FTP server and call it the Linux Kernel."), you could create a system similar to FreeNet where people hosted small chunks of this growing data corpus, hashed and encrypted and distributed to such an extent that no-one knew what was in the blocks that they held. If you had to do that to access the system, then while you might be saving your own competitors' information, at least you're improving your own security in doing so. And obviating the need for stone tablets is a Good Thing.

Now I just need to invent the system to protect this document, so that in a years' time when someone wants to do this I can say "Ah, but I invented it first! Your solution must be open source and free for everyone!"

Last updated: | path: tech / ideas | permanent link to this entry

CLUG Bike Ride - March 2006

This is just a sort of advance notice of my plans for the Linux Bike Ride March 2006 Canberra. Or whatever its title should be.

Please email the CLUG list or me if you're interested in coming. Please feel free to wait at an appropriate point for us, or phone me on my mobile (0422 392 081) to arrange to be at a particular locus in the space-time continuum. Even if you just meet up for lunch, that'd be great!

Last updated: | path: tech / clug | permanent link to this entry

Tue 21st Mar, 2006

How big a pipe, you say?

I'm probably the third-last Fedora fan to find this out, but version five is now steaming and fresh on the various mirrors I have access to. I'd resolved long ago to share FC5 when it came out via torrents, but it makes more sense for me to download it from a mirror I don't get charged for. I ssh'd in from work to my home machine and started downloading it across the 512/512 DSL, and then thought, "hang on a minute..." and checked AARNET.

Which the ANU has three 55Mbit Frame Relay links to.

Which means that I downloaded the five images in about five minutes - at the average rate of 10MB/sec (the speed of my link to the building switch, as it turns out. I've got a gigabit switch here but my upstream is still limited...).

Then I copied it onto my 80GB portable laptop drive, and whisked it home, for the purposes of sharing for those less fortunate.

Internode still only has the DVD image. I'll get that too. That can wait until I get home.

Last updated: | path: tech / fedora | permanent link to this entry

All posts licensed under the CC-BY-NC license. Author Paul Wayper.

Main index / tbfw/ - © 2004-2023 Paul Wayper
Valid HTML5 Valid CSS!