My approach to operational transformation February 17, 2012Posted by olvidadorio in Programming.
The whole approach is rather simple and vanilla, except for the method of operational transformation I chose to control concurrent edits to a file:
I assign a unique id and a floating point number between 0 and 1 to every character in a document, such that ordering the characters by those points yields the document in its original sequence.
Adding a new character between two others now simply amounts to inserting it somewhere between the neighbors’ respective positions. In this manner no shifting indexes or clocks must be handled.
The problem that this helps solve is the following: One user (let’s call him Bert) is editing the file at index 23, whereas another user (Nancy) is editing around index 42. Bert hits a key, editing the document, and 30 miliseconds later Nancy hits the delete key, removing one character from the document. Both Bert and Nancy’s edits now reach the server. It now has to figure out which key Nancy actually deleted. Since Bert’s edit happened before Nancy’s the naive assumption would be that she actually deleted the character at (old index) 22, after having been shifted once over by Bert’s edit (which however hadn’t yet reached Nancy’s client). So in order to mitigate this situation the server need not only know the exact state of every client, it also constantly has to struggle with different connection speeds and the like to establish the correct sequence.
Using my technique, prioritizing edits is a non-issue, all edits are assumed as factual statements about a position, independently of the order in which they are submitted. The only pathological case appears if two users are concurrently editing the same spot (which is problematic anyway), this pathological case is mitigated by introducing some random jitter to submitted characters’ positions.
However, you may have already noticed that inserting characters in my fashion can suffer from an entirely different problem: Insufficient floating point precision. If I always insert my character in the exact middle between its two adjacents I get my very own uncomfortable version of Zeno’s paradox, with distance between values getting arbitrarily small rather quickly (0.5, 0.75, 0.875 …).
The first approach to mitigate this that I’ve implemented is to acknowledge the fact that text is usually typed from one direction to another (e.g. latin script from left to right). So I try to insert new characters closer to their left than their right neighbor (not smack-dab in the middle).
Another approach would be an unfortunately slightly more complex handoff-process in which all clients and the server agree on a point in the future, at which all clients concurrently create a new version of the ordering placements, according to a cannonical method (i.e. evenly distributed between 0 and 1). All edits that were performed before that point in time are included in that canonical re-scaling, all thereafter are inserted by the clients into the new order, according to the above method. Problems may arise in case some edits performed before that point arrive late, after re-assigning, which necessitates all clients to re-assign afresh for every late edit (and in the worst case re-assign places for consecutive edits).
I bet there’s a correct name for this technique of ordering concurrent inserts to a sorted collection, but I haven’t found it yet. I also haven’t seen anyone apply this technique to OT, but if you have, please let me know!
A bug walks into a program…
…Nay, it rather puts on a suit, and becomes a program!
In this article I suggest a novel approach to programming that’s more like poetry and hopefully a lot less sucky… Caution, it’s a rough draft, but I’m quite keenly interested in making it intelligible, so if it isn’t please hit me. So here goes the story:
This afternoon, I was sitting on the loo, letting the usual trivialities flow through my mind; visual programming and such, the usual. One of my thoughts was: How cool would it be to have a user-interface that returns to the expressiveness of command line interfaces and the days of programming languages as operating system. That once again makes the average user an applied programmer as opposed to a mere button pusher along the lines of “select from this range of three fuckin’ fabrics”, nothing more than a consumer of pre-programmed options.
So while sitting there trying to figure out what that kind of user-interface might look like this idea for a new programming language paradigm (re-)struck. This paradigm kind of goes in the direction of natural language interfaces but might also be combined with some visual approach. But let’s try to put it together:
Poetic programming or the imprecise language
An interpreted programming language that is not precise, where the interpreter doesn’t let you or make you fully control, predict or understand the syntax or actual execution steps of your program. This language gives you the freedom to do “not all things considered” style programming in exchange for giving up control of well-defined execution behavior. Essentially it treats your code not as a drab cookbook but as a deep and mysterious poem, full of unexplored subtleities.
..But you say: This is bullshit. I don’t care if it hurts! I wanna have control. Heaven forbid if things go horribly wrong?!
Well, then we need fault-tolerance built into the language. Because, coming to think of it, we need fault-tolerance anyway. We should constantly be monitoring and testing our applications’ behavior anyway. Because things are going to go wrong anyway, anyhow, anywhere, at any time, and preferably while you’d rather be doing something else…
So a few fault-tolerance characteristics:
- Undo everywhere. All actions are journaled and undoable if ever possible.
- A limited and access-controlled set of non-undoable actions
- Possibly “let-it-crash” supervision and process capsuling mechanisms as known from Erlang, Akka etc.
- Context, context, context. The interpreter basically performs anaphora and general reference resolution on the tokens used in the programmer’s instructions. This may be an unsolved problem – in general and in natural language – but since we are creating an iterable artificial language from scratch, reference resolution might actually be tractable. I imagine a system in which various methods could be plugged in and refined bit by bit. Maybe one could even create a kind of economy/evolution, with selection criteria and feedback. But fundamentally this is a part where the language must be very malleable and customizable.
- Side effects all over the place; While functional programming languages such as Haskell try to get rid of all side effects altogether in order to get safe programs, we take the opposite approach, put side-effects into everything. Let functions’ execution affect each other, this allows for context, more information, less work. And then make sure you can undo it when things go wrong.
- Case-based programming. Have various views in which to do things, these actions are automatically journaled and can then be recalled using incomplete references.
Work In Progress August 11, 2008Posted by olvidadorio in Programming.
Tags: neural networks, obama, thesis, xmpp
add a comment
I am still hacking away at my thesis. That’s why I haven’t posted in a while. I would have loved to have made some pithy comments about Obama’s evolving policy for president and how seamlessly he’s getting into the mood. Oh, and I’d loved to have talked about a few more issues: There were some further thoughts on Browser as Platform vs. other ways of doing that (remember, I had opted for IM’s). It’s interesting that XMPP has been getting more and more attention (as well as discontention) as it becomes clear that some sort of push-mechanism is really kinda necessary…
Most of the other topics are fermenting! My thesis, as I said, has taken a front seat. I still haven’t really started writing a text (that could worry me). I set my time to finish until the end of September, where I should be done & ready to review by mid Sept. – which doesn’t leave a heck of a lot of time. Oh no. Next steps are: wrapping up the programming part. Since a straightforward way of learning the task hasn’t worked, I’m opting for an augmentative strategy. Very much like heuristics help you in search problems, I’ll be helping the learning algorithm along by providing a (hopefully) well-picked set of more easily learnable sub-problems and sub-networks that are merged and integrated step-by-step. Lets hope it’ll fly.
Back to hack.
Platform Problems June 15, 2008Posted by olvidadorio in Programming.
Tags: air, ajax, bill gates, browser, erlang, flex, gears, jabber, silverlight, steve jobs, virtual-machine
Earlier today I read a TechCrunch article (had been reddited) on Google Gears and its position as a direct competitor. The main thrust is that the browser — possibly with some extension — is going to be the new important virtual machine — and that Google is positioning Gears as an alternative to both Flex/Air and that M$ technology. I was having an uneasy feeling dans le gutt.
Also, I was watching a joint interview by Steve Jobs and Bill Gates. Besides this being an intriguing setting as a whole, just watch the second part of this sub-clip. Steve & Bill are asked for their take on exactly this browser-as-new-standard-platform thesis. Of course they basically laugh it off. But as Jobs puts it: they are the dinosaurs. But watch the clip, it’s nice.
Here’s my (& my gutt’s) take: The browser (or conversely, the www) is quite a shitty choice for a standard platform because: