<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>chaostangent &#187; Code</title>
	<atom:link href="http://blog.chaostangent.com/archives/category/code/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.chaostangent.com</link>
	<description>More squirrels than sense</description>
	<pubDate>Wed, 27 Aug 2008 19:25:42 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.1</generator>
	<language>en</language>
			<item>
		<title>Expectancy: PHP 5.3</title>
		<link>http://blog.chaostangent.com/archives/446</link>
		<comments>http://blog.chaostangent.com/archives/446#comments</comments>
		<pubDate>Wed, 27 Aug 2008 19:18:00 +0000</pubDate>
		<dc:creator>ChaosTangent</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[namespaces]]></category>

		<category><![CDATA[oop]]></category>

		<category><![CDATA[php]]></category>

		<category><![CDATA[php4]]></category>

		<category><![CDATA[php5]]></category>

		<category><![CDATA[php5.2]]></category>

		<category><![CDATA[php5.3]]></category>

		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://blog.chaostangent.com/?p=446</guid>
		<description><![CDATA[
The release of PHP 5.3 is due sometime soon and with a feature freeze in place since the 24th of July and a pre-release alpha now available, it&#8217;s worth exploring some of the many additions and changes that are going to be introduced.
As PHP is the language I most frequently work in and one which [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.chaostangent.com/wp-content/uploads/2008/08/php.gif"><img class="alignnone size-medium wp-image-447" title="Pretty Hard Panda" src="http://blog.chaostangent.com/wp-content/uploads/2008/08/php.gif" alt="" width="120" height="67" /></a></p>
<p>The release of PHP 5.3 is due sometime soon and with a feature freeze in place since the 24th of July <em>and</em> a pre-release alpha now available, it&#8217;s worth exploring some of the many additions and changes that are going to be introduced.<span id="more-446"></span></p>
<p>As PHP is the language I most frequently work in and one which I&#8217;ve done all sorts with (from web applications, to <a href="http://blog.chaostangent.com/archives/370">file exploration</a> to <a href="http://blog.chaostangent.com/archives/14">media player scripting</a>), I like to think I&#8217;m sensitive to deficiencies and oddities in the released implementations. Version 5.3 contains a lot of elements backported from the still distant version 6, the most glaring omission being end-to-end Unicode support without mb_* fudges or iconv; being able to use string-backed functions like array_unique() without suspicion will be a big help, but I&nbsp;digress.</p>
<p>The most high-profile addition is that of namespaces, gone will be the warts that dot current frameworks (e.g. Zend_Db_Table_Rowset) which will make different frameworks and modules far easier to use and far more friendly when you want them to play nicely&nbsp;together.</p>
<p>Static functions have also been promoted to all a lot of the meta-programming niceties that member functions have including true overloading support which will allow first level abstractions such as database wrappers to not require instantiation before being called (which I discovered around the same time as <a href="http://blog.chaostangent.com/archives/40">my get_class exploration</a>). For instance, if using an ORM, doing People::getAllById() will now be easier to achieve. Along side this many of the magic methods have been tightened up to make them less ambiguous (__get can only be public and not static, signatures enforced&nbsp;etc.)</p>
<p>Looking through some of the other <a href="http://wiki.php.net/doc/scratchpad/upgrade/53">changes detailed in the PHP Wiki</a> it seems that a selection of new functions surrounding garbage collection are now being exposed including checking whether it is enabled, and selectively enabling or disabling it. Whether this is a mistake (close by get_extension_funcs() is detailed as a new function but <a href="http://uk3.php.net/manual/en/function.get-extension-funcs.php">appears to have been in since PHP4</a>) and these are bleed-throughs from the Zend Engine is unclear, but without some surrounding memory management facilities, it would seem unwise to disable or allow disabling of garbage&nbsp;collection.</p>
<p>On the extension front numerous ones have been standardised and moved into the PECL system which goes some way to neatening things up; the change <a href="http://blog.felho.hu/what-is-new-in-php-53-part-3-mysqlnd.html">some are talking about</a> is the choice between a local MySQL library (mysqlnd) versus the native libmysql library that comes when compiling against a MySQL release. PHP and MySQL have always been bedfellows despite their conflicting release licenses (especially so since Sun gobbled up MySQL) so this seems like a smart move for all concerned with separate code-base, better engine integration and statistical analysis now possible (<a href="http://www.hristov.com/andrey/projects/php_stuff/pres/mysqlnd_vikinger.pdf">PDF&nbsp;details</a>).</p>
<p>What all of this adds up to is a release that&#8217;s solid on paper, but the bum-rush for patches is sure to be as swift as any other PHP release. Especially with the OO enhancements though, it feels like these should have been included from day one, as not only will there now be a disjoint between PHP4 and PHP5 shared servers, but PHP5.2 and PHP5.3 as well. For someone who runs their own server this is not massive worry, especially when the list of backwards compatibility changes are so small, but for service providers (hosts, ISPs etc.) still dragging their feet over 4 &gt; 5 &gt; 5.2, this adds another step of&nbsp;complexity.</p>
<p>The real test will obviously be the frameworks and high profile applications that PHP utilises and with word that the <a href="http://framework.zend.com/">Zend Framework</a> won&#8217;t be <a href="http://www.nabble.com/PHP-5.3-Namespaces-on-ZF-td18836642.html">supporting namespaces until its 2.0</a> release next year the lead time could be immense, especially when you consider phpBB, what was once considered the yardstick of PHP usage, <a href="http://www.phpbb.com/support/documentation/3.0/quickstart/quick_requirements.php">still supports 4.3</a> with its most recent version, the playing field for cutting edge PHP seems less than&nbsp;agile.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chaostangent.com/archives/446/feed</wfw:commentRss>
		</item>
		<item>
		<title>Javascriptery: Tabbed forms</title>
		<link>http://blog.chaostangent.com/archives/380</link>
		<comments>http://blog.chaostangent.com/archives/380#comments</comments>
		<pubDate>Wed, 12 Mar 2008 21:44:34 +0000</pubDate>
		<dc:creator>ChaosTangent</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<category><![CDATA[accessibility]]></category>

		<category><![CDATA[ala]]></category>

		<category><![CDATA[forms]]></category>

		<category><![CDATA[javascript]]></category>

		<category><![CDATA[script]]></category>

		<category><![CDATA[snippet]]></category>

		<category><![CDATA[tabbed]]></category>

		<category><![CDATA[tabs]]></category>

		<guid isPermaLink="false">http://blog.chaostangent.com/archives/380</guid>
		<description><![CDATA[
Forms are perhaps the bane of web development for me; you can&#8217;t get them to look good, you can&#8217;t find a foolproof way to make them act well and lets not even start of trying to get them into a pacified state, free from the dangers of user input (surprise ending: form input will never [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://gallery.chaostangent.com/galleries/random/blog/tabbedform.png" width="512" height="100" alt="" /></p>
<p>Forms are perhaps the bane of web development for me; you can&#8217;t get them to look good, you can&#8217;t find a foolproof way to make them act well and lets not even start of trying to get them into a pacified state, free from the dangers of user input (surprise ending: form input will never be completely trustworthy). A lot of sites would appear to have aesthetically pleasing forms, however this is a careful ruse by them as they sidestep the problem of forms by having only one or two of them, and then they usually only have a few fields. The monstrosities I am required to deal with almost daily are things of grotesque beauty, veritable Rube Goldberg machines  of complexity.<span id="more-380"></span></p>
<p>The long and the short of this diversion into why forms are evil (please, end my suffering quickly <a href="http://www.w3.org/MarkUp/Forms/">XForms</a>) is that to get a form looking good, you have to spend a long time fiddling with things. Enough of this banter anyway, my fiddling with JavaScript (like the dirty little bastard child of C and Perl it is) produced a way of creating a tabbed form that defaults to a standard single form if a user prefers to use <a href="http://noscript.net/">NoScript</a> or an antiquated browser of&nbsp;yore.</p>
<p>So the following&nbsp;markup:</p>
<pre><code>&lt;form method="post" id="theForm"&gt;
	&lt;fieldset&gt;
		&lt;legend&gt;First tab&lt;/legend&gt;
		&lt;ol&gt;

			&lt;li&gt;&lt;label for="formone"&gt;One&lt;/label&gt; &lt;input type="text" name="one" id="formone" /&gt;&lt;/li&gt;
			&lt;li&gt;&lt;label for="formtwo"&gt;Two&lt;/label&gt; &lt;input type="text" name="two" id="formtwo" /&gt;&lt;/li&gt;
			&lt;li&gt;&lt;label for="formthree"&gt;Three&lt;/label&gt; &lt;input type="text" name="three" id="formthree" /&gt;&lt;/li&gt;
		&lt;/ol&gt;
	&lt;/fieldset&gt;
	&lt;fieldset&gt;

		&lt;legend&gt;Second tab&lt;/legend&gt;
		&lt;ol&gt;
			&lt;li&gt;&lt;label for="formfour"&gt;Four&lt;/label&gt; &lt;input type="text" name="four" id="formfour" /&gt;&lt;/li&gt;
			&lt;li&gt;&lt;label for="formfive"&gt;Five&lt;/label&gt; &lt;input type="text" name="five" id="formfive" /&gt;&lt;/li&gt;
			&lt;li&gt;&lt;label for="formsix"&gt;Six&lt;/label&gt; &lt;input type="text" name="six" id="formsix" /&gt;&lt;/li&gt;

		&lt;/ol&gt;
	&lt;/fieldset&gt;
	&lt;fieldset&gt;
		&lt;legend&gt;Third tab&lt;/legend&gt;
		&lt;ol&gt;
			&lt;li&gt;&lt;label for="formseven"&gt;Seven&lt;/label&gt; &lt;input type="text" name="seven" id="formseven" /&gt;&lt;/li&gt;
			&lt;li&gt;&lt;label for="formeight"&gt;Eight&lt;/label&gt; &lt;input type="text" name="eight" id="formeight" /&gt;&lt;/li&gt;
			&lt;li&gt;&lt;label for="formnine"&gt;Nine&lt;/label&gt; &lt;input type="text" name="nine" id="formnine" /&gt;&lt;/li&gt;
		&lt;/ol&gt;
	&lt;/fieldset&gt;
&lt;/form&gt;</code></pre>
<p>The default will be to display the first fieldset, with links in a list to display the other two. A trivial form like this certainly doesn&#8217;t require a tabbed layout, but a monstrosity that contains 27 input fields (some multiple choice) could do with a little information management when displayed to the user. The general markup is what I&#8217;ve settled on for the majority of my forms and is based almost entirely on <a href="http://www.alistapart.com/articles/prettyaccessibleforms/">Nick Rigby&#8217;s article on ALA</a> but the styling isn&#8217;t what&#8217;s important&nbsp;here.</p>
<p>For this project, like any I undertake with JavaScript, I&#8217;ll be using the <a href="http://www.prototypejs.org/">Prototype library</a> (version 1.6 specifically for this snippet), this could be done without it with minimum fuss but Prototype is lovely so I usually already have it&nbsp;included.</p>
<p>The functionality of this project is pretty minimal, the building of a list of the available fieldsets lies at the core of it. When the script is invoked it will hide all but the first fieldset, build an unordered list of the fieldsets (taking the names from the &lt;legend&gt; elements) and then set up event listeners for that list to change the visible state of each&nbsp;fieldset.</p>
<p>First things first, set up the Javascript object and hiding of the&nbsp;fieldsets:</p>
<pre><code>var tabbedForm = {
	init: function() {
		var formElem = $('theForm');
		if(formElem)
		{
			$A(formElem.getElementsByTagName('fieldset')).each(function(s, i) {
				var fieldsetId = s.identify();
				// hide all but the first
				if(i != 0)
				{
					s.hide();
				}
			});
		}
	}
};</code></pre>
<p>Nothing spectacular, uses the oft ignored index property of the each() function to scry when it&#8217;s not the first in a list, there are plenty of other ways of achieving this. Next job is to build the list of available fieldsets and plop that into the document at some point, so augmenting the init()&nbsp;function:</p>
<pre><code>init: function() {
	var formElem = $('theForm');
	if(formElem)
	{
		var listElem = document.createElement('ul');
		$A(formElem.getElementsByTagName('fieldset')).each(function(s, i) {
			var fieldsetId = s.identify();
			// hide all but the first
			if(i != 0)
			{
				s.hide();
			}

			var legendElem = s.down('legend');
			if(legendElem)
			{
				var listItemElem = document.createElement('li');
				var linkElem = document.createElement('a');
				linkElem.href = fieldsetId;
				linkElem.innerHTML = legendElem.innerHTML;
				Element.addClassName(linkElem, fieldsetId);
				Event.observe(linkElem, 'click', tabbedForm.tabClicked);

				listItemElem.appendChild(linkElem);
				listElem.appendChild(listItemElem);
			}
		});

		Element.insert(formElem, {before: listElem});
	}
}</code></pre>
<p>An unordered list item is created, the for each fieldset, the &lt;legend&gt; element is nabbed and its value used as the title for each list item. Probably the only questionable part is making the link element point to the ID of the fieldset, this is just how I do things so that when a link is clicked, the ID is available. Other people I know put these sort of items within the Javascript object itself or in a classname or somesuch, whatever works for you; I don&#8217;t have to worry about non-Javascript users clicking the links because the entire structure is generated rather than marked up. I drop the completed unordered list above the form element which fits with the &#8220;tab&#8221; metaphor we&#8217;re aiming&nbsp;for.</p>
<p>The only remaining function is what happens when a link in the generated list is clicked which according to my event listener is called (cunningly enough),&nbsp;&#8220;tabClicked&#8221;:</p>
<pre><code>tabClicked: function(evt) {
	Event.stop(evt);
	var linkElem = Event.findElement(evt, 'a');
	var formElem = $('theForm');
	if(linkElem &amp;&amp; formElem)
	{
		var idToShow = linkElem.href.substr(linkElem.href.lastIndexOf('/')+1);
		$A(formElem.getElementsByTagName('fieldset')).each(function(s) {
			if(s.identify() == idToShow)
			{
				s.show();
			}
			else
			{
				s.hide();
			}
		});
	}
}</code></pre>
<p>After stopping the link click event from bubbling up any further it grabs the clicked link element (I find it best not to take for granted which element has been clicked and just do a &#8220;findElement&#8221; to make sure we&#8217;re on the same page), pulls the ID from href attribute then iterates through the form&#8217;s fieldsets to find the one it refers&nbsp;to.</p>
<p>At this point the scripting is completed and a <a href="http://blog.chaostangent.com/stuff/tabbedform/">barebones proof of concept</a> can be seen. Obviously with no style it&#8217;s not going to look like tabs, but with a little <a href="http://www.alistapart.com/articles/slidingdoors/">sliding-door tomfoolery</a>, you&#8217;ll be tabbed up in no time. At this point you&#8217;ll likely want to expand on the functions above by dropping in some choice CSS classes, setting the active tab to &#8220;on&#8221; for appropriate styling and maybe even adding some other classes to let your stylesheet know things have been modified by a script (I find simply added a &#8220;scripted&#8221; class to the container element works&nbsp;wonders).</p>
<p>The beauty of this is it&#8217;s accessible (the form still works 100% without scripting) and it prevents a user from seeing just what a mammoth form they may be completing (blood of your first born? yes&nbsp;please).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chaostangent.com/archives/380/feed</wfw:commentRss>
		</item>
		<item>
		<title>Blogosphereotronomatic - GO!</title>
		<link>http://blog.chaostangent.com/archives/379</link>
		<comments>http://blog.chaostangent.com/archives/379#comments</comments>
		<pubDate>Tue, 11 Mar 2008 23:12:06 +0000</pubDate>
		<dc:creator>ChaosTangent</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<category><![CDATA[Stuff]]></category>

		<guid isPermaLink="false">http://blog.chaostangent.com/archives/379</guid>
		<description><![CDATA[I have put in a list of links of other websites I frequent in the right hand column (the much maligned right column) but only on the homepage. Many people may be tempted to call it a &#8220;Blogroll&#8221; which is precisely what Wordpress informed me it was called when I put in the links; this [...]]]></description>
			<content:encoded><![CDATA[<p>I have put in a list of links of other websites I frequent in the right hand column (the much maligned right column) but only on the homepage. Many people may be tempted to call it a &#8220;Blogroll&#8221; which is precisely what Wordpress informed me it was called when I put in the links; this however sounds far too close to &#8220;bogroll&#8221; which means toilet paper and I doubt that&#8217;s something that I would like to infer about sites which you are potentially scurrying towards, away from this nexus of madness. As such, and to lampoon the grotesque word &#8220;blogosphere&#8221;, this list of links is now called the Blogosophereotronomatic, bow before your new Scrabble word&nbsp;god.</p>
<p>I plucked these links from my RSS reader (<a href="http://www.rssowl.org/">RSS Owl</a>, how I love thee) however there is one category still left to add, the ubiquitous &#8220;Stuff&#8221; category which more or less defines this site as well as my life so take from that what you will. I&#8217;m hoping to tweak the design at some point and possibly add RSS feed links (as that&#8217;s usually how I read them) but that&#8217;s for another&nbsp;time.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chaostangent.com/archives/379/feed</wfw:commentRss>
		</item>
		<item>
		<title>Deconstruction 2: JUDGEMENT DAY</title>
		<link>http://blog.chaostangent.com/archives/372</link>
		<comments>http://blog.chaostangent.com/archives/372#comments</comments>
		<pubDate>Thu, 10 Jan 2008 20:08:39 +0000</pubDate>
		<dc:creator>ChaosTangent</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://blog.chaostangent.com/archives/372</guid>
		<description><![CDATA[Attacking those &#8220;random&#8221; files a couple of days ago provided enough of a challenge to keep me interested for a few hours, especially as it seemed like I was treading new ground in terms of spec&#8217;ing out previously unexplored file formats. It turned out that the files had already been mapped and successfully decompressed and [...]]]></description>
			<content:encoded><![CDATA[<p>Attacking those &#8220;random&#8221; files a <a href="http://blog.chaostangent.com/archives/370">couple of days ago</a> provided enough of a challenge to keep me interested for a few hours, especially as it seemed like I was treading new ground in terms of spec&#8217;ing out previously unexplored file formats. <a href="http://blog.chaostangent.com/archives/370#comments">It turned out</a> that the files had already been mapped and successfully decompressed and the only thing left to do was build an unpacker which was in the pipeline. It seemed my work wasn&#8217;t exactly fruitless but other, probably smarter people had everything under control. I wasn&#8217;t about to let that stop me though.<span id="more-372"></span></p>
<p><em>Note (2008-01-11): The full (official?) SDK for this file format <a href="http://yaneurao.hp.infoseek.co.jp/yaneSDK2nd/">has been located</a> which includes both a packer and an unpacker as well as other tools I&#8217;m sure are useful for working on the file format. The full name of the file format is &#8220;Yaneurao&#8221; with the SDK going by the nomenclature of &#8220;yaneSDK&#8221; which is the stem for the file format signature of &#8220;yanepkDx&#8221;. There is already a <a href="http://yanesdkdotnet.sourceforge.jp/">.NET version of the SDK</a> so if you&#8217;re interested in my deconstruction process then read on, otherwise I would recommend using the official/fully-featured&nbsp;SDKs.</em></p>
<p>The compression format was identified as <abbr title="Lempel-Ziv-Storer-Szymanski">LZSS</abbr> and reading through <a href="http://sekai.insani.org/archives/24">several</a> <a href="http://oldwww.rasip.fer.hr/research/compress/algorithms/fund/lz/lzss.html">sites</a> revealed that some of the data I had initially spotted but attributed to SHIFT JIS (or at one point a Unicode Byte Order Marker, perfect for a non-Unicode file) were the tell-tale signatures of LZSS; the gradual degradation into junk data was also typical of the algorithm as the further into the file the stream progresses, the more back references are&nbsp;present.</p>
<p><img src="/stuff/deconstruction/06.png" width="382" height="20" alt="" /><br />
While I hadn&#8217;t heard of LZSS, it came as no surprise that it was a modified version of <a href="http://en.wikipedia.org/wiki/LZ77">LZ77</a> which I had come across before though never toyed with. Having to <a href="http://www.cs.duke.edu/courses/spring03/cps296.5/papers/ziv_lempel_1977_universal_algorithm.pdf">dig through a dense PDF</a> was not my idea of fun and my university days had proven that reading academic proofs rarely lead to workable implementations for me so I <a href="http://www.google.co.uk/search?q=lzss+php">searched for a ready-made PHP version</a> which (for reasons which will soon become glaringly apparent) didn&#8217;t prove fruitful. After coming up against dead-ends with other languages I settled on the <a href="http://www.koders.com/c/fidC554142F5E42CA3433CD4C8B9043D09C8A092DF8.aspx">defacto C version</a> which seemed most other versions I found were based&nbsp;off.</p>
<p>Ignoring my <a href="/stuff/deconstruction/deconstructor.zip">original deconstruction script</a> for the moment, I worked on the assumption that each individual file contained within the large .dat files were individually compressed given that each file had a readable opening section of bytes and (according to the LZSS spec) didn&#8217;t have any back references. Like with other implementations of algorithms I didn&#8217;t fully understand, I copied the C code more or less exactly, altering formatting to my tastes and altering code to take into account any PHP idioms that I could foresee. Checking things over, I pumped in one of the compressed files and, unsurprisingly, the output file was more or less blank. After rechecking the code and running it again, the output file was once again filled with spaces and some sporadic junk bytes that didn&#8217;t look&nbsp;familiar.</p>
<p>The script wasn&#8217;t even outputting the uncompressed data at the beginning of the file and the output was larger than the input but still not the size flagged in the original .dat files. After scratching my head for a while I set about spitting out some debug data to pinpoint what had gone wrong and where. The algorithm is broken down into roughly three main sections, in two main control structures. Putting in some basic output formatting to check each section was executing proved that each section was being run in a way that I could only assume was&nbsp;correct:</p>
<p><img src="/stuff/deconstruction/07.png" width="520" height="144" alt="" /></p>
<p>This assumption of course turned out to be false but I wouldn&#8217;t realise this until later in the day. The LZSS algorithm uses a number of constants to define things such as the size of the sliding buffer window, maximal reference length and minimal reference length (a change from the LZ77 algorithm to prevent the encoding being longer than the original) so I tweaked the values first with sensible then ridiculous values only to have the script spit out similarly broken output. The C algorithm also had several places where it used hex values to do bitwise operations, converting these to decimal (obviously) proved ineffective and I was ready by now to admit that I was stumped. I had been working on it for a while now so I took a break for lunch, during which I decided to ditch the C algorithm and start from scratch so that I actually understood what was going&nbsp;on.</p>
<p>This proved even more torturous so I switched back to my original script and started spitting out some fairly detailed output including: the section of the algorithm, the current byte location in the stream, the hex value of the most pertinent read byte and the binary value of that&nbsp;byte.</p>
<p><img src="/stuff/deconstruction/08.png" width="520" height="144" alt="" /></p>
<p>This more or less nailed down that the entire implementation was broken, the values it was generating from the very beginning were incorrect which of course meant all the back references and so forth were incorrect. Using the binary output and a bit of paper I worked out what the values were <em>supposed</em> to be and started following the values through the algorithm. This part was absolutely essential to working out what was wrong with the implementation of the algorithm as it elucidated what each part&nbsp;did:</p>
<ol>
<li>The first section (which I had termed &#8220;FLAGS&#8221;) worked out whether a byte was a control byte and set a flags&nbsp;variable</li>
<li>The second section (which I had termed &#8220;AND1&#8221;) assumed it was reading a raw byte and simply wrote it to the output stream (and the&nbsp;buffer).</li>
<li>The third section (which I had termed &#8220;CONTROL&#8221;) read the two control bytes which formed a back reference and then read the appropriate data from the buffer and subsequently the&nbsp;output.</li>
</ol>
<p>From my output it was apparent the meat of the algorithm, reading the raw data, wasn&#8217;t being done. Then, in that moment of lucid elation, I realised exactly what was going&nbsp;wrong.</p>
<p>PHP was grabbing a byte from the input file as a string, and being a loosely-typed language meant that when it came to doing bitwise operations, the underlying type was incredibly important. I&#8217;m more than willing to admit that this state of affairs was my own damn fault for prototyping this in a language that wasn&#8217;t built for algorithms and bit level operations and had I done this in a strongly-typed language, everything would have been dandy. Of course, had I simply dumped the implementation I found into a C file and compiled away, I wouldn&#8217;t really understand what was going on, so my retardedness didn&#8217;t go to&nbsp;waste.</p>
<p>Long story short, forcing the read bytes into an integer using the ord() function (and intval() just to make sure) solved the issue and the file I was working on transformed before my&nbsp;eyes.</p>
<p><img src="/stuff/deconstruction/09.png" width="520" height="144" alt="" /></p>
<p><img src="/stuff/deconstruction/10.png" width="520" height="150" alt="" /></p>
<p>Almost.</p>
<p>Turns out what &#8220;sage&#8221; had said in my comments on the original version of my unpacker was slightly wrong, the sliding window wasn&#8217;t 256 bytes (0x100) but the standard LZSS implementation window size of 4096 bytes which means that nothing really needed to be changed from the standard C implementation of the algorithm. As a proof of&nbsp;concept:</p>
<p><a href="/stuff/deconstruction/chara_init_third.xml.lzss">Sample LZSS compressed file</a>, <a href="/stuff/deconstruction/chara_init_third.xml">Sample uncompressed&nbsp;file</a></p>
<p>So I now present version 1.1 of the deconstructor script which is released under the same <a href="http://creativecommons.org/licenses/by/2.0/">Creative Commons Attribution 2.0 Generic&nbsp;license</a>.</p>
<p><a href="/stuff/deconstruction/deconstructor1.1.zip">deconstructor1.1.zip&nbsp;(1.4KB)</a></p>
<p>The usage is exactly the same:<br />
<code>php deconstructor.php data1.dat&nbsp;output\</code></p>
<p>The only difference will be the output spat out by the script which will tell you when a file has been decompressed and whether it succeeded or failed (done by checking the canonical size in the .dat file versus the output&nbsp;size).</p>
<p><strong>To-do</strong><br />
At the moment the script outputs a file to a temporary name and then operates on that file. This isn&#8217;t optimal but I was having trouble getting my implementation to work in-stream, probably due to fatigue. I may or may not fix that for the PHP version as the next step is to drop the entire deconstructor into a C or C++ file and do a native compile so you don&#8217;t have to mess around with PHP and I feel like I&#8217;ve developed something in a big-boys&#8217; language. If I get the time and the inclination I may do that over the&nbsp;weekend.</p>
<p>As well as the unpacker, I get the feeling that the <a href="http://blog.seiha.org/">friend</a> who this is a favour for will require a repacker which will obviously mean doing the LZSS algorithm in reverse and also bundling everything into a .dat file. Should be an intriguing challenge to see if I&#8217;ve learned anything from this little&nbsp;endeavour.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chaostangent.com/archives/372/feed</wfw:commentRss>
		</item>
		<item>
		<title>Deconstruction</title>
		<link>http://blog.chaostangent.com/archives/370</link>
		<comments>http://blog.chaostangent.com/archives/370#comments</comments>
		<pubDate>Mon, 07 Jan 2008 20:41:36 +0000</pubDate>
		<dc:creator>ChaosTangent</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://blog.chaostangent.com/archives/370</guid>
		<description><![CDATA[Out of curiosity and a favour to someone, I decided to take a look at some random .dat files that were ripe for the translating; what ensued was a morning of head scratching, hex scrying and using some of the lesser used PHP functions.
Sample File 1, Sample File 2, Sample File&#160;3
All screenshots taken from data1.data, [...]]]></description>
			<content:encoded><![CDATA[<p>Out of curiosity and a <a href="http://blog.seiha.org/">favour to someone</a>, I decided to take a look at some random .dat files that were ripe for the translating; what ensued was a morning of head scratching, hex scrying and using some of the lesser used PHP functions.<span id="more-370"></span></p>
<p><a href="http://blog.chaostangent.com/stuff/deconstruction/data1.dat">Sample File 1</a>, <a href="http://blog.chaostangent.com/stuff/deconstruction/data3.dat">Sample File 2</a>, <a href="http://blog.chaostangent.com/stuff/deconstruction/data5.dat">Sample File&nbsp;3</a></p>
<p><em>All screenshots taken from data1.data, sample file 1 and the window is resized for the most appropriate screenshot rather than general&nbsp;workability.</em></p>
<p>First thing I did was to crank open the lovely XVI32 hex editor and have a look at the sample files provided, their .dat extension more or less indicated they were a proprietary format and were unlikely to relinquish their secrets easily. What was known was that the files contained a header portion, a bundle of XML files in a contiguous stream and a lot of junk data. The XML files could be seen and their encoding was stated as SHIFT JIS and, after cursing its existence, I attributed the junk data to that which seemed like a good place to&nbsp;start.</p>
<p><img src="/stuff/deconstruction/01.png" width="320" height="20" alt="" /><br />
The first eight bytes seemed to be a file signature, but <a href="http://www.google.com/search?q=yanepk">Google</a> <a href="http://www.google.com/search?q=yanepkdx">searches</a> for all or parts of the signature were fruitless which meant it was time to pick things&nbsp;apart.</p>
<p><img src="/stuff/deconstruction/02.png" width="320" height="20" alt="" /><br />
The next four bytes were different for each file and at first I thought it was part of the block format that made up the header part of the file but the section repetition for the header block didn&#8217;t match up so after converting it to a variety of different number formats (I&#8217;m no hex wizard and I originally thought it was only a two byte short rather than a four byte integer or long) and assumed it was an unisgned long (32 bits) in Little Endian&nbsp;order.</p>
<p><img src="/stuff/deconstruction/03.png" width="443" height="438" alt="" /><br />
The next section pattern repeated a number of times until the file obviously started with the embedded XML files. After a bit of byte counting and &#8220;duh&#8221; moments, the general format of the section&nbsp;is:</p>
<p><code>256 bytes - file path and name<br />
4 bytes - unsigned long<br />
4 bytes - unsigned long<br />
4 bytes - unsigned&nbsp;long</code></p>
<p>At a total of 268 bytes for each block, this layout repeats for precisely the number of times specified by the very first unsigned long (after the file signature). So the entire header block consists&nbsp;of:</p>
<p><code>8 bytes - signature "yanepkDx"<br />
4 bytes - number of header entries<br />
(number of header entries * 268 bytes) - header&nbsp;entries</code></p>
<p>This was all well and good but didn&#8217;t really illuminate exactly what the three numbers were. After pulling out all the entries, a few things became&nbsp;clear:</p>
<ul>
<li>The first number in each block increases for each successive&nbsp;block</li>
<li>The second number was always larger than or equal to the third&nbsp;number</li>
<li>The first number plus the third number always equalled the first number of the block immediately after the current&nbsp;one</li>
</ul>
<p>So without resorting to rocket science the first number is the absolute byte offset of the filename, the second number was a bit of a mystery, the third number is the length in bytes of the data in the file. After pushing this info through a script it became obvious this was the defacto format of the file, no complex tree structures or other nasties were awaiting; the XML files were pulled out without problem and within a few minutes their original file structure was&nbsp;recreated.</p>
<p>All done right? Wrong. My initial thought that the XML files were SHIFT JIS encoded was indeed correct, however it didn&#8217;t solve the junk that proliferated <strong>some</strong> the&nbsp;files.</p>
<p><a href="/stuff/deconstruction/arive_boss_plane.xml" rel="text/xml">Sample un-junked file</a>, <a href="/stuff/deconstruction/chara_growth.xml" rel="text/xml">Sample junked file</a><br />
<img src="/stuff/deconstruction/04.png" width="520" height="150" alt="" /><br />
Trying to shift the format into different encodings using known functions only seemed to jumble the junk around rather than get rid of it. It now became apparent that the data was more than likely compressed or otherwise encoded which illuminated what the mysterious second number was in each of the header blocks. The third number represented the packed size of the data, the second represented the unpacked size; this was obvious as the smaller, un-junked files had the same values for each, usually less than 150&nbsp;bytes.</p>
<p><img src="/stuff/deconstruction/05.png" width="520" height="144" alt="" /><br />
Running both the individual files and the larger .dat file through various decompressors proved less than useful as most of the time the file became so garbled that it sent a few hundred bell tones to my computer speaker making it sound like it was having a seizure. I tried various versions and functions of the gzip/zlib library, bzip2, LHA (of which I knew the Japanese were particularly fond of) and of course good old fashioned zip. It stood to reason that the compression wasn&#8217;t going to be processor intensive (very few game compression schemes are) which more or less ruled out predictive text algorithms (PPM et al) as well as ACE and 7z formats. The files also seemed to lack any form of dictionary entries as for each file the XML declaration was always in tact which meant that the compression seemed to start an arbitrary length into the file (which would explain why the smaller files were&nbsp;untouched).</p>
<p>This is unfortunately as far as I got after a mornings work and spent a decent amount of time attempting to track down information. The game the files comes from is <a href="http://blog.seiha.org/?p=92">Battle Moon Wars Act 3</a> and it seems that they use TYPE MOON characters, other games of which have been successfully translated which may be one avenue to investigate. The <a href="http://en.wikipedia.org/wiki/Battle_Moon_Wars">developers of the game are &#8220;Werk&#8221;</a> and if any of their other games (either in the series or otherwise) had been pulled apart, it may give some indication as to where to go forward. There does seem to be information in someone&#8217;s brain as not only was an <a href="/stuff/deconstruction/data1unpacked.dat">&#8220;unpacked&#8221; version of data1.dat unearthed</a>, but <a href="http://forums.visualnews.net/showthread.php?t=11925">forum</a> <a href="http://nrvnqsr.proboards20.com/index.cgi?action=display&#038;board=doujin&#038;thread=1124787854&#038;page=3">posts</a> indicate that work had already begun (if not already aborted) on the technical side of&nbsp;things.</p>
<p>For today at least I&#8217;m done with attempting to reverse-engineer arbitrary files and perhaps after sleeping on it some bright idea will be revealed to me that daylight failed to illuminate. For now there is the command line PHP script I quickly prototyped to deconstruct the .dat files (released under the <a href="http://creativecommons.org/licenses/by/2.0/">Creative Commons Attribution 2.0 Generic License</a>) and the promise of further work in the&nbsp;future:</p>
<p><a href="/stuff/deconstruction/deconstructor.zip">deconstructor.zip&nbsp;(1KB)</a></p>
<p>Things should be self explanatory from the file; get a command line PHP interpreter set up and run &#8220;deconstructor.php&#8221; with the name of the file to tear apart and optionally an output folder e.g.<br />
<code>php deconstructor.php data1.dat&nbsp;output\</code></p>
<p>This is an open call for anyone who wants to help with the effort to scry the encoding/compression of the XML files whether you already know or want to take a stab at it, you are more than&nbsp;welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chaostangent.com/archives/370/feed</wfw:commentRss>
		</item>
		<item>
		<title>Upcoming and updates</title>
		<link>http://blog.chaostangent.com/archives/324</link>
		<comments>http://blog.chaostangent.com/archives/324#comments</comments>
		<pubDate>Mon, 15 Oct 2007 09:17:25 +0000</pubDate>
		<dc:creator>ChaosTangent</dc:creator>
		
		<category><![CDATA[Anime]]></category>

		<category><![CDATA[Code]]></category>

		<category><![CDATA[Stuff]]></category>

		<guid isPermaLink="false">http://blog.chaostangent.com/archives/324</guid>
		<description><![CDATA[I feel I&#8217;ve stuck to my a &#8220;post a day&#8221; routine for a while, unfortunately that has dried up due to running out anime series that I was watching; the only one remaining is Umisho which I&#8217;m awaiting the last episode of before writing a review. I did consider reviews of the two movies I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>I feel I&#8217;ve stuck to my a &#8220;post a day&#8221; routine for a while, unfortunately that has dried up due to running out anime series that I was watching; the only one remaining is Umisho which I&#8217;m awaiting the last episode of before writing a review. I did consider reviews of the two movies I&#8217;ve recently watching: Paprika and 5cm a Second which would have been done Saturday and Sunday had I not been drained on Saturday after completing the seminal Half Life 2: Episode 2 and uninspired on Sunday after watching the cripplingly mediocre Resident Evil: Extinction.<span id="more-324"></span></p>
<p>I have tweaked a couple of items in the backend. The first is the <abbr title="Really Simple Syndication">RSS</abbr> category links; as I have been known to post about whatever I feel like, taxonomy feeds are a good way of ignoring the general rambling I tend to do while squiffy or bored. The problem was that my permalink structure on Wordpress <a href="http://trac.wordpress.org/ticket/4550">caused some problems</a> with the automatic feed links which made me to hold off implementing them, <a href="http://trac.wordpress.org/changeset/6100">the solution</a> was simple and worked a charm. The other change was the aggregating of very old entries in the archive list; this was something that I had done when I programmed my own abortive attempt at a blog system but wasn&#8217;t something Wordpress did natively. I did think of putting it into a plugin but I considered that overkill as I was already monkeying around with the guts of Wordpress. I&#8217;d made the decision to stick on 2.2.3 and not upgrade to 2.3 until I was sure all the bugs in 2.3 had been rooted out; taking a quick glance at the <a href="http://trac.wordpress.org/report/3">trac listing</a> for open tickets for 2.3.1 and beyond reveals very little that would stop me however I&#8217;d been burned by the near weekly 2.2.x updates of before. The reason there are very old entries is due to the merging of my old <a href="http://www.deadjournal.com">DeadJournal</a> entries with this blog; primarily for posterity but (as I mention in the <a href="http://blog.chaostangent.com/archives/category/yeoldedeadjournal">category description</a>, another new addition) also to remind me of what my writing <em>used</em> to be&nbsp;like.</p>
<p>As far as future entries is concerned, I&#8217;ll finish my draft review of Paprika and start one on 5cm a Second which should provide enough time to start a new feature: the three episode taste test. While this anime season doesn&#8217;t look nearly as spectacular as the last one, it has a number of shows which may hold interest. Even with most series now on two episodes it&#8217;s pretty obvious which ones I&#8217;m going to stick with, however three is enough to get over the budget burn of the first and languid story/character drive of the second. At some point I&#8217;m also hoping to finish off my <a href="http://blog.chaostangent.com/archives/category/cuba-2k7">Cuba write up</a>, although I need to retroactively populate previous entries with photos (already done for days <a href="http://blog.chaostangent.com/archives/45">one</a>, <a href="http://blog.chaostangent.com/archives/46">two</a>, <a href="http://blog.chaostangent.com/archives/54">three</a> and <a href="http://blog.chaostangent.com/archives/55">four</a>) and continue writing; each entry can take close to two hours to write and check which is basically the entire&nbsp;evening.</p>
<p>There are other things on the horizon as well with <a href="http://chaostangent.com/">chaostangent.com</a> in general. I recently attended the <a href="http://www.futureofwebapps.com/">Future of Web Apps</a> conference in London which was a bit of a mixed bag but generally awesome. Apart from the highly anti-social laptop usage and people continually trying to sell me things, most all of the developer talks I attended were excellent, the stand out one was by <a href="http://ejohn.org/">John Resig</a> of the <a href="http://www.mozilla.org/">Mozilla Corporation</a> which was both very informative and interesting. One thing the conference did do was give me some ideas for a project which has been sitting on the back burner for a while now. I&#8217;ve also been fiddling with Wordpress more and more having decided to stick with 2.2.3 for the time being so I can hopefully start integrating some of my ideas that I&#8217;d originally written off due to restrictions on the plugin&nbsp;architecture.</p>
<p>You can consider this post to be an interstitial between your (sometimes) regularly scheduled program, just to indicate that I&#8217;m not going to slip into another four week&nbsp;lull.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chaostangent.com/archives/324/feed</wfw:commentRss>
		</item>
		<item>
		<title>Silver Air</title>
		<link>http://blog.chaostangent.com/archives/52</link>
		<comments>http://blog.chaostangent.com/archives/52#comments</comments>
		<pubDate>Tue, 12 Jun 2007 21:09:55 +0000</pubDate>
		<dc:creator>ChaosTangent</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://blog.chaostangent.com/archives/52</guid>
		<description><![CDATA[With news that Adobe had recently released and renamed their &#8220;Apollo&#8221; project to &#8220;AIR&#8221;, I decided that it was perhaps time to discover exactly what Adobe was incubating. In terms of software and companies, there are none which I follow with any amount of zeal; if a piece of software delivers on promises of being [...]]]></description>
			<content:encoded><![CDATA[<p>With news that Adobe had recently released and renamed their &#8220;Apollo&#8221; project to &#8220;<a href="http://labs.adobe.com/technologies/air/">AIR</a>&#8221;, I decided that it was perhaps time to discover exactly what Adobe was incubating. In terms of software and companies, there are none which I follow with any amount of zeal; if a piece of software delivers on promises of being better than what I currently use then I have no qualms of switching. As such, I hadn&#8217;t followed Adobe very closely since their merger with Macromedia, and was only dimly aware of their release of&nbsp;CS3.</p>
<p>It surprised me then that AIR, the Adobe Integrated Runtime, seemed such a departure from what Adobe had released previously. Essentially it is a virtual machine that allows developers to use HTML, CSS and Javascript or Flash and Flex to develop &#8220;<a href="http://en.wikipedia.org/wiki/Rich_internet_applications">Rich Internet Applications</a>&#8221; for the desktop. That&#8217;s the corporate rhetoric, and it took me the best part of a morning to figure out exactly what AIR was trying to do and the rest of the day wondering <em>Why?</em><span id="more-52"></span></p>
<p>AIR is nothing more than a virtual machine in the same vein as the <a href="http://www.java.com/getjava">Java Virtual Machine</a>: it provides a central target for development and, in theory, will allow programs developed for it to run identically across all platforms which support it. In writing this an admirable goal (Java&#8217;s maligned &#8220;<a href="http://www.answers.com/topic/write-once-run-anywhere">Write once, debug anywhere</a>&#8221; idiom notwithstanding), it&#8217;s not until you dig deeper until you realise just how ridiculous AIR appears. Ignoring the jargon and the acronym soup, AIR allows HTML, Javascript and Flash developers to create applications for your desktop; the end result is a program which installs and runs from your computer instead of the vicious wilds of the internet. The question is raised really as to why I would <em>want</em> a Flash developer to go anywhere near my desktop: a pantheon reserved for people with some semblance of UI design which the vast majority (not all) of Flash developers&nbsp;lack.</p>
<p>Both the <a href="http://labs.adobe.com/wiki/index.php/AIR:Applications:Samples">sample</a> and <a href="http://labs.adobe.com/showcase/air/">showcase</a> applications are less than stellar, with such awe inspiring developments as an RSS reader! A desktop bookmarks organiser! An online storage solution! Or how about a good old fashioned map? What about some eternally useful desktop widgets? Testing them out I could only think: five years ago these would have been merely&nbsp;interesting.</p>
<p>&#8220;Rich Internet Application&#8221; seems like some kind of multilayered oxymoron, instantly conjuring up a swathe of gradients and pithy animations, married with protracted jargon like &#8220;mash-up&#8221; and &#8220;web 2.0&#8221; all the while robbing me of my standardised UI interactions. The web is only just taking its first formative steps into this field greater interactivity; we&#8217;re still testing this through a <em>browser</em>, not an <em>experiencer</em>. Until both the technologies and mindsets involved mature, we certainly don&#8217;t need to bring this frontier pursuit to the&nbsp;desktop.</p>
<p>Of course, the one good thing to come from this distasteful look at where Adobe is heading is the chance to engage with some of the other technologies both Adobe and others have developed. <a href="http://www.adobe.com/products/flex/">Flex</a> was next on the list and after once again cutting through the advertising spiel and marketing jargon, it turned out to be a set of UI components for Flash. Obviously Flex has come a long way from the Server/Client setup that the late <a href="http://www.macromedia.com/">Macromedia</a> used, and with a neat <a href="http://en.wikipedia.org/wiki/MXML">XML system</a>, Flex attempts to bring Flash to application developers. Once again I fail to see why this is a good thing. Apart from the speed of prototyping, I can think of no Application Developer worth their title who would seriously consider Flash over more capable, accessible and mature&nbsp;technologies.</p>
<p>So two for two, Adobe seem to be pushing technologies which when isolated, are advanced and progressive, but seem to focus more on the wow than the&nbsp;why.</p>
<p>This trail ultimately led to the Microsoft &#8220;Flash-killer&#8221;, <a href="http://silverlight.net/">Silverlight</a>. Essentially Silverlight is a combination of both Flex and Flash in a far more, if you&#8217;ll forgive the pun, <em>flexible</em> coating. Silverlight attempts to offer a genuine alternative to standard applications with a still evolving set of UI features but with the capacity for tried-and-tested programming languages to control them. Fundamentally, Silverlight suffers from the same downfalls as an AIR/Flash/HTML combo: accessibility and speed. However it seems that Microsoft has a far better grasp of their goal (a better form of online application) than&nbsp;Adobe.</p>
<p>Some of the <a href="http://arstechnica.com/news.ars/post/20070501-microsofts-flash-killer-steals-the-show-at-mix07.html">criticism levelled</a> at Microsoft over Silverlight include their lack of adherence to standards such as <a href="http://www.w3.org/Graphics/SVG/">SVG</a>. Ordinarily I would agree with this given the ongoing fiasco (to put it lightly) with Internet Explorer; however Silverlight and its competitors are still very much emerging technologies and it&#8217;s only the disparate parts which have been standardised rather than the entire package. Microsoft is a paid-up member of the <a href="http://www.w3.org">W3C</a> and could well have had a hand in those standards; but with increasingly vocal barbs being thrown about as to the practices of the W3C, it makes one wonder whether the standards are actually worth anything. Microsoft once said that just being a standard does not make it the best way to do things; this may seem like a spoiled brat reinventing the wheel (how&#8217;s that for metaphor mixing?) but in a small number of cases, this is very much true. You can be just as much hindered by those standards as empowered by&nbsp;them.</p>
<p>And then of course, at the very end of the rainbow was the recently released <a href="http://gears.google.com/">Google Gears</a> which claimed to be able to run web applications offline. Once again, I fail to see the point in this beyond mobile devices, but kudos to Google for making the implementation rhetoric free and refreshingly simple to&nbsp;explain.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chaostangent.com/archives/52/feed</wfw:commentRss>
		</item>
		<item>
		<title>PHP request routing</title>
		<link>http://blog.chaostangent.com/archives/47</link>
		<comments>http://blog.chaostangent.com/archives/47#comments</comments>
		<pubDate>Sat, 09 Jun 2007 17:13:00 +0000</pubDate>
		<dc:creator>ChaosTangent</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://blog.chaostangent.com/archives/47</guid>
		<description><![CDATA[Request routing in PHP is when you take away a degree of URL control from your web-server and hand it over to your PHP application. Ordinarily, an http URI points to a file or directory, whereby the file structure of your publicly accessible area dictates the URI&#8217;s which are available. When a web server can&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>Request routing in PHP is when you take away a degree of URL control from your web-server and hand it over to your PHP application. Ordinarily, an http URI points to a file or directory, whereby the file structure of your publicly accessible area dictates the URI&#8217;s which are available. When a web server can&#8217;t find a file, it throws a 404 and presents an error page. Request routing doesn&#8217;t remove the file structure URI&#8217;s (if properly configured) but supplements the available URI space with one which can be controlled through&nbsp;PHP.</p>
<p>The benefits for doing this versus the work taken to achieve it weigh heavily on how much you value &#8220;nice&#8221; URLs in terms of memorability and search engine snuggliness. Request routing allows for arbitrary URLs, however, it adds an overhead to nearly every page request made to your application and also means your PHP system now needs to deal with the 404 error messages that the web server would have otherwise transparently sent. Despite it&#8217;s downside and moderately complex methodology, for large or commercial systems, the ability to tightly control the URLs can greatly aid extensibility and maintainability.<span id="more-47"></span>The only <a href="http://www.phpaddiction.com/tags/axial/url-routing-with-php-part-one/">other</a> <a href="http://www.phpaddiction.com/tags/php/url-routing-with-php-part-two/">article</a> I&#8217;ve read on the subject seemed to get the idea but it lacked a lot of nice functionality. The system mentioned in those articles can easily be achieved with some Apache mod_rewrite trickery without the hassle of involving PHP. Indeed, the system proposed allows for only one type of URL and doesn&#8217;t touch on the 404 handling. An ideal way would be the ability to route arbitrary as well as tokenised URLs. If you&#8217;ve programmed with <a href="http://www.rubyonrails.org/">RoR</a> or <a href="http://www.cakephp.org/">Cake</a> or even (to a certain extent) the grand-daddy of frameworks, <a href="http://www.fusebox.org/">Fusebox</a>, then this is all probably sounding very&nbsp;familiar.</p>
<p>The way your PHP system is set up will ultimately define which tokens are available, but for most modern OO frameworks, they follow the MVC pattern which means for a request for most pages will be sent to a <strong>controller</strong> (or <strong>command</strong>) which will undertake an <strong>action</strong> on a particular set of arguments, usually a unique identifier,&nbsp;<strong>id</strong>.</p>
<p><strong>Getting the URL into&nbsp;PHP</strong></p>
<p>Before all that however, we need to get PHP to deal with requests which involves alleviating our webserver of the task. For a standard Apache, mod_rewrite and .htaccess&nbsp;system:</p>
<pre><code>RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</code></pre>
<p>If a request does not reference a file and does not reference a directory, pass it to index.php. The same task can probably be done on lighttpd using a either url.redirct or url.rewrite-repeat. IIS apparently doesn&#8217;t do request based conditions out-of-the box and an <a href="http://www.iismods.com/url-rewrite/index.htm">additional</a> <a href="http://www.isapirewrite.com/">ISAPI</a> <a href="http://www.qwerksoft.com/products/iisrewrite/">module</a> must apparently be&nbsp;used.</p>
<p>In this example the request is passed to index.php, this is because I use a single entry point for my applications (which has a number of benefits such as application hiding, strict error checking and so forth), however this can easily be redirected to any script within your system; and as products like <a href="http://www.wordpress.org">Wordpress</a> have shown, this approach works in a multi-file as well as single entry point&nbsp;system.</p>
<p>With the URL now handed over, the first task is to strip down the URL into only the part we&#8217;re interested in. This means peeling off any query strings or base-directories which may have come along for the ride. I&#8217;ve come to the conclusion that there is no compact way of achieving this, and even the code snippet below omits some of the thornier&nbsp;details:</p>
<pre><code>$parsedRequestURI = parse_url($_SERVER['REQUEST_URI']);
$requestURI = (!empty($parsedRequestURI['path'])) ? $parsedRequestURI['path'] : $_SERVER['REQUEST_URI'];
$baseDir = dirname($_SERVER['SCRIPT_NAME']);

if(substr($requestURI, 0, strlen($baseDir)) == $baseDir)
{
	$requestURI = substr($requestURI, strlen($baseDir));
}</code></pre>
<p>If you&#8217;re using Windows, you&#8217;ll need to either find a different method for calculating the base directory or make sure the path separators are a forward slash not a back slash (a simple str_replace would suffice). There isn&#8217;t anything particularly taxing in that snippet and at the end of it all, $requestURI will have the URI that we want to focus&nbsp;on.</p>
<p><strong>Our&nbsp;routes</strong></p>
<p>The concept of a route revolves around being passed a URL and matching it to a specific part of our system. If your router isn&#8217;t going to be part of an MVC framework, then the system matching may simply be a static template or another script to run. The only part of the following explanation which needs to be adapted is what is spit out at the end for your system (or &#8220;dispatcher&#8221;) to deal&nbsp;with.</p>
<p>In my case, the routes is simply an&nbsp;array:</p>
<pre><code>$routes = array('random' =&amp;gt; array('command' =&amp;gt; 'image', 'action' =&amp;gt; 'random'),
	':command/:action/:id' =&amp;gt; '', // standard routes
	':command/:action' =&amp;gt; '',
	':command' =&amp;gt; '');</code></pre>
<p>The array key is the URL we wish to match against (complete with tokens) and the value is an array of options we wish to supply our system. Obviously for a fully tokenised URL, we want the options to be constructed from the URL (which sounds cyclic but bear with it) hence why those values are the empty&nbsp;string.</p>
<p><strong>The&nbsp;crunch</strong></p>
<p>We now have both parts necessary to route: the URL supplied by the user (e.g. mycommand/youraction/hisid) and a list of routes we&#8217;re going to match against. Our router has to perform one vital function: match the route given the URL. The converse of this function, get a URL from a route, will only be used internally but allows the PHP system to completely decouple itself from the concept of URLs (should it choose to do&nbsp;so).</p>
<p>The routeFromURL() function will break down our URL (and subsequently our list of URLs to match against) into parts, then iteratively attempt to match those parts until a full match is found. To break down the URL into&nbsp;parts:</p>
<pre><code>$requestedRoute = array_merge(array_filter(explode('/', urldecode($url))));</code></pre>
<p>This decodes any errant URL identifiers, splits it on the forward slash, filters out any blank entries then finally renumbers the index using the array_merge function. The renumbering of the keys is important when matching like-for-like on&nbsp;URLs.</p>
<pre><code>foreach($this-&amp;gt;routes AS $route =&amp;gt; $options)
{
if(!empty($route))
{
$routeParts = array_merge(array_filter(explode('/', $route)));
$matched = false; // this specifies whether we've matched so far, if not, move on to the next route

// routes do not have to match absolutely, only the first parts of the route can match
if(count($routeParts) &amp;lt;= count($requestedRoute))
{
	for($i = 0; $i &amp;lt; count($routeParts); $i++)
	{
		// specialised matcher parts of the route
		if(substr($routeParts[$i], 0, 1) == ':')
		{
			$special = substr($routeParts[$i], 1);
			switch($special)
			{
				case 'command':
				case 'action':
				case 'id':
				case 'extra':
					$options[$special] = strtolower($requestedRoute[$i]);
					break;
			}
			$matched = true;
		}
		elseif($routeParts[$i] == $requestedRoute[$i])
		{
			$matched = true;
		}
		else
		{
			$matched = false;
			break;
		}
	}

	if($matched)
	{
		return $options;
	}
}</code></pre>
<p>The iterations may be a little hard to follow in that snippet, essentially: for each route in our route table &gt; for each part of that route &gt; if it matches the equivalent part of the supplied URL &gt; go on to the next part ELSE break and go on to the next route in our&nbsp;table.</p>
<p>At the end of this we&#8217;ll have an $options array which details our command (controller), action, id and anything extra. The comments in the code indicate that this isn&#8217;t a strict matcher and that the specificality of a match matters. All this means is that passed a URL of &#8220;this/is/a/route&#8221;, if the route table contains a match for &#8220;this/is&#8221;, this will match even if there is <em>another </em>route in the table after it (&#8220;this/is/a&#8221;) which is more specific. The easiest way to deal with this is to either strictly match the number of parts in the passed URL to the matched&nbsp;URL:</p>
<pre><code>if(count($routeParts) &amp;lt;= count($requestedRoute))</code></pre>
<p>becomes:</p>
<pre><code>if(count($routeParts) == count($requestedRoute))</code></pre>
<p>Or to simply order your route table with your most specific routes first. I&#8217;ve not had a problem with this behaviour, however your circumstances may dictate exact matching as a&nbsp;requirement.</p>
<p>The token aspect of this system allows for very flexible routes to be constructed which may be wholly tokenised (such as &#8220;:action/:id:/:command&#8221;), a hybrid (&#8220;image/show/:id&#8221;) or arbitrary (&#8220;show/me/the/money&#8221;). The number of tokens you can use is entirely customisable, the use of a switch statement prevents a large if block but for a large number of possible tokens, you may want to examine a better look-up system (perhaps store the tokens in an&nbsp;array).</p>
<p>Obviously in a production environment you&#8217;ll be doing error checking either within your router or in your dispatcher to make sure what is passed is sane. The dispatcher (where the $options array means something in the context of your system) is the place where 404 errors will be triggered as it&#8217;s only at that point when it becomes apparent if your system can handle the request or not. To trigger a 404 you need to send headers along with your PHP&nbsp;script:</p>
<pre><code>header('HTTP/1.0 404 Not Found');</code></pre>
<p>The content passed to the browser should be valid page describing the error, obviously this is the place where you can provide the ability for customisable 404 pages or even initiate a search to try and discern what the user was looking&nbsp;for.</p>
<p><strong>getURLFromRoute</strong></p>
<p>As mentioned before, the inverse of the function is to provide the router with a route and get back a URL. This allows your system to redirect pages or provide contextual URLs within pages. On the surface it sounds like a reasonably simple task, of course in practice it proves to be much more&nbsp;vexing.</p>
<p>Essentially we wish to match the route arrays to one another, and the route array which matches will then be our URL. Obviously the tokens muddle this slightly however the main iteration is the same as&nbsp;getRouteFromURL:</p>
<pre><code>$specials = array('command','action','id','extra');
foreach($this-&gt;routes AS $route =&gt; $options)
{
	// if there are specials within the route, substitute them into the options array
	if(strpos($route, ':') !== FALSE)
	{
		$options = (!is_array($options)) ? array() : $options;
		foreach($specials AS $special)
		{
			if(strpos($route, ":{$special}") !== FALSE)
			{
				$options[$special] = (array_key_exists($special, $requestedRoute)) ? $requestedRoute[$special] : '';
			}
		}
	}

	if($requestedRoute == $options)
	{
		$replace = array_values($options);
		$search = array_map(create_function('$a', 'return ":{$a}";'), array_keys($options));

		$url = str_replace($search, $replace, $route);
		echo $url;
		return $url;
	}
}</code></pre>
<p>The function works as such: iterate through the route table &gt; if there are specials for a route, take the values from the requested route into our route array &gt; if our constructed route array matches the requested route array, replace any tokens and return the URL ELSE cycle to the next route in the&nbsp;table.</p>
<p>Perhaps the only confounding part of that array is the presence of a lambda-style function which simple prefixes the $specials array values with a colon for easier use of&nbsp;str_replace.</p>
<p><strong>Conclusion</strong></p>
<p>As you can see, there&#8217;s a lot of code and lot of logic and pit-falls that go into provided a tokenised routing system, and this hasn&#8217;t even touched on the dispatching part of it (which can open a new can of worms with tasks like class Reflection and file existence&nbsp;checking).</p>
<p>Ordinarily something like this would go against my usual PHP coding methodology: less is better. I am however convinced that in the correct circumstances, the benefits can far outweigh the 0.03 seconds this adds to your execution time. Perhaps I&#8217;m becoming softer in my old&nbsp;age&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chaostangent.com/archives/47/feed</wfw:commentRss>
		</item>
		<item>
		<title>It&#8217;s not a joke if you can choke on the thought of it</title>
		<link>http://blog.chaostangent.com/archives/40</link>
		<comments>http://blog.chaostangent.com/archives/40#comments</comments>
		<pubDate>Tue, 10 Apr 2007 21:25:16 +0000</pubDate>
		<dc:creator>ChaosTangent</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://blog.chaostangent.com/archives/40</guid>
		<description><![CDATA[I was trying to explain to a friend some of the gripes that I had with PHP&#8217;s gung-ho approach to object-orientation and launched into an extended description of my biggest bugbear: class inheritance. At the time I was slightly less than sober (as is so often the case with important discussions and me) so in [...]]]></description>
			<content:encoded><![CDATA[<p>I was trying to explain to a friend some of the gripes that I had with PHP&#8217;s gung-ho approach to object-orientation and launched into an extended description of my biggest bugbear: class inheritance. At the time I was slightly less than sober (as is so often the case with important discussions and me) so in the aftermath I began questioning what I had said, especially as it was a while ago I last tried to push past the&nbsp;&#8220;bug&#8221;.</p>
<p>First, an explanation. If you have an abstract class you can define abstract methods which must be beefed out in any non-abstract subclasses but you can also define some standard methods (with bodies) which any subclasses will happily take with them. The easiest way to explain this is to just dump the code and go from there:<span id="more-40"></span></p>
<p><em>Edit: This has essentially been &#8220;solved&#8221; now and boiled down to me being a retard and not reading the documentation. The behaviour explained below is a bug with get_class when it&#8217;s called from an object&#8217;s method, to get the &#8220;real&#8221; class name of an instance, simply pass $this to the get_class&nbsp;method.</em></p>
<pre><code>abstract class AbstractClass
{
	public function methodOne()
	{
		echo get_class()."\n";
	}
}

class ConcreteClass extends AbstractClass
{
	public function methodTwo()
	{
		echo get_class()."\n";
	}
}

$conc = new ConcreteClass();
$conc-&gt;methodOne();
$conc-&gt;methodTwo();</code></pre>
<p>This is as simple as it gets, one abstract class, one &#8220;Concrete&#8221; class (because Concrete is cool kids) and two method invocations. Now those methods <em>should</em> output &#8220;ConcreteClass&#8221; twice, however, methodOne() outputs &#8220;AbstractClass&#8221; and methodTwo() outputs &#8220;ConcreteClass&#8221;. Looking at the <a href="http://uk.php.net/manual/en/function.get-class.php">definition for &#8220;get_class&#8221;</a> reveals that what it returns is technically true ($conc is indeed an instance of AbstractClass) but this functionality is the opposite side of&nbsp;useful.</p>
<p>When I tried this back at the beginning of 2006, I cursed and chalked it up to PHP being PHP and got on with things. This simple case isn&#8217;t earth-shattering and can be worked around by putting a static function in AbstractClass which accepts an object and gets the class from that but that sort of roundabout route mangles an otherwise elegant&nbsp;solution.</p>
<p>It&#8217;s not hard to imagine a situation for this: an abstract &#8220;Model&#8221; class with methods for automagically building up SQL functions (pulling the name of the instantiating class as the name of the database table). No longer can you call $myModel-&gt;create() as the get_class() function would only return &#8220;Model&#8221; as the class rather than &#8220;MyModel&#8221;. The &#8220;solution&#8221; to this is, as mentioned above, to use static methods in Model, so $myModel-&gt;create() becomes Model::create($myModel) which is syntactically more convoluted and the static-charge of the method could easily cause problems further down the&nbsp;line.</p>
<p>When explaining this, I began to wonder whether I had gone space-crazy and perhaps my original implementation of this suffered from being too complex. Or perhaps it was indeed a bug and had been fixed. Neither were true. But then I wondered whether I was assuming something of PHP which perhaps didn&#8217;t hold true in other languages, perhaps I was imagining the functionality I&nbsp;wanted&#8230;</p>
<p><strong>Ruby</strong>:</p>
<pre><code>class AbstractClass
	def method_one
		puts self.class.to_s
	end
end

class ConcreteClass &gt; AbstractClass
	def method_two
		puts self.class.to_s
	end
end

concrete = ConcreteClass.new
concrete.method_one
concrete.method_two</code></pre>
<p><strong>Java</strong>:</p>
<pre><code>public abstract class AbstractClass
{
	public void methodOne()
	{
		System.out.println(this.getClass().getName());
	}
}

public class ConcreteClass extends AbstractClass
{
	public void methodTwo()
	{
		System.out.println(this.getClass().getName());
	}
}

public class Bimble
{
	public static void main(String[] args)
	{
		ConcreteClass conc = new ConcreteClass();
		conc.methodOne();
		conc.methodTwo();
	}
}</code></pre>
<p>Both of these dirt-simple examples print out &#8220;ConcreteClass&#8221; twice and prove I wasn&#8217;t going crazy (at least not degenerating any further than is normal). I don&#8217;t really have much to say about the situation, PHP has always been a bastard child of OO and scripting but it&#8217;s foibles like these which build-up from irksome to&nbsp;tiresome.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chaostangent.com/archives/40/feed</wfw:commentRss>
		</item>
		<item>
		<title>As a sign on my skin</title>
		<link>http://blog.chaostangent.com/archives/39</link>
		<comments>http://blog.chaostangent.com/archives/39#comments</comments>
		<pubDate>Sat, 24 Mar 2007 21:31:39 +0000</pubDate>
		<dc:creator>ChaosTangent</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://blog.chaostangent.com/archives/39</guid>
		<description><![CDATA[I have met my nemesis; it&#8217;s name is character encoding. The idea that when programming, the physical memory size of a string does not necessarily equate to how many characters (or &#8220;glyphs&#8221; to use the vernacular) are within that text&#160;string.
The antagonist in this comedy of errors is not ignorance of the situation, but knowledge of [...]]]></description>
			<content:encoded><![CDATA[<p>I have met my nemesis; it&#8217;s name is character encoding. The idea that when programming, the physical memory size of a string does not necessarily equate to how many characters (or &#8220;glyphs&#8221; to use the vernacular) are within that text&nbsp;string.</p>
<p>The antagonist in this comedy of errors is not ignorance of the situation, but knowledge of the ineptitude surrounding it. In theory, Unicode is the great equaliser, the One Ring and so forth. In practice however things are different. For web projects I boiled down the dilemma into four places where Things Can Go Wrong. The first is the web-page markup itself, nestled cosily in the &lt;head&gt; tag is the content-type, oft forgotten and left to Dreamweaver to assign this is the encoding that is passed to your programming language of choice which is the second choke point. If all you do is pass the string to your database you may just be able to pull off the perfect murder, if however you wish to do any kind of modification to the string, then your programming language needs to know how to deal with the character encoding or how to convert to an encoding it can use. Once you spent enough time with the jesters, it&#8217;s time to pass things over to your database which, just to be pedantic, has two points of failure. The first is the connection encoding whereby the database tries to convert whatever it <strong>stores</strong> the data in to a suitable encoding for it&#8217;s client; and then there&#8217;s the minor issue of how your string is stored within the&nbsp;database.</p>
<p>Projects that don&#8217;t need to worry about languages other than English can well ignore character sets entirely and pretend that everything is buttercups and puppies in the world of ASCII. Move even a little though, even to Roman-based alphabets like French or Swedish and things break down. The ideal, the blue-sky pie would be UTF-8 from start to finish and back again, but this article wouldn&#8217;t exist if it were that&nbsp;simple.</p>
<p>The reality of the situation is that browsers are like panes of glass, you fit them correctly and you don&#8217;t need to worry about them. As long as you&#8217;re not using a database of antiquity then character encoding within the database is solid, the &#8220;major&#8221; databases have you covered. It&#8217;s a shame then that it all breaks down with the programming language/environment especially the scripting&nbsp;languages.</p>
<p>I&#8217;ve been informed that Perl, the most distinguished of scripting languages (what other language allows a power-user to be called a &#8220;monk&#8221;?), has everything neatly arranged and ready for surgery. Python shares a similar preparedness. It&#8217;s unfortunate then that PHP (the greatest scripting language evar!) and Ruby (the new greatest scripting language evar!) are so arse-backward despite their&nbsp;popularity.</p>
<p>PHP I don&#8217;t necessarily blame for it&#8217;s inadequacy; it&#8217;s something that I&#8217;ve come to expect from the Quasimodo bell-ringing approach it takes: volume over grace. Sure you have the multi-byte string module which is not included by default, or the iconv conversion module again not included by default, or the perpetually in development PHP 6. These are just plasters over a gaping tumour that is the lack of built-in character encoding&nbsp;support.</p>
<p>Ruby on the other hand, I held so much hope for. The fervent few claimed it would do so much for so little investment and yet where the lack of character encoding support grew from stagnation and sloth-like speed with PHP, Ruby has just omitted to deal with it entirely. As a language it is actively developed and yet once again character encoding support is resigned to labyrinthian workarounds and obscure modules (or &#8220;mixins&#8221; as I&#8217;ve been commanded to call&nbsp;them).</p>
<p>Were I a petty man I would blame the programmers of yore for their lack of foresight, but when ASCII was developed most computers had trouble rendering more colours than I have teeth so forgiveness I give retrospectively. What I can&#8217;t fathom is why two widely (read: cosmopolitan) used languages would be so flaccid when it comes to &#8220;simple&#8221; text. It&#8217;s not that I can read Chinese, or Japanese, or Korean, or Swedish or even French, but it irks me that were I wont to, so many obstacles stand in the&nbsp;way.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chaostangent.com/archives/39/feed</wfw:commentRss>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.416 seconds -->
