The Slow Lane

A blog about autocrossing, some geeky stuff & Philadelphia.
Mazdaspeed Motorsports

Earth Hour is a failure.

This was too funny not to repost. Apparently yesterday everyone was supposed to turn off the lights for one hour as a sign of solidarity for climate change. Anthony Watts reports that out in California Earth Hour did not make a dent in the electrical demand load reported by CA-ISO, California’s eletrical system regulator. In that blog post is a quote from an outspoken global warming skeptic Ross McKitrick.

I don’t want to go back to nature. Haiti just went back to nature. For humans, living in “Nature” meant a short life span marked by violence, disease and ignorance.
…through the use of pollution control technology and advanced engineering, our air quality has dramatically improved since the 1960s despite the expansion of industry and the power supply. If, after all this, we are going to take the view that the remaining air emissions outweigh all the benefits of electricity, and that we ought to be shamed into sitting in darkness for an hour, like naughty children
…then we are setting up unspoiled nature as an absolute, transcendent ideal that obliterates all other ethical and humane obligations. No thanks. I like visiting nature but I don’t want to live there, and I refuse to accept the idea that civilization is something to be ashamed of.

This sums up a lot of how I feel about the global warming/carbon issue. McKitrick said it a bit more artfully than I would have been able to. Original credit for posting this quote and linking to McKitrick’s reaction goes to the No Frakking Consensus Blog.

We’re in!

Well all my complaining in my last post may be taken to be null and void considering what I’m about to say next. But I stand by my position. The first time homebuyer tax credit is/was creating artificial spikes in home prices. All you have to do is look at how home sales and home prices fell in December. But now we are part of the club. We are homeowners! And this is the main reason for my lack of posting for the last three months.

The homebuying experience was interesting and stressful and we learned some things to look out for next time. But we have closed! We are 75% moved. And lots of repairs and painting is under way. Hopefully I’ll be able to make smaller posts along the way as we fix some of the issues with the house and mold it into our vision. Now that I finally installed the Wordpress2 iPhone app that actually works with me self-hosted blog that will be easier!

More homebuyer tax credits? No thanks!

You must be thinking “is this guy crazy?” An insentive for buying a house, who doesn’t want that? Well I don’t. And up until a week ago I was trying to take advantage of the current first time homebuyer tax credit. Key word there is trying. You see my wife and I have discovered there is quite a lot of competition out there for homes in the “first time buyer price range.” And competition drives up prices. People are clamoring to take advantage of the credit. You would think that the credit would help. But in the end it hurts.

My wife and I aren’t the only ones to think so. There are other articles that also feel this is bad. I will just tell our personal story. That all starts with a trip to the lawyers office. Our apartment lease doesn’t end until April 30th and if we closed on November 30th, the last day of the credit, we’d be on the hook for 5 months of rent. So half of the tax credit gone, straight away. The lawyer cut a deal with the apartment complex and we were off shopping. But we weren’t the only ones. Twice we ran into situations where agents had schedules showings for the same or overlapping times. Both times there wasn’t one other potential buyer there, but two! And both homes stayed on the market less than a week. We live in Pennsylvania, but the same thing is happening in Santa Clarita, CA.

We ended up making three offers in the span of two months. And we lost out on two of those offers. Both were at asking price! We had been watching lots of HGTV and none of those people payed asking! In one case there was an escalation clause involved! My agent was beside himself both times. So you may be asking, what happened to the third? We were under contract on that third house. The home inspection revealed a few big ticket items that needed to be addressed. And the homeowner didn’t want to give us any money towards repair. This homeowner was quite “hostile” during negotiations. But I also got a sense that they felt we could suck it up since we were getting free government money. Well it isn’t free if we have to use it to rebuild the falling garage we paid for.

This quote from the Calculated Risk Blog sums up the market situation nicely.

This level of first-time buyers is completely unsustainable – even if another tax credit is enacted. There was significant pent up demand from potential first-time buyers who were priced out of the market in 2004-2006, and then were afraid to buy as prices fell. But demand from these buyers will wane.

We were part of that pent up demand. Before the market “collapsed” we wondered if we would ever be able to afford a home. Well even after the collapse we still can’t apparently.

Here are some other points to ponder. Others have pointed out that a credit such as this, especially if it is extended to everyone, will raise the price of homes by that amount. And that is true. But who does that benefit? Not the buyer. If that $8,000 credit goes towards your down payment, and you put 10% down, that allows you to buy $80,000 more house. Although you were able to afford more house you have a larger mortgage. You have to pay some of that money back, compounded with interest. Your mortgage company likes that. Yeah those same guys that marketed their “toxic”, bound to fail mortgages to others, that got us into this whole mess. And we will all have to pay for this program via taxes. Meanwhile the seller just walked away with $80,000 more than they would have been able to get otherwise. Let’s not forget to mention the people trying to defraud the system. So write or call your senators and tell them no.

Performance of static vs. instanciated method calls

As you saw in my past post I am working on filtering user input into my PHP application. I don’t want to get to much into the boring details because I started to write the post explaining all the little details and I could see it getting very long and drawn out and unfocused. But I was experimenting with calling the filter functions as static methods of a class. Then I thought about making objects of the class and calling the method from the instantiated class. I wanted to know if there was a performance difference between the two so I created a test. And there is.

For the test I am using my Filter_UTF8 class, discussed a little in my last blog post. I am calling the validate method. This is not a “hello world” type of test. The method does some heavy lifting and/or calculating. For all the tests I would call the method 10,000 times, to validate a ~1,200 kB text file. The same file would be validated over and over again.

The first test was to use call_user_func_array to call the method. This took 10 to 11 seconds to run.

1
2
3
4
5
6
7
$iteration = 10000;
$i = 0;
while ($i < $iteration) {
    $ret = call_user_func_array(array('Filter_UTF8', 'validate'), 
        array($text, 4096));
    $i++;
}

Next was Creating object, calling the method, and then destroying the object. I did it this way because this simulates one of the common design pattern for doing filters, a collection object holds a bunch of filter instances for each value of the form. Then when “validation” is run each one is called to do it’s thing and then the form is processed and they are all destroyed once the form data is saved or it’s re-rendered. So each instance is created, run once, or twice if maybe you have a getMessage() type function, and they destroyed. I feel test is close to how the above design pattern would work on a large scale.

1
2
3
4
5
6
7
8
$iteration = 10000;
$i = 0;
while ($i < $iteration) {
    $my = new Filter_UTF8();
    $ret = $my->validate($text, 4096);
    $i++;
    $my = null;
}

The results were surprising. It took 33 seconds for this to run. Eeek!

I thought maybe the act of creating and destroying all those objects was causing the slowdown. So I created a third test that created one instance and calls the validate() method 10,000 times.

1
2
3
4
5
6
7
$iteration = 10000;
$i = 0;
$my = new Filter_UTF8();
while ($i < $iteration) {
    $ret = $my->validate($text, 4096);
    $i++;
}

I was really surprised when this took the same 33 seconds as creating 10,000 instances did. The crappy thing is, creating a bunch of instances is easier than trying to manage calling them statically unless you want to type out each filter call in a bunch of if/else statements (I’m trying to do an automated form type of thing). I just can’t believe the performance difference. You wouldn’t notice the difference on each page hit, where you had 100 of these. But if your script had 100 people all doing the same thing at once that ends up being a big difference.

UTF-8 Validation and PHP, do you?

I wonder why none of the major PHP frameworks have validators for UTF-8 encoding? You may be asking why do I need to validate incoming text as UTF-8? UTF-8 is the preferred character encoding for the web, if you want to display languages other than the Latin derived ones. All of the browsers support it. And so do all the major databases. I won’t get into the basics about what UTF-8 is and why you should use it. There are plenty of other resources for that. I’m also going to assume you are sending the correct charset header (in HTTP, not relying on a meta tag). And that your DB tables and connections are declared to use UTF-8. That stuff is well covered elsewhere also.

I still haven’t really told you why you would want to validate incoming text as UTF-8, I’ve only told you why you should use it. And the simple answer is the old security mantras of all input is evil and Filter Input and Escape Output. If this text is input I want to be validating it, no? The W3C recommends that you validate UTF-8 text. WACT does too. It’s been proven that you can launch an XSS attack on a site using “incorrectly” encoded text. Chris’ example used GBK encoding, but I think you can do the same thing with UTF-8. Is it immune?

I’ve been looking at a lot of example code looking for answers. The “major” PHP frameworks I looked at were Zend Framework, CakePHP, Symfony, Solar, Codeigniter, and Kohana. None of them include any validators for UTF-8, or any other text encoding. I looked at some other smaller frameworks but they were lucky to have validators at all.

I also looked at a few of the big mainstream PHP projects. Joomla, MediaWiki, phpBB, and Wordpress. Of them Joomla contains the library PHP UTF-8 by Harry Fuecks. The purpose of this library is to provide “native” PHP multibyte string functions when mb_string isn’t loaded on your server (and presumably you can’t do anything about it because you are on shared hosting). In the back of this library is a couple of functions, one actually validates UTF-8 and returns a true or false. The second converts UTF-8 to it’s Unicode code points, returned as an array. And the last converts that array back to UTF-8. The last two are used in the library’s uft8_strtolower() and uft8_strtoupper() functions, but is other wise unused by Joomla. The first function, called utf8_is_valid() is the one I am most interested in, and it is not used at all. Interestingly Kohana includes this same library, but they rearranged the files and function names, and stripped out the utf8_is_valid()

MediaWiki and phpBB both use a set of functions to Normalize UTF-8 data strings. This goes beyond just validating the byte stream. It is also recommended to Normalize UTF-8 so it sorts properly and consistently, and that is probably why these two packages do it. Both, especially MediaWiki count on being able to search strings well. But it also seems to be compute intensive. For what it’s worth it appears phpBB borrowed MediaWiki’s code and refactored it.

The last of our quadruple, Wordpress, contains a function to do a basic validation if a stream is UTF-8. However it seems that in practice this seems to be more UTF-8 detection than it is validation. The function, called seems_utf8(), allows 5 and 6 byte sequences, which were apparently in the early UTF-8 versions, before the Unicode consortium decided to limit the code point range to U+10FFFF, making anything over 4 bytes unnecessary. It also does not check for the disallowed UTF-16 surrogate code points, or byte order marks. Those last two points are important because Windows, Java, and Oracle store text internally in UTF-16. So a botched conversion from one of these sources into the browser could send invalid text to your PHP application. I don’t know the ins and outs of copy and pasting from one of these sources to a browser. I assume a conversion happens but don’t know where.

Getting back to the utf8_is_valid() function in PHP UTF-8, this is essentially what I have in mind for my CMS. The code in that function comes from another small library written by Henri Sivonen of validator.nu fame. He provides a function to convert a UFT-8 string to Unicode code points, returned in an array, and another one to convert the array to a UTF-8 string. Sound familiar? This is where the PHP UTF-8 library got it’s code that does the same thing. I stumbled onto this library through the WACT site, and ended up coding essentially the same thing Harry Fuecks did. I also made a “sanitizer” version that “deconstructs” the byte stream and throws out all the bad byte sequences without the intermediate point of the array. It just concatenates the good byte sequences int a new string.

I haven’t gotten into the functions PHP natively provides for checking and converting character encodings. But this post is long enough so that will have to wait until another day. So for anyone in the PHP community lucky or unlucky enough to read this post, should we be validating strings to make sure they are in the encoding we think they are in?