The Slow Lane

A blog about autocrossing, some geeky stuff & Philadelphia.
Mazdaspeed Motorsports

We’re in!

Well all my complaining in my last post may be taken to be null and void considering what I’m about to say next. But I stand by my position. The first time homebuyer tax credit is/was creating artificial spikes in home prices. All you have to do is look at how home sales and home prices fell in December. But now we are part of the club. We are homeowners! And this is the main reason for my lack of posting for the last three months.

The homebuying experience was interesting and stressful and we learned some things to look out for next time. But we have closed! We are 75% moved. And lots of repairs and painting is under way. Hopefully I’ll be able to make smaller posts along the way as we fix some of the issues with the house and mold it into our vision. Now that I finally installed the Wordpress2 iPhone app that actually works with me self-hosted blog that will be easier!

More homebuyer tax credits? No thanks!

You must be thinking “is this guy crazy?” An insentive for buying a house, who doesn’t want that? Well I don’t. And up until a week ago I was trying to take advantage of the current first time homebuyer tax credit. Key word there is trying. You see my wife and I have discovered there is quite a lot of competition out there for homes in the “first time buyer price range.” And competition drives up prices. People are clamoring to take advantage of the credit. You would think that the credit would help. But in the end it hurts.

My wife and I aren’t the only ones to think so. There are other articles that also feel this is bad. I will just tell our personal story. That all starts with a trip to the lawyers office. Our apartment lease doesn’t end until April 30th and if we closed on November 30th, the last day of the credit, we’d be on the hook for 5 months of rent. So half of the tax credit gone, straight away. The lawyer cut a deal with the apartment complex and we were off shopping. But we weren’t the only ones. Twice we ran into situations where agents had schedules showings for the same or overlapping times. Both times there wasn’t one other potential buyer there, but two! And both homes stayed on the market less than a week. We live in Pennsylvania, but the same thing is happening in Santa Clarita, CA.

We ended up making three offers in the span of two months. And we lost out on two of those offers. Both were at asking price! We had been watching lots of HGTV and none of those people payed asking! In one case there was an escalation clause involved! My agent was beside himself both times. So you may be asking, what happened to the third? We were under contract on that third house. The home inspection revealed a few big ticket items that needed to be addressed. And the homeowner didn’t want to give us any money towards repair. This homeowner was quite “hostile” during negotiations. But I also got a sense that they felt we could suck it up since we were getting free government money. Well it isn’t free if we have to use it to rebuild the falling garage we paid for.

This quote from the Calculated Risk Blog sums up the market situation nicely.

This level of first-time buyers is completely unsustainable – even if another tax credit is enacted. There was significant pent up demand from potential first-time buyers who were priced out of the market in 2004-2006, and then were afraid to buy as prices fell. But demand from these buyers will wane.

We were part of that pent up demand. Before the market “collapsed” we wondered if we would ever be able to afford a home. Well even after the collapse we still can’t apparently.

Here are some other points to ponder. Others have pointed out that a credit such as this, especially if it is extended to everyone, will raise the price of homes by that amount. And that is true. But who does that benefit? Not the buyer. If that $8,000 credit goes towards your down payment, and you put 10% down, that allows you to buy $80,000 more house. Although you were able to afford more house you have a larger mortgage. You have to pay some of that money back, compounded with interest. Your mortgage company likes that. Yeah those same guys that marketed their “toxic”, bound to fail mortgages to others, that got us into this whole mess. And we will all have to pay for this program via taxes. Meanwhile the seller just walked away with $80,000 more than they would have been able to get otherwise. Let’s not forget to mention the people trying to defraud the system. So write or call your senators and tell them no.

Performance of static vs. instanciated method calls

As you saw in my past post I am working on filtering user input into my PHP application. I don’t want to get to much into the boring details because I started to write the post explaining all the little details and I could see it getting very long and drawn out and unfocused. But I was experimenting with calling the filter functions as static methods of a class. Then I thought about making objects of the class and calling the method from the instantiated class. I wanted to know if there was a performance difference between the two so I created a test. And there is.

For the test I am using my Filter_UTF8 class, discussed a little in my last blog post. I am calling the validate method. This is not a “hello world” type of test. The method does some heavy lifting and/or calculating. For all the tests I would call the method 10,000 times, to validate a ~1,200 kB text file. The same file would be validated over and over again.

The first test was to use call_user_func_array to call the method. This took 10 to 11 seconds to run.

1
2
3
4
5
6
7
$iteration = 10000;
$i = 0;
while ($i < $iteration) {
    $ret = call_user_func_array(array('Filter_UTF8', 'validate'), 
        array($text, 4096));
    $i++;
}

Next was Creating object, calling the method, and then destroying the object. I did it this way because this simulates one of the common design pattern for doing filters, a collection object holds a bunch of filter instances for each value of the form. Then when “validation” is run each one is called to do it’s thing and then the form is processed and they are all destroyed once the form data is saved or it’s re-rendered. So each instance is created, run once, or twice if maybe you have a getMessage() type function, and they destroyed. I feel test is close to how the above design pattern would work on a large scale.

1
2
3
4
5
6
7
8
$iteration = 10000;
$i = 0;
while ($i < $iteration) {
    $my = new Filter_UTF8();
    $ret = $my->validate($text, 4096);
    $i++;
    $my = null;
}

The results were surprising. It took 33 seconds for this to run. Eeek!

I thought maybe the act of creating and destroying all those objects was causing the slowdown. So I created a third test that created one instance and calls the validate() method 10,000 times.

1
2
3
4
5
6
7
$iteration = 10000;
$i = 0;
$my = new Filter_UTF8();
while ($i < $iteration) {
    $ret = $my->validate($text, 4096);
    $i++;
}

I was really surprised when this took the same 33 seconds as creating 10,000 instances did. The crappy thing is, creating a bunch of instances is easier than trying to manage calling them statically unless you want to type out each filter call in a bunch of if/else statements (I’m trying to do an automated form type of thing). I just can’t believe the performance difference. You wouldn’t notice the difference on each page hit, where you had 100 of these. But if your script had 100 people all doing the same thing at once that ends up being a big difference.

UTF-8 Validation and PHP, do you?

I wonder why none of the major PHP frameworks have validators for UTF-8 encoding? You may be asking why do I need to validate incoming text as UTF-8? UTF-8 is the preferred character encoding for the web, if you want to display languages other than the Latin derived ones. All of the browsers support it. And so do all the major databases. I won’t get into the basics about what UTF-8 is and why you should use it. There are plenty of other resources for that. I’m also going to assume you are sending the correct charset header (in HTTP, not relying on a meta tag). And that your DB tables and connections are declared to use UTF-8. That stuff is well covered elsewhere also.

I still haven’t really told you why you would want to validate incoming text as UTF-8, I’ve only told you why you should use it. And the simple answer is the old security mantras of all input is evil and Filter Input and Escape Output. If this text is input I want to be validating it, no? The W3C recommends that you validate UTF-8 text. WACT does too. It’s been proven that you can launch an XSS attack on a site using “incorrectly” encoded text. Chris’ example used GBK encoding, but I think you can do the same thing with UTF-8. Is it immune?

I’ve been looking at a lot of example code looking for answers. The “major” PHP frameworks I looked at were Zend Framework, CakePHP, Symfony, Solar, Codeigniter, and Kohana. None of them include any validators for UTF-8, or any other text encoding. I looked at some other smaller frameworks but they were lucky to have validators at all.

I also looked at a few of the big mainstream PHP projects. Joomla, MediaWiki, phpBB, and Wordpress. Of them Joomla contains the library PHP UTF-8 by Harry Fuecks. The purpose of this library is to provide “native” PHP multibyte string functions when mb_string isn’t loaded on your server (and presumably you can’t do anything about it because you are on shared hosting). In the back of this library is a couple of functions, one actually validates UTF-8 and returns a true or false. The second converts UTF-8 to it’s Unicode code points, returned as an array. And the last converts that array back to UTF-8. The last two are used in the library’s uft8_strtolower() and uft8_strtoupper() functions, but is other wise unused by Joomla. The first function, called utf8_is_valid() is the one I am most interested in, and it is not used at all. Interestingly Kohana includes this same library, but they rearranged the files and function names, and stripped out the utf8_is_valid()

MediaWiki and phpBB both use a set of functions to Normalize UTF-8 data strings. This goes beyond just validating the byte stream. It is also recommended to Normalize UTF-8 so it sorts properly and consistently, and that is probably why these two packages do it. Both, especially MediaWiki count on being able to search strings well. But it also seems to be compute intensive. For what it’s worth it appears phpBB borrowed MediaWiki’s code and refactored it.

The last of our quadruple, Wordpress, contains a function to do a basic validation if a stream is UTF-8. However it seems that in practice this seems to be more UTF-8 detection than it is validation. The function, called seems_utf8(), allows 5 and 6 byte sequences, which were apparently in the early UTF-8 versions, before the Unicode consortium decided to limit the code point range to U+10FFFF, making anything over 4 bytes unnecessary. It also does not check for the disallowed UTF-16 surrogate code points, or byte order marks. Those last two points are important because Windows, Java, and Oracle store text internally in UTF-16. So a botched conversion from one of these sources into the browser could send invalid text to your PHP application. I don’t know the ins and outs of copy and pasting from one of these sources to a browser. I assume a conversion happens but don’t know where.

Getting back to the utf8_is_valid() function in PHP UTF-8, this is essentially what I have in mind for my CMS. The code in that function comes from another small library written by Henri Sivonen of validator.nu fame. He provides a function to convert a UFT-8 string to Unicode code points, returned in an array, and another one to convert the array to a UTF-8 string. Sound familiar? This is where the PHP UTF-8 library got it’s code that does the same thing. I stumbled onto this library through the WACT site, and ended up coding essentially the same thing Harry Fuecks did. I also made a “sanitizer” version that “deconstructs” the byte stream and throws out all the bad byte sequences without the intermediate point of the array. It just concatenates the good byte sequences int a new string.

I haven’t gotten into the functions PHP natively provides for checking and converting character encodings. But this post is long enough so that will have to wait until another day. So for anyone in the PHP community lucky or unlucky enough to read this post, should we be validating strings to make sure they are in the encoding we think they are in?

An idea and a setback

I’ve had an idea. I’ve been having lots of ideas lately but this one has been in my head for a while. There are lots of Content Management Systems or CMS’ out there, software that you use to build and maintain a website. They come in all shapes, sizes, and complexities. I have observed there is a missing niche among all of these CMS’. Something that is simple to use for non-computer people, and offer the basics without going overboard with complexity or features. This Wordpress blog I’m typing on now is a good example. In fact Wordpress has become very popular as a CMS for small websites where you just need to make a few pages.

That is where my inspiration came from. Our SCCA webpage could use a CMS like that. And a friend of mine runs a videography business where he wants to be able to update his site frequently. You could just use Wordpress for these sites, but why drag along all the blog oriented code when you aren’t going to use it. Plus I believe Wordpress could use a good tune-up. It is still written for the no-longer-supported PHP4.

So I started writing my own little CMS in my spare time. It’s very slow going. It’s hard to get something really going when you are doing it two hours at a time. and of course I’ve been running into some snags and doing a lot of learning. One of those snags turned out to be database access. I saw that I would be writing a bunch of similar queries. So I wrote a lightweight database abstraction layer to automate some of that SQL creation. This database layer is based on PHP5’s PDO, meaning it has an object oriented interface and I can use it to connect to many different databases, assuming the SQL I write is compatible.

I don’t want to re-invent the wheel so I looked at two frameworks that do something similar, Zend_Db and Solar_Sql for ideas. They both take a different approach on how to handle prepared statements and how to pass the data into them. I tried to take the middle road and support both. The solution which I came up with, I recently found out won’t work. So I’ve got to give it a big re-think. It’s thing like this that are slowing the project down. Plus the stop-starting from lack of time. I didn’t want to talk too much about this project until it was more together. but based on this “little” setback I realized that it’s going to take a lot longer then I hoped for this thing to see the light of day. So I might as well talk about it online. I certainly haven’t been autocrossing this summer. :(