Oracle and Postgres Redux
12/10/2008So, my little rant about Oracle got a lot more attention than I'd expected.
While I was mainly focused on the price of Oracle, now I'd like to explore my favorite alternative a little more: PostgreSQL (PG). I've been busy the last few days using the excellent Ora2Pg to convert my old 8i database. What I've done is a quick and dirty off-hours conversion for a project that I've been working on for several months just to see how PG would work.
The project is a dashboard, a business reporting tool that is all the rage these days. It includes every business indicator they could think of from hours a deal spent waiting for approval, to projected totals of deals not yet approved, to charts and graphs of deals long ago approved. Gone are the days when you could give the users a copy of Crystal Reports and let them at it. They expect much more.
In the beginning I was saddled with an enormous spec that delineated only some of what the application should do and a number of existing reports. Unfortunately, not one of them ran in under 10 minutes and they were each a knotted mess of SQL depending on an untold number of views and procs. It took months to streamline the SQL and create a new data warehousing project. When it was all said and done, 51 queries dominated the main screen of the dashboard.
And it still took 30 seconds to draw for simple searches.
That was when I began caching like mad. There were hundreds of thousands of possible search term combinations, meaning I couldn't simply populate the cache ahead of time, but at least it only needed to run so often. Terracotta , which I'll write about soon, was a huge help. It's ten pounds of awesome in a five pound sack.
Caching is probably the only reason my users haven't gone running into the hands of Oracle, thrusting dollars at them. It alleviates the load and allows only queries that haven't yet been run to actually touch the database. I can get away with a whole lot less horsepower than I would need otherwise.
Porting
This isn't meant to be a benchmark because that would be fantastically unfair to Postgres. I've spent an enormous amount of effort in Oracle optimization. It's big job filled with expensive consultants who post weird pictures of themselves.
The schema contains materialized views built on partitioned tables. This
greatly helps cut down the amount of data Oracle must sort through to fetch
rows. The indexing has been deliberate and extensively tested, some times only
to gain a .10 second difference. It's been de-normalized as much as possible
but the structure of the data made that difficult. A couple queries still
touch as many as five tables. The queries themselves have also been written
and rewritten with Oracle in mind, using inline views, subselects or whatever
Oracle happened to fetch faster. I also made some use of optimizer hints like
/*+ RULE */
and /*+ JUST_USE_MY_INDEX_ALREADY(table) */
. I think of them as an
implicit admission that the optimizer is broken if I have to tell it what to
use.
But for Postgres I generally let Ora2Pg do it's thing. I didn't even bother
changing the types on many of the columns, some of which defaulted to the
slower arbitrary precision type numeric
. Postgres supports more sensible
integral types like int2
and int4
. I also didn't bother partitioning
the tables similar to my Oracle setup, and I didn't even bother compiling it
myself to my architecture for the best performance. I pulled 8.3 using apt-get
from the Ubuntu repository.
In short, I did about as little as possible. The only changes I made to the
queries were the simple syntax differences between the two. Such as Postgres
requires the "AS" keyword in the SELECT expression. Or a few type casts here
and there. I found that a few of my indexes depended on Oracle's trunc
function, so I created my own function that simply returned the result of a
::date cast and indexed that. There's a lot of room to improve these results.
Results
So without further ado, here's how an unoptimized PG faired:
That's pretty impressive.