PostgreSQL Bloggers at Sun

I guess you already heard about Sun Storage F5100 Flash Array and its world record benchmarks.  

But it's not F5100 that I am going to talk about but its smaller sibling called Sun Flash Accelerator F20 PCIe Card.  The name is a mouthful like all Sun product names so I will just call it "The Accelerator Card" in the remainder of this blog entry.  Of course the idea is not to start with the answer and find a problem with it. But I am going to narrate  is how we saw a problem and then thought of using this answer to solve the problem.

Recently our group ISV-E was doing our standard thing of  making applications run best on Sun. In this particular project with PeopleSoft Enterprise 9.0 on M5000 system using Sun Storage 6540, we encountered a problem that certain batch jobs where taking a long time to execute. Peoplesoft Enterprise 9.0 actually have ways to breakup jobs and run them in parallel so as to use the multi-core of the multi-processor system. But yet we could not really leverage the system enough to be satisfactory.  In this project they were using Oracle Database 11g. I got to give it to Oracle, they do have good tools. We used Oracle Enterprise Manager and saw for the troubled batch process, it was showing lot of blue color in its output.


Also looking at the top Objects, the tool reported which tables and index were  troublesome which was causing that amount of blue appear in the chart. This "Blue" problem is what led us to an idea to test out the Accelerator Card in the system and see if can help out here. What we did was created a few tablespaces and spread them out on the four Flash Modules on the Accelerator Card and moved the highly active (or "hot" ) tables and indices to the newly created tablespace. What we saw was simply huge reduction in the blue area and more green. That lead to the slogan in our team

"Go Green with the Accelerator Card !"

The Accelerator card not only reduced the time on this process but many other batch processes which had high IO components.  Here is a relative comparison of how it helped (with additional slight boost from upgrading SPARC64 VII from 2.4Ghz to 2.53Ghz CPUs).


Of course the next question is what if you take the same thing to its bigger sibling, Sun Storage F5100 Flash Array, well that's exactly what we did and as they say the rest is history.(Hint: Read the world records link and search for PeopleSoft)  For more information check out Vince's blog entry on  PeopleSoft Enterprise Payroll 9.0 NA and also  Why Sun Storage F5100 is good for PeopleSoft 9.0 NA Payroll application.

Truly if you use Oracle and use Oracle Enterprise Manager to monitor your application performance and are turning blue by seeing lot of Blue area in the chart then just remember

"Go Green with the Accelerator Card !"


Recently I was working on a project which used Infobright as the database. The version tested was 3.1.1 both on OpenSolaris as well as Solaris 10. Infobright is like a column-oriented database engine for MySQL primarily targeted towards data warehouse, data mining type of project deployments.

While everything was working as expected, one thing we did notice that as number of concurrent connections tried to query against the database we noticed that queries deteriorated fast in the sense that not much parallel benefits were being squeezed from the machine. Now this sucks! (apparently sucks is now a technical term). It sucks because the server has definitely many  cores and typically each Infobright query still can at the max peg a core. So the expectation will be typically to atleast handle concurrent queries which is close to the number of cores  (figuratively speaking though in reality it depends).

 Anyway we started digging into this problem. First we noticed that CPU cycles were heavy so IO was probably not the culprit (in this case). Using plockstat we found

# plockstat -A -p 2039    (where 2039 is the PID of mysqld server running 4 simultaneous queries)

^C 
Mutex hold 

Count     nsec Lock                         Caller 
------------------------------------------------------------------------------- 
3634393     1122 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_Znwm+0x2b 
3626645     1047 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_ZdlPv+0xe 
    2 536317885 0x177b878                    mysqld`_ZN7IBMutex6UnlockEv+0x12 
   12  6338626 mysqld`LOCK_open             mysqld`_Z10open_tableP3THDP13st_table_listP11st_mem_rootPbj+0x55a 
 9057     1275 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_Znwm+0x2b 
 8493     1051 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_ZdlPv+0xe 
 7928     1119 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_ZdlPv+0xe 
    5   326542 0x177b878                    mysqld`_ZN7IBMutex6UnlockEv+0x12 
  683     1189 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_Znwm+0x2b 
  564     1339 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_Znwm+0x2b 
  564     1274 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_Znwm+0x2b 
  564     1156 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_ZdlPv+0xe 
   17    36292 0x1777780                    mysqld`_ZN7IBMutex6UnlockEv+0x12 
    2   246377 mysqld`rccontrol+0x18        mysqld`_ZN7IBMutex6UnlockEv+0x12 
   57     8074 mysqld`_iob+0xa8             libstdc++.so.6.0.3`_ZNSo5flushEv+0x30 
  218     1479 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_Znwm+0x2b 
    4    78172 mysqld`rccontrol+0x18        mysqld`_ZN7IBMutex6UnlockEv+0x12 
    4    75161 mysqld`rccontrol+0x18        mysqld`_ZN7IBMutex6UnlockEv+0x12 
….

R/W reader hold 

Count     nsec Lock                         Caller 
------------------------------------------------------------------------------- 
   44     1171 mysqld`THR_LOCK_plugin       mysqld`_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0xa3 
   12     3144 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1    14125 0xf7aa18                     mysqld`_ZN11Query_cache21send_result_to_clientEP3THDPcj+0x536 
    1    12089 0xf762e8                     mysqld`_ZN11Query_cache21send_result_to_clientEP3THDPcj+0x536 
    2     1886 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    2     1776 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     3006 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     2765 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     1797 mysqld`LOCK_grant            mysqld`_Z11check_grantP3THDmP13st_table_listjjb+0x38c 
    1     1131 mysqld`THR_LOCK_plugin       mysqld`_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0xa3 

Mutex block 

Count     nsec Lock                         Caller 
------------------------------------------------------------------------------- 
 2175 11867793 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_ZdlPv+0xe 
 1931 12334706 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_Znwm+0x2b 
    3 93404485 libc.so.1`libc_malloc_lock   mysqld`my_malloc+0x32 
    1    11581 libc.so.1`libc_malloc_lock   mysqld`_ZN11Item_stringD0Ev+0x49 
    1     1769 libc.so.1`libc_malloc_lock   libstdc++.so.6.0.3`_ZnwmRKSt9nothrow_t+0x20
..

Now typically if you see libc_malloc_lock in a plockstat for a  multi-threaded program then it is a sign that the default malloc/free routines in libc is the culprit since the default malloc is not scalable enough for a multi-threaded program. There are alternate implementations which are more scalable than the default. Two such options which are already part of OpenSolaris, Solaris 10 are libmtmalloc.so and libumem.so. They can be forced to be used instead of the default without recompiling the binaries by preloading anyone of them before the startup command.

In case of the 64-bit Infobright binaries we did that by modifying the startup script mysqld-ib and added the following line just before invocation of mysqld command.

LD_PRELOAD_64=/usr/lib/64/libmtmalloc.so; export LD_PRELOAD_64

What we found was now the response times for each query was more in-line as it was being executed on its own. well not true entirely but you get the point. For a 4 concurrent queries we found that it had improved from like 1X to 2.5X reduction in total execution time.

Similary when we used libumem.so we found the reduction more like 3X when 4 queries were executing concurrently.

LD_PRELOAD_64=/usr/lib/64/libumem.so; export LD_PRELOAD_64

Definitely something to use for all Infobright installations on OpenSolaris or Solaris 10.

In a following blog post we will see other ways to tune Infobright which are not as drastic as this one but still buys some percentage of improvements. Stay tuned!!










Recently I got access to the refreshed Sun Fire X4140 consisting of 2 x 6-core Opterons with 36GB RAM. Since the release of the final PostgreSQL 8.4 bits I had not tried it out so I downloaded the Solaris 10 binaries of PostgreSQL 8.4 (64-bits) from the download site of postgresql.org and took it for the test drive with the same iGen benchmarks that I had used earlier for my PGCon2009 presentation.

The system already had Solaris 10 5/09 installed with couple of  SSDs  and a RAID LUN for the database. I put the WAL log on an internal drive with ZFS intent log on SSDs and the tablespaces on the RAID LUN (on an external storage array).

Notice the crossing of the 400K tpm boundary with PostgreSQL here using this benchmark toolkit. None of my tests have ever done that before. I consider this to be a milestone achievement with PostgreSQL, Solaris 10, Sun Fire Systems with Opterons.




Sun is launching systems with multisocket  6-core Opterons (Istanbul) today. Last week I got access to  Sun Fire X4140 with 2 x 6-core Opterons with 36GB RAM. It is always great to see such a 1RU system packaged with so many x64 cores.

# psrinfo -vp
The physical processor has 6 virtual processors (0-5)
  x86 (chipid 0x0 AuthenticAMD family 16 model 8 step 0 clock 2600 MHz)
    Six-Core AMD Opteron(tm) Processor 8435
The physical processor has 6 virtual processors (6-11)
  x86 (chipid 0x1 AuthenticAMD family 16 model 8 step 0 clock 2600 MHz)
    Six-Core AMD Opteron(tm) Processor 8435


I decided to take the system for a test drive with Olio. Olio is a Web 2.0 toolkit consisting on a web 2.0 event calendar application  which can help stress a system. Depending on your favorite scripting language you can use either PHP, Ruby on Rails, Java as the language used to create the application. (I took the easy way out and selected Olio PHP's prebundled binary kit)

Please don't let the small 2MB kit size fool you thinking it will be a easy workload to test it out. While setting it up I figured that to generate the data population for say 5000 users you will need space with atleast 500GB disk space for the content that it generates for it. Yes I quickly had to figure out how to get a storage array for Olio with about 800GB LUN.

Olio requires a webserver, PHP (of course) and  a database for its metadata store (it has scripts for MySQL already in the kit). The system came preconfigured with Solaris 10 5/09. I downloaded MySQL 5.4.1 beta  and also the Sun WebStack kit which has Apache Httpd 2.2, PHP 5.2 (and also MySQL 5.1 which had not used since I had already downloaded MySQL 5.4 Beta). Memcached 1.2.5 is part of the WebStack download and Olio is configured to use it also by default (but can be disabled too).

Eventually everything was installed and configured in the same X4140 and using the Faban Harness on another system started executing some runs with file store and the meta store preconfigured to handle all the way up to 5000 concurrent users. The results are as follows:

OlioPHP

Here are my observation/interpretations:

  • Eventually beyond 10 cores run I find that the system memory (36GB) is not enough to sustain more concurrent users to fully utilize the remaining cores. I would probably need RAM  in the range of 48GB or more to handle more users. (PHP is not completely thread-safe and hence the web server used here spawns processes)
  • This 1RU system can handle more than 3200 users  (with everything on the same system) with CPU cycles to spare is pretty impressive. It means you still have enough CPU to log into the system without seeing degraded performance.
  • Actually you can see here that SMP (or should be called  SMC - Scalable Multi Cores) type system helps when the initial cores are added  instead of using multiple single core systems (ala in Cloud).

 In an upcoming blog entries I will talk more about the individual components used.



MC Brown and I co-presented a session at CommunityOne West on how to add probes to applications, using MySQL and PostgreSQL as case study. In the presentation, I used a very simple example to demonstrate how easy it is to add probes. If you want to try out yourself, here is the code. Extract the files, run gmake (or gnumake on OS X) to build, and run the executable in one terminal and the DTrace script in another terminal to see the output from the probes. To see more complicated examples, checkout the MySQL or PostgreSQL source code.

At PgCon 2009 in Ottawa, I did a lightning talk on DTrace probes in PostreSQL 8.4. I wanted to show several demos but ran out of time. If you want to try them out, use the scripts below.

Here is the script, query_time.d, used in slide 14. This script is used to identify slow queries by printing out the query execution time.


#!/usr/sbin/dtrace -s
#pragma D option quiet

dtrace:::BEGIN
{
  printf("Tracing... Hit Ctrl-C to end.\n");
}

postgresql*:::query-start
{
    self->query = copyinstr(arg0);
    self->ts = timestamp;
}

postgresql*:::query-done
/self->ts/
{
    @query_time[self->query] = avg(timestamp - self->ts);
    self->query = 0;
    self->ts = 0;
}

dtrace:::END
{
     printf("%10s %s\n", "TIME (ns)", "QUERY");
     printf("==============================================================\n");
     printa("%@10d %s\n", @query_time);
}

Here is the second script, sort.d, used in slide 16. This script tells the type of sort, whether the sort was done in memory or on disk, and the time to perform the sort.


#!/usr/sbin/dtrace -qs

dtrace:::BEGIN
{
        sorttype[0] = "TUPLE";
        sorttype[1] = "INDEX";
        sorttype[2] = "DATUM";

        sortmethod[0] = "INTERNAL";
        sortmethod[1] = "EXTERNAL";
}

postgresql*:::sort-start
{
        self->ts = timestamp;
        printf("\nBegin %s sort, workmem = %d KB \n", sorttype[arg0], arg3);
}

postgresql*:::sort-done
/self->ts && arg0 == 0/
{
        /* Internal Sort */
        printf("%s sort ended, space used = %d KB \n", sortmethod[arg0], arg1);
        printf("Sort time = %d ms\n\n", (timestamp - self->ts) / 1000000);
}

postgresql*:::sort-done
/self->ts && arg0 == 1 /
{        /* External Sort */
        printf("%s sort ended, space used = %d disk blocks\n", sortmethod[arg0], 
arg1);
        printf("Sort time = %d ms\n\n", (timestamp - self->ts) / 1000000);
}

Below is the last script, query_trace.d, used in slide 23. This script provides useful data that will allow you to dig down deeper. In this example, the buffer reads to table 16397 (this is the OID) is huge. This signals a red flag that an index may be needed for this table. To find out the table name from OID, run "SELECT relname FROM pg_class WHERE relfilenode=16397" in psql.


#!/usr/sbin/dtrace -qs

postgresql*:::query-start
{
        self->ts = timestamp;
        self->pid = pid;
}

postgresql*:::buffer-read-start
/self->pid/
{
        self->readts = timestamp;
}

postgresql*:::buffer-read-done
/self->pid && arg7/
{
        /* Buffer cache hit */
        @read_count[arg2, arg3, arg4] = count();
        @read_hit_total["Total buffer cache hits      : "] = count();
        @read_hit_time["Average read time from cache : "] = avg (timestamp - self->readts);
        self->readts = 0;
}

postgresql*:::buffer-read-done
/self->pid && !arg7/
{
        /* Buffer cache miss */
        @read_count[arg2, arg3, arg4] = count();
        @read_miss_total["Total buffer cache misses    : "] = count();
        @read_miss_time["Average read time from disk  : "] = avg (timestamp - self->readts);
        self->readts = 0;
}

postgresql*:::buffer-flush-start
/self->pid/
{
        self->writets = timestamp;
}

postgresql*:::buffer-flush-done
/self->pid/
{
        @write_count[arg2, arg3, arg4] = count();
        @write_time["Average write time to disk   : "] = avg (timestamp - self->writets);
        self->writets = 0;
}

postgresql*:::query-done
/self->ts && self->pid == pid/
{
        printf("\n============ Buffer Read Counts ============\n");
        printf("%10s %10s %10s %10s\n","Tablespace", "Database", "Table", "Count");
        printa("%10d %10d %10d %@10d\n",@read_count);

        printf("\n======= Buffer Write Request Counts ========\n");
        printf("%10s %10s %10s %10s\n","Tablespace", "Database", "Table", "Count");
        printa("%10d %10d %10d %@10d\n",@write_count);

        printf("\n========== Additional Statistics ===========\n");

        printf ("Backend PID    : %d\n", pid);
        printf ("SQL Statement  : %s\n", copyinstr(arg0));
        printf ("Execution time : %d.%03d sec (%d ns)\n", (timestamp - self->ts) / 1000000000, ((timestamp - self->ts) / 1000000) % 1000, timestamp - self->ts);
        printa("\n%19s %@8d\n",@read_hit_total);
        printa("%19s %@8d\n",@read_miss_total);
        printa("%19s %@8d (ns)\n",@read_hit_time);
        printa("%19s %@8d (ns)\n",@read_miss_time);
        printa("%19s %@8d (ns)\n",@write_time);
        printf("\n\n");

        trunc(@read_count);
        trunc(@write_count);
        trunc(@read_hit_total);
        trunc(@read_miss_total);
        trunc(@read_hit_time);
        trunc(@read_miss_time);
        trunc(@write_time);

        self->ts = 0;
        self->pid = 0;
}

To see more sample scripts as well as a GUI tool, check out the PostgreSQL DTrace Toolkit.

As many of you know, PostgreSQL 8.4 has quite a few more DTrace probes. See my previous blog post for more details . To use the probes, you need to write DTrace scripts, which is quite easy to do, but to make it easier to use the probes (especially for those who are new to DTrace), I have written some scripts that you can just run from the command line. In addition, I've integrated some of those script with Chime to make it even easier to visualize the data. Check out the toolkit on PgFoundry.

With the release of the OpenSolaris 2009.06, I thought it is time to update the Minimal OpenSolaris 2008.11  Appliance OVF image that I had created earlier. The script create_osol2009006_app.sh has been updated to create minimal OpenSolaris 2009.06 Appliance images for VirtualBox. 

How to use the OVF image?

  • Download VirtualBox 2.2.4 and install it on your host platform.
  • Download the OpenSolaris 2009.06 App OVF image zip file and then unzip it.
  • Fire up Virtualbox GUI and  use menu item VirtualBox->File->Import Appliance to import the image (using the  OSOL200906App.ovf file ) into a new VirtualBox VM
  • Start the newly created VM and in few minutes you will be  ready to login into OpenSolaris 2009.06 kernel.The preset login information is user: root with password: opensolaris.

Comments welcome.

Simon Riggs of 2nd Quadrant recently submitted a patch for testing which should improve read only scalability of Postgres. I took it for a test drive for my setup. In the first set of tests I used the same benchmark as previous ones so as to have the same reference point.

It seems changing the Number of Buffer Partitions for this workload does not have any impact. My dataset for this iGen benchmark is pretty small and should easily fit under 2GB size and hence may not be stressing the buffer partitions too much to warrant bigger number. The patch still helps to get good healthy 4-6% gain in peak values.


At PGCon 2009, Jesper Pedersen talked to me about the new Binary Transfer patch which was submitted to the JDBC Driver for Postgres 8.4. I thought it will be nice to compare how the JDBC 8.4 driver compared to older 8.3 JDBC Driver. Hence I took it for a drive

The 8.4 JDBC Driver with BinaryTransfer patch seems to get to a better peak faster but since to taper off at high clients. I don't know if this benchmark was the right benchmark for it. Need more benchmarks which uses JDBC to see the performance difference with this feature.


During my PGCon 2009 presentation there was a question on the saw tooth nature of the workload results on the high end side of benchmark runs. To which Matthew Wilcox (from Intel) commented it could be scheduler related. I did not give it much thought at that time till today when I was trying to do some iGen runs for the JDBC Binary Transfer patch (more on that in another blog post) and also Simon's read only scalability runs . Then I realized that I was not following one of my one tuning advice for running Postgres on OpenSolaris. The advice is to  use FX Class of scheduler instead of the default TS Class on OpenSolaris . More details on various scheduler classes can be found on docs.sun.com.

Now how many times I have forgotten to do that with Postgres on OpenSolaris I have no idea. But yes it is highly recommended specially on multi-core systems to use FX scheduler class for Postgres on OpenSolaris. How much gain are we talking about? The following graph will give an indication using the default TS scheduler class Vs the FX Scheduler class using the iGen benchmark.

The gain is about 14% by just switching over to FX Class. How did I get Postgres server instance to use FX class? I cheated and put all processes of the user (with userid 236177)  in FX class using the following command line.

# priocntl -s -c FX -i uid 236177

One thing to figure out is how to make sure Postgres uses FX scheduler class out of the box on OpenSolaris so I don't keep forgetting about that minute performance tip.





On the first day of PGCon 2009 I presented on my results of my testing with Postgres 8.4beta1 vs the earlier version (8.3.7). The good news is it should not cause any regressions to existing users of 8.3.7 to upgrade and exploit the opportunity to use the new features of Postgres 8.4. 


Comments/Questions welcome.


While working on my upcoming presentation for PGCon 2009 on Thursday, I found that sometimes it is misleading to just take one snapshot of locks to figure the hot locks in PostgreSQL workload characterization.

So again starting from one of the DTrace scripts I arrived at pglockwait_84.d

NOTE: It only works with operating systems that support DTrace. I have only tested it on OpenSolaris as of now.

It can either be used to track to summarize all PostgreSQL backends (using '*')  or selected one using process id using 10 second interval. It also prints time so that it can be dumped into a file for post-processing analysis. 

An example output  is show below during dbt-2 runs using PostgreSQL 8.4 beta1.

# ./pglockwait_84.d '*' 2009 May 19 02:52:14 Lock-Id Mode Wait-Time(ms) Count Dynamic Locks Exclusive 0 5 ProcArrayLock Shared 0 37 Dynamic Locks Shared 1 52 CLogControlLock Exclusive 1 85 BufFreelistLock Exclusive 1 81 CLogControlLock Shared 1 103 ProcArrayLock Exclusive 2 112 BgWriterCommLock Exclusive 10 123 BufMappingLock Exclusive 11 636 XidGenLock Exclusive 17 2 BufMappingLock Shared 34 1566 WALInsertLock Exclusive 49 2305 LockMgrLock Exclusive 65 852 2009 May 19 02:52:24 Lock-Id Mode Wait-Time(ms) Count XidGenLock Shared 0 1 XidGenLock Exclusive 0 12 ProcArrayLock Shared 1 86 BufFreelistLock Exclusive 4 240 BgWriterCommLock Exclusive 5 213 Dynamic Locks Shared 5 157 CLogControlLock Exclusive 6 238 CLogControlLock Shared 6 384 ProcArrayLock Exclusive 57 360 Dynamic Locks Exclusive 158 7 WALInsertLock Exclusive 187 7837 LockMgrLock Exclusive 226 3251 BufMappingLock Exclusive 289 2141 BufMappingLock Shared 895 5513 2009 May 19 02:52:34 Lock-Id Mode Wait-Time(ms) Count XidGenLock Shared 0 0 Dynamic Locks Exclusive 0 6 XidGenLock Exclusive 0 5 ProcArrayLock Shared 1 76 BufFreelistLock Exclusive 3 183 BgWriterCommLock Exclusive 4 118 ProcArrayLock Exclusive 5 229 Dynamic Locks Shared 5 91 CLogControlLock Exclusive 29 198 CLogControlLock Shared 62 272 BufMappingLock Exclusive 141 1685 LockMgrLock Exclusive 206 2175 WALInsertLock Exclusive 221 5540 BufMappingLock Shared 279 4180 2009 May 19 02:52:44 Lock-Id Mode Wait-Time(ms) Count XidGenLock Shared 0 0 Dynamic Locks Exclusive 0 3 XidGenLock Exclusive 0 5 ProcArrayLock Shared 0 67 BgWriterCommLock Exclusive 1 69 BufFreelistLock Exclusive 2 148 CLogControlLock Shared 3 262 CLogControlLock Exclusive 4 199 ProcArrayLock Exclusive 47 277 WALWriteLock Exclusive 64 2 BufMappingLock Exclusive 79 1599 WALInsertLock Exclusive 151 5949 LockMgrLock Exclusive 198 2377 BufMappingLock Shared 223 4345 Dynamic Locks Shared 1568 144 ^C

It throws an output every 10 second and the time spent in acquiring the locks. For the BufMappingLock, LockMgrLock and Dynamic Locks it aggregates all of them together respectively. It's bit high on system resources if you track all Postgres backends but if you already know which one then it can be low on overhead. Hope it is useful to you too as I found it for my purpose.


 I modified one of Robert's dtrace scripts so that it is  useful for my purpose to measure often asked transactions per second  for random workload running on PostgreSQL.

The script is as follows:

#!/usr/sbin/dtrace -qs
postgresql*:::transaction-start
{
	@startpersec["New"] = count();
}
postgresql*:::transaction-commit
{
	@commitpersec[ "Commit"] = count();
}
postgresql*:::transaction-abort
{
	@abort["Abort"] = count();
}
profile:::tick-1s
{
        printf("******** Transactions Per Second *********\n");
	printf("%20s %15s\n", "Txn Type", "Count");
	printf("==========================================\n");
	printa("%20s %@15d\n", @startpersec);
	printa("%20s %@15d\n", @commitpersec);
	printa("%20s %@15d\n", @abort);
        printf("\n");
	clear(@startpersec);
	clear(@commitpersec);
	clear(@abort);
}


UPDATE: You can also download it pgtps.d

When you execute it you see outputs every second as follows:

 
# ./tps.d
******** Transactions Per Second *********
            Txn Type           Count
==========================================
                 New             192
              Commit             192
               Abort               1

******** Transactions Per Second *********
            Txn Type           Count
==========================================
                 New             175
              Commit             172
               Abort               0

******** Transactions Per Second *********
            Txn Type           Count
==========================================
                 New             195
              Commit             198
               Abort               0

******** Transactions Per Second *********
            Txn Type           Count
==========================================
                 New             183
              Commit             178
               Abort               2


How to interpret the output?

  • New mentions how many transactions started per second
  • Commit talks about how many transactions commited per second.
  • Aborts talks about transactions aborted in that second

Useful specially when some one  asks a questions that they are generally reading from a questionaire like how many transactions per second are we doing?

Where is your TPS report?

Postgres 8.4 Beta1 community binaries are now for OpenSolaris 2008.11. The Beta1 binaries for OpenSolaris can be downloaded from postgresql.org binary location. Postgres 8.4 binaries for Solaris 10 are also available.

For people who don't have OpenSolaris installed on their laptop but want to try out the new improved DTrace Probes in Postgres 8.4beta1, you can install the Minimal OpenSolaris Appliance OVF image for VirtualBox 2.2 and install the Postgres 8.4beta1 binaries in the appliance to try it out. You can also use the DTrace probes on your Mac OS X too.

Easiest way to install the binaries on the OpenSolaris Appliance is to first install SUNWwget package from the OpenSolaris repository

pkg install SUNWwget

and then using copy the download mirror url for those binaries using http and download it with wget in the appliance

wget "http://wwwmaster.postgresql.org/redir/198/h/binary/v8.4beta\
/solaris/opensolaris/i386/postgresql-8.4beta1-opensolaris.i386-32.tar.bz2"

The community binaries typically should be untarred in /opt.

bzcat postgresql-8.4beta1-opensolaris.i386-32.tar.bz2 |tar -xf -

This will then have the binaries in /opt/postgres/8.4beta1/. If you also untar the 64-bit binaries then the the 64-bit binaries are available from /opt/postgres/8.4beta1/64.

One thing that I have noticed with these binaries that it does not pick up the libraries if installed in /opt by default so depending on the type of bits you may need to set the following

LD_LIBRARY_PATH=/opt/postgres/8.4beta1/lib; export LD_LIBRARY_PATH

or

LD_LIBRARY_PATH_64=/opt/postgres/8.4beta1/lib/64; export LD_LIBRARY_PATH_64

Beyond that everything should work as you would expect. Well almost... One thing to also note is that the new 8.4 GUC parameter effective_io_concurrency to allow readahead for bitmap scan index scans is disabled on OpenSolaris / Solaris 10.

 If you do find something that doesn't seem to work, please feel free to leave comments.

 

Going through the comments for Minimal OpenSolaris Appliance entry,  I thought I will go over the problems I encountered when I was working on the create_solaris_appliance.sh script and what I think we can do to improve OpenSolaris in those areas.

1. Setting up the disk

This is probably the first thing that most appliance creators will have to do is to format a new media before it is usable by OpenSolaris. Now to make a disk usable by OpenSolaris (on x86/x64 platforms) two things needs to be done, one a primary Solaris partition needs to be created using fdisk and then a regular Solaris VTOC needs to be initialized on the Solaris partition. While the experience is bit easier with the interactive option of the commands, putting it in a script can be challenging.

Fortunately fdisk has -B option. From the man page:

    -B
         Default to one Solaris partition  that  uses  the  whole
         disk. On an x86 machine, if the disk is larger than 2 TB
         (terabytes), the default size of the  Solaris  partition
         will be limited to 2 TB.

Hence I could use the following easily in my script:

fdisk -n -B /dev/rdsk/${INSTALLDISK}p0

Unfortunately I saw no such option for fmthard. Infact it made it more difficult since you need to enter the geometry information of the target disk. I took the easy way out by finding the same for the default VirtualBox disk size which is 16GB and using it as follows:

fmthard -d 0:2:00:48195:33447330 /dev/rdsk/${INSTALLDISK}p0

There are some scripting ways to work this around as Vikram Datta commented on my earlier entry:

  SecCnt=`prtvtoc /dev/rdsk/${INSTALLDISK}p0 | awk '/sectors\/cylinder/ { print $2 }'`
  LastSect=`prtvtoc /dev/rdsk/${INSTALLDISK}p0 | awk '$1 == "2" { print $5 }'` 
  LastSect=`expr $LastSect - $SecCnt` 
  fmthard -d 0:2:00:${SecCnt}:${LastSect} /dev/rdsk/${INSTALLDISK}p0 
  

But I think it should be as easy as  fdisk.. i.e. doing the following:

fmthard -B /dev/rdsk/${INSTALLDISK}p0

Hence I have filed a new RFE 6829475. I think this RFE is useful not just for my script but in general helps improve usability of the command to a new learner of OpenSolaris.

2. Setting up the ZPool

The next step of creating zpool for the root device was pretty straight forward

zpool create -f rpool ${INSTALLDISK}s0
zfs set compression=on rpool 
zfs create -o mountpoint=legacy rpool/ROOT
zfs create -o mountpoint=$PKG_IMAGE rpool/ROOT/VOSApp
zfs create -V 128M rpool/swap
zfs create -V 16M rpool/dump
zfs create rpool/ROOT/VOSApp/opt
zfs create rpool/ROOT/VOSApp/var
zfs create rpool/export
zpool set bootfs=rpool/ROOT/VOSApp rpool

Here I took liberty of separating out /opt /var into separate dataset so that I can enable zfs snapshots just for "optional" and "variable" data of the applications. This is a point of view of deployment. Your view may be different here.

3. Setting up the packages from OpenSolaris repository

The next step on how to use OpenSolaris repository to install the pacakges. Alex Eremin had great pointers on his blog that I adapted . The initial setup can be easily done by exporting the PKG_IMAGE environment variable to the directory where you are currently mounting the zpool and then using pkg image-create command.

pkg image-create -f -F -a opensolaris.org=http://pkg.opensolaris.org/release $PKG_IMAGE
pkg refresh

Then  I played with the package list over and over again to get a minimal size with most network adapter drivers to get on the internet and all required pacakges to allow "pkg" command to work. Of course this is the piece that took me quite a bit of trial and error to figure out the right mix of pacakges so OpenSolaris does boot up successfully and allow "pkg" to run successfully.

pkg install SUNWcsd 
pkg install SUNWcs 
pkg install SUNWcar SUNWcakr SUNWkvm SUNWos86r SUNWrmodr \
 SUNWpsdcr SUNWpsdir SUNWcnetr SUNWesu SUNWkey SUNWnfsckr \
 SUNWnfsc SUNWgss SUNWgssc SUNWbip SUNWbash SUNWloc SUNWsshcu \
 SUNWsshd SUNWssh SUNWtoo SUNWzfskr SUNWipf SUNWintgige SUNWipkg \
 SUNWadmr SUNWadmap SUNWPython SUNWperl584core SUNWgrub SUNWxcu6\
 SUNWxcu4 SUNWgawk SUNWgtar SUNWgnu-coreutils SUNWscp SUNWfmd \
 SUNWxge SUNWbge SUNWnge SUNWrge SUNWrtls \
 SUNWixgb SUNWchxge SUNWzfs-auto-snapshot SUNWsolnm

I did realize one thing.. Loading one package at a time with pkg is awfully slow. By putting all the packages within one line (except for SUNWcsd and SUNWcs) I could cut down the time from hours to minutes. This was my "Eureka" moment when I could install the packages to a bare-metal .. well a bare-virtualbox VDI within 10 minutes.

4. Setting up the SMF Database on the system

cp $PKG_IMAGE/lib/svc/seed/global.db $PKG_IMAGE/etc/svc/repository.db
chmod 0600 $PKG_IMAGE/etc/svc/repository.db
chown root:sys $PKG_IMAGE/etc/svc/repository.db
# setup smf profiles
ln -s ns_files.xml $PKG_IMAGE/var/svc/profile/name_service.xml
ln -s generic_limited_net.xml $PKG_IMAGE/var/svc/profile/generic.xml
ln -s inetd_generic.xml $PKG_IMAGE/var/svc/profile/inetd_services.xml
ln -s platform_none.xml $PKG_IMAGE/var/svc/profile/platform.xml
# Set the environment variables for svccfg.
SVCCFG_DTD=${PKG_IMAGE}/usr/share/lib/xml/dtd/service_bundle.dtd.1
SVCCFG_REPOSITORY=${PKG_IMAGE}/etc/svc/repository.db
SVCCFG=/usr/sbin/svccfg
export SVCCFG_DTD SVCCFG_REPOSITORY SVCCFG
${SVCCFG} import ${PKG_IMAGE}/var/svc/manifest/milestone/sysconfig.xml 

Again this is one area I think OpenSolaris can improve a bit. It does some amount of research (google, Alex Eremin) before understanding how to set it up properly. (But then I am not a kernel engineer.)

5. Other Miscellaneous but important stuff to get a bootable system

The following are basically hacks in some ways to get to a bootable system. 

# Set TimeZone 
echo "$TIMEZONE" >⁞ $PKG_IMAGE/etc/TIMEZONE
echo $HOSTNAME > $PKG_IMAGE/etc/nodename 
# configure our new /etc/vfstab
printf "rpool/ROOT/VOSApp -\t/\t\tzfs\t-\tno\t-\n" >> $PKG_IMAGE/etc/vfstab
printf "/dev/zvol/dsk/rpool/swap\t-\t-\t\tswap\t-\tno\t-\n" >> $PKG_IMAGE/etc/vfstab
chmod a+r $PKG_IMAGE/etc/vfstab
# turn off root as a role
printf "/^root::::type=role;\ns/^root::::type=role;/root::::/\nw" |\
ed -s $PKG_IMAGE/etc/user_attr
# Edit etc/ssh/sshd_config to allow ssh to root account
printf "/^PermitRootLogin no\ns/^PermitRootLogin no/PermitRootLogin yes/\nw" |\
ed -s ${PKG_IMAGE}/etc/ssh/sshd_config 
# Generate ssh keys
ssh-keygen -t dsa -f $PKG_IMAGE/etc/ssh/ssh_host_dsa_key -N ''
ssh-keygen -t rsa -f $PKG_IMAGE/etc/ssh/ssh_host_rsa_key -N ''

6. Finally the boot archive and  grub

# configure /dev in the new image
devfsadm -R $PKG_IMAGE
bootadm update-archive -R $PKG_IMAGE
$PKG_IMAGE/boot/solaris/bin/update_grub -R $PKG_IMAGE
# For zfs root, menu.lst has moved to /rpool/boot/grub/menu.lst. # 
# create the new real grub menu
cat <⁞<-EOF > /rpool/boot/grub/menu.lst
default 0
timeout 10
splashimage /boot/grub/splash.xpm.gz
title  Appliance based on OpenSolaris 2008.11 
findroot (pool_rpool,0,a)
bootfs rpool/ROOT/VOSApp
kernel\$ /platform/i86pc/kernel/\$ISADIR/unix  -B \$ZFS-BOOTFS
module\$ /platform/i86pc/\$ISADIR/boot_archive
EOF
# make the grub menu files readable by everyone.
chmod a+r $PKG_IMAGE/boot/grub/menu.lst
chmod a+r /rpool/boot/grub/menu.lst
# setup /etc/bootsign so that grub can find this zpool
dir -p /rpool/etc>
echo pool_rpool > /rpool/etc/bootsign
zfs set mountpoint=/ rpool/ROOT/VOSApp
zfs set compression=off rpool 


Hope this makes it easier for  someone thinking of making their own appliances based on OpenSolaris.

NOTE: There was fair number of people who did download the images. I did accidently lose the OpenSolaris image once after it was downloaded 75 times.  But you can track the images directly at mediacast.sun.com/tags/appliance



Few days ago I talked about a Postgres 8.3 Appliance based on OpenSolaris. Today lets look at how to use that appliance image to get an Openbravo ERP 2.40 appliance based on OpenSolaris in VirtualBox.

Download the Postgres 8.3 Appliance OVF image and unzip the two files. Fire up VirtualBox 2.2 and use File->Import Appliance and point it to the .ovf file  from the zip file. Change the networking from NAT to "Bridged Network" and start the VM and soon you get "postgresdb login:" screen.  Use root/opensolaris to login into the system and verify that postgres instance is already running as follows:

# svcs -a |grep postgres
disabled       19:08:00 svc:/application/database/postgresql_83:default_64bit
online         19:08:23 svc:/application/database/postgresql_83:default_32bit

The default options of postgresql.conf are pretty low so bump them up slightly

# vi /var/postgres/8.3/data/postgresql.conf

shared_buffers=128MB
wal_buffers=128kB
checkpoint_segments=16
listen_addresses='*'

# svcadm restart svc:/application/database/postgresql_83:default_32bit

 Now import other required dependencies for Openbravo ERP 2.40

# pkg install SUNWj6dev SUNWant SUNWtcat
DOWNLOAD                                    PKGS       FILES     XFER (MB)
SUNWj6dev                                    0/4     25/4756    1.08/84.90

Make sure that your newly installed tomcat setup has a valid server.xml file or copy it from an example file included. 

# cp /var/apache/tomcat/conf/server.xml-example /var/apache/tomcat/conf/server.xml

 Now download Openbravo ERP 2.40 installer as follows:

# pkg install SUNWwget

# wget "http://voxel.dl.sourceforge.net/sourceforge/openbravo/OpenbravoERP_2.40-solaris-intel-installer.bin"

# chmod a+x OpenbravoERP_2.40-solaris-intel-installer.bin

# ./OpenbravoERP_2.40-solaris-intel-installer.bin

 And use the following options:

  • /opt/OpenbravoERP | /var/OpenbravoERP/AppsOpenbravo/attachments
  • Complete |Standard | /usr/jdk/latest | /usr/bin/ant 
  • /var/apache/tomcat
  • PostgreSQL
  • /usr/postgres/8.3/bin
  • localhost     5432
  • (Enter password for postgres user as "postgres" twice)
  • openbravo    tad     (Enter password for tad user  twice)
  • Context name: openbravo
  • Date format: DD MM YYYY, Date Separator -, Time format 24h, Time Separator :
  • Demo data: Y or N depending on your preferences

After the information the installation GUI takes quite a bit of time to complete specially if you select to load the demo data. (Hope you made changes to PostgreSQL before to tune this loading.)

Once the installation completes  start tomcat as follows

# /usr/apache/tomcat/bin/startup.sh

Now from any other machine (or host machine) fire a browser and enter the IP address of the VM with port 8080 and uri openbravo and you now have a virtual VM with Openbravo running

http://myVMipaddress:8080/openbravo

The login screen for Openbravo should appear. Use Openbravo as username and openbravo (all lower case) as password to login and set it up for your business.



One of the things that I always miss in OpenSolaris is text install and the primary reason for using this is in VirtualBox where I want to install just enough OpenSolaris to do a particular task with it like Database Server, Drupal, Java Application Server, etc (but more server like tasks). I really don't need GNOME or any other desktop utilities and it is primary a waste of space if I really have to do it for multiple VMs.

I have created a new OVF image which is less than 270MB size and can be used to play with the OpenSolaris kernel in VirtualBox 2.2 and if you like it can install the rest of the OpenSolaris Desktop to get to the default full blown installation of OpenSolaris 2008.11

Download the OpenSolaris App OVF image, unzip it and just use VirtualBox->File->Import Appliance to import the image into a new VirtualBox VM and you are ready to boot into OpenSolaris kernel. The preset login information is user: root with password: opensolaris.

TIP: If you like this minimal install and would like to try out the full fledge Desktop install just execute the following command

# pkg install slim_install

This will take a long time since it installs packages of another 600MB on top of the installation. But it will be pretty much the same set of packages that comes with OpenSolaris 2008.11 CD.  (Actually I expected "# pkg install entire" to do the intended task but unfortunately it just installed "entire" and not its dependencies. Must be a bug somewhere with "entire". )

 If you are interested in your own custom OVF image, here is how I have created the image:

  • For the time being this requires the  OpenSolaris 2008.11 CD image (650+ MB) only to execute my script create_solaris_appliance.sh which is about 5KB.
  • Download Virtualbox and install it
  • Create a New Virtual Box VM with following parameters in the Wizard Screen
    • Call it  My OpenSolaris Appliance
    • Select OS Type "OpenSolaris" 
    • Base Memory  768MB or increase it to 1GB if you can spare your RAM for it
    • Create a New Dynamic Expanding Image with exactly 16.00 GB (Any other size may not work)
  • Once the VM is created, immediately  click the blue Network link and modify it to select a "Bridged Network"  (in the setup make sure it is connected to the active host interface - wired or wireless depending on the host system)
  • Also Click the CD-ROM image and point it to the osol-200811.iso image that you downloaded earlier
  • Boot up the VM and select the first LiveCD option in the Grub Menu options
  • Select through the defaults to get the full Gnome Desktop
  • Open Firefox in the VM and make  sure your VM  has internet access
  • Open a terminal window and execute the following commands in sequence
    • wget "http://blogs.sun.com/jkshah/resource/create_solaris_appliance.sh"
    • time pfexec sh -x create_solaris_appliance.sh
    • pfexec halt
  • At this point stop the VM and disconnect the CD image connected to the VM and use VirtualBox->File->Export Appliance Wizard to export the image
It takes about 12 minutes to create an image of about 270MB using the above steps with a decent internet connection and decent desktop.

If you want to own customized Image, take a look at create_solaris_appliance.sh and create_pg_appliance.sh and create your own versions of it for say Drupal, Ruby, Glassfish, etc.


With the release of VirtualBox 2.2 today, exporting and importing appliances becomes lot easier. The new features of VirtualBox 2.2 includes import and export of appliances based on OVF or Open Virtualization Format. 

One of  my earlier blog entry talks about creating appliances. Using the same script mentioned in the earlier blog entry,  I now have a OVF Image  which will work with VirtualBox 2.2  "Import Appliance"  and make it easier to try a virtual PostgreSQL 8.3 database appliance running OpenSolaris under the cover.

The PostgreSQL 8.3 Appliance Image is less than 330MB (unlike a full install of OpenSolaris which could take more than 1500MB).  Just unzip the files in a directory and import them through VirtualBox 2.2 using File->Import Appliance Wizard.  Since an appliance (database in this case) is typically accessed through external application servers, you will want to use Bridge Network instead of NAT.  Detailed instructions to make it accessible (both VirtualBox VM as well as PostgreSQL 8.3)  from external clients are published on Postgres 8.3 Appliance page.

Finally  it is lot easier to finally see the following grub menu on your own VirtualBox 2.2:

 Also for PostgreSQL users/developers who are not familiar with OpenSolaris,  this VM  allows you to try the DTrace probes and ZFS snapshots with PostgreSQL 8.3. More information related to it is  coming soon on Postgres 8.3 Appliance page. Also the PostgreSQL DTrace Toolkit 2009.03.29 is available on PGFoundry.org.

Tip: You can also use my PostgreSQL Monitor Demo to connect to it:


As usual if you have questions, I will be happy to answer them.

I presented a talk at PostgreSQL East 2009 at Drexel University today morning.  The topic was "Effects of Flash/SSDs on PostgreSQL".  Here are the slides from the presentation

If you have questions please leave comments.

During last year's PgCon 2008 I had presented about "Problems with PostgreSQL on Multi-core Systems".  On slide 14 I talked about the results with IGEN with various think times and had identied the problem of how it is difficult to scale with increasing number of users. The chart showed of how 200ms think time tests will saturate about 1000  active users and then throughput starts to fall.  On slide 16 & 17  I identified ProcArrayLock as the culprit of why scalability tanks with increasing number of users. 

Today I was again intrigued by the same problem as I was trying out PostgreSQL 8.4 builds and once again hit the bottleneck at about 1000 users and frustrated that PostgreSQL cannot scale even with only 1 active socket (64 strands)  of Sun Enterprise SPARC T5240 which is a 2 socket UltraSPARC T2 plus system.  

Again I was digging through the source code of PostgreSQL 8.4 snapshot to see what can be done. While  reviewing lwlock.c I thought of a change  and quickly  changed couple  of lines in the code and recompiled the database binaries and re-ran the test. The results are as show below (8.4 Vs 8.4fix)

The setup was quite standard (as how I test them). The database and log files are on RAM (/tmp). The think times were about 200ms for each user between transactions.  But the results was quite pleasing. On the same system setup,  TPM throughput went up about 1.89x up and response time now shows a more gradual increase rather than a knee-jerk response.

So what was the change in the source code? Well before I explain what I did, let me explain what is PostgreSQL trying to do in that code logic.

After a process acquires a lock and does its critical work, when it is ready to release the lock, it finds the next postgres process that it needs to wake up so effectively not causing starvation for processes trying to do exclusive locks  and also I think it is trying to avoid a Thundering Herd problem. It is a good thing to do for single core systems since CPU resources are sacred and effective usage results in better performane. However on systems with many CPU resources (say like 256 threads of Sun SPARC Enterprise T5440) this ends up artificially bottlenecking since it is not the system but the application determining which next process should wake up and try to get the lock.

The change I did was discard the selective process waiter wake-up and just wake up all waiters waiting for that lock and let the processes, OS Dispatcher, and CPU resources do its magic on its own way (and that worked,  worked very well.)


          //if (!proc->lwExclusive)
           if (1)
           {
                            while (proc->lwWaitLink != NULL &&
                                         1)
                                          // !proc->lwWaitLink->lwExclusive)
                                       proc = proc->lwWaitLink;
           }


Its amazing that last year I had tried so many different approaches to the problem and a mere simple approach proved to be more effective.

I have put a proposal to the PostgreSQL Performance alias that a postgresql.conf tunable be defined for PostgreSQL 8.4 so people can tweak their instances to use the above method of awaking all waiters without impacting the existing behavior for most existing users.

In the meantime if you do compile your own PostgreSQL binaries, try the workaround if you are using PostgreSQL on ay 16 or more cores/threads system and provide feedback about the impact on your workloads.


Ever since I had upgraded my workstation to Solaris 10 10/08 (Update 6) on my Sun Blade 2000 I had noticed that my patch update connection had not worked.Initially it would give me some weird Cocoa error or sometimes an empty dialog box with just OK in it.

Toda finally I was determined to fix it so I can apply the latest patches automatically instead of downloading them manually. Anyway the trick is to read these two Sun Forum entries.

  1. Entry regarding the Solaris 10 patch  that was released in January 2009
  2. Entry regarding the correct Java version that is required by underlying smpatch. 

In short in my case I had to install patch 121118-15 (for SPARC)  and also remove references to JDK 1.6_10 in my /usr/jdk so that I can make my update manager work again on my desktop. Finally my useful update manager is now working on Solaris 10 10/08 to find and install recent patches to the system.

Sometimes you need a kickstart to get one moving from inertia. Considering that I haven't blogged in 2009 at all, I welcome the PostgreSQL Conference East 2009  email reminders to get me going again.  That will motivate me to do some targetted work with PostgreSQL for the community. I will be presenting on SSDs and PostgreSQL during the event. Unlike Robert Treat who can pretty much prepare all his slides during the lunch break of the conference, some of us require more time.  As a reminder PostgreSQL East 2009 registration has already started. 

 I am also looking forward to PostgreSQL 8.4 beta and test drive it on 128-threaded Sun SPARC Enterprise  T5240.



DTrace probes were introduced in PostgreSQL starting in 8.2. Initially, only a handful were added, and they were mostly for developers. In 8.4, many more (46 to be exact) have been added, and they are targeted more toward database administrators as shown below.

query-parse-start(const char *)
query-parse-done(const char *)
query-rewrite-start(const char *)
query-rewrite-done(const char *)
query-plan-start()
query-plan-done()
query-execute-start()
query-execute-done()
query-start(const char *)
query-done(const char *)
statement-status(const char *)
sort-start(int, bool, int, int, bool)
sort-done(unsigned long, long)
buffer-read-start(ForkNumber, BlockNumber, Oid, Oid, Oid, bool)
buffer-read-done(ForkNumber, BlockNumber, Oid, Oid, Oid, bool, bool)buffer-flush-start(Oid, Oid, Oid)
buffer-flush-done(Oid, Oid, Oid)
buffer-hit(bool)buffer-miss(bool)
buffer-checkpoint-start(int)
buffer-checkpoint-sync-start()
buffer-checkpoint-done()
buffer-sync-start(int, int)
buffer-sync-written(int)
buffer-sync-done(int, int, int)
buffer-write-dirty-start(ForkNumber, BlockNumber, Oid, Oid, Oid)
buffer-write-dirty-done(ForkNumber, BlockNumber, Oid, Oid, Oid)
deadlock-found()
checkpoint-start(int)
checkpoint-done(int, int, int, int, int)
clog-checkpoint-start(bool)
clog-checkpoint-done(bool)
subtrans-checkpoint-start(bool)
subtrans-checkpoint-done(bool)
multixact-checkpoint-start(bool)
multixact-checkpoint-done(bool)
twophase-checkpoint-start()
twophase-checkpoint-done()
smgr-md-read-start(ForkNumber, BlockNumber, Oid, Oid, Oid)
smgr-md-read-done(ForkNumber, BlockNumber, Oid, Oid, Oid, const char *, int, int)
smgr-md-write-start(ForkNumber, BlockNumber, Oid, Oid, Oid)
smgr-md-write-done(ForkNumber, BlockNumber, Oid, Oid, Oid, const char *, int, int)
xlog-insert(unsigned char, unsigned char)
xlog-switch()
wal-buffer-write-dirty-start()
wal-buffer-write-dirty-done()

Documentation should be available in 8.4 doc page soon, but if you don't want to wait, check out the doc patch I recently submitted. If you're using or plan to use the probes, I'd love to hear your feedback, both positive and constructive!

Special thanks to Theo Schlossnagle, Robert Treat, Zdenek Kotala, Alvaro Herrera and Simon Riggs for their contributions with the probes as well as reviewing them.

There are some basic, golden rules when it comes to having a vibrant community of contributors.

The following are rules I have extracted and learned based on my experience managing and working with engineers actively involved and participating in the Apache/Derby, PostgreSQL and MySQL open-source communities. These rules are also based on extensive discussions with many folks involved with the MySQL community, with the PostgreSQL community and with the Apache/Derby (Java DB) community, over many years.

Before I go through these rules, I would like to thank Marten Mickos for having suggested some of the headings for these rules. (I originally had much longer headings for all of them.) I would also like to thank many of MySQL, PostgreSQL and Java DB colleagues, as well as to many other colleagues involved in open-source development, for having contributed to the ideas and practices behind these rules.

A) Transparency.
1.Often, this openness can span all the way from development (architectural specification, implementation design and planning, implementation, code review and walk-through) to testing, qualification and release.
2.It may be possible to move towards greater transparency over time but openness in development is often the minimum starting point. 

B) Dialog.
1.It should be possible to conduct open dialog and conversation regarding any aspect of the development (and other aspects of) work.
2.When mailing lists and other archive-able communication channels (such as wikis) focused on development work are opened up, it becomes easier to conduct open dialog and conversation regarding the development work. 
3.Of course, when a corporation or business concern contributes (either as a major contributor or a minor contributor) to the development of an open-source product, it is to be expected that some aspects of the development work (e.g. those related to specific customer needs) may remain obscure through mechanisms such as withholding of a customer's name. 

C) Pace.
1.It should be possible to track the fate of any contribution and have a public archive of the conversation conducted regarding that contribution—recording decisions made and various feedback loops in time for the purposes of learning and further work.
2.For this purpose, it is often sufficient to have a time record of the conversation conducted with respect to the given contribution.
3.These records can be searched to determine the fate of the contribution.
4.These records help provide a learning platform for the future contributors.

D) Setting Expectations.
1.Using available and open information, the contributor community should be able to form and entertain valid expectations regarding milestones, releases, timelines, etc.
2.Anticipating the future and related risk management helps all market participants to reduce transaction costs.

E) Small is Beautiful.
1.While it should be possible to absorb contribution of any size, emphasis should be put on  absorbing smaller and incremental contributions.
2.To create mass and momentum and community and quality, it helps to encourage smaller contributions.

F) Differences.
1.Not all contributions are equal.
Contributions are judged by whether they are well designed, fit into business roadmaps, are well documented, comply with standards, do not produce regressions in the code and improve performance.
2.Not all contributors are equal.
Contributors vary in expertise, skill and experience.
These variations give meaning to the practices and procedures of the contributor community.

G) Places.
1.It is clear where one needs to work.
There are enough branches or trees to serve distinctly different target groups.
2.Trees and branches are well-groomed.
Active code branches or trees are kept at a minimum set in order to keep the product roadmap and expectations coherent.

H) Parallelism.
1.Contributions are added in parallel with frequent synchronization so that community participants can respond to each others' work.Parallel work leads—naturally and out of brute necessity—to modularization, better and faster integration.

I) Incrementalism.
1.Work is conducted in increments.
2.Each contribution does one thing.
3.Each contribution has a test case that exercises it.

J) Learning.
1.Contributor community assets (channels of communications, forums, bug databases, etc.) are developed to improve learning by all participants and contributors.


Acknowledgment

I'd like to thank Brian Aker, Knut Anders Hatlen, Davi Arnaut, Kaj Arnö, Jorgen Austvik, Igor Babaev, Mark Callaghan, Peter Eisentraut, Sergei Golubchik, Shawn Green, Lenz Grimmer, Rick Hillegas, Stefan Hinz, Geir Hoydalsvik, Henrik Ingo, Alexey Kopytov, Mark Leith, Dmitry Lenev, Manyi Lu, Giuseppe Maxia, Paul McCullagh, Mårten Mickos, Chad Miller, Francois Orsini, Konstantin Osipov, Trudy Pelzer, Sergey Petrunia, Jay Pipes, Jeffrey Pugh, Ole Solberg, Georg Richter, Mikael Ronström, Kristian Waagan, Dag Wanvik, Monty Widenius, Jeff Wiss, and more.

Based on Alex Eremin's blog entry on minimal script and some of my own hacking,  I have a script that should allow to create a basic PostgreSQL 8.3 Appliance using OpenSolaris 2008.11 Kernel  in Virtualbox easily with an image which is not bloated (No X, Gnome, etc).

Here is how to try it out:

  • For the time being this requires the  OpenSolaris 2008.11 CD image (650+ MB) only to execute my script create_pg_appliance.sh which is about 6KB. Bear with me till I solve this problem and play along :-)
  • Download Virtualbox and install it
  • Create a New Virtual Box VM with following parameters in the Wizard Screen
    • Call it PostgreSQL 8.3 Appliance
    • Select OS Type "OpenSolaris" 
    • Base Memory  768MB or increase it to 1GB if you can spare your RAM for it
    • Create a New Dynamic Expanding Image with exactly 16.00 GB (Any other wise may not work)
  • Once the VM is created, immediately  click the blue Network link and modify it to select a "Host Based Network"  (in the setup make sure it is connected to the active host interface - wired or wireless depending on the host system)
  • Also Click the CD-ROM image and point it to the osol-200811.iso image that you downloaded earlier
  • Boot up the VM and select the first LiveCD option in the Grub Menu options
  • Select through the defaults to get the full Gnome Desktop
  • Open Firefox in the VM and make  sure your VM  has internet access
  • Open a terminal window and execute the following commands in sequence
    • wget "http://blogs.sun.com/jkshah/resource/create_pg_appliance.sh"
    • pfexec sh -x create_pg_appliance.sh
    • pfexec reboot
  • At this point stop the VM and disconnect the CD image connected to the VM and boot again or  select the  grub entry
    • Boot from Hard disk
  • Once the VM boots from the hard disk, select the  grub menu item
    • PostgreSQL 8.3 Appliance based on OpenSolaris 2008.11
  • The Operating system boots up to give a login prompt. Login as root/opensolaris
  • The PostgreSQL database server is already running with PGDATA in /var/postgres/8.3/data
  • Modify postgresql.conf  to add  listen-address='*' parameter
  • Modify pg_hba .conf to  allow your clients in pg_hba.conf
  •  Restart the postgresql server instance as follows:
    • svcadm restart postgresql_83:default_32bit
  • Now the PostgreSQL Appliance is ready to be connected from your clients.
  • Remember to take snapshots of rpool/VOSApp/var so you can always revert back to earlier copies if you ever need it.
Image of Grub Menu

Note: In a trial installation on a MacBook Pro the script executed in  about 11 minutes which  includes the time it takes to download packages from pkg.opensolaris.org.  I thought that was fantastic.

Maybe I should also create a minimized LiveCD just to execute the script on Virtualbox.  If you have problems executing the script let me know.



This year CommunityOne is being held also in New York (March 18,  2009) in addition to the CommunityOne held in San Francisco (June 1, 2009). CommunityOne is a conference sponsored by Sun Microsystems focused on open source innovation and implementation. It brings together developers, technologists and students for technical education and exchange.

The Call for Paper is on and been extended to December 19.  Since quite a bit of PostgreSQL crowd is in east coast maybe we should also put in some PostgreSQL proposals atleast for the New York event in March 2009. Any takers/suggestions?

Here is what that comes to my mind instantaneously :


Others that I missed?


The second supported version of OpenSolaris called OpenSolaris 2008.11 is now officially launched. With the release, I guess now PostgreSQL 8.3 is now "officially" supported on OpenSolaris too (though you could have used it prior  anyway with the development releases of OpenSolaris after the initial OpenSolaris 2008.05 was released).

 Anyway let's quickly look at some tips and how-to which I think could be useful to improve the experience of PostgreSQL 8.3 with OpenSolaris 2008.11.  One way to take a quick drive of OpenSolaris 2008.11 is to also try out the new VirtualBox 2.0.6 and install OpenSolaris 2008.11 in a VM. (Note: If you do plan to access the VM from other systems then use Host Interface instead of NAT to make it easy to access it from outside.)

PostgreSQL 8.3 is not already installed once you install OpenSolaris 2008.11 from the LiveCD. This gives an opportunity for a tweak that I would highly recommend. By default OpenSolaris 2008.11 does not have option for separate ZFS dataset for /var (like Solaris 10 10/08 - U6 does). I would certainly like my PostgreSQL database to be on a separate dataset that I can control PostgreSQL snapshots  independently. Since I know the default PostgreSQL database path for PostgreSQL 8.3 on OpenSolaris is /var/postgres/8.3 I generally create a separate dataset as follows before I use pkg script to install PostgreSQL 8.3

# zfs create -o mountpoint=/var/postgres rpool/postgres

Now I am ready to install the various packages of PostgreSQL 8.3

# pkg install SUNWpostgr-83-server

# pkg install SUNWpostgr-83-client SUNWpostgr-jdbc SUNWpostgr-83-contrib

# pkg install SUNWpostgr-83-docs  SUNWpostgr-83-devel

# pkg install SUNWpostgr-83-tcl SUNWpostgr-83-pl

Now to install PGAdmin3 as follows:

# pkg install SUNWpgadmin3

Even pgbouncer is available in the repository

# pkg install SUNWpgbouncer-pg83

Now that all binaries are installed, let's look on how to get started. Generally the best way to start and stop PostgreSQL 8.3 server is via the svcadm command of OpenSolaris. The SMF manifest for PostgreSQL 8.3 is installed with SUNWpostgr-83-server, however the import of the script does not happen till the next reboot. We can always work that around by doing a quick manual update as follows:

# svccfg import /var/svc/manifest/application/database/postgresql_83.xml

This creates two entries one for 32-bit server instance and one for 64-bit server instance.

# svcs -a |grep postgres
disabled        1:14:44 svc:/application/database/postgresql_83:default_64bit
disabled        1:14:45 svc:/application/database/postgresql_83:default_32bit

Depending on your server/choice you can start the corresponding server. Or 32-bit instance may be just easier to select if there are doubts.

# svcadm enable postgresql_83:default_32bit

# svcs -a |grep postgres
disabled        1:14:44 svc:/application/database/postgresql_83:default_64bit
online          2:12:37 svc:/application/database/postgresql_83:default_32bit

# zfs list rpool/postgres
NAME             USED  AVAIL  REFER  MOUNTPOINT
rpool/postgres  30.4M  11.8G  30.4M  /var/postgres

The client psql is still not in the default path. The path /usr/postgres/8.3/bin should be in your search PATH. (or if you are using 64-bit add /usr/postgres/8.3/bin/64).

# /usr/postgres/8.3/bin/psql -U postgres postgres
Welcome to psql 8.3.4, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

postgres=#

For people who are using PostgreSQL 8.2 on OpenSolaris 2008.05 already, there is a way to get to PostgreSQL 8.3. First you have to update your OpenSolaris 2008.05 to OpenSolaris 2008.11 image using

# pkg image-update

Then install PostgreSQL 8.3 binaries as above and also install an additional package

# pkg install SUNWpostgr-upgrade

If there are more question feel free to leave a comment.




With the release of JavaFX 1.0, a new version of Netbeans 6.5 is now also available. Let's quickly look at its current PostgreSQL Support.

The JDBC driver for PostgreSQL is integrated in Netbeans 6.5. It is  located at:

netbeans-6.5/ide10/modules/ext/postgresql-8.3-603.jdbc3.jar

As you can see it is pretty much upto sync to the latest (about a rev back since the latest is 604 build).  Clicking the Services tab of Netbeans shows Database where a new entry can be created using using the "Direct URL Entry" using the format as follows:

jdbc:postgresql://hostname/databasename

providing the user name and password for the connection and then the schema name (hint: public schema) in the Advanced Tab and press OK to create a new connection to PostgreSQL database.

Once connected clicking the expand sign which looks like  o- can  be expanded to see the Tables/Views/Procedures and more expansions can be done on Tables to see  data either using menu  options or using the SQL Editor.

With the PostgreSQL setup in place now it should be easier to program with your favorite language be it Java  or JavaFX , Python, PHP, Ruby, etc using PostgreSQL database as the backend.

( Note for myself:  Try out Greg Smith's pgtune project which uses Python with the new Netbeans's Python editor.)



Over the years I have accumulated quite a bit of digital content (music, pictures, videos, archived tax returns, etc) and have used external drives, CDs/DVDs, and even USB sticks to backup the data. After a while, it has become increasing inconvenience to locate the content let alone share it with multiple computers, so I decided to shop for a home NAS. But which one? There are quite a few of them on the market with varying prices ($200 to $2000+) and features. I liked the Netgear's ReadyNAS, but it was quite pricy. With 2TB of storage, it was around $1,500 at the time. This costed more than a PC with the same amount of storage, so I thought to myself, "Why not use a PC with ZFS."

After some Googling, I found a number of people had already done exactly this, and Simon's blog was particularly helpful , especially for the system config that worked well for him. So, I decided to build a similar system. I ordered the components from Newegg.com, and the total price was less than $900 for a pretty powerful system (2.6 Ghz AMD Athlon, 2GB ECC RAM, 4x500GB WD drives). It took me a couple of days to put the system together, install OpenSolaris 2008.05, setup ZFS pools, and CIFS server/client, and voila, I have a system that can be used as a NAS as well as a general purpose server. Now, I have all my data in one place and be able to get to it from all the computers on the network, and be assured that if a drive fails, I won't lose data thanks to the RAID support in ZFS.

Hyperic just released Hyperic HQ 4.x, and I recently took it for a test drive on OpenSolaris 2008.05. The installation went smoothly, but I ran into one issue when starting the server. With a 64-bit kernel, the startup script assumes you have 64-bit JRE on the system. It my case, the 64-bit JRE wasn't available, so the server startup failed. I could have installed the 64 bit version, but I decide to just comment out the check in the startup script (hq-server.sh) to use the 32-bit version. Below are the details of the installation steps.

Before running the Hyperic HQ install script, setup the database first. Below are the commands I used for PostgreSQL. Run psql, and then run the following commands:

postgres=# create role admin with login createdb password 'hqadmin';
CREATE ROLE
postgres=# CREATE DATABASE "HQ" OWNER admin;
CREATE DATABASE
postgres=# 

Now, run the install script (assuming you're unpacked the download in /var/tmp) and note the answers I entered in bold.

-bash-3.2# /var/tmp/hyperic-hq-installer/setup.sh
Initializing Hyperic HQ 4.0.1 Installation...
Loading taskdefs...
Taskdefs loaded
Choose which software to install:
1: Hyperic HQ Server
2: Hyperic HQ Agent
You may enter multiple choices, separated by commas.
1,2
HQ server installation path [default '/home/hyperic']:
/usr/local/hyperic
Choices:
	1: Oracle 9i/10g
	2: PostgreSQL
	3: MySQL 5.x
What backend database should the HQ server use? [default '1']:
2
Enter the JDBC connection URL for the PostgreSQL database [default 'jdbc:postgresql://localhost:5432/HQ?protocolVersion=2']: Enter

Enter the username to use to connect to the database:
admin
Enter the password to use to connect to the database: enter password here
(again): enter password again
HQ agent installation path [default '/usr/local/hyperic']: Enter

Loading install configuration...
Install configuration loaded.
Preparing to install...
Validating agent install configuration...
Validating server install configuration...
Checking server webapp port...
Checking server secure webapp port...
Checking server JRMP port...
Checking server JNP port...
Checking database permissions...
Verifying admin user properties
Validating server DB configuration...
Installing the agent...
Looking for previous installation
Unpacking agent to: /usr/local/hyperic/agent-4.0.1...
Setting permissions on agent binaries...
Fixing line endings on text files...
--------------------------------------------------------------------------------
Installation Complete:
  Agent successfully installed to: /usr/local/hyperic/agent-4.0.1
--------------------------------------------------------------------------------

 You can now start your HQ agent by running this command:

  /usr/local/hyperic/agent-4.0.1/bin/hq-agent.sh start

Installing the server...
Unpacking server to: /usr/local/hyperic/server-4.0.1...
Creating server configuration files...
Copying binaries and libraries to server installation...
Copying server configuration file...
Copying server control file...
Copying server binaries...
Copying server libs...
Setting up server database...
Setting permissions on server binaries...
Fixing line endings on text files...
--------------------------------------------------------------------------------
Installation Complete:
  Server successfully installed to: /usr/local/hyperic/server-4.0.1
--------------------------------------------------------------------------------


 You can now start your HQ server by running this command:

  /usr/local/hyperic/server-4.0.1/bin/hq-server.sh start

 Note that the first time the HQ server starts up it may take several minutes
 to initialize.  Subsequent startups will be much faster.

 Once the HQ server reports that it has successfully started, you can log in
 to your HQ server at: 

  http://opensolaris.local:7080/
  username: hqadmin
  password: hqadmin

 To change your password, log in to the HQ server, click the "Administration"
 link, choose "List Users", then click on the "hqadmin" user.


Setup completed.
A copy of the output shown above has been saved to:
  /var/tmp/hyperic-hq-installer/installer-4.0.1/./hq-install.log

At this point the installation is completed, and the server can be started with the following line, but as you can see HQ failed to start.

-bash-3.2# /usr/local/hyperic/server-4.0.1/bin/hq-server.sh start
Starting HQ server...
Initializing HQ server configuration...
Checking jboss jndi port...
Checking jboss mbean port...
Setting -d64 JAVA OPTION to enable SunOS 64-bit JRE
Booting the HQ server (Using JAVA_OPTS=-XX:MaxPermSize=192m -Xmx512m -Xms512m -d64)...
HQ failed to start
The log file /usr/local/hyperic/server-4.0.1/logs/server.out may contain further details on why it failed to start.

Looking at the hq-server.sh script, I notice that it checks if you're running a 64-bit kernel. If so, it uses the 64-bit JRE. Since I only have 32-bit JRE on my system, I commented out the following section in hq-server.sh.

#if [ $THISOS = "SunOS" ] ; then
#       ARCH=`isainfo -kv`
#
#       case $ARCH in
#               *64-bit*)
#                 echo "Setting -d64 JAVA OPTION to enable SunOS 64-bit JRE"
#                       HQ_JAVA_OPTS="${HQ_JAVA_OPTS} -d64"
#                       ;;
#       esac
#fi

And rerunning the script, the server started up properly.

-bash-3.2# /usr/local/hyperic/server-4.0.1/bin/hq-server.sh start
Starting HQ server...
Removing stale pid file /usr/local/hyperic/server-4.0.1/logs/hq-server.pid
Initializing HQ server configuration...
Checking jboss jndi port...
Checking jboss mbean port...
Booting the HQ server (Using JAVA_OPTS=-XX:MaxPermSize=192m -Xmx512m -Xms512m)...
HQ server booted.
Login to HQ at: http://127.0.0.1:7080/
-bash-3.2#

At this point you can just hit the server by pointing the browser to http://127.0.0.1:7080/, and you should see the Portal Dashboard.

This is just a quick cheatsheet to get you started. The official installation/configuration instructions are available on Hyperic website.

Recently I burned a copy of OpenSolaris 2008.11 RC1 and used it with Songbird (using pkg install SUNWsongbird)  and Fluendo MP3 Decoder (which is free for OpenSolaris),  Flash and soon I had a setup in my bedroom which I love to call OpenSolaris Home Theatre Edition that I used to listen to my songs collection and watch online shows that I controlled via Remote Desktop  which to me was a serious contender to those Media Centers out there. 

However I realized that while I wanted to "Pump it up" for my personal usage, I really wanted to "Strip it Down" for business usage. What I meant is in order for someone to try it out with say another Open Source Software it is easier now to say try that Open Source Software pre-installed on OpenSolaris in a Virtualbox Image. However I found it very hard to do it in practice.

Say for example I want to get somebody to try out the latest version of MySQL on OpenSolaris. The easiest thing is to give them a VDI image of preinstalled version of MySQL 5.1 RC  running on OpenSolaris that somebody just double clicks and boom a virutalbox instance starts up with MySQL ready to deliver.

However there is a problem with that theory. The VDI image of OpenSolaris fresh install in a virtualbox instance is about 2.5 - 3 GB. Of course adding MySQL on top of it won't drastically increase the size but I still have a problem with the base size itself. Since the only way that this can work is to use some sort of Peer to Peer File Sharing technology as hosting and hoping people will download this DVD Size downloads without any problems is like going at 10:00am for a Thanksgiving Deal at a store that opens at 6:00am with less than 20 SKU available (Tough Luck!!).

Anyway I started dissecting on how to reduce the size of the VDI image. There are few tricks provided by Virtualbox itself to release zero'ed out sectors using

VBoxManage modifyvdi VDINAME compact

However trying this out on a VDI holding OpenSolaris 2008.11 RC1 candidate did not really help.

The next thing I tried was to list out all the packages that are installed using

pkg list

There are about 550 packages installed as reported by that list. I tried  removing many of the non-essential desktop things (Aha who needs Xorg, Gnome, Firefox, Thunderbird) and reduced it to less than 400 packages. However even with that the VDI images is still not much smaller than 2.5GB. I tried the trick of cloning the disk to another VDI via ZFS replace thinking it will get rid of all the freed up space obtained by removing all the packages but the resulting new VDI image was still 2.5GB.

Looking at ZFS list output I found that for my virtualbox instance with 512MB I have three main zfs file systems defined which gives me the approximate 2.5GB usage.

rpool/ROOT/opensolaris- The legacy mount which is root (/) which is about 1.3 GB

rpool/dump which is about 512MB

rpool/swap which is again about 512MB

Now the goal is to reduce the VDI size without creating complications for the end user trying out the VDI and also still have the capability of going back to the fully installed version of OpenSolaris.

Hence the quest still continues..


Continuing on my earlier blog entry on setup of PostgreSQL using Sun Storage 7000  Unified Storage (Amber Road), let's now look at how to take snapshots for such systems. 

Considering a typical PostgreSQL system there are two ways to take backup: 

Cold or Offline backup: Backing of PostgreSQL while PostgreSQL server is  not running. The advantage of such a backup is that it is simple to achieve the purpose of backing up the database. However the disadvantage is that the database system is unavailable for work.  The general steps for doing a cold backup is, shutdown gracefully the PostgreSQL server and backup all the files including $PGDATA, pg_xlog if it is in different location and all the tablespaces location used for that server instance. After backup the files, start the PostgreSQL server again and you are done. In case if you ever want to go back to that earlier backup version, you shutdown the database, restore all the old files and restart the database.  This is essentially the same logic which we will follow with snapshots except if you now have a multi gigabyte database, the snapshot operation  will look like million times faster than having to do a "tar" or "cpio" (specially if you consider that "tar" is a single threaded application doing its best but still slow if you use it on multi-core systems).

Lets consider an example on how to take a snapshot with OpenStorage systems with Cold or Offline backup strategy.

Assume that Sun Storage 7000  Unified Storage System  has the  hostname "amberroad" (Sun Storage 7000 Unified Storage Systems is mouthful ) have a project defined "postgres" which has three devices exported "pgdata" which holds $PGDATA, "pglog" which holds $PGDATA/pg_xlog and "pgtbs1" which is one of the tablespaces defined.

Then a backup/snapshot script  (as postgres user ) will look something similar to the following (Disclaimer:  Provided AS-IS. You will have to modify it for your own setups):

#!/bin/sh
date
echo "---------------------------------------------------------------"
echo "Shutting down the PostgreSQL database..."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA stop
echo "---------------------------------------------------------------"
echo "Deleting earlier snapshot called testsnap.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres snapshots select testsnap destroy
echo "---------------------------------------------------------------"
echo "Taking a new snapshot..."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots snapshot testsnap
echo "---------------------------------------------------------------"
echo "Verifying the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots show
ssh -l root amberroad shares select postgres snapshots select testsnap show
echo "---------------------------------------------------------------"
echo "Restarting the database.."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA start -l $PGDATA/server.log
echo "---------------------------------------------------------------"
date

Note: The testsnap should not exist otherwise snapshot operation will fail and hence we delete any earlier snaps called testsnap in the script. You may modify the script to suit your priorities.

Now in case you want to restore to the snapshot "testsnap" version
you can use a sample script as follows:

#!/bin/sh
date
echo "Restoring the database"
echo "---------------------------------------------------------------"
echo "Shutting down the database..."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA stop
echo "---------------------------------------------------------------"
echo "Restoring the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres select pgdata snapshots \
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pglog snapshots \
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pgtbs1 snapshots \
select testsnap rollback
echo "---------------------------------------------------------------"
echo "Restarting the database.."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA start -l $PGDATA/server.log
echo "---------------------------------------------------------------"
echo "Database restored to earlier snapshot: testsnap."
echo "---------------------------------------------------------------"
date

NOTE: When you use the above script, all snapshots taken after “testsnap” are lost as they are rolled back to the point in time when “testsnap” was taken. There is another feature to clone the snapshot incase if you don't want to use rollback.

Hot or  Online backup:  A Hot or Online backup is when one takes a backup while PostgreSQL server is running. The advantage is that the database server is available. The disadvantage is that the setup is more complex than the Cold or Offline backup strategy. The recommended way to use Hot or Online backup with PostgreSQL is to use it in conjuction with PITR - Point in Time Recovery feature of PostgreSQL.

This can be achieved by turning on continuous WAL archiving in PostgreSQL and using   SELECT pg_start_backup('label');before taking the snapshot and then issuing SELECT pg_stop_backup(); after the snapshot is completed.

For this you will also need another dedicated device under the "postgres" project to just hold the WAL archive files. (In my case I call it "pgwal" and mounted as /pgwal/archive 

To turn on continuous WAL archiving you need to set the following variables in postgresql.conf (and then restart the PostgreSQL server):

archive_mode = true
archive_command = 'test ! -f /var/lib/postgres/8.2/data/backup_in_progress || cp -i %p /pgwal/archive/%f < /dev/null'
archive_timeout=3600

Then a sample backup/snapshot with hot backup  script can look as follows:

#!/bin/sh
#PGDATA= /var/postgres/8.3/data; export PGDATA # if not set 
date
echo "---------------------------------------------------------------"
echo "Indicating PostgreSQL for a hot Snapshot.."
echo "---------------------------------------------------------------"
touch $PGDATA/backup_in_progress
psql  -c “select pg_start_backup('MyHotSnapShot');” postgres
echo "---------------------------------------------------------------"
echo "Deleting earlier snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres snapshots select testsnap destroy
echo "---------------------------------------------------------------"
echo "Taking a new snapshot..."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots snapshot testsnap
echo "---------------------------------------------------------------"
echo "Verifying the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots show
ssh -l root amberroad shares select postgres snapshots select testsnap show
echo "---------------------------------------------------------------"
echo "Indicating to PostgreSQL Server that Hot Snapshot is done."
echo "---------------------------------------------------------------"
psql  -c “ select pg_stop_backup();” postgres
rm $PGDATA/backup_in_progress
echo "---------------------------------------------------------------"
date

Restoring from a hot backup snapshot is also bit tricky since we need to decide whether we need to roll forward and upto what point. It is probably beyond the scope of my post and hence I will just do a simple one with all files in the wal arhcives.

First of all restoring with PITR requires a recovery.conf file which should contain the path to the WAL Archives:

restore_command = 'cp /pgwal/archive/%f "%p"'

Also since we will be doing rollforward with the archive files,  we need to remove all the old pg_xlog files, create an archive status directory, remove the postmaster.pid file and an artifact of our earlier script, remove the backup_in_progress marker file too.

(We will do the above steps also as part  of our restore script.)

A restore snapshot script file can look something like as follows:

#!/bin/sh
#PGDATA= /var/postgres/8.3/data; export PGDATA # if not set 
date
echo "Restoring the database"
echo "---------------------------------------------------------------"
echo "Shutting down the database..."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA stop
echo "---------------------------------------------------------------"
echo "Restoring the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres select pgdata snapshots \
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pglog snapshots \
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pgtbs1 snapshots \
select testsnap rollback
#if there is reason to rollback walarchives then uncomment the next two lines
#ssh -l root amberroad confirm shares select postgres select pgwal snapshots \
#select testsnap rollback
echo "---------------------------------------------------------------"
echo "Make sure pg_xlog/* is empty ..."
echo "Make sure pg_xlog/archive_status exists..."
echo "Make sure postmaster.pid is removed"
echo "---------------------------------------------------------------"
rm $PGDATA/backup_in_progress
rm $PGDATA/postmaster.pid
rm $PGDATA/pg_xlog/*
mkdir -p $PGDATA/pg_xlog/archive_status
echo "---------------------------------------------------------------"
echo "Set recovery.conf for recover.."
echo "---------------------------------------------------------------"
echo restore_command = \'cp /pgwal/archive/%f "%p"\' > $PGDATA/recovery.conf
echo "---------------------------------------------------------------"
echo "Restarting the database.."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA start -l $PGDATA/server.log
echo "---------------------------------------------------------------"
echo "Database restored."
echo "---------------------------------------------------------------"
date

Of course the above scripts may not be best but gives an idea of how one can use snapshot features provided by Sun Storage 7000  Unified Storage Systems with PostgreSQL.



Somehow I must have missed this  try and buy offer before.  Evaluate the Sun Fire X4600 with PostgreSQL, MySQL or Microsoft SQL Server 2005. The Sun Fire X4600 M2 Server included in the program is available in two configurations. First configuration is  4 x Quad-Core with 16GB RAM and second configuration is with 8x Quad-Core (yes 32 cores in all) with 64GB RAM. Of course it only comes with two disks in it which means also need to "Try and Buy" the Sun StorageTek 2540 Array or the Sun Storage 7110 Unified Storage System with it.

Would love to see some real life deployment saturate 32-core Opteron system using only PostgreSQL or MySQL as the database system in it. 

Sun just announced a new line of storage appliances that will forever change the NAS appliance market. By using OpenSolaris and technologies such as ZFS, DTrace, FMA and SMF with standard-based storage server, Sun is able to bring the products to market at a much cheaper price point than its competitor and yet provide more differentiated features such as:

  1. Analytics - With DTrace as the underlying technologies, Analytics allow users to ask questions about the storage system in production in realtime from a GUI.
  2. Hybrid Storage Pool - Leveraging the Flash technology, the ZFS Hybrid Storage Pool transparently manages DRAM, Flash, and low-cost hard drives, providing a high performance and cost effective storage solution.
  3. Remote Replication - Data can easily replicated with minimum configuration from the browser using the powerful ZFS features.
  4. Supported Protocols: NFS, CIFS, iSCSI, HTTP, WebDAV, FTP

To get more technical information about these exciting new NAS appliances, checkout the blogs and demos from the engineering team.

BTW, you can also try out the products for 60 days for FREE.

The New Sun Microsystems  announced a new line of OpenStorage devices. The new Sun Storage 7000 Unified Storage Systems (code name Amber Road) are Storage appliances which uses ZFS as the core technology and DTrace as the means of "understanding" what's happening with the devices. This now allows systems which DO NOT have ZFS capabilities  (ala Linux, Windows, etc) to use and gain  benefit from the features of ZFS and DTrace (albeit limited to the "Storage" part only).

PostgreSQL (on Linux,Solaris, Windows) can use such systems through one of the standard protocols that these OpenStorage devices support

  • NFS
  • CIFS (Only on Windows)
  • iSCSI

PostgreSQL on NFS is already discussed at length (both merits and also the demerits of using NFS with PostgreSQL ) in the PostgreSQL community. Also I haven't met anybody yet who said that they are using PostgreSQL with their database on CIFS mounted devices. So I think one might actually select iSCSI as the favorable protocol for such OpenStorage devices with PostgreSQL. (Correct me via comments if you think otherwise.)

Here are quick steps that I had used to configure the PostgreSQL server (running Solaris 10) to use iSCSI devices exported from Sun Storage 7000 Unified Storage Platform

 In my case I had created two iSCSI LUN devices (one for PGDATA and one for pg_xlog) with file permissions for the postgres user. (This setup is generally done using the BUI - Browser User Interface of the OpenStorage "Amber Road" device.)

Now on the PostgreSQL Server I did the following to setup iSCSI initiators and LUNS:

  1. Add and display iSCSI target discovery address by giving the IP of the "Amber Road" system.

    # iscsiadm add discovery-address 10.9.168.93

    # iscsiadm list discovery-address

    Discovery Address: 10.9.168.93:3260

  2. Display iSCSI targets discovered from the Amber Road system

    # iscsiadm list discovery-address -v 10.9.168.93

    Discovery Address: 10.6.140.151:3260

    Target name: iqn.1986-03.com.sun:02:a4602145-85f8-64fa-c0ef-a059394d9a12

    Target address: 10.9.168.93:3260, 1

    Target name: iqn.1986-03.com.sun:02:0449398a-486f-4296-9716-bcba3c1be41c

    Target address: 10.9.168.93:3260, 1

  3. Enable and display static discovery.

    # iscsiadm modify discovery --static enable

    # iscsiadm list discovery

    Discovery:

    Static: enabled

    Send Targets: disabled

    iSNS: disabled

  4. Adds a target to the list of statically configured targets. A connection to the target will not be attempted unless the static configuration method of discovery has been enabled.

  5. # iscsiadm add static-config iqn.1986-03.com.sun:02:9e0b0e03-8823-eb7e-d449-f9c21930ba15,10.9.168.93

    # iscsiadm add static-config iqn.1986-03.com.sun:02:2cc4fe10-c7ba-697f-d95f-fa75efe50239,10.9.168.93

  6. Use Solaris devfsadm(1M) to create iSCSI device nodes.

    # devfsadm -i iscsi

  7. Use format(1M) command to access iSCSI disks. The disk(s) to be selected contain /scsi_vhci in their path name. Local disks are listed before iSCSI disks in the format command list. The following shows disk no. 4 and 5 are iSCSI disks.

    # format

    Searching for disks...done

    AVAILABLE DISK SELECTIONS:

    0. c0t0d0 <DEFAULT cyl 17830 alt 2 hd 255 sec 63>

    /pci@0,0/pci8086,25f8@4/pci108e,286@0/sd@0,0

    1. c0t1d0 <DEFAULT cyl 17830 alt 2 hd 255 sec 63>

    /pci@0,0/pci8086,25f8@4/pci108e,286@0/sd@1,0

    2. c0t2d0 <DEFAULT cyl 17830 alt 2 hd 255 sec 63>

    /pci@0,0/pci8086,25f8@4/pci108e,286@0/sd@2,0

    3. c0t3d0 <DEFAULT cyl 17830 alt 2 hd 255 sec 63>

    /pci@0,0/pci8086,25f8@4/pci108e,286@0/sd@3,0

    4. c2t600144F04890703F0000144FA6CCAC00d0 <DEFAULT cyl 13051 alt 2 hd 255 sec 63>

    /scsi_vhci/disk@g600144f04890703f0000144fa6ccac00

    5. c2t600144F0489070250000144FA6CCAC00d0 <DEFAULT cyl 13051 alt 2 hd 255 sec 63>

    /scsi_vhci/disk@g600144f0489070250000144fa6ccac00

  8. For UFS Follow the following procedure

    1. Create file systems on iSCSI disks.

    # newfs /dev/rdsk/c2t600144F04890703F0000144FA6CCAC00d0s0

    # newfs /dev/rdsk/c2t600144F0489070250000144FA6CCAC00d0s0

    1. Make the mount points.

    # mkdir /pgdata

    # mkdir /pglog

    1. Mount the iSCSI LUNs

    # mount -F ufs -o forcedirectio /dev/dsk/c2t600144F04890703F0000144FA6CCAC00d0s /pgdata

    # mount -F ufs -o forcedirectio /dev/dsk/c2t600144F0489070250000144FA6CCAC00d0s0 /pglog


  1. For ZFS Create zpool as follows:

    # zpool create pgdata c2t600144F04890703F0000144FA6CCAC00d0s

    # zpool create pglog c2t600144F0489070250000144FA6CCAC00d0s0



  1. For ZFS with Read-Cache and Separate Intent Log (OpenSolaris 2008.05 and later) using local disks

    # zpool create pgdata c2t600144F04890703F0000144FA6CCAC00d0s cache c0t1d0s1 log c0t1d0s0

    # zpool create pglog c2t600144F0489070250000144FA6CCAC00d0s0 cache c0t2d0s1 log c0t2d0s0

PostgreSQL Setup:

Many times, UFS filesystems have “lost+found” directory which causes PostgreSQL's initdb command to fail as directory is not empty. Also major versions of database are incompatible and hence it makes sense to take a naming convention as follows and create a sub directory within /pgdata and /pglog depending on the version you are planning to use:

mkdir /pgdata/8.3 /pglog/8.3

mkdir /pgdata/8.2 /pglog/8.2

mkdir /pgdata/8.1 /pglog/8.1


Then for PostgreSQL 8.3 (Solaris 10 10/08) the step is as follows

/usr/postgres/8.3/bin/initdb -D /pgdata/8.3 -X /pglog/8.3/pg_xlog


As for 8.2 the steps are as follows:

/usr/postgres/8.2/bin/initdb -D /pgdata/8.2

mv /pgdata/8.2/pg_xlog /pglog/8.2/

ln -s /pglog/8.2/pg_xlog /pgdata/8.2/pg_xlog


And for the default 8.1 the steps are similar:

/usr/bin/initdb -D /pgdata/8.1

mv /pgdata/8.1/pg_xlog /pglog/8.1/

ln -s /pglog/8.1/pg_xlog /pgdata/8.1/pg_xlog

And then use the pg_ctl from the version of PostgreSQL being used to start and stop the server:

pg_ctl start -D /pgdata/8.3 -l /pgdata/8.3/server.log

pg_ctl stop -D /pgdata/8.3

In another blog post I will explain on how to take snapshots with PostgreSQL and the OpenStorage devices.


The New Sun Microsystems  announced a new Glassfish/MySQL/OpenSolaris SPECjAppServer2004 result today. The  real highlight is that all software used in this benchmark is all Open Source Software. It is  a win for Open Source Software including Open Source Databases like PostgreSQL and MySQL. We need more of such benchmarks to highlight the exorbitant prices charged by Proprietary Database & other Software Vendors who charge and force customers to give them all their major dollars of their IT budget.

Tom Daly's blog entry highlight that in terms of Price/Performance, the proprietary database vendors who  conveniently also happen to be the Application Server Vendors charge  a much higher cost including all hardware/software (10X) associated with same performance which can be obtained  by Open Source database alternatives like PostgreSQL and MySQL. Check BM Seer's blog entry for more comparison.


SPEC required disclosure : - SPEC, SPECjAppServer reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 5th Nov 2008 All comparisons are based on the SPEC SPECjAppServer2004JOPS@Standard metric from www.spec.org or on pricing made using the bill of materials included in each SPECjAppServer2004 result

Solaris 10 10/08 (ala Solaris 10 Update 6) is now available. One of the new features of Solaris 10 10/08 is that PostgreSQL 8.3 is now bundled and available at /usr/postgres/8.3. I thought I will take this opportunity to help folks on how to use the new version on PostgreSQL using Openbravo ERP 2.40 as an example. (Note: This is specifically for Solaris 10 Update 6. If you are looking for instructions on how to install Openbravo and PostgreSQL on OpenSolaris check the other blog entry.)

First of all I recommend people to use the ZFS root feature now available in this update.  On my Sunblade 2000 I had to use the text installer via

boot cdrom - text

to install the system with ZFS boot.

Also another piece of recommendation is to put /var in a separate dataset. Why? Well all postgresql databases typically goes by default in /var/postgres  and hence to use the snapshot feature of ZFS, this provides a fine granularity.

Once Solaris 10 Update 6 is installed, then to see all options of Postgres available on Solaris 10 Update 6 do the following:

# svcs -a |grep postgres
disabled       Oct_31   svc:/application/database/postgresql:version_82_64bit
disabled       Oct_31   svc:/application/database/postgresql:version_82
disabled       Oct_31   svc:/application/database/postgresql:version_81
disabled       Oct_31   svc:/application/database/postgresql_83:default_64bit
disabled       Oct_31   svc:/application/database/postgresql_83:default_32bit

In my case I want to use a 32-bit instance of  PostgreSQL 8.3 so I just do the following.

# svcadm enable svc:/application/database/postgresql_83:default_32bit
# svcs -a |grep postgres
disabled       Oct_31   svc:/application/database/postgresql:version_82_64bit
disabled       Oct_31   svc:/application/database/postgresql:version_82
disabled       Oct_31   svc:/application/database/postgresql:version_81
disabled       Oct_31   svc:/application/database/postgresql_83:default_64bit
online         16:56:35 svc:/application/database/postgresql_83:default_32bit

Note when you do svcadm enable for any of the PostgreSQL instances, it first does an initdb and then does the equivalent of pg_ctl start. So only after the first enable of the service, you will see that the data directory initialized in /var/postgres/8.3/data (for 64-bit it will be data_64).

At this point of time I typically edit postgresql.conf located in /var/postgres/8.3/data/postgresql.conf and modify it for my basic tweaks:

checkpoint_segments=16 
shared_buffers = 128MB
wal_buffers=512kB

and then restart PostgreSQL server so the changes are incorporated before I start using the PostgreSQL server instance.

# svcadm restart svc:/application/database/postgresql_83:default_32bit

Great now PostgreSQL 8.3 server is up and running. Don't forget to use psql from  /usr/postgres/8.3/bin instead of the default /usr/bin which uses PostgreSQL 8.1 psql.

Lets now use an application on top of this new PostgreSQL server. Openbravo ERP 2.40 which was recently released also requires tomcat .

Now there are two versions of tomcat in Solaris 10 Update 6 (unlike OpenSolaris which has none by default and when you install it via pkg it is the latest version).  The two versions are the original Tomcat 4 and Tomcat 5.5.

/usr/apache/tomcat    - Tomcat 4

/usr/apache/tomcat55 - Tomcat 5.5 

For ease of use and Openbravo we want to use the newer Tomcat version.

so let's set the already bundled tomcat55 quickly as follows:

# cp /var/apache/tomcat55/conf/server.xml-example 
/var/apache/tomcat/conf55/server.xml

Openbravo ERP 2.40 requires ant to be 1.6.5 or higher.  The ant in /usr/sfw/bin in Solaris 10 Update 6 is still the original version  1.5.4.  I downloaded apache-ant-1.7.1-bin.zip from ant.apache.org  and unzipped it in /opt

Download the Openbravo ERP 2.40 Solaris (SPARC) installer available on its download site.

First of all if you are not logged in as root and use "su -" to root,  then using another terminal using the normal userid,  give X server display rights to root. As a short cut I use "xhost localhost"  to give permission to all users on localhost.

Then back as root

# chmod a+x OpenbravoERP_2.40-solaris-sparc-installer.bin
# DISPLAY=localhost:0.0; export DISPLAY
# ./OpenbravoERP_2.40-solaris-sparc-installer.bin


Here is my quick cheat sheet of answers for the questions that OpenBravo installation GUI asks (note the ones in bold are changed from default):

  • /opt/OpenbravoERP
  • /var/OpenbravoERP/AppsOpenbravo/attachments
  • Complete
  • Standard
  • /usr/jdk/instances/jdk1.5.0
  • /opt/apache-ant-1.7.1/bin/ant 
  • /var/apache/tomcat55
  • PostgreSQL
  • /usr/postgres/8.3/bin
  • localhost     5432
  • (Enter password for postgres user as "postgres" twice)
  • openbravo    tad     (Enter password for tad user  twice)
  • Context name: openbravo
  • Date format: DD MM YYYY, Date Separator -, Time format 24h, Time Separator :
  • Demo data: Y or N depending on your preferences

After the information the installation GUI takes quite a bit of time to complete specially if you select to load the demo data. (Hope you made changes to PostgreSQL before to tune this loading.)

Notice that all user related data (PostgreSQL, Tomcat, Openbravo Attachments) are now going in /var. This allows us now to take a snapshot of /var to revert back to this installed image in case we accidentally corrupt user data or application settings.

Before taking the snapshot it might be a good procedure to make sure the database and tomcat is not running. Since we haven't started tomcat55 yet we only have to bring down PostgreSQL server.

# svcadm disable svc:/application/database/postgresql_83:default_32bit

# zfs snapshot rpool/ROOT/s10s_u6wos_07b/var@openbravoinstall

Once the installation completes  start tomcat as follows

# svcadm enable svc:/application/database/postgresql_83:default_32bit

# /usr/apache/tomcat55/bin/startup.sh

Once tomcat has completely started (depending on system it could be more than couple of seconds),  open a browser and go to

http://localhost:8080/openbravo

If you see the login screen if everything goes right !. Use Openbravo as username and openbravo (all lower case) as password to login and set it up for your business.

If you want to rollback to this intial version just do:

# /usr/apache/tomcat55/bin/shutdown.sh

# svcadm disable svc:/application/database/postgresql_83:default_32bit

# zfs rollback rpool/ROOT/s10s_u6wos_07b/var@obenbravoinstall

# svcadm enable svc:/application/database/postgresql_83:default_32bit

# /usr/apache/tomcat55/bin/startup.sh


Once tomcat has completely started,  open a browser and go back to the browser and start again:

http://localhost:8080/openbravo

More help on installation is available on their wiki site.

There are a quite number of Open Source ERP apps out there, but only a few (Compiere/Adempiere, OpenBravo, xTuple, OFBiz) seem to have gained traction. Recently, I tried out OFBiz and getting it to work with PostgreSQL was quite simple. OFBiz is actually shipped with an embedded Java DB (Derby), so for testing or development, you probably don't need an external DB.

I did my test on OpenSolaris 2008.05, so if you use a different version of Solaris or a different OS, the instructions below may vary.

First, check out the source, compile, and run with the embedded database to make sure everything works properly.

  1. Create a directory where you want OFBiz to reside, and run "svn co http://svn.apache.org/repos/asf/ofbiz/trunk ofbiz"
  2. Go to the OFBiz directory (now referred to as $OFBIZDIR), and run "./ant run-install"
  3. In $OFBIZDIR, run "java -jar ofbiz.jar"
  4. Once OFBiz is started, check out the sample app by pointing the browser to http://localhost:8080/ecommerce/

Once your have it working with Derby, follow the steps below to switch to PostgreSQL. I use version 8.3.x.

  1. Assuming you have Postgres installed and ready to go, run:

    postgres $ createuser -S -D -R -P -E ofbiz
    postgres $ createdb -O ofbiz ofbizdb

  2. Edit the OFBiz DB config file in $OFBIZDIR/framework/entity/config/entityengine.xml

    <delegator name="default" entity-model-reader="main" entity-group-reader="main" entity-eca-reader="main" distributed-cache-clear-enabled="false">
    <group-map group-name="org.ofbiz" datasource-name="localpostnew"/>
    ...
    </delegator>

    <datasource name="localpostnew"
    ...
    <inline-jdbc
    jdbc-driver="org.postgresql.Driver"
    jdbc-uri="jdbc:postgresql://127.0.0.1:5432/ofbizdb"
    jdbc-username="ofbiz"
    jdbc-password="ofbiz"
    isolation-level="ReadCommitted"
    pool-minsize="2"
    pool-maxsize="250"/>
    </datasource>

  3. Copy PostgreSQL JDBC driver to $OFBIZDIR/framework/entity/lib/jdbc. Make sure to use the JDBC4 driver.
  4. Go to $OFBIZDIR, and run "./ant run-install" to populate the data in PostgreSQL
  5. Re/start OFBiz by running "java -jar ofbiz.jar"
  6. Once OFBiz is up and running, you should be able to access http://localhost:8080/ecommerce/

That's all there is to it.

Jignesh has a post on setting up OpenBravo with Postgres on OpenSolaris 2008.05.

Setting up MediaWiki (I use version 1.13.1) with PostgreSQL 8.3 is quite straightforward. Thanks to Greg Mullane for fixing the issues.

Since tsearch2 is integrated into core in 8.3, you only need to run the following three commands before running the MediaWiki install script.

$ createuser -S -D -R -P -E wikiuser
$ createdb -O wikiuser wikidb
$ createlang plpgsql wikidb

With 8.2, you have to run the following commands in addition to the above.

$ psql wikidb < /usr/postgres/8.2/share/contrib/tsearch2.sql (path for PostgreSQL on Solaris)
$ psql -d wikidb -c "grant select on pg_ts_cfg to wikiuser;"
$ psql -d wikidb -c "grant select on pg_ts_cfgmap to wikiuser;"
$ psql -d wikidb -c "grant select on pg_ts_dict to wikiuser;"
$ psql -d wikidb -c "grant select on pg_ts_parser to wikiuser;"
$ psql -d wikidb -c "update pg_ts_cfg set locale = current_setting('lc_collate') where ts_name = 'default';"

If you need to upgrade to PostgreSQL 8.3 from an older version of PostgreSQL, you may find the following links useful.
http://www.mediawiki.org/wiki/Manual:Upgrading_Postgres
http://julien.danjou.info/blog/index.php/post/2008/05/19/Upgrading-mediawiki-from-PostgreSQL-82-to-83

Jignesh K. Shah and Robert Lor describe how to set up Openbravo (an open-source ERP system) and Drupal (an open-source CMS) on PostgreSQL on OpenSolaris.
Check out a new PostgreSQL buildfarm set up by Zdenek Kotala. (By the PostgreSQL community convention, I'm referring to it as a "buildfarm" but it should probably be better known as a "testfarm".)

Drupal is quickly gaining popularity among the many Open Source Content Management Systems (CMS) out there. AFAIK, Drupal is mostly used with MySQL on Linux, so I wanted to find out myself how easy (or hard) it is to get it working with PostgreSQL on OpenSolaris. Below were the steps I followed:

  1. Install OpenSolaris 2008.05. The installation from the Live CD went very smoothly, much simpler than previous versions of Solaris. Great job to the installer team! If you don't have the CD, you can download it from http://www.opensolaris.com/. Remember, OpenSolaris comes with the bare minimum, so additional software need to be installed from the network repository.

  2. Install Web Stack (Apache, PHP, MySQL, etc). After the OS installation, I fired up Firefox and just followed the instructions from the Welcome Page, clicked on "the world of OpenSolaris" link and that took me to http://www.opensolaris.com/learn/. From there, I followed these links: "Web Stack Getting Started Guide" > "2. Getting Started" > "Installing Web Stack Components" > "Setting Up Your AMP Development Environment". I decided to install everything using the following command.

  3. # pkg install amp-dev

  4. Initialize the Web Stack Environment & Start Apache. To do this, I just followed the Web Stack Getting Started Guide. Clicked on Applications > Developer Tools > Web Stack Getting Start Guide, and did the followings:
  5. Initialize the environment: Clicked on Applications > Developer Tools > Web Stack Initialize

    (The guide is a bit confusing, so run the following commands first before starting Apache/MySQL; otherwise you'll get errors)
    # svccfg import /var/svc/manifest/network/http-apache22.xml
    # svccfg import /var/svc/manifest/application/database/mysql.xml

    Start Apache & MySQL: Clicked on Applications > Developer Tools > Web Stack Admin > Start Apache2/MySQL servers

    Point browser to http://localhost so check that Apache is running.

  6. Install PostgreSQL. I wanted to install PostgreSQL 8.3 from the network repository, but I found out that 8.3 didn't make it to OpenSolaris 2008.05 which was based on Nevada build 86. To see all the available PostgreSQL packages, run.

    # pkg search -r postgres

    The output shows a lot of packages, but unfortunately the 8.3 packages are for later builds of Nevada. I did try to install one package, but it didn't work.

    I could build Postgres 8.3 myself, by fortunately Sun also provides the binary in tarball format at http://www.postgresql.org/ftp/binary/v8.3.3/solaris/opensolaris/i386/, so I just used this version.

  7. Setup PostgreSQL. I wanted to run Postgres using using the "postgres" user, but it's setup as a role by default, so I did the following to change it to a normal user:

    a) As root, run "passwd -d postgres"
    b) Edit /etc/user_attr and change the type for "postgres" from role to normal
    c) Change the home directory and shell for "postgres" using the usermod command

    Now I need to create the Postgres DB cluster. Make sure the directory where Postgres is installed is added to PATH.

    postgres$ initdb -D /directory/path/to/data
  8. Install Drupal 6.4. Follow the instructions in INSTALL.txt and copy the files to /var/apache2/2.2/htdocs.

    Follow the instructions in INSTALL.pgsql.txt to setup the database. Here's what I did. Make the Postgres is running.

    postgres$ createuser --pwprompt --encrypted --no-adduser --no-createdb drupal
    Enter password for new role:
    Enter it again:
    Shall the new role be allowed to create more new roles? (y/n) y

    postgres$ createdb --encoding=UNICODE --owner=drupal drupaldb

  9. Configure Drupal. Make sure Apache is running and point the browser to http://localhost, and you should see the Drupal setup page.

  10. That's it. Besides a couple of small hiccups, I think the whole process was quite straightforward.

If you really like using pgAdmin to administer you Postgresql server instance, then this tip is important for you.

Solaris/OpenSolaris supports many versions of Postgres. Currently, OpenSolaris supports postgresql version 8.1, 8.2 and 8.3; which means you can install any of these versions from the repository and it will work just out of the box. pgAdmin, a postgresql database administration/monitoring tool (also available in the repository) is not tied to any specific version of the postgres server. So, You can install pgAdmin from the repository and simply connect to the postgres server that you are using.

Now, there is small glitch here. If you are trying to run programs like pg_backup, pg_restore from pgAdmin (Option Backup, Restore on database), then you need to be careful. pgAdmin expects these binaries to be present in the same directory as itself. i.e. /usr/bin. However, different postgresql server binary versions are installed in different locations, such as,


Postgres 8.1 : /usr/bin

Postgres 8.2 : /usr/postgres/8.2/bin

Postgres 8.3 : /usr/postgres/8.3/bin

Now, Lets say, you are using pgAdmin to restore a database of version 8.1, then you are fine. But, if you do not have 8.1 installed, pgAdmin will simply disable these options from the menu, as it would not be able to see the binaries at all. Also, If you have both 8.1  & 8.2 installed, but you are using 8.2 server then pgAdmin will use backup & restore binaries from the 8.1 bin directory, simply because they exist in the same directory as pgadmin. This will result in inconsistent behavior.

The best way to resolve all such issues is to create a symlink to /usr/bin/pgadmin3 from the bin directory of the server you are using. So, e.g., If you are using 8.2, then you could do following,


bash# ln -s /usr/bin/pgadmin3 /usr/postgres/8.2/bin/pgadmin3

and the same for 8.3 as well. Obvious that, this is not required if you are using 8.1 server.

then just include the 'bin' directory  of your server at the begining of your  PATH and you will always use  pgAdmin and the correct backup/restore binaries.

As a part of my work on in-place upgrade project I need cleanup code and prepare unified API for page content access. The work went smoothly and I thought that I'm ready with first prototype. Everything worked fine and regression test passed without any problems. However, I has expected that new interface should have a performance penalty and I wanted to know how big the impact is. I asked my colleagues from performance team if they can help me. Paul van den Bogaard offered me that he helps me but he needs 64bit SPARC binaries. I answered "It's no problem". I used same flags for compilation as we use for Solaris PostgreSQL official packages. And of course as a true hacker a tried gmake check at the end. Upsssssss...

...
     bit                  ... ok
     numeric              ... ok
     txid                 ... ok
     uuid                 ... FAILED (test process exited with exit code 2)
     enum                 ... FAILED (test process exited with exit code 2)
     money                ... ok
...

======================================================
 27 of 115 tests failed, 1 of these failures ignored. 
======================================================

Hmmm. It does not look really good. Quick look into regression.diffs confirm my doubt that it is in hash index.

  -- btree and hash index creation test
  CREATE INDEX guid1_btree ON guid1 USING BTREE (guid_field);
  CREATE INDEX guid1_hash  ON guid1 USING HASH  (guid_field);
! server closed the connection unexpectedly
!       This probably means the server terminated abnormally
!       before or while processing the request.
! connection to server was lost

Ok. Core dump shows more. Fortunately I set system to store core into special directory and with "core.%f.%p" pattern (see coreadm(1M)). It helps to keep all generated core files separately. And now what pstack(1) shows:

 00000001000858f8 _hash_metapinit (1006c2818, ff00, fc00, 1000038c9bc, 13d, 2) + 260
 000000010008238c hashbuild (ffffffff7fffa1c8, 1006c2708, 1006ba670, 100645840, 1004e2000, 1006c2818) + 84
 00000001002c66d8 OidFunctionCall3 (100082308, 1006c2708, 1006c2818, 100645840, 3, ffffffff7fffa570) + 48
 00000001000d1eb8 index_build (1006c2708, 1006c2818, 100645840, 1006c24c8, 1006c47f0, 0) + 3c
 00000001000d16c0 index_create (0, 1, 4085, 100645840, ffffffff7fffa7b8, 1006c2818) + 78c

Good, I adjusted HashMetaPageData structure and added new HashPageGetMeta macro for get pointer to the correct place and of course the change must be backward compatible.

--- a/src/include/access/hash.h Fri Aug 01 13:16:08 2008 +0000
+++ b/src/include/access/hash.h Sat Aug 02 18:04:16 2008 +0200
@@ -138,7 +138,6 @@
 
 typedef struct HashMetaPageData
 {
-       PageHeaderData hashm_phdr;      /* pad for page header (do not use) */
        uint32          hashm_magic;    /* magic no. for hash tables */
        uint32          hashm_version;  /* version ID */
        double          hashm_ntuples;  /* number of tuples stored in the table */

...

+#define HashPageGetMeta(page) \
+ ((HashMetaPage) (((char *)(page)) + SizeOfPageHeaderData + sizeof(ItemIdData)))

Because I got SIGBUS (Bus Error) there is problem with memory alignment but start address of hashm_magic is on correct place. Strange, time to take debugger:

_hash_metapinit+0x260:          clrx      [%l4 + 0x24]

After investigation (thanks to Vita Batrla for help with SPARC assembler). Register l4 contains pointer to the page. It means that it points to hashm_ntuples which is double and double has to be located on 8 bytes aligned address. But how can it happen when structure starts on same address? The problem is size of PageHeaderData structure. It is 28bytes long and of course it is not divided by 8. It means compiler has to put 4 padding bytes between hashm_version and hashm_ntuples in original structure. But padding is not necessary in the new one and unfortunately new structure is different. I compare it with code generated from head and difference is clear:

_hash_metapinit+0x248:          clrx      [%i3 + 0x28]

During investigation I found another interesting thing. When I compiled PostgreSQL without optimization problem disappeared. Close look into code shows what happened:

_hash_metapinit+0x32c:          std       %f36, [%l0 + 0x8]

The STD instruction works fine with nonalignment floating point.

And what is conclusion? I can break backward compatibility and cleanup hash code or create several version of HashMetaPage for different page layout. Everything depends how new hash index implementation ends up. When it will be integrated into PostgreSQL 8.4 compatibility will be broken anyway and clean up can be done.

I dropped by "LinuxWorld Expo" in San Francisco on Tuesday, and I think the only thing worth noting is the PgDay.

Five years ago, when I attended the same Expo, it had a completely different spirit, with a lot more participation by the main Linux vendors and a large variety of software companies.

(Other references: PostgreSQL on Solaris and on OpenSolaris.) 

In the previous entries many graphs were shown. The once showing throughput by the amount of commands handled had this typical sinusoidal pattern. I now think I found the cause of it. Although I definitely do not understand it (yet).

Since, as usual, a picture tells us more than a thousand words here is one with that pattern. The vertical axes shows the cumulative time by all PostgreSQL processes in a certain state:


mail_commitdelay.png

Ok, what is shown here? The red graph depicts the spent while sleeping as a direct result of the commit_delay setting. Currently the average number of processes with active transactions when a flush is called is around 85. Therefore that part of the equation is true, so indeed it is the commit_delay setting that determines if the sleep is done: all other prerequisites are true (read the source if you do not understand :-) .)

Now when setting commit_delay to zero, the result becomes:


mail_nocommitdelay.png

The time for XLogFlush now fully determines the time spent in the routine that handle the RecordTransactionCommit. It is also approximately 60% of the time spend handling commands.

The downside of not disabling commit_delay is total throughput. A little bit of delay reduced the amount of (WAL related) IO significantly. This was already shown in an earlier blog entry. However the behavior is much more constant without the delay. would it be beneficial if the same constancy could be realized in the commit_delay case?

To answer that one, some extra investigations are called for.

If you haven't heard already, Peter Eisentraut is joining Sun to work on PostgreSQL. This is very exciting, and I'm looking forward to working with Peter on some interesting projects.

Just as Peter decided to join Sun, Josh Berkus decided to leave us. Josh has made many contributions, and he will be missed. I have learned a lot from him for the past couple of years. Josh, good luck with your new endeavors.

BTW, I'm at OSCON this week. I arrived here Saturday night so I could attend PDXPug Day. I'm glad I decided to attend this mini-conference because the talks were excellent and I also got to meet some cool PostgreSQL people.

On July 22nd, 2008, I will be joining Sun Microsystems as PostgreSQL software engineer. Sun has been a valuable contributor to the PostgreSQL project for a number of years now, and I am looking forward to joining them in this effort. I am glad that I will be able to continue my personal role in the PostgreSQL project with the support of the great resources that Sun provides.

I would like to take this opportunity as well to thank my former colleagues at credativ GmbH for their support of the PostgreSQL project and my own career. I wish them much success in their continued development.

So, I expect that I will have more time to contribute to PostgreSQL development from now on, and both Sun and I have a sizeable backlog of projects and ideas that we would like to realize. Time to get started!

In previous posts a typical pattern in handling commands was seen. The handling versus reading commands cumulative time consumption showed a pattern as can be seen in the next graph:

mediumheap.png

I had a hunch that this could be related to the client (load generator) behaviour. The client program is written in Java. It runs 128 application threads that continuously communicate with the database. Lots of rows inserted and retrieved. The number of objects created should be considerable. Garbage collection to clean up the Java heap might be the driving force for the above mentioned pattern.

My standard configuration runs with a 2 Giga byte heap. I decided to increase this to 3.2GB, and also increase the Eden size to 1.5GB. For those a little familiar with Java, this is the space in which the objects are created. The test resulted in the following graph:


largeheap.png

Indeed it is the client who is responsible for these spikes. Using a larger heap means more objects can be created in the Java heap before a garbage collection is forced. Also a larger heap holds many more objects translating in longer GC times. Indeed the time spent reading commands has increased. Please note, garbage collections in Eden are "stop the world", meaning the client does not respond to the database process: during a GC the readcommand does take significant more time.