open thought and learning

Archive for the ‘code’ Category

Tuning a JVM for Berkeley DB Java Edition

leave a comment »

For those who not have heard about Berkeley DB (called BDB): it is a transactional storage engine with basic key/value pairs, very agile and highly performance-oriented, with no SQL-like engine. Compared to it’s native version, the Java Edition has quite a few differences and is useful when it is to be integrated with a basic Java application.

The aim of the database is to be available in RAM all the time as much as possible, so that all query responses are fast. Based on this, here’s my take on tuning the JVM that hosts the BDB:

  • JVM heap size should be around the same size as the data store
  • Use the Concurrent Mark/Sweep GC algorithm to have low-pause GC times
  • Since most of the objects are going to be living ‘forever’, it’ll make sense to have a huge tenured generation
  • If the DB size can vary, refrain from giving Xmx and Xms the same values. Give a huge difference so that the JVM can manage it as your data grows

This is what CATALINA_OPTS might look like (includes a lot of debug flags as well):

CATALINA_OPTS="-server -Xms1024m -Xmx4096m -XX:+UseMembar -XX:+PrintGCDetails -
XX:+PrintGCApplicationStoppedTime -XX:NewRatio=4 -XX:+UseConcMarkSweepGC -verbos
e:gc -Xloggc:/appl/tomcat/logs/gcdata.txt"

-XX+UseMemBar is there to accomodate for the high IO waits I had been seeing – I think there’s a problem in linux with the JDK using memory barriers. I read about the bug here.

BDB Java Edition is not a replacement for a traditional database, but is a means to have almost immediate results for things like look-up data, subscriptions and most frequently-used information. There are quite a number of on-line resources available to help you set it up and use it – native or Java, whichever your flavor is.

memcached is another such tool that is useful when improving performance for an application-database connection.  More on it in another post some other time.



Written by mohitsuley

August 8, 2008 at 9:30 pm

My mod_python 101

leave a comment »

After having built mod_python.so and doing a LoadModule modules/mod_python.so I expected everything to work fine, assuming I did a SetHandler python-program and a PythonHandler helloworld within an Apache virtualhost.

I could get Hello World! on the page then, with this following snippet:

$cat helloworld.py
from mod_python import apache
def handler(req):
req.content_type = “text/plain”
req.write(”Hello World!”)
return apache.OK

That’s when the problems began.

1. I couldn’t put in another .py file in the same directory that could run successfully – 404 returned
2. All HTML files returned a 404 as well
3. There was nothing on the error logs

After much reading around, here’s the final configuration that worked:

LoadModule modules/mod_python.so
PythonDebug On
AddHandler python-program .py
PythonHandler mod_python.publisher
PythonPath "['/appl/python2.5/lib/','/appl/webdocs/','/appl/python2.5/bin/python/','/appl/python2.5/lib/python2.5/site-packages/mod_python/','/appl/python2.5/lib/python2.5/site-packages/'] + sys.path"

And the URL that worked was http://myhost/helloworld.py/handler

Apparently, mod_python (also) has three kinds of default handlers – cgi, publisher and PSP.

cgihandler – to work with existing CGI scripts by creating a false environment
psp – Python Server Pages, just like JSP
publisher – The preferred way to handle python pages, using page/method

publisher is recommended for newer applications, and as you move forward and your application increases in complexity and entry points, you can make your own handlers.

Like I said, 101 and basics, but what the heck, I get to write something here. Itchy fingers.

More later.

Written by mohitsuley

July 31, 2008 at 1:28 am

Posted in code, linux

Tagged with ,