Posts Tagged ‘sysadmin’
disown and nohup
This is the first time I started a file transfer, and on hindsight, struck my head and said “Wish I’d started this with nohup or screen…”; I could have left home on time with this laptop tagging along with me.
What I didn’t know *then* (now I know) that there’s a beautiful bash built-in called disown which can attch any/all running jobs to the init process. Yay!
[root@linux-test data]# scp RHEL4-U5-i386-AS-disc4.iso suleym@192.168.1.1:/appl/RHEL-AS4/
suleym@192.168.1.1's password:
RHEL4-U5-i386-AS-disc4.iso 0% 1016KB 1.0MB/s 06:21 ETA
[1]+ Stopped scp RHEL4-U5-i386-AS-disc4.iso suleym@192.168.1.1:/appl/RHEL-AS4/
[root@linux-test data]# bg
[1]+ scp RHEL4-U5-i386-AS-disc4.iso suleym@3.122.220.169:/appl/RHEL-AS4/ &
[root@linux-test data]# disown -h
More about disown here and in man bash.
Writing this post for the sake of not reverse-engineering searches on Google, and so that people see disown *related* to nohup.
Tuning a JVM for Berkeley DB Java Edition
For those who not have heard about Berkeley DB (called BDB): it is a transactional storage engine with basic key/value pairs, very agile and highly performance-oriented, with no SQL-like engine. Compared to it’s native version, the Java Edition has quite a few differences and is useful when it is to be integrated with a basic Java application.
The aim of the database is to be available in RAM all the time as much as possible, so that all query responses are fast. Based on this, here’s my take on tuning the JVM that hosts the BDB:
- JVM heap size should be around the same size as the data store
- Use the Concurrent Mark/Sweep GC algorithm to have low-pause GC times
- Since most of the objects are going to be living ‘forever’, it’ll make sense to have a huge tenured generation
- If the DB size can vary, refrain from giving Xmx and Xms the same values. Give a huge difference so that the JVM can manage it as your data grows
This is what CATALINA_OPTS might look like (includes a lot of debug flags as well):
CATALINA_OPTS="-server -Xms1024m -Xmx4096m -XX:+UseMembar -XX:+PrintGCDetails -
XX:+PrintGCApplicationStoppedTime -XX:NewRatio=4 -XX:+UseConcMarkSweepGC -verbos
e:gc -Xloggc:/appl/tomcat/logs/gcdata.txt"
-XX+UseMemBar is there to accomodate for the high IO waits I had been seeing – I think there’s a problem in linux with the JDK using memory barriers. I read about the bug here.
BDB Java Edition is not a replacement for a traditional database, but is a means to have almost immediate results for things like look-up data, subscriptions and most frequently-used information. There are quite a number of on-line resources available to help you set it up and use it – native or Java, whichever your flavor is.
memcached is another such tool that is useful when improving performance for an application-database connection. More on it in another post some other time.
Cheers!
Making SSI work on a JSP response
If you need to parse SSI from a JSP response, there are two simple ways to do it:
1. Use the SSIServlet and handle it within tomcat
2. If you have a separate web server like Apache in front of tomcat, and you want that web server to do it, the plot thickens.
If you ask, ‘why, when you are already using Java? You can do all that you can do with SSI in a JSP, right?‘, you might be surprised. Let’s just say the reason is out-of-scope for this post.
So, you have a three-tier architecture with web servers spread across the world and app/DB servers local to certain data-centers. Naturally, you might want to ‘assimilate’ content on the web servers (closest to local users based on 3DNS/similar) where it’s already present instead of shuttling bytes back-and-forth between the web and app layers. That’s the reason. And did I say earlier it was out of scope? My bad.
The way you would do it is set up Apache on a specific Location to grab for, put an AddOutputFilterByType statement with the MIME type as text/x-server-parsed-html and finally, on the JSP itself, you will set the MIME type using setContentHeader for the response.
Your Location section might look like this:
<Location /application/ssiparser >
Options +Includes
AddOutputFilterByType INCLUDES;DEFLATE text/x-server-parsed-html
</Location>
In an ideal scenario, everything should have been hunky-dory, but life isn’t so simple. At least it didn’t happen so easily for me.
What I had done earlier was, in order to make certain performance improvements, added a CompressionFilter on tomcat to gzip all responses from it so that the app-web performance improves as well. This meant that once the response reached Apache it would already be gzipped and SSI parsing would not be possible. Mind you, this is Apache 2.0.x and not 2.2.x where you can actually set up FilterDeclare and such.
There are two ways to get around this problem:
1. Get the CompressionFilter to exclude the Location you have on for SSI, and then pass on INCLUDES;DEFLATE to AddOutputFilterByType.
2. Or, unset the Accept-Encoding header on the request first so that it doesn’t take gzip and the CompressionFilter doesn’t compress it at all. If I try to deflate it again now, it doesn’t happen.
The problem with (2) is that you end up sending decompressed data across. Option (1) would be the right way to go.
(1) will entail a change on the web.xml for your application.
(2) will look like this:
<Location /application/ssiparser >
Options +Includes
RequestHeader unset Accept-Encoding
RequestHeader set Accept-Encoding deflate
AddOutputFilterByType INCLUDES;DEFLATE text/x-server-parsed-html
</Location>
The JSP will start with:
<%
response.setHeader("Content-Type","text/x-server-parsed-html");
%>
<!--#include virtual="/static/content/news.html"-->
<!--#include virtual="/static/content/weather.html"-->
<!--#include virtual="/static/content/media.html"-->
Most folks do not upgrade Apache as they do with other kinds of software, just because it's so damn stable and fulfills your requirements very well. However, I feel if you need to work with filters and play around with them, 2.2 will be the way to go.
OpenDeploy rollback across a WAN
While working on Interwoven OpenDeploy I came across the following problem:
Large deployments or file-pushes spanning a WAN or a continent used to sometimes time-out or roll back. The problem was noted where there was a significant difference of size between file lists.
This is what happens:
- OD starts n threads based on the n lists of files to be deployed.
- Thread 1 finishes and the remaining n-1 threads continue file transfer.
- After exactly 5 minutes, thread 1 times out (shows a TCP packet with RST flag set on tcpdump) and after all threads finish, the deployment fails and rolls back the transaction.
Root cause:
Some network device on the way times out TCP idle sessions more than 300 seconds and sends an RST flag, dropping the connection essentially. When this happens, OpenDeploy considers the transaction corrupt and rolls it back.
Fix
- Get the firewall to extend the timeout to a more reasonable time (perhaps similar to the default tcp_keepalive_time of 7200 seconds?) – not practically possible if a number of teams are involved.
- Change tcp_keepalive_time to ~200 seconds
- If the keepalive change does not help alone, try http://libkeepalive.sourceforge.net . Works like a charm!
Generally speaking, and not being ‘opendeploy-centric’, I did learn the importance of keepalive packets and how the default value of 7200 seconds might not be practical when an application talks to servers across network borders.
Thanks to my colleague Prajal Sutaria for working on this!
Caching problems with SAML
Anyone who has worked with SAML knows very well how effective and simple it is to achieve federated services with your own authentication mechanism. What needs to be remembered, though, is that end-users might very well be behind firewalls. And with that come proxies; and those proxies open up the Pandora’s box aka cache.
Proxies can cache POST response from the authentication user agent and make user1 see a page which says ‘Welcome user2′. Do a forced-refresh (Ctrl-F5, Cmd-R) on the browser, and you can see your own ID again.
Fixes:
1. Ensure proxies don’t cache any content for your authentication domain.
2. Pass a ‘random’ value like the timestamp using Javascript to the URL (to make it unqiue)
3. Force the content-provider’s web server, and the user agent web server to set Cache-Control to max-age=0 and proxy-revalidate.
4. Make sure you’re sending an invalidation string in the packet as well.
Clearing proxies in a company with about ~100 proxy servers might not be the right choice. The onus should lie on the development and the sysadmin team to make sure important pages are non-cacheable. Never trust proxy servers is the motto here.
Pinging hostnames from /etc/hosts
Problem Statement: Ability to ping a user-defined hostname with a valid IP address
Solution: Simple, put it in the /etc/hosts file and you’re done.
You still can’t do it; did you check nsswitch.conf? This is what should be there: hosts: files dns .
So, with the right /etc/nsswitch.conf and /etc/hosts, should it work?
root@treebeard:~# cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 treebeard
192.168.2.2 mithrandir
root@treebeard:~# ping mithrandir
PING mithrandir (192.168.2.2) 56(84) bytes of data.
64 bytes from mithrandir (192.168.2.2): icmp_seq=1 ttl=64 time=0.092 ms
64 bytes from mithrandir (192.168.2.2): icmp_seq=2 ttl=64 time=0.067 ms
It works!
But…
root@treebeard:~#sudo su - mohit
mohit@treebeard:~$ ping mithrandir
ping: unknown host
It seems when I switch to a non-root user, entries in /etc/hosts fail to take effect.
Why?
The problem is with the read attributes on /etc/nsswitch.conf. I hadn’t noticed that it was world-unreadable.
root@treebeard:~# chmod o+r /etc/nsswitch.conf
root@treebeard:~#sudo su - mohit
mohit@treebeard:~$ ping mithrandir
mohit@treebeard:~$ ping mithrandir
PING mithrandir (192.168.2.2) 56(84) bytes of data.
64 bytes from mithrandir (192.168.2.2): icmp_seq=1 ttl=64 time=0.092 ms
64 bytes from mithrandir (192.168.2.2): icmp_seq=2 ttl=64 time=0.067 ms
Worked, finally. The weird thing is I would have assumed ping to complain that it wasn’t able to read a file or something, but there was nothing of that sort. This means you can actually force a user to stick to DNS resolution and all daemons and root-owned processes to leverage /etc/hosts.
Bad idea I’d say. This might be a ticking time-bomb. I faced this problem when configuring two nodes for a 10g RAC cluster. The DB runs as a user, and the DBA had a tough time getting the private interconnect working – thanks to nsswitch.conf.
Lesson learnt.
My mod_python 101
After having built mod_python.so and doing a LoadModule modules/mod_python.so I expected everything to work fine, assuming I did a SetHandler python-program and a PythonHandler helloworld within an Apache virtualhost.
I could get Hello World! on the page then, with this following snippet:
$cat helloworld.py
from mod_python import apache
def handler(req):
req.content_type = “text/plain”
req.write(”Hello World!”)
return apache.OK
That’s when the problems began.
1. I couldn’t put in another .py file in the same directory that could run successfully – 404 returned
2. All HTML files returned a 404 as well
3. There was nothing on the error logs
After much reading around, here’s the final configuration that worked:
--snip--
LoadModule modules/mod_python.so
--snip--
PythonDebug On
AddHandler python-program .py
PythonHandler mod_python.publisher
PythonPath "['/appl/python2.5/lib/','/appl/webdocs/','/appl/python2.5/bin/python/','/appl/python2.5/lib/python2.5/site-packages/mod_python/','/appl/python2.5/lib/python2.5/site-packages/'] + sys.path"
--snip--
And the URL that worked was http://myhost/helloworld.py/handler
Apparently, mod_python (also) has three kinds of default handlers – cgi, publisher and PSP.
cgihandler – to work with existing CGI scripts by creating a false environment
psp – Python Server Pages, just like JSP
publisher – The preferred way to handle python pages, using page/method
publisher is recommended for newer applications, and as you move forward and your application increases in complexity and entry points, you can make your own handlers.
Like I said, 101 and basics, but what the heck, I get to write something here. Itchy fingers.
More later.
Automator on Google Code
Finally, I have a home for Automator on Google Code at sysadmin-automator. I will start uploading existing scripts tomorrow.
Automator
I have finally been able to create a project on sourceforge for that pet project of mine – Automator.
It all started with my irritation at having to execute the same commands over a number of servers, most of the times during the day. Unfortunately, webmin was not much of an option – I don’t prefer it too much because of it’s highly invasive nature. Moreover, I should have the ability to run a script over a group of servers that I know belong to an application or a tier. For that matter, if today, I wanted to sync all my linux servers from a specific NTP server *manually*, I should ideally execute a command whose pseudocode might look like:
execute on all servers that have OS as linux
So, beginning with this concept, I realized this was not much of a technical feat. It actually was the cumbersome task of ‘documenting’ what each server constituted right from the very beginning and then using this information during my day-to-day tasks.
Let’s say there’s an application called WORDPRESS which has three tiers – DB, WEB and APP and also has three separate set of servers each for DEVelopment, STAGE and PRODuction (that’s how big companies work, you know). To add to it, there are servers across the world, but this particular application actually is in a data-center in NYC. Now, using this information (and much more that can be collected automatically from each server, like, OS, architecture), I should be able to know what servers to work on.
In a nutshell, this is what automator might work like:
mohit@test$auto-command -g WORDPRESS-PROD-WEB -c ntpdate ntp.timeserver.com
This will run ntpdate on all WordPress production web servers. Neat! But how do you bell the cat? The only way I have been able to figure out is during initial handover of the box. Every organization that manages more than a hundred servers has a process in place which has a team ‘audit’ the server and run scripts in order to configure default services before it is handed over to the customer. This is when we can run our own ‘automator service registration’. All you do is connect to the remote server, push your script, run it remotely and then grab some data automatically and of course manually (like environment, name of application etc) and store it in a MySQL database.
I have this working using a system of bash shell scripts. But with time, modifications and the sheer number of servers that I manage, I am moving this over to Python/MySQL. But yes, I’d like to keep it simple – just as a command line interface with a very basic framework that other SAs can use to build their own scripts when they include the package.
Let’s see how far this idea goes.
