Ran into a recent problem at work today, which shouldn’t have been an issue. However, something that normally took a few minutes ended up taking all morning to resolve, and shed light on a few other oddities. I got a ticket early in the morning that a user no longer had access to a directory on a server that has had numerous restores done to it over the last few weeks while the developers try to get it working again. So I logged on and checked the permissions and discovered the folders were owned by root, and quickly realized they were NFS mounts. No matter what I did though, I couldn’t get access to those mounts from the remote system. Other systems mounted that server fine, and that system mounted other nfs shares fine too. It turned out that the server had decided to start sending traffic to the nfs server in question over a vip instead of it’s primary link. This vip of course wasn’t in DNS, or in the hosts file, or in the vsftab file. A quick addition to /etc/hosts and it started working again. However, here is a list of steps for troubleshooting NFS in a solaris environment.
- Can the client ping the NFS server?
- Can the server ping the client?
- Can the server resolv the ip of the client to a name?
- Is the NFS service and associated (rpcbind, portmap, etc) running?
- Is it running on the client also?
A little more indepth
- Does the share show up as an export via share?
- Is the client an allowed client?
- What happens when you run showmount -efrom the client?
- Are the permissions valid?
- If the permissions are for NIS users/groups, are both systems seeing the same NIS server?
- Do the ACL’s make sense?
- Don’t forget, getfacl on Solaris < =9, ls -v on Solaris >= 10
- Run snoop and watch the nfs traffic
Ledger is a pretty sweet finance system that fits my style of all text all the time. Only problem is that installing ledger 3.0 is a bit of a pain on RedHat 6. It doesn’t have any native packages, and the 2.x branch is pretty much dead. To install it you need to compile it yourself, and of course it uses a newer version of some tools so it’s a multi step process. Here is how I did it.
Install Yum prereqs
Some stuff is provided by yum, but in case run the following
yum groupinstall "Development Tools"
yum install cmake cmake28
yum install mpfr*
Download boost 1.46.1 – I had issues with 1.52, and while they may have been caused by other things, this works. extract it
tar -zxf boot_1_46_1.tar.gz
Now it’s time to compile and install
Now it’s time to install ledger, the instructions are almost the same as on the github page
git clone git://github.com/jwiegley/ledger.git
git checkout -b master origin/master
./acprep update make install
You should now be able to use ledger 3.0 If you have any trouble feel free to leave a comment or contact me on irc.
Recent came up with a few things that make using graphite even easier. Namely the Graphlot view. It’s a much nicer view for single graphs then the composer as it lets you zoom in easily, and unlike rrd, you can zoom in both horizontally and vertically. Problem is it sucks for building graphs in it, and I’ve got lots of complicated graphs I want to look at. So I took a hint from the Graphite Composer bookmarklet at obfuscurity, and created a bookmarklet for graphlot. Just drag this link into your bookmarks bar. It will prompt for a graphite url, and then will take you to the graphlot page for it. Open In Graphlot
Heres the diff
< //Added UTC Offset to correct date, set to millisecond difference between local timezone and utc
< //Added UTCOFFSET to correct date
< var d = new Date(v – UTCOFFSET);
> var d = new Date(v);
Hopefully these make it a little easier and nicer to use.
This is a rant, it won’t be to long but…
So I just started playing with graphite which is just all sorts of awesome. However most of the data I care about needs to come from SNMP, because it’s A, a SAN and I can’t just install some agents, or B) It’s a locked down Solaris box that’s got ancient versions of everything and won’t work with any of the new hotness. So I looked at collection agents that worked with graphite, and …. yeah nothing except collectd. Collectd sucks, I had nothing but issues getting it working, and it doesn’t play nice with graphite. I like graphite because it’s simple, a name, a value and a timestamp over a tcp port. Thats the sort of stuff that is awesome. collectD tries to do all this other stuff…and while I’m sure it’s great once it’s setup I don’t like it. I don’t use agents, because my servers have more important things to do than hang on a crappy plugin that doesn’t handle edge cases well.
Enough bitching though, I want to go back to the unix philosophy, do one thing, do it well. Graphite does one thing well, graphing, carbon does one thing well, getting data. SNMP does one thing…poorly…very poorly. Unfortunatley I have yet to find a decent system that can poll snmp data and present it in a useful fashion without having to go through a million steps configuring it. I don’t want my snmp poller to autodiscover my network and try to be smarter than me (OpenNMS I’m looking at you). I just want to say, these mibs, these hosts, this interval, go.
As a pet project I’m going to start working on one in python, until it’s ready for primetime I’ll deal with collectd and it screwing up my nice graphite naming scheme.
Recently had one of the more embarrassing things happen to me at the new job. I spent several hours attempting to figure out my term settings on the Solaris boxes, only to find out that my term was correct, I had just forgotten how to use vi. In case anyone else happens to run into this, it’s important to know that the version of vi that comes with Solaris is vi, not vim, not vim linked as vi, not a modern version of vi, but old school vi. Old school as in, delete and backspace are features that don’t exist, and the arrow keys, fughedaboutit.
Now, if you’re not expecting this, because you were spoiled by the wonders of modern vi clones like vim that are usable and flexible, you might be tempted to blame your terminal settings for the garbage and weirdness that shows up on your screen. The most noticible effect is that most of the time, it will transform the delete key into a capitalize the character under the cursor key. This is maddeningly frustrating, and after spending 5 hours digging into various term settings, it’s definitly supposed to be that way.
So, just a reminder, hjkl are cursor replacements, cw is change word, A is append to end of line, D is delete line from cursor. Most importantly, :q! gets you out after you screwed up the file in a new and imaginative way.
Just created a new project on google code, pyDepGrapher. I’ve been working on a easy to use dependency generator for a while. It’s much nicer to generate the pretty dependency charts on the computer rather than keep drawing them on the whiteboard at work. Since I recently started a new job, I’ve spent the last couple of weeks doing nothing but documenting how all the systems interact and what depends on what.
It allows you to enter the data into a csv file, and specify the type of item and the dependency, then using a configuration that allows you to use any graphviz options specify how each type will be drawn. Then it spits out a nice png with your graph. Over the next few weeks I’ll be adding quite a few features including subgraphs or clusters, some limited analyis capabilities and maybe even a web fronted for maintaing the overall list.
I’ve been playing around with writing some python tools to get data into and out of bugzilla. I’ve dumped them in google code at https://code.google.com/p/gardens-bugzilla-tools/
Just found out that when setting up a sqlldr job on a windows system when you setup the action you MUST put in a working directory. Otherwise the sqlldr silently fails.
Contrary to old versions now fence_xvm restarts the vm by default. WTF? If it dies I want it dead, as they say, shoot the other node in the HEAD. To fix, create the fence_xvm as a shared fence and then manually edit the cluster.conf file and add option=”off” to the fence section. Stupid stupid stupid
If you have multiple interfaces, and your default route points to a different interface than your cluster communications, you MUST specify the interface when launching fence_xvmd on the physical hosts