Subscribe to feed
Blog | About

Archive for Administration

Basic Postfix Queue Management

I had to wrangle with a bunch of overloaded email servers recently and wanted to share my new best friend when this sort of thing happens:

postsuper

This guy is available if your email server is running postfix (at /usr/sbin/postsuper) or if you are running Zimbra which uses postfix under the hood (at /opt/zimbra/postfix/sbin/postsuper in that case). You should be able to run it and access the man page for it as root. This is pretty basic stuff for a postfix admin but I had a lot of trouble finding even basic descriptions of how to go about doing the things that postsuper can do.

Postfix files messages into several queues. The main ones are the following:

  • Incoming/Active - These are the common queues where incoming/outgoing messages live until they finish being delivered or received.
  • Deferred - If messages cannot be delivered they go here and delivery is reattempted until the messages expire.
  • Hold - A queue that you the administrator can move messages to. Messages placed here are not processed, no attempt is made to redeliver them, and they do not expire. As far as I can tell this queue only exists to make life easy on the administrator and gives you a safe place to store things temporarily.

Sometimes an email server will get crushed with volume from an attack, a misconfiguration, a mistake, or something similar and your machine will peg itself at full load while it tries to move the messages. This is when postsuper shines. There is a lot more to postfix and postsuper but these three commands can help you save your server and the people that depend on it for email in certain situations.

  • postsuper -d (deletes messages)
  • postsuper -h (move messages to Hold queue)
  • postsuper -r (requeue messages, can requeue messages in Hold to Incoming/Active)

Each of the above commands take a third argument ‘queue_id’. This value can be ALL (must be all caps) to tell it to apply the operation to all messages in all queues, a dash (-) to tell it to apply the operation to queue_ids provided at stdin, or a specific message’s queue_id. The stdin option is especially nice as you can create a file with a queue_id on each line and then have postsuper process all of those messages in one run. You can also do ALL [queue name in lowercase] to apply the operation to all messages in a specific queue. For example, postsuper -d ALL hold would delete all messages in the Hold queue. Do NOT leave the queue name off if you are trying to do this else the ALL will snag everything in all queues.

Using the commands above, this is how you could deal with a single host, or a small set of hosts that is slamming your server with messages. Say these messages are not important and do not need to be read but they are backing up the postfix queues and preventing delivery of real email to your users.

(1) postsuper -h ALL

  • Pushes all messages (the junk and the valid ones) to the Hold queue so you can breathe and think.
  • Empties out the Incoming/Active queues so that new messages can be effectively delivered.
  • At this point you need to stop the hosts sending the volume you don’t want, either by blocking the hosts if it is an attack or by fixing the issues if it is a misconfiguration or issue with a server you have control over.

(2) Next you would want to delete all of the messages you don’t want from the Hold queue. There are several approaches here, but this is one:

  • Create a file containing the information on the bad messages:
    mailq | grep theEnemyHostName > bad_messages.txt
  • Write a script to generate a new file from bad_messages.txt that contains just the queue id on each line, nothing else. Use perl or whatever you are comfortable with. The queue id is generally the first chunk of output for each message printed by mailq so you likely just need to grab the first eleven characters of each line.
  • postsuper -d - < list_of_queue_ids.txt. This will delete all of the messages you identified.

(3) Requeue all of the remaining, valid messages with postsuper -r ALL.

All of the above applies to Zimbra as well. Even though it has that fancy web-based administrative interface and a real nice way to view and filter your mail queues that interface always blows up and pukes errors when I try to do an operation involving any kind of heavy message volume. So if you get in a similar jam the CLI is what you have to use. The postsuper command is also very fast so even if you have 10’s or 100’s of thousands of messages jamming things up it should not be a problem.

Comments

OpenSuSE No More

I’ve installed OpenSuSE on a dozen or so work servers, used it as my previous development environment for about a year, and generally have been a big fan.

However, it has seemed the ‘official’ repositories get more and more out of date (I am running 10.2 mostly and my impression is it has been left to rot) and i’ve grown increasingly frustrated with how slow yast has become at updating its caches of rpms and repositories whenever I want to install or update software. I generally load yast, wait 10 - 15 minutes, then do what I was looking to do. I have the machines set to update themselves every week, do I need to toggle another setting to make them go ahead and update their software lists and repository caches?

That hasn’t been too big of a deal. The machines had been rock solid stable (300 day + uptime) so I didn’t want to fiddle with something that was working. I had a weird experience this week though when I realized what was happening when I rebooted some of the long stable machines.

The first case happened at the office for a machine that wasn’t very important. I rebooted and when the machine came back up there were dozens of errors related to runlevel 3 applications not being able to start because my /var partition wasn’t accessible, networking didn’t start correctly, and the keyboard did not work. For this machine I just blamed it on the HD and requested a replacement from Dell.

Then I went to the datacenter and rebooted a production application server to troubleshoot an amber light and the exact same thing happened. I did not expect that and could not write off that machine as it served several important roles.

I booted up with a live cd and all of the system partitions were fine. Everything could be mounted, fsck came back clean, I could chroot into the SuSE system and stuff worked, I checked over my /boot partition, GRUB configuration and inittab file, I had no idea why it would fail so utterly at boot time.

Our basic (non-Database) server is setup like this:

/boot - primary Linux ext3
swap - primary Linux swap
/ - primary LVM
/dev/system/root as '/' on the LVM partition as ext3
/dev/system/var as '/var' on the LVM partition as ext3

These machines were updating every week but kernel updates are not applied until reboots so my gut feeling was that perhaps something changed due to the kernel upgrade related to LVM. I spent literally 6 hours troubleshooting down this path and was inclined to believe this was the issue because there is in fact a lot of chatter on google about kernel upgrades screwing with LVM. I even tried creating a non-LVM /var, copying the contents of the LVM /var there and booting. That almost worked but the network did not start and I still could not use the keyboard.

At about hour 5 I pulled up the Novell documentation for the init process of OpenSuSE and started working through it step by step chroot’ed in from the live cd.

What was the issue?

OpenSuSE deleted it’s own /etc/init.d/boot script.

Seems impossible right? I’ve watched it happen 3 times now and still have a lot of machines to reboot that I fully expect to have the same problem. Perhaps a penalty for long uptime? I missed it completely when I was checking over inittab initially - I guess I just assumed the core script that kicks off EVERYTHING would not have been deleted by the official update/upgrade process of a mature Linux distribution. I managed to find it by progressively stepping back the initial run level passed to the kernel by GRUB until I could see far enough up the boot process to see the ‘file not found’ message. I didn’t see anyone else having this issue so hoping if someone else hits it they don’t waste half a day chasing false causes and find this post instead.

So now the question is what distribution should go on our servers (a distribution that neuters itself during an upgrade cannot stay). I am pretty fond of Ubuntu server but Sean at the office has pointed me at Arch and I am really, really digging the way they do things. I hope to make a separate post about that at the company blog in the near future.

Comments

Zimbra Migration Postmortem

I posted a short while back about excitement surrounding a migration from Exchange 2003 to Zimbra for our company. The migration has had its ups and downs and now that it has happened and I have had a couple weeks to dig in as both a user and administrator I would like to share our experience.

The general takeaways are that Zimbra isn’t perfect. It does some things worse than Exchange and some things better but the balance, in my opinion, slants heavily in Zimbra’s favor. I’ll break it up into migration and then administration/usage.

The Migration

The migration was a bruiser. It involved a couple nights of failed attempts and then a brutal 6pm - 4am effort to get everything finished well enough to go to sleep. I had a sysadmin helping me that knew his stuff so the details of how to complete it aren’t here (he handled most of the work), just the headaches I saw. The issues included:

  • The bulk migration tool was not able to migrate calendars.
  • The individual .pst importing tool also was not able to migrate the calendars. It would just fail like crazy and then give up because the error count was too high. For users with 2k+ appointments the migration would fail after only a few dozen events. I eventually got these calendars over by doing .pst exports/imports with Outlook itself rather than trying to use server-side migration tools.
  • We had to run the bulk migration over 2 nights because it took a long time. This isn’t a huge surprise because we had 100’s of 1000’s of emails, events, and contacts to migrate but the issue is that the second run re-imported everything imported in the first batch despite settings to the contrary. This essentially created duplicates of all emails and contacts.

To remove the duplicates of emails I used a perl script found at this page (this script actually worked fantastic). For contacts I used the Zimbra CLI to bulk clear the applicable address books and used client apps to re-import cleanly.

Administration/Usage

Zimbra started to shine after the migration ordeal. We immediately had all of our OSX users sync’ing their iCal, Apple Mail, and Address Book apps with the server, I had most of the Outlook users on the Zimbra Outlook connector without much effort, and most things worked well. There were a few issues I encountered.

  • The Outlook connector worked flawlessly in XP Pro but was very difficult to install in Vista. You need to follow the tip here and then just keep trying until it works. If it doesn’t work remove the program and try again. I really hate Vista and the fact that it makes things so hard.
  • The activesync with Windows Mobile is pretty flaky. It fails often for no apparent reason. I settled on using IMAP for email and just sync’ing my contacts and calendar and this seems to work consistently. It was as if it was stumbling over the greater volume of items to sync when the email was part of it.
  • I’m not real happy with the calendar sharing. Without admin intervention a user must share their calendar with each individual user and each of those individuals must login to the web interface to accept the share and see it. These notifications cannot be accepted in Mail/Outlook/Entourage or whatever else. Once these calendars are accepted though you can use almost any app you want as your calendar and that is nice.
  • There are connector apps for almost everything, but many of them are not updated to the latest versions of their target apps and none of them are completely polished and perfect. The Outlook and OSX ones seem to be the best but those also are not without issues.

In general though Zimbra works pretty well. I have calendar and contacts sync’d with my laptop using the OSX sync services and also sync’d to my Windows Mobile phone using activesync - a setup that never would have been possible with Exchange (without Entourage, but Entourage sucks in my opinion).

There are shortcomings but as I have worked through various user issues I have discovered what I believe is Zimbra’s biggest strength - its openness and open source underpinnings. It is a huge, powerful piece of code and between the CLI and the REST API you can do almost anything as an admin. Now that I am getting the hang of it I have created a set of quick scripts to interact with the CLI for doing things like auto-mounting calendars shared with distribution groups (getting around the email acceptance bummer mentioned above). The REST API is great and documented a bit here. It is completely trivial to export people’s contacts or calendars and to constrain what is exported using different parameters using the REST API.

Another big advantage in Zimbra’s favor is the community is quite strong and helpful. They have a wiki, forums, and bugzilla all very active and open.

So this is a bit of a ramble, but overall I am exceptionally happy that we made this switch. Zimbra is not perfect but it is powerful and utterly open making it possible to find workarounds for almost anything and it helps that it runs on Linux as well.

Comments

Zimbra Anticipation and Exchange Hatred

I have mentioned it in passing before, but almost every server associated with our company is running Linux (and Mac has managed to take over the workstations surprisingly fast - only 2 windows machines being used now). The last hold out on the server side was the Exchange server we setup when we first got an office that for obvious reasons had to be running Windows Server 2003. This was the same server I had the raid fun with.

Finally, after a year+ now of me hating Exchange solo, the requests and general feeling of the office has shifted against it across development AND sales and the migration to Zimbra is scheduled to be completed next week. I couldn’t be happier about it. Among many things I am most looking forward to administering a Linux machine, having a better web client (that doesn’t change by browser), and Apple iSync support. I am also looking forward to the Blackberry support which despite my unwanted but nontrivial experience with Windows servers I absolutely could not make work with Exchange.

To properly send Exchange on its way I thought I would enumerate some of the many reasons I hate it :)

  • It runs on Windows. Windows is decent for a workstation but makes for an awful server in my opinion. Perhaps it comes down to experience, but I feel that the Windows approach to server administration (meaning hundreds of obscure windows, tabs, and buttons) requires more effort to learn, involves completely unnecessary abstractions over known technology, and makes everything you need to do take longer. They are unstable, require reboots to update (wtf?), and you have to use remote desktop to administer them. Enough about Windows as a server in general, back to Exchange.
  • There is no reasonable method for setting up a catch all. Read this page if you need to do it and prepare to be disgusted.
  • There is no reasonable method for forwarding email. How did they mess this one up so badly? It seems that the ability to setup an email forward would be a core feature of server software designed to send and receive email. To do this you have to create a dummy contact with the forward email, then create an exchange user account (with a different name and username else there is a collision), then configure that exchange user account to forward its mail to the dummy contact record, which will then cause the email to be forwarded to the final destination. Completely ridiculous as it bloats the active directory listing with loads of dead entries and takes too many steps to setup.
  • I have had a lot of trouble with Exchange’s SMTP connectors where HTML emails headed towards external email accounts (via forwarding hack mentioned in previous bullet) back up in the queues for absolutely no reason and prevent messages from being delivered for hours sometimes.
  • It doesn’t support iSync as far as I know.
  • It doesn’t have spam filtering built in (it kind of does but it does an awful job in our experience).
  • The web client is pretty terrible, and if you try to access it with anything other than IE is takes a severe dive to awful. In the non-IE mode you can’t search, can’t create folders, can’t create rules, and it is genuinely unusable.

The only big strength Exchange offered, and the reason we used it to begin with, was the calendar synchronization and thankfully Zimbra has arrived to offer an alternative to the mess that is Exchange. Zimbra is now feature rich, stable, and validated by huge installations such as the one at Georgia Tech. The feedback and reviews are glowing and the documentation makes it clear that all the little things I hate about Exchange because they take too long or are too cludgy are quick command line or file editing steps.

I’ll post again once I have put some hours of usage in with some content that doesn’t mention Exchange once and instead talks about Zimbra. I think it is safe to say though that if you are starting a company just skip Exchange from the start. You can get hosted Zimbra just as you can hosted Exchange if you don’t want to manage your own server.

Comments (2)

Intel Raid Fun

One of the first machines we setup at the office was an Exchange server built from parts. As anyone at the office knows, I absolutely hate Exchange and one day that last Windows Server box in the company will be gone and replaced with Zimbra or something else.

The machine uses an Intel D945PSN desktop board with a fake raid raid1 across 2 Samsung SP2504C drives and runs Windows Server 2003. The fake raid is managed through either a chunk of software accessible at boot time with CTRL-I or through the Intel Matrix Storage Console once booted into Windows.

It ran nicely for 2ish years but then about 2 weeks ago one of the disks in the raid1 failed. I had a heck of a time replacing that drive and wanted to share what worked and what didn’t here. The process was complicated by the fact that it was our single Exchange server and managed all corporate email so I couldn’t be as adventurous with getting it back in order.

Things That Didn’t Work

First off, the replacement drives I ordered didn’t work as straight up simple replacements and are the source of all the difficulty. They were firmware VT100-52 and the older original drives were VT100-33. The newer drives were 100 MB smaller and thus the Intel Matrix software would not let me rebuild on to them. So we had 1 232.9 GB drive and a new 232.8 GB drive.

Next, I purchased BootItNG (which ended up being crucial to this task overall) and tried resizing the NTFS partition of the existing drive. After doing this and booting Windows back up the Intel Matrix software still saw the old drive as 232.9 GB so this didn’t work either.

Calling Intel for support did not work. They refused to assist over the phone with a “chipset issue”, and over e-mail they refused to assist because I was running Server 2003 on the desktop board and apparently this wasn’t supported. Despite my response of “The OS is not the problem, I need help with the Intel raid” and creating a new ticket where I lied and said I was using XP they never responded to me again.

Calling Samsung for support did not work. Their website did not have anything useful for this model number hard drive. I called them multiple times seeking a way to downgrade the newer drives or upgrade the older drives in terms of firmware. Eventually they had me call some Far East Services company in New Jersey whose phone system was busted (they couldn’t hear me - tried from multiple lines) and whose website was offline. I unleashing an email to some common emails (info/admin/support/technical) hoping to get a hit and despite ‘admin’ going through I got no response. I finally got Samsung to connect me to them on some other number but they then told me they only handled exchanges and Samsung would handle any firmware issues. So no luck there.

What Did Work

So I did finally get things working, and here was the sequence that worked. By no means am I guaranteeing this will work for you, proceed as your own risk. In short you must disable the existing raid volume, copy the data from your remaining original disk to the new smaller disk, then boot back into Windows and create a new raid volume.

  • Go buy the BootItNG / Image For Windows combo pack or something that can accomplish the same things. It is only $50 for the combo.
  • Image your drive and back it up to another machine. The odds of something going wrong are high enough.
  • Shut off the machine and remove the dead hard drive.
  • Boot the machine and use CTRL-I to access the Matrix ROM software.
  • Reset your remaining disk to Non-Raid.

    • At this point the software will issue stern warnings and tell you it will destroy ALL data on the drive being reset. I took a gamble based off forum chatter and it turns out this was not the case FOR ME (but who knows whether it ever is true). All it did was remove the disk from the fake raid controller’s knowledge and it makes the previously mirrored drive standalone and bootable outside of the raid array.
    • If you omit this step and try to boot off of the mirrored drive while it is still in the raid volume it will not work.
  • Now reboot your machine and make sure everything is happy. Windows should boot, and if you check the Intel Matrix software once booted it will now show your single drive as Non-Raid. At this point you are running on your single original disk and not using any raid volume.
  • Power off your machine and insert your new 100 MB smaller hard drive.
  • Boot the machine up with the BootItNG CD inserted.
  • Copy the contents of the original drive to the new drive using BootItNG’s copy/paste metaphor.
  • Power off your machine and remove your old disk.
  • Boot the machine back up and make sure it gets all the way into Windows. We are now running on a single new disk of the smaller size. After the boot Windows will likely want to do a CHKDSK, this is fine, just let it do its thing.
  • Power off the machine again and either insert the original 100 MB larger drive or another new drive and boot it back up with BootItNG.
  • Delete anything on the disk you just inserted, we are preparing it for usage as the second disk in a raid 1 mirror.
  • Finally, reboot again this time into Windows. It may tell you that new hardware has been detected and want to reboot AGAIN. Just let it if this is the case. Dang Windows and its constant restarts.
  • Once in Windows, in the Intel Matrix software you want to select “Create a raid volume from existing Data”.
  • Select the new drive that you copied your old data over to (that Windows booted off of) as the source, and the empty 2nd disk as the target.
  • It will crank for awhile, but you should be in good shape now. Your machine can keep running and working while the raid creation is completing.

Summary

  • All of this hassle was due to a 100 MB discrepancy in disk size. Buying bigger disks (300GB+) to make sure they exceeded the size of the originals would have worked as well and it would have been easier but that meant wasted disks, wasted space, and missing an opportunity to tinker for me.
  • In raid both the model number and firmware of the drives is extremely important due to different firmwares of the same model number having varying amounts of available space.
  • Buy backup drives when you buy the machine and the firmware isn’t going to be a problem.
  • BootItNG is a handy program that is worth its modest price.
  • Fake raid kind’ve sucks and should never be explored as an option for performance purposes. Only raid for redundancy makes any sense with it.

Comments (2)

Startup Technology Expenses

One aspect of a software startup that cannot be escaped is money must be spent on technology and development of technology. Whether this is a good or bad thing depends on if you ask the engineer or the accountant. My general rules of thumb are:

  • Purchases that help people do their jobs better or faster are worth paying for.
  • Before spending money on something look for an open source alternative that is cheap or free. Often you will find something better or only slightly inferior to the commercial item.
  • If you are going to spend money on something, the price-to-substance ratio is important.

And now a smattering of thoughts and plugs for each rule of thumb in the context of our company that is full of my personal opinions. I do realize that the earliest days of a startup largely must ignore most of this list. For example, when you don’t have an office yet (and everybody works from their homes) you don’t really worry about getting comfortable chairs, good machines, etc. for that office.

Purchases that help people work

  • Screen real estate is important. I used to think this meant 2 screens but have refined this to mean total resolution. With my macbook pro and spaces I went from using 2 computers and 3 monitors to just 1 laptop and I feel more efficient now. I like to give 2 monitors to any person that wants one - especially engineers, designers, and QA.
  • Good chairs are worth paying for. I’ve worked places in the past that gave their engineers hand me down garage sale garbage to sit on. The nature of a software company means people are going to spend a lot of time sitting and the chairs need to be good enough that people don’t notice them all day (and often longer given the nature of startups). Aerons are great if you can get a deal on them but there are solid options in the $200 - $300. CWC sells better quality furniture at the best price.
  • Don’t skimp on workstation hardware. I personally think the mac path is worth the premium for developers. On a per-item basis the price is virtually equivalent but given Dell’s willingness to haggle and price slash (especially if buying multiple items) a premium does remain. I think it is worth it.

Open Source

  • We use Java and I think it is better than .NET and it is free. You can build it on Windows/Linux/Mac and you can deploy it to all 3 as well. I think PostgreSQL is better than SQL Server (and MySQL). The Microsoft lock in has never made any sense to me and I feel the Java community is a great place in that the number of unqualified engineers is relatively small and it is full of extremely qualified people. Java also scales vertically or horizontally very, very well. It has the whole 10,000 frameworks/libraries to choose from “problem” that .NET does not have but that is okay in my opinion. We went with Spring/Hibernate/DWR and it has worked out great.
  • PostgreSQL is fantastic. The developers are accessible and helpful and the community is strong. We’ve run it up to a 1TB database and it handles it just fine. You obviously have to run it on a reasonable machine as load increases but it scales vertically wonderfully and there are addons for replication. Check out Slony and/or Mammoth Replicator if you need that replication, we haven’t yet. Visit this site for installing Postgres on your local mac workstation.
  • Linux is the way to go for servers. I don’t think the Linux/Dell combo can be beaten on the server side.

Price-to-Substance Ratio - Some Examples

  • IntelliJ IDEA is worth its cost. It is magical and exceeds a plugin-ridden eclipse install for features out of the box and I think the editing experience and source control interaction are superior.
  • Despite stability issues I think the Leopard incremental upgrade to OSX was worth it for productivity overall. Spotlight and Spaces have changed my workflow completely.
  • Dell provides a fantastic ratio here. I would strongly recommend them for server hardware, especially their latest models. Solid architecture, solid raid controllers, RAM, etc. If you go with Dell get in sync with a Small Business team. It will save you money and streamline the process as you get to talk to the same people every time. Their business lines of laptop (Latitude) and desktops (Optiplex) are also solid.
  • Good consultants and contractors are worth their rates for focused, time-constrained assistance. You have to be careful though because there are a large number of unqualified people posing as consultants and contractors that aren’t worth the time it takes to arrange a contract. If you find somebody you can work with and does a good job keep using them as needed.
  • Parallels is worth its very manageable price for providing IE6/IE7 testing to mac-using developers. See this post for help setting up the free VMs provided by Microsoft for doing this testing.
  • FlexBuilder isn’t worth the cost. When I used it a long while back it was $700+ with charting and had marginally more functionality than notepad2. Following that link, it looks like they are pumping Flex 3 now. The fact that Flex 2 has profound issues makes this especially troublesome.
  • Flex Data Services pricing defies all reasoning. $20k per CPU. Same for pretty much any other product that charges per-CPU. If anyone knows of ANY per-CPU product that is worth paying for let me know. I recently priced out a better WYSIWYG editor for portions of our product and they wanted pricing per CPU for a text editor.
  • And finally, I think sharp, qualfied engineers that you can interact with in person in the US are superior to any offshore team. When you consider the time differences, communication barriers, and general lack of quality offshore I believe a 5 man team of people that know what they are doing and work together here could out perform a 50 man team of offshore cube farm drones. I have 3 specific experiences (admittedly not that many) working with offshore teams. 2 ended in utter failure to complete the task, and 1 was bailed out of before it got too far along because even the onshore PM/BA assigned were completely clueless and ineffective. I feel like the offshoring development companies live in an alternative universe where you just keep a neutral look on your face through meetings and shuffle out inferior product making fixes until the customer is too frustrated, tired, or so accustomed to the low quality that they start to believe the software is good and consider the project a “success.”

So there you have a smattering of my thoughts. I expect to elaborate on many of these items in separate posts in the future. You can likely tell by the tones which items I find most interesting and/or alarming.

Comments (2)

Big ext3 partitions in openSUSE 10.2

I realize this is a pretty niche topic but spent several hours today trying to figure out how to create a couple 3.4TB ext3 partitions in openSUSE servers and wanted to share what worked.

The biggest tip is don’t try to use the openSUSE installer to partition and/or format the big partition. In my case it screwed things up no matter how I attempted to tweak settings. One sequence that does work is this:

  • Install openSUSE, but don’t touch the big disk (i’ll call it /dev/sdb for this post).
  • Once installed, login as root and run parted /dev/sdb. This is a great tool that I only discovered today. This page provides a good overview plus documentation and the help system in the tool pretty much tells you anything you need to know.
  • From the parted prompt type mklabel gpt.
  • Type mkpart primary start end. This creates a primary partition beginning at ’start’ and ending at ‘end’. These can be fixed MB amounts or percentages. In my case this was mkpart primary 0 100%. This creates just the partition, it does not setup a file system. In this example the new partition would be /dev/sdb1.
  • Type quit. The parted tool has great commands for making file systems as well but they don’t support ext3.
  • Now back at a regular prompt type mkfs.ext3 -b 4096 /dev/sdb1, let it crank for awhile, and you’ll have your big ext3 partition. The -b argument is specifying the block size. ext3 maximum size is determined by block size, more information on the wikipedia page.
  • Mount the file system to wherever you want and add the relevant entry to /etc/fstab

Hopefully this will save somebody else a couple hours - the key in my case was to not let the openSUSE installer play any role in setting up the partition.

Comments

openSUSE 10.2 autoyast

I’ve become a pretty huge fan of openSUSE. The installer is excellent, it just works really well and I really like having the option of yast to manage most aspects of the system, even when working from a command line. In comparison to RHEL it has more file system options, newer/more rpms in the official repositories, and in my opinion yast is superior to RHEL’s up2date. If support is an issue you can get Suse Enterprise Linux preinstalled by hardware vendors (including Dell) as well as enterprise support from Novell.

Though the openSUSE installer is pretty solid manually booting, configuring, installing, and updating an OS can get old really fast especially if you are installing on machines meant to have the same or similar roles. As part of the effort to improve our ability to manage more machines at work I decided to explore two tools to make life easier:

  • Setting up our own installation server
  • Using autoyast to automate 95% of the install for new machines

I generally followed the guidance offered at this novell.com page but want to walk through the specific process I went through as well as some specific gotchas and details in the hopes of helping out anybody else trying to do the same with 64bit openSUSE 10.2 on servers. By “on servers” I mean “no x windows”.

Setting up the Installation Server

If you have an existing openSUSE box setting up the installation server is pretty easy. Here are the steps involved in setting the server up and linking it to the official Novell yast repostories so your new installations get updated packages.

  • Run yast and goto Software -> Software Management
  • Search for and install yast2-instserver
  • Exit and restart yast and goto Miscellaneous -> Installation Server
  • From here you will be walked through the process of copying the files from your installation media to the HD and exposing the sources with FTP, HTTP, or NFS
  • For this particular example I went with FTP, openSUSE installed and attempted to configure vsftp
  • I had to manually /sbin/service vsftpd start to make it work.
  • By default vsftp was configured to allow only anonymous access with read-only permissions, and /srv/ftp was set as the root of what anonymous can see on the disk, so the config was perfect by default.
  • The full path to the 64bit installation source CD contents was /srv/ftp/sources/suse-10.2-64bit/. It is a good idea to give the source directory a specific name as that allows you to add alternate sources (like 32bit) to the same installation server in the future.
  • Go to /srv/ftp/sources/suse-10.2-64bit/CD1 and create a new file named add_on_products.
  • Edit this new file and enter any number of source repositories that you want to be included in new installs - 1 on each line. In my case it looked like this:
    http://download.opensuse.org/distribution/10.2/repo/oss
    http://download.opensuse.org/distribution/10.2/repo/non-oss
    http://download.suse.com/update/10.2
  • Sources entered here will also automatically be registered as installation sources for the new machines. If you aren’t using 10.2 your source repositories will be different. Check this page for all of them.
  • That wraps up the installation server. Assuming the vsftp service started up you are good to go.

At this point, you can setup new openSUSE machines by installing against this server. You would need to boot the machine with some sort of openSUSE installation media (the DVD, CD1, a properly setup usb key, or the minimal install CD) to get to the installation menu. From there hit F4, enter your FTP installation server and the /sources/suse-10.2-64bit/CD1 directory, press enter, and then continue with the installation. Having the installation server is really nice because you can control and manage a single, consistent set of rpms.

Setting up autoyast

Just having a central installation server is great but with autoyast you can almost completely automate installation of new openSUSE servers. This works by creating an autoyast control file at which you point new installations. The control file can include instructions for disk partitioning, installed software, services, custom config files, and directions to run extra scripts at various stages of the installation. The link at the top of this post provides a pretty good overview and the documentation here is very helpful as well. That documentation provides almost all of the information you need so where details are excluded from the following look there.

In my specific case (an autoyast file for JBoss servers) the process went like this:

  • Uploaded the latest versions of JBoss and Java (yast didn’t have 1.5), init.d scripts for JBoss, as well as our custom /etc/profile.d/environment.sh file to the installation server under a different directory accessible through FTP.
  • Wrote a script meant to run after new installs to download and configure the above. Really just a bunch of wgets, copying, linking, chmod/chown changes. This was going to be downloaded and run in the init-scripts stage of the autoyast install.
  • Setup a fresh install of openSUSE exactly as I wanted it for a JBoss server and ran yast2 autoyast from the command line.
  • Selected Tools -> Create Reference Profile
  • Selected the areas I cared about including. Note that selections here are in addition to a default set of information that includes partitioning and installed packages. In my case Firewall, Online Update Config (I enabled this on the reference server), Local Security, and User Management made sense.
  • Next was to add a custom sshd_config file. With the reference profile loaded, went to Miscellaneous -> Complete Configuration Files and then alt-E for configure.
  • Alt-w for new, file path of /etc/ssh/sshd_config for the new installs, and the loaded the contents of my existing sshd_config file for the contents.
  • Lastly, I wanted to run the script I mentioned above as an init-script. These are scripts which run after installation is complete and networking is functional on a new server. init-scripts cannot be configured through the autoyast tool so I did File -> Save As and generated my baseline autoyast file.
  • If you see warnings about the format of the generated xml file (the autoyast control file) ignore them. The Suse team has issues with their schema files.
  • Finally, I edited the autoyast file and added my init-script to the end. It looked like this:

    <scripts>
      <init-scripts config:type="list">
        <script>
          <location>ftp://myserver/myscript.sh</location>
          <interpreter>shell</interpreter>
        </script>
      </init-scripts>
    </scripts>

  • Then I just uploaded this file to the same FTP server so it was accessible during new installs.

Though the number of steps I just listed seems long, these autoyast files are really very quick to make. You could create any number of them for different machine roles and make them all available for new installs.

Setting up a New Server

Now that you have an installation server (FTP-based in this specific case) and all the autoyast files and other resources a new machine could need, you can setup a new machine from scratch by doing the following:

  • Boot from the openSUSE DVD, CD1, or minimal installation CD. With some more work you can setup a bootable usb key or use the PXE boot capability of newer machines to boot from a network resource.
  • Once you see the installation menu, Hit F4, enter your FTP installation server and the /sources/suse-10.2-64bit/CD1 directory, press enter.
  • Move the cursor over the Installation option and type autoyast=ftp://[installserver]/[autoyast-file]. What you type appears in the command line options along the bottom of the screen.
  • Press enter and walk away from the machine for awhile so the installation can complete.

Now, when I set this up, GRUB wouldn’t boot the newly installed machine. It turned out that the kernel version I was running on the reference server (and from which I generated the initial autoyast file) was different from the kernel provided by the installation server. This meant in my autoyast file the GRUB configuration portion was trying to reference a file (vmlinuz-2.6.18.2-34-default) that didn’t exist. So make sure your installation server is tied to the official repositories and make sure your reference machine is fully up to date before creating the baseline autoyast file.

I used this same approach to create configurations for JBoss, e-mail, and basic openSUSE-based servers.

Comments (2)

WebMux Setup

At work we have been pulling various infrastructure tasks into development sprints to prep for some larger clients in the near future. I snagged the task of researching and setting up load balancing equipment and want to share the experience. Perhaps my google searching abilities are just not strong enough, but I had real difficulty finding current, meaningfully deep discussion or comparison about the hardware load balancing products available. I did find Load Balancing Digest to be pretty helpful for general information and introduction. This post is just a record of my relatively shallow and unqualified experience. I would really love to hear any comments, feedback, or opinions.

Options

In my searching I encountered a pretty clean separation of product categories. There are devices that cost less than $5k each and then devices that cost more than $10k. From my limited research it seems the 5-figure devices were presented as “appliances” and “platforms” that were full of features I really didn’t need. I just wanted a load balancer not a firewall + router + load balancer + ssl accelerator + whatever else all in one package. We also don’t anticipate needing the connection counts and throughput abilities of some of these more expensive products for a long while so the sub $5k market suited our situation just fine.

I spent some time looking at the following vendor’s spec sheets:

LoadBalancer.org
Coyote Point
Barracuda
CAI Networks

The fact that all of the various spec sheets offered different fields combined with my inability to find very much meaningful discussion/comparison online caused by decision to be weighted heavily by the small pieces of information I did find. The CAI Networks WebMux products were spoken highly of in several forums, their low end device had specs more than satisfying our requirements, they support replicated pairs, and they had more capable products should we need to upgrade in the future. So I contacted AVANU, one of the resellers listed on the CAI Networks website, and had an evaluation unit of the WebMux 481S shipped over free of charge. It arrived in 3 days.

Setup

Once we got it in the office I was able to set it up in 20 - 30 minutes. The documentation is reasonable and it is based on Linux Virtual Server (as are many of the load balancing products out there) so the documentation for that project can be consulted for details that the Webmux documentation leaves out about scheduling methods or terminology.

I went with the Out-of-Path Mode configuration described in the manual and we did layer 4 least connections persistent scheduling. Our servers kept their existing IPs and all clients are sent to the farm IP setup in the WebMux. The manual suggests adding a loopback to the machines involved in your cluster using iptables but I instead setup a loopback alias with # /sbin/ifconfig lo:1 [farm IP] netmask 255.255.255.255 up.

So literally the complete configuration involved only the following:

  • Power up the WebMux and connect its server LAN port to our switch
  • Follow the Common Configuration instructions in the manual for initial setup
  • Use the web configuration panel to add a farm and assign it an IP
  • Use the same panel to add the servers to the farm
  • Login to each server and setup the loopback alias to the farm IP
  • Done

We have not yet taken the step of using the JBoss Cluster capabilities so on the software end configuration was straight forward. The WebMux supports multiple farms as well so you could use the same device to cluster other services (SMTP, DB) behind the web servers.

After setup I fired up JBoss on all of the involved servers, used a load testing tool (WebLoad) to send a ton of fake users to the farm IP and watched the WebMux web panel to verify connections were appearing evenly across the machines. Everything worked perfectly.

Of course all of this was done in our development environment. Once I have actual experience with the machines in production maybe I’ll post again with more informative content.

Comments (2)

Server Naming Conventions

It has been a bit too long since the last post and I hope to make up for it with the particularly “startup” taste of this one.

When we got started as a company we didn’t have a lot of hardware of the server variety to worry about. We had a production web/application server, production database server, a corporate mail server, and a couple development servers meant to mimic the production environment. Such a small number of machines is easy to keep track of and there was no need to create names more unique that “prodweb”, “proddb”, etc.

Though we still couldn’t fill a full rack with equipment the machine count did get large enough that a better naming scheme was necessary. After a fairly short brainstorming session we unanimously agreed that beer provided the best namespace for new machines. With that decision made we refined the naming scheme by assigning regions to machine roles. A few samples:

  • Database servers, generally the strongest and best, are named after Belgian beers
  • Web/Application servers are named after English beers
  • Servers for QA purposes are named after cheap American Macrobrews

The naming scheme alone is pretty excellent but a few weeks later we came up with a further refinement of the system. Thus far our servers have come from Dell. The price combined with the availability of 2 - 4 hr hardware replacement (Silver and Gold support packages can generally be negotiated to low or no cost) make up most of that decision. The reason the vendor matters here is that Dell servers ship with sharp looking gray bezels that snap into the front of the servers and cover the inputs and drives to give a rack of machines a consistent, clean look. Here is a picture of one if you are unfamiliar with Dell servers.

bezel.jpg


The Dell logo in their center is almost exactly the same diameter as the cap from a beer bottle. The reason this is wonderful should be obvious. We realized we could remove the Dell logos, replace them with the beer bottle caps matching individual server names and have unique, physical identifiers for our servers. Here is the approach:

Step 1: Remove the Dell logo

remove.gif

The logos are glued to a ring of rubber-like material that itself is glued to the bezel. The best way to remove them is to flip the bezel over and insert a screw driver through the larger of the slots behind the logo. Pressing firmly here will separate the logo partly from the rubber. Next flip the bezel back over and use the screw driver to pry the logo away fully. It doesn’t matter if the rubber ring remains or not as the bottle caps will fit over it fine.

Step2: Attach the bottle cap

supplies.gif

The best adhesive for attaching a bottle cap appears to be rubber cement. It avoids taking the stronger step of using super glue and dries clear. You will need to apply a solid layer and let the cap sit in place for several minutes before picking up the bezel and placing it back on the machine.

Here are a few of our servers now:


newcastle.gif

full_stack.gif

It is possible that a similar naming and labeling scheme wouldn’t be allowed in some environments but we’ve enjoyed it significantly enough that we couldn’t hoard the idea. You could take the same approach with most of Dell’s equipment but the logos placed on some of their products are larger and the bottle cap can look out of place.

Comments (1)