Archive

Posts Tagged ‘monitoring’

Oddities in Gathering Windows Performance Data

February 17th, 2009 No comments

At Zenoss we do quite a bit of remote monitoring of computers running Windows. In the Enterprise edition of the product, we collect raw performance counter data using the conventional remote Windows Registry APIs.

We ran into an issue recently with a customer running Windows 2000 where the data from the remote server was being truncated prematurely. Since we implement our own remote API (so we can run natively on Linux and with Python, rather than requiring Windows), there was some immediately concern we ran into a low-level bug in our protocol implementation. Thanks to the release of the Windows Communications Protocols (MCPP) last year we have great detail on how our API layer should function.

Reviewing the MCPP in detail compared to our implementation showed no bugs against the specification, but I did notice some odd behavior. Normally when using the RegQueryValue API you specify a NULL buffer point and a zero-length buffer size so that the call will provide the actual size of the buffer needed. With this particular customer’s server I noticed that this behavior wasn’t behaving as documented in the MCPP.

An error code of ERROR_MORE_DATA was being returned. The MCPP says that when this value is returned the server will populate the size output variable with the actual size in bytes of the needed buffer. In this case, the size was always the same size as the input. After some experimentation I found that if I passed in approximately 64 Kbytes more data the call would finally succeed.

While quite odd behavior, this is actually the documented and expected state in the Win32 API documentation for RegQueryValueEx, but not in the MCPP. Specificially, when using the HKEY_PERFORMANCE_DATA key the ERROR_MORE_DATA behaves differently and the caller has more responsibility in guessing an appropriate buffer size.

The following pseudo-code shows the basic flow for how RegQueryValueEx should be used, either for locally or remote performance data access.

size = 65536 # starting size, probably computed from a previous registry call
params.in.data = params.out.data = buffer(size)
while 1:
    params.in.size = size
    params.out.size = 0
    dcerpc_winreg_QueryValue(params)
    if params.out.result == ERROR_MORE_DATA:
        size = size + 65536 # add another 64 Kbytes of data to the buffer
        params.in.data = params.out.data = buffer(size)
        continue
    break

After fixing that issue I was still left with one oddity. Let’s say, for example, it took 293,500 bytes of data before the RegQueryValueEx call was successful. And yet, the actual amount of returned data would only be 195,000 bytes, or something similar. This behavior seems quite different than on the other Windows operating systems we have tried so far.

This is the first time we’ve tried our data collection against a Windows 2000 server running Exchange locally. Windows 2000 has also been the source of several other key behavior differences in how performance data is returned, so my current speculation is how the server actually determines what data to be returned varies greatly between operating system versions. We normally query the performance counter registry for only a subset of values. It may well be that on Windows 2000 a buffer size large enough to retrieve all performance counters is required, even though once the call is complete it actually used quite a bit less.

Quirky, but another bug gone.

Using Zenoss Core to Watch the Home Network

November 25th, 2008 3 comments

Zenoss is an open-source infrastructure management product. Normally used by institutions to watch their networking and server infrastructure, it also is used in smaller, less mission-critical scenarios. Scenarios like mine: I want to monitor my home & home office infrastructure since it has grown over time to contain a fair number of devices.

Without open-source alternatives like Zenoss, monitoring an extensive home or small office network is cost prohibitive so it often just isn’t done.

Since I also work for Zenoss, I’m often using various versions of Zenoss on my development systems to watch the home network. But these installations are not stable and long-running. One alternative I’ve considered is to install a new Virtual Machine on my home workstation that is a Zenoss appliance, but the issue here is that my home workstation is not always completely stable. I want my Zenoss install to be stable and relatively free from interference from my other activities. A dedicated server is the answer.

Normally I have enough spare computer parts in the closet that I could cobble together a working machine, but no such luck this time. Since I was building a server from scratch, I really did not want to go and build another large machine that was power-hungry, loud or took up a lot of space. Intel has come to the rescue with the new Atom-based systems that are very low power and use the mini-ITX form factor.

I wound up choosing the Intel D945GCLF2 main-board that comes with a dual-core Intel Atom 330 already installed. I configured it with 2GB of DDR2 667 memory. I picked an Apex MI-100 case with 250W power supply for it all to live in. Since this case is passively cool the only noise in the system is the small fan on the main-board’s memory controller (the CPU is passively cooled!) and any noise from the hard drive. I did have some spare hard drives laying around, so I picked an older Western Digital 150GB Raptor to help with system speed. Total out of pocket cost since I already had the drive? $171.22 shipped from newegg.

IMG_0483

As you can see from the relative size of the hard drive the whole system is tiny. I don’t need an optical drive so if I wanted to put another 3.5″ hard drive in the case for mirroring purposes I could easily do so. Chances are better than I’ll eventually put a low-cost solid-state drive in the system as a 16 or 32 GB size would be plenty.

IMG_0487

I put Ubuntu server on the system and used the network-based install to get it installed so I didn’t need to put an optical drive in the system. Rather than mess around with a network boot (which is great when it works, but I never have a working server setup to host the boot files) I just put the boot image on a USB memory stick and booted the core installation from there. The only issue here was a drive number reordering problem once the system was rebooted and the memory stick removed, but that was easily fixed.

Once running the system is effectively like any other x86 based Linux server. It’s not even slow, as the 1.6 GHz dual-core Atom supplies plenty of computing power. It’s clearly not as fast as current or previous generation desktop or server chips, but for building Zenoss I don’t find it any slower than the virtual machines in Zenoss’s development VMware farm. In many ways, I like having a slower system here to use because it helps keep me mindful of not ever user has plenty of computing power to throw at our software.

I decided to run Zenoss using a source-based install, but instead of running off the trunk I picked the 2.3.x branch to start with. This branch is from the version we just released which is proving to be our best release yet. I’ll switch to new releases using the source install but avoid the trunk given the large amount of turmoil that can occur there as new features are integrated into the product.

After getting Zenoss running I let it discover my home network. As expected, it found nearly all of the devices I have (13 out of the 16). There will be some configuration to do on some of the devices to enable true monitoring, but this is a great start.

ZenossHomeNetwork