/mnt/config/hosts/<HOSTNAME>/config/condor_config.local
.This is an old revision of the document!
Table of Contents
Network Installation of Condor
This page describes the installation procedure for installing Condor on a network-shared location. Installing Condor on the network allows for centralized configuration and convenient upgrades. This network-shared location should be on a high-availability file server, preferably a high-bandwidth RAID-enabled NAS, that can be accessed from all members of the Condor pool.
Our pool uses a dedicated NAS server with a 2.7TB RAID 5+1 drive configuration to provide faster than normal access times to the data than normal hard drive I/O, especially data reads.
Install Binaries
In order to install the binaries onto the tesla.cs.wlu.edu
NAS, run this command in the terminal:
cd /mnt/config/src/fedora64 sudo ./condor_configure --type=manager,submit,execute --central-manager=john.cs.wlu.edu --local-dir=/mnt/config/hosts/_default --install-dir=/mnt/config/release/x86_64_rhap_5 --install --verbose
Add Machines to Condor Pool
Add Local condor User
In order for daemons to run correctly and for permissions to be properly set, a local condor
user must be present on all members of the Condor pool. The following must be set for the condor
users:
''condor'' UID = **1344**\\ ''condor'' GID = **1610**
First, check to see if the condor
user exists on the machine. Do this by running:
cat /etc/passwd | grep ^condor:
If you get a match, first reset its settings in case the user wasn't created correctly.
sudo groupmod -g 1610 condor sudo usermod -c "Owner of Condor Daemons" -d "/var/lib/condor" -m -u 1344 -g condor -s "/sbin/nologin" -L condor
If you get a message that says that the directory
/var/lib/condor
already exists, run this command next:
sudo chown -R condor:condor /var/lib/condor
If you do not get a match, you need to manually add the user. To do this, run:
sudo groupadd -g 1610 condor
sudo useradd -c “Owner of Condor Daemons” -d “/var/lib/condor” -m -u 1344 -g condor -s “/sbin/nologin” condor sudo usermod -L condor</code>
Just to be sure, do
ls -al /var/lib/condor
and verify that the entry .
is owned by condor
and is a part of the condor
group. If not, you probably have a conflicting UID or GID and will have to set it manually. Set it to one that is not being used by the local user system or by the network and then set the CONDOR_IDS
variable in that individual host's Condor local configuration file1)
Set Machine Variables
The problem with putting as much of Condor on the NAS is that this introduces a lot of NFS traffic onto the network, especially when Condor jobs are running. Having the user executables stored centrally on the NAS will cause all of the computers to be almost constantly reading from the NAS when the executables are opened and run.
The W&L Computer Science Department Systems Administrator, Steve Goryl, had a similar problem when all of the Linux applications on the lab computers were actually centrally located and run from the central CS department server. This proved to produce higher-than-expected traffic on the network and the programs became laggy. Installing the applications locally on the hard drives of the lab computers proved to be more of a pain administratively but provided much better overall performance.
We can still have good performance2) while having Condor centrally located by having Condor's binaries and all of the configuration files located on the NAS while storing currently-running Condor job user executables locally on the executing machine's hard drive. Condor's binaries will stay on the NAS for the sake of easy upgrades and job binaries will be stored on and run from the execute machines' local hard drives.
In order to do this, we need to create certain directories on every machine that are owned by the (local) condor
user. These directories will serve as the playground for condor jobs when they are executing on a machine. To do this, we need to create such directories and then tell Condor where they are and what to do with them.
sudo mkdir /var/lib/condor/execute sudo chown -R condor:condor /var/lib/condor/execute sudo chmod -R 755 /var/lib/condor/execute