Thursday, March 31, 2011

/usr/bin/ld cannot find -lf2c for CentOS 5

If you encounter this error "/usr/bin/ld: cannot find -lf2c", you are obviously missing f2c package. Do download the f2c-20031026-3.0.1.el5.x86_64.rpm package found at  f2c-20031026-3.0.1.el5.x86_64.rpm - CentOS 5 (RHEL 5) - ATrpms

# wget http://dl.atrpms.net/el5-x86_64/atrpms/stable/f2c-20031026-3.0.1.el5.x86_64.rpm
# rpm -Uvh f2c-20031026-3.0.1.el5.x86_64.rpm
# ldconfig

Wednesday, March 30, 2011

Compiling MPI BLACS on CentOS 5

An interesting article from Linux Cluster on Compiling MPI BLACS on CentOS 5. BLACS is compiled with OpenMPI 1.4.x with g77 and gfortran.

For more information see Compiling BLACS on CentOS 5

Compiling LAPACK on CentOS 5

Download the lapack latest stable version (lapack-3.3.0.tgz) from http://www.netlib.org/lapack/
# cd /root
# tar -xzvf lapack-3.3.0.tgz
# cd /root/lapack-3.3.0
# cp make.inc.example make.inc
Assuming Edit make.inc. Assuming the Compiling ATLAS on CentOS 5
#BLASLIB = ../../blas$(PLAT).a
BLASLIB = /usr/local/atlas/lib/libf77blas.a /usr/local/atlas/lib/libatlas.a
Compile lapack package
# make
Copy the libraries to
# mkdir /usr/local/lapack/lib
# cp /root/lapack-3.3.0/*.a /usr/local/lapack/lib
# cd /usr/local/lapack/lib/
# chmod 555 *.a
Other related Information
  1. Compiling ATLAS on CentOS 5

Tuesday, March 29, 2011

Compiling ATLAS on CentOS 5

This tutorial is to help you compile ATLAS (Automatically Tuned Linear Algebra Software) with gFortran. For those who are using Intel Compiler, you have the reliable Intel MKL (Math Kernel Library)

First thing first, some comparison between ATLAS and MKL.

ATLAS
ATLAS The Automatically Tuned Linear Algebra Software (ATLAS) provides a complete implementation of the BLAS API 3 and a subset of LAPACK 3. A big number of instructions-set specific optimizations are used throughout the library to achieve peak-performance on a wide variety of HW-platforms.

ATLAS provides both C and Fortran interfaces.

ATLAS is available for all HW-platforms capable of running UNIX or UNIX-like operating systems as well as Windows (tm).
MKL
Intel's Math Kernel Library (MKL) implements a set of linear algebra, fast Fourier transforms and vector math functions. It includes LAPACK 3, BLAS 3 and extended BLAS and provides both C and Fortran interfaces.

MKL is available for Windows (tm) and Linux (x86/i686 and above) only.
Download the latest stable package from ATLAS (http://sourceforge.net/projects/math-atlas/files/Stable/). The current stable version is atlas3.8.0.tar.gz. Do note that ATLAS don't like configuration on its original location, hence the need to create ATLAS_BUILD directory.
# cd /root
# tar -xzvf atlas3.8.3.tar.gz
# mkdir /root/ATLAS_BUILD
# cd /root/ATLAS_BUILD
# /root/ATLAS/configure
You will need to turn off CPU Throttling. For CentOS and Fedora, you will use
# /usr/bin/cpufreq-selector -g performance
For more information, you can see my blog entry Switching off CPU Throttling on CentOS or Fedora

Compile ATLAS
make
make check
make ptcheck
make time
make install
By default, ATLAS installed to /usr/local/atlas

Finally remember to add /usr/local/atlas/lib to your LD_LIBRARY_PATH

Friday, March 25, 2011

Switching off CPU Throttling on CentOS or Fedora

Under CentOS and Fedora, you can switch off CPU Throttling or "Dynamic Frequency Scaling" to maximise your CPU performance. For more information of CPU Throttling, you can also read Dynamic Frequency Scaling from Wikipedia. Just type the command

# /usr/bin/cpufreq-selector -g performance

For Debian-based hardware, you may want to take a look at
Looking at Xorg High CPU Usage Issue for netbook

Wednesday, March 23, 2011

Installing Gromacs 4.0.x on CentOS 5.x


GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.

 
It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers

Do note that this Gromacs Installation Guide is for Gromacs 4.0.x. For detailed instruction, see GROMACS Installation Instructions. For installation of FFTW, you may want to take a look at Blogh Entry Installing FFTW

Since I'm using FFTWMPI (OpenMPI to be exact) and configure FFTW with --prefix=/usr/local/fftw,

I've configured the following

# ./configure CPPFLAGS="-I/usr/local/fftw/include" LDFLAGS="-L/usr/local/fftw/lib" \ 
--with-fft=fftw3 --enable-mpi --disable-float
Some notes...... (Assuming you are using bash)


  1. CPPFLAGS="-I/usr/local/fftw/include"
  2. LDFLAGS="-L/usr/local/fftw/lib"
  3. To compile with FFTW version 3 "--with-fft=fftw3"
  4. To enable MPI "--enable-mpi"
  5. To select Double precision  "--disable-float"
# make -j 8
where 8 is the number of cores.

# make mdrun
* if you have configure with "--enable-mpi"
# make install
* Install all the binaries, libraries and shared data files with:
# make install-mdrun
* If you only want to build the mdrun executable (in the case of an MPI build),

# make links
* If you want to create links in /usr/local/bin to the installed GROMACS executables

Tuesday, March 22, 2011

How to fix -fPIC errors

A very good article on fPIC error. See 3. HOWTO fix -fPIC errors by Gentoo Linux

If you have problem like " relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC .libs/assert.o: could not read symbols: Bad value ".

The article lists 4 cases of fPIC

Case 1: Broken Compiler
At least GCC 3.4 is known to have a broken implementation of the -fvisibility-inlines-hidden flag. The use of this flag is therefore highly discouraged, reported bugs are usually marked as RESOLVED INVALID. See bug 108872 for an example of a typical error message caused by this flag."
Case 2: Broken `-fPIC' support checks in configure
Many configure tools check whether the compiler supports the -fPIC flag or not. They do so by compiling a minimalistic program with the -fPIC flag and checking stderr. If the compiler prints *any* warnings, it is assumed that the -fPIC flag is not supported by the compiler and is therefore abandoned. Unfortunately, if the user specifies a non-existing flag (i.e. C++-only flags in CFLAGS or flags introduced by newer versions of GCC but unknown to older ones), GCC prints a warning too, resulting in borkage.

To prevent this kind of breakage, the AMD64 profiles use a bashrc that filters out invalid flags in C[XX]FLAGS

Case 3: Lack of `-fPIC' flag in the software to be built
This is the most common case. It is a real bug in the build system and should be fixed in the ebuild, preferably with a patch that is sent upstream. Assuming the error message looks like this:


Code Listing 6.1: A sample error message
.libs/assert.o: relocation R_X86_64_32 against `a local symbol' can not be used
when making a shared object; recompile with -fPIC .libs/assert.o: could not
read symbols: Bad value

This means that the file assert.o was not compiled with the -fPIC flag, which it should. When you fix this kind of error, make sure only objects that are used in shared libraries are compiled with -fPIC.
In this case, globally adding -fPIC to C[XX]FLAGS resolves the issue, although this practice is discouraged because the executables end up being PIC-enabled, too.

 Case 4: Linking dynamically against static archives
Sometimes a package tries to build shared libraries using statically built archives which are not PIC-enabled. There are two main reasons why this happens:
Often it is the result of mixing USE=static and USE=-static. If a library package can be built statically by setting USE=static, it usually doesn't create a .so file but only a .a archive. However, when GCC is given the -l flag to link to said (dynamic or static) library, it falls back to the static archive when it can't find a shared lib. In this case, the preferred solution is to build the static library using the -fPIC flag too.

Sometimes it is also the case that a library isn't intended to be a shared library at all, e.g. because it makes heavy usage of global variables. In this case the solution is to turn the to-be-built shared library into a static one.

    Monday, March 21, 2011

    Resolving "specifies multiple packages" error when removing a package

    You may be using the good old rpm -e to remove a package, you may encounter an error, in my case "blas specified multiple packages". Naturally rpm will not allow you to remove the package.

    # rpm -e --nodeps --allmatches (package)

    Sunday, March 20, 2011

    Resolving Single and Double Precision Discrepancy between pre-Nehalem Chipsets and Nehalem Chipsets

    One of our researchers was running a job running on an SMP with older Intel Processors such as Intel(R) Xeon(R) CPU X7460 @2.66GHz (code-named "Dunnington") and we notice the accuracy between single and double precision was in the order of 5 decimal different.

    For example:
    0.623291xxxxxxx (Single Precision Code)
    0.623290xxxxxxx (Double Precision Code)

    One important thing to note is that the Intel Compiler is 11.x

    But if we run the same code on the newer Intel Nehalem Architecture, you will see that the discrepancy between the single and double precision quite large. We notice the discrepancy of the order of 1 decimal point.

    For example:
    0.523667xxxxx (Single Precision Code)
    0.4353836xxxxx (Double Precision Code)

    Similarly, the Compiler used is the Intel Compiler 11.x

    If we compare the results between the Dunnington Chipsets and the Nehalem Architecture, the discrepancy is really quite unacceptable.

    Well, the solution is actually quite easy, you should update the Intel Compiler to the latest Intel® Parallel Studio XE 2011 for Linux* and your discrepancy should be eliminated and your results should be similar as what given to discrepancy. The Intel® Parallel Studio XE 2011 for Linux* has the latest libraries for the Nehalem Architecture.

    For more information on where to download, do look at the Free Non-Commercial Intel Compiler Download

    Tuesday, March 15, 2011

    Torque Resource Manager Server Parameters

    Useful Information on Torque Resource Manager Server Parameters. Do take a look
    Torque Resource Manager Appendix B: Server Parameters

    Monday, March 14, 2011

    Dealing with stuck jobs and Torque and MAUI

    This is a add-on for the blog entry "Manually Deleting Torque amd PBS jobs using MAUI"

    1. Force the Torque Server or MOM to send an obituary of the job ID to the server
    # qsig -s 0 job_id

    2. Using the momctl command on the compute nodes where the job is listed. You can use a tracejob to check which nodes the job has been send to
    # momctl -c job_id -h compute_node_1

    3i. Setting the qmgr server setting mom_job_sync to True might help prevent jobs from hanging.
    # qmgr -c "set server mom_job_sync = True"

    3ii. To verify that the setting in 3i is in, you can use trhe command
    # qmgr -c "p s"

    4. The final option. If all else fail, do a
    qdel -p job_id

    For more information, see Adaptive Computing Website Section 11.1.7 Stuck Jobs

    Sunday, March 13, 2011

    Manually Deleting Torque amd PBS jobs using MAUI

    Tracing Jobs
    To trace a job with MAUI commands including the nodes the jobs are residing, you can use the commands
    # showq -r

    Alternatively, you can use the MAUI commands to trace the job activity
    # trace job_id


    Deleting Jobs
    To delete a job with MAUI commands, you can use the commands,
    # canceljob job_id

    Alternatively, you can also use PBS commands to delete a job
    # qdel job_id


    PBS mom control
    If not able to delete a stale job which has no process, you can use the momctl command
    # momctl

    If you are unable to delete the stale job with has no process, you can use momctl to do diagnostic. Basically The momctl command allows remote shutdown, reconfiguration, diagnostics, and querying of the pbs_mom daemon. For more information on momctl, do look at momctl by http://www.clusterresources.com/:

    Example 1: Diagnosis of pbs_mom
    # momctl -h node1 -d 1
    Example 2: Cycle the pbs_mom on node 1
    # momctl -h nod1 -C

    Manually deleting the jobs
    To manually delete the jobs, you should shutdown the pbs server
    # service pbs_server stop

    Remove the job spool files
    # rm /var/spool/pbs/server_priv/jobs/111.host.SC 
    # rm /var/spool/pbs/server_priv/jobs/111.host.JB

    Restart the pbs_server
    # service pbs_server restart

    Further Information:
    1. Deleting PBS/Maui Jobs

    Thursday, March 10, 2011

    Vmware View for iPAD is finally here!


    Vmware View Client for iPAD is finally here…….

    VMware View Client for iPad makes it easy to access your Windows virtual desktop from your iPad with the best possible user experience on the Local Area Network (LAN) or across a Wide Area Network (WAN).

    Monday, March 7, 2011

    Fast Access to the last Directory accessed

    To quickly switch between 2 directories quickly, instead \of typing the whole path, you can use this command instead

    # cd -

    Friday, March 4, 2011

    Installing Centrify Express on CentOS 5


    I tried installing Centrify Express 64-bits on CentOS 5.4 x86_64 and it was quite smooth  

    Prerequisites:
    1. You have root account and password
    2. In order for you to join the domain, you need an Active Directory account with permission to add computers to the domain

      Download Centrify Express, go to:
      1. Go to Download Centrify Express
      2. You may also wish to look at the Centrify Express Linux Quick Start Guide (pdf) and Centrify Express Admin Guide

        Preparation for the Linux Box to join Centrify
        1. Change of Hostname for the Linux Computer. See blog entry Changing the hostname on CentOS

        2. Ensure your /etc/nsswitch.conf contains the following lines
        hosts: files dns 
        See man page for nsswitch.conf for more information on configuring for nsswitch

        3. Ensure your resolv.conf includes a DNS Server than resolve SRV records for your domain
        # less /etc/resolv.conf
        You should get something like
        search example.com
        nameserver 192.168.1.5

        4. Now you are ready to install
        # mkdir centrify-suite

        # mv centrify-suite-2011-rhel3-x86_64.tgz

        # tar -zxvf centrify-suite-2011-rhel3-x86_64.tgz

        # ./install-express.sh

        Respond to the installation prompt (Taken from Centrify Admin)

        How do you want to proceed? (E|S|X|C|Q) [X]:
        Accept the default, X (for Express Edition), by clicking Enter.

        Do you want to run adcheck to verify your AD
        environment? (Q|Y|N) [Y]:
        Accept the default answer, Y (to run adcheck) by clicking
        Enter.

        Please enter the Active Directory domain to check:
        Enter the fully qualified name of your AD domain; for example,
        ad.example.com

        Join an Active Directory domain? (Q|Y|N) [Y]
        Accept the default answer, Y to join a domain.

        Enter the Active Directory authorized user
        [administrator]:
        Enter the password for the Active Directory user:

        Click Enter to select the defaults for the following prompts:
        Enter the computer name: [QA1.sales.acme.com]
        Enter the container DN [Computers]:
        Enter the name of the domain controller [auto detect]:
        Reboot the computer after the installation (Q|Y|N) [Y}:

        You will see summation text similar to the following:

        You chose Centrify Suite Express Edition and entered the following:
        Install CentrifyDC 4.4.0 package: Y
        Install CentrifyDC-nis 4.4.0 package: N
        Install CentrifyDC-openssh 4.3.1 package: Y
        Install CentrifyDA 1.1.2 package: N
        Run adcheck : Y
        Join an Active Directory domain : Y
        Active Directory domain to join : ad.example.com
        Active Directory authorized user : administrator
        computer name : computername.ad.example.com
        container DN : Computers
        domain controller name : auto detect
        Reboot computer : Y

        You can still try to do a direct Active Directory domain join.
        # adjoin ad.example.com -u admin_user --force

        Thursday, March 3, 2011

        Encountering LW_ERROR_LDAP_INSUFFICIENT_ACCESS [LW_ERROR_LDAP_INSUFFICIENT_ACCESS] on Open Likewise

        I was trying to join my Linux Box to an MS Active Directory Domain using Likewise Open from Likewise. Although I have permission to join computers to MS Active Directory domain but somehow once I use the command
        # ./domainjoin-cli --logfile logfile join --ou my_OU my_AD_Domain Administrators

        From the Trace Stacks that comes out was

        Stack Trace:
        /builder/src-buildserver/Platform-6.0/src/linux/domainjoin/domainjoin-cli/src/main.c:937
        /builder/src-buildserver/Platform-6.0/src/linux/domainjoin/domainjoin-cli/src/main.c:493
        /builder/src-buildserver/Platform-6.0/src/linux/domainjoin/libdomainjoin/src/djmodule.c:332
        /builder/src-buildserver/Platform-6.0/src/linux/domainjoin/libdomainjoin/src/djauthinfo.c:722
        /builder/src-buildserver/Platform-6.0/src/linux/domainjoin/libdomainjoin/src/djauthinfo.c:1157
        20110302130337:WARNING:Short domain name not specified. Defaulting to 'mydomain'
        20110302130343:ERROR:LW_ERROR_LDAP_INSUFFICIENT_ACCESS [LW_ERROR_LDAP_INSUFFICIENT_ACCESS]

        But somehow I suspect I may not have permission to set certain attributes (such as Description) on the computer account