Monday, July 30, 2012

Intel Infiniband Solution - TrueScale Infiniband


Information on Intel® TrueScale InfiniBand Solutions.
You would have probably known Intel buy-over of Qlogic. Not much detailed information are revealed yet on Intel site, but you can find similar information at Qlogic site, but without Intel specific part number

  1. Intel® TrueScale InfiniBand Edge and Director Switches
  2. Intel® TrueScale InfiniBand Fabric Management and Software Tools 
  3. Intel® TrueScale InfiniBand Host Adapters
  4. Intel® InfiniBand Cables

Sunday, July 29, 2012

Basic Active Directory Authentication with Centrify Express for CentOS 6

Centrify Express is a comprehensive suite of free Active Directory-based integration solutions for authentication, single sign-on, remote access, file-sharing, monitoring. In this tutorial, you will learn how to install Centrify Express  on CentOS

Do read on Basic Active Directory Authentication with Centrify Express for CentOS 6

Thursday, July 26, 2012

Another look at Changing hostname for CentOS

This is an extension of the article Changing the hostname on CentOS. You can replace Step 2 and 3 found in the article with a one-line hostname command

# hostname www.hostserver.com

To test the hostname
# hostname

www.hostserver.com



Saturday, July 21, 2012

Compiling QUEST with Intel XE Compilers on CentOS



This is a continuation of Compiling QUEST with GNU Compilers on CentOS, If  you are planning instead to use Intel XE Compilers, you have to edit the make.inc file. Here are my settings. Do note that the Intel XE and previous version of Intel are major architecture changes. Do look at Linking Applications with Intel MKL version 10

At make,inc, line 23
# Intel Fortran compiler
FC = ifort
FC_FLAGS    = -O3 -warn -openmp
NOOPT_FLAGS = -O0 -warn -openmp

At make.inc line 46
# Intel C++ compiler
 CXX = icc
 CXX_FLAGS = -O3 -openmp $(CUDAINC) $(MAGMAINC)

At make.inc, line 56
# Intel MKL library
MKLPATH   = /opt/intel/mkl/10.2.6.038/lib/em64t
LAPACKLIB = -L$(MKLPATH) -L$mkll  -lguide -lpthread -lguide -lpthread -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core


Finally, follow the verification process at stated in the Compiling QUEST with Intel Compilers on CentOS

Compiling QUEST with GNU Compilers on CentOS


QUEST (QUantum Electron Simulation Toolbox) is a Fortran 90/95 package that implements the Determinant Quantum Monte Carlo (DQMC) method for quantum electron simulations.

Compiling under gcc is complete breeze. Just do a simple.
# tar -xzf QUEST-1.3.0.tgz

# cd QUEST-1.3.0

# make
For different environments, please edit the make.inc file to suit your system. FOR GNU, you do not need to

For testing, you can go to  QUEST-1.3.0/EXAMPLE/verify
# ./verify

.....
.....
Parameters : t =  1.00, mu = -0.50, U =  0.00, beta =  1.50,
============================================================================
                    Theoretical | Computed (avg +- error) |  |T-C| : error
----------------------------------------------------------------------------
          Density :   0.828924  |  0.828924 +-  0.000000  |    0.00 :  0.00
           Energy :  -0.964017  | -0.964017 +-  0.000000  |    0.00 :  0.00
============================================================================
 91.11% within 1 error bar (Expected 63.2%).
100.00% within 2 error bar (Expected 86.5%).
 Running time:   322.1790     (second)

Friday, July 20, 2012

Encountering Warning No xauth data; using fake authentication data for X11 forwarding

I was encountering this error recently which trying to X forward to another remote site.

"Warning: No xauth data; using fake authentication data for X11 forwarding."
and there was no and doesn't display picture.

These are the steps I took to trouble-shoot
  1. I checked my /etc/ssh/sshd_config and noted that the I have "X11Forwarding yes"
  2. On my .ssh/config, I have the "ForwardX11 yes"
  3. But one of my parameter /etc/ssh/sshd_config  "X11Uselocalhost yes". Apparently,I was able to X11 Forward for hosts specify in my /etc/hosts file, but those outside my host file, I was not able to display the picture.
  4. But once I modified the  "X11Uselocalhost no", the issue was resolved.
There was this post that a user explained quite well. (http://www.authsecu.com/nntp/comp-security-ssh/19540-comp-security-ssh-what-does-%22x11uselocalhost-no%22-do.htm)

When doing X forwarding, sshd listens on a TCP socket for connections from X clients. Normally, it will accept connections addressed to the loopback address only (127.0.0.1), restricting it to clients on the local host. X11UseLocalhost no means it will accept connections from anywhere. 

Wednesday, July 18, 2012

Locating executable or binary for a command

Linux can be so fun and yet easy. This commands most probably you would have known but it is so important if you want to see where the binary is called. The good command is R

1. If you are looking for binary for a particular program. For example R
$ whereis -b R

R: /usr/bin/R /usr/lib/R /usr/local/bin/R /usr/include/R /usr/share/R


2. If you are looking for source for a particular program. For example R
$ whereis -s R


3. If you are looking for manual for a particular program. For example R
$ whereis -m R

R: /usr/share/man/man1/R.1.gz

Tuesday, July 17, 2012

Issues arising when node has muliple queue with Torque

I noticed that for Torque/MAUI, when the compute nodes belong to different queues, there could be a tendency where Torque/MAUI could conclude that the resource pool does not have sufficient resources.

I'm using Torque 2.5.3 / MAUI 3.3.1 version

Take for example, in /var/spool/torque/server_priv/nodes, if your nodes belong to
node01 np=8 queue1 queue2
node02 np=8 queue1 queue2
node03 np=8 queue2 queue3
node04 np=8 queue2 queue3

If you submit a job to queue2, something like

$ qsub -q queue2 -l nodes=3:ppn=8 openmpi.sh -v file=my_mpi_file

Based on the resources in queue2, there should be enough, but somehow MAUI will see that the resoruce is not enough. One of the best way to identify is to see the issue is to use checkjob. See Using MAUI checkjob command 

It is recommended that compute resource is tagged to one queue to prevent Torque/MAUI miscalculation



Monday, July 16, 2012

Using MAUI checkjob command

The command checkjob is very useful if you are using MAUI or MOAB to identify jobs issues

1. One of my favourite commands to check why a Job is IDLE is
$ checkjob -v 6235
The above commands display reasons why idle job is blocked ignoring node state and current node utilization constraints
node-c00              accepted : 8 tasks supported
node-c02              accepted : 8 tasks supported
node-c03              accepted : 8 tasks supported
node-c04              accepted : 8 tasks supported
node-c05              accepted : 8 tasks supported
node-c06              accepted : 8 tasks supported
node-c07              rejected : CPU
node-c08              rejected : CPU
node-c09              rejected : CPU
Some of the reasons why it is rejected can be due to. The list is not exhausted.

Features - queue or features does not meet requirement 
CPU - CPU does not match requirements
State - Used by other jobs

For more information, see Adaptive Computing checkjob. Do note if you are using MAUI some of the checkjob features are available in the commercial scheduler.


Tuesday, July 10, 2012

An error occurred while loading or saving configuration information for gnome-settings-daemon

When an account is deleted and a new account is created from the system with the same username, the GConf configuration system may fail to handle configuration for the gnome-settings-daemon etc

The error will look like

An error occurred while loading or saving configuration information for gnome-settings-daemon. Some of your configuration settings may not work properly.

You might lose a couple of icons as well and the screen seems unusable

To solve this issue, you might have to remove the user's gconf-username directory found in the /tmp
# rm -rf /tmp/gconfd-session/

If you are using the xrdp, do
# service xrdp restart

Similarly. if you are using vnc, remember to kill your older session and create a new session one

For more information, do look at

What can I do if I get 'An error occurred while loading or saving configuration information for gnome-settings-daemon' when I run a GNOME session?

Sunday, July 8, 2012

Enabling Server Side Includes - SSI on Apache 2

Enabling Server Side Include for Apache is a very simple process.

First thing first, you have to includes make sure APache knows which file to be parsed with should be parsed using SSI

At /etc/httpd/conf/httpd.conf, ensure the following has been done.
AddType text/html .shtml
AddHandler server-parsed .shtml

The next thing is to ensure that the Directory Section of the Apache where the .shtml reside should contains this. This is epsecially true if you set the AllowOverride

<Directtory /home/tester/public_html>
Options +Includes
AllowOverride All
Order allow,deny
Allow from all
</Directory >

For more information and resources, see
  1. Apache Module mod_include
  2. Why my Apache Server Side Include (SSI) is not working?

Friday, July 6, 2012

Inserting leap second causing kernel to hang for CentOS 5.7

I have encountered an interesting problem today. I encountered an error on one of my server

kernel: Clock: inserting leap second 23:59:60 UTC

It results in the hanging of one of my node. There is a report on this Bug 479765 -Leap second message can hang the kernel. Apparently, the CPU ran into a

According to the report, the widely circulated fix as root fixes the symptoms

# date -s "`date -u`"


Using ntpdate in debugging mode

The command ntpdate set the date and time via NTP. For more information on how to setup the NTP, do look at Configuring NTP Server and Client on CentOS 5.x

I found that a simple flag -d for ntpdate is very helpful
# ntpdate -d 0.centos.pool.ntp.org


Looking for host 0.centos.pool.ntp.org and service ntp
host found : 202-150-213-154.rev.ne.com.sg
transmit(202.150.213.154)
.....
.....
reference time:    d3a041ca.66da1dd6  Fri, Jul  6 2012  0:39:38.401
originate timestamp: d3a04688.94d79870  Fri, Jul  6 2012  0:59:52.581
transmit timestamp:  d3a04688.93710d5e  Fri, Jul  6 2012  0:59:52.575



Resources:
How do I join pool.ntp.org?

Monday, July 2, 2012

Configuring Torque Submission Node

If you are planning to have more nodes where the users can do submission apart from the Head Node of the Cluster, you may want to configure a Submission Node. By default, TORQUE only allow one submission node. There are 2 ways to configure this submission node. One way is by using the Using RCmd authentication, the other is by using the “submit_host paramter” in the Torque Server.

We will focus on “submit_host paramter” in the Torque Server for this blog entry. For more information, see my other blog Configuring Torque Submission Node