Wednesday, October 31, 2012

Installing NFS4 on CentOS 5 and 6

Taken from Installing NFS4 on CentOS 5 and 6 (my alternative Linux Cluster Blog). This tutorial is a guide on how to install NFSv4 on CentOS 5 and 6.

Step1: Installing the packages
# yum install nfs-utils nfs4-acl-tools portmap
Some facts about the tools above as given from yum info.
nfs-utils -  The nfs-utils package provides a daemon for the kernel NFS server and related tools, which provides a much higher level of performance than the traditional Linux NFS server used by most users.
This package also contains the showmount program.  Showmount queries the mount daemon on a remote host for information about the NFS (Network File System) server on the remote host. For example, showmount can display the clients which are mounted on that host. This package also contains the mount.nfs and umount.nfs program.
nfs4-acl-toolsThis package contains commandline and GUI ACL utilities for the Linux NFSv4 client.
portmap - The portmapper program is a security tool which prevents theft of NIS (YP), NFS and other sensitive information via the portmapper. A portmapper manages RPC connections, which are used by protocols like NFS and NIS.
The portmap package should be installed on any machine which acts as a server for protocols using RPC.


Step 2: Exports the File System from the NFS Server (Similar to NFSv3 except with the inclusion of fsid=0)
/home           192.168.1.0/24(rw,no_root_squash,sync,no_subtree_check,fsid=0)
/install        192.168.1.0/24(rw,no_root_squash,sync,no_subtree_check,fsid=1)
The fsid=0 and fsid=1 option provides a number to use in identifying the filesystem. This number must be different for all the filesystems in /etc/exports that use the fsid option. This option is only necessary for exporting filesystems that reside on a block device with a minor number above 255.one directory can be exported with each fsid option.

Exports the file system
# exportfs -av

Restart the NFS service
# service nfs start
If you are supporting NFSv3,  you have to start portmap as NFSv3 requires them. As such, NFSv4 does not need to interact with rpcbind[1], rpc.lockd, and rpc.statd daemons. For more information see Fedora Chapter 9.  Network File System (NFS) – How it works for a more in-depth understanding.
# service portmap restart


Step 2: Client Mapping
# mount -t nfs4 192.168.1.1:/ /home

Tuesday, October 30, 2012

Updating the udev configuration on CentOS

This is a add-on to the blog entries
  1. "Device eth0 does not seem to be present" on cloned CentOS VM
  2. Cannot get device settings No such device.
After modifying and updating the udev configuration as seen in the 2 blog entries. You can reload the new udev configuration in the memory. Use the command start_udev
# start_udev

Update the network configuration.
# service network restart


Further information,
  1. Look at Changing the ethX to Ethernet Device Mapping in EL6


Monday, October 29, 2012

Tools for OpenFlow


What is OpenFlow?

Taken from http://www.openflow.org/

OpenFlow is an open standard that enables researchers to run experimental protocols in the campus networks we use every day. OpenFlow is added as a feature to commercial Ethernet switches, routers and wireless access points – and provides a standardized hook to allow researchers to run experiments, without requiring vendors to expose the internal workings of their network devices. OpenFlow is currently being implemented by major vendors, with OpenFlow-enabled switches now commercially available.

Tools for Software Defined Network Controller
  1. Floodlight
  2. Nox


Wednesday, October 24, 2012

Go Parallel - Wonderful Resource for Parallel Software Development

Go Parallel Website which is sponsored by Intel and partnership with Geeknet has provided a portal for parallel software development from video, tutorials, videos news etc.

Do checkout the site.

Tuesday, October 23, 2012

NFS4 Information from the University of Michigan

NFSv4 information can be found from the University of Michigan Centre for Information Technology Integration Project NFS Version 4 Open Source Refrence Implementation

I like the rfc3530 definition of NFS v4 definition as written in the site

The Network File System (NFS) version 4 is a distributed filesystem protocol which owes heritage to NFS protocol version 2, RFC 1094, and version 3, RFC 1813. Unlike earlier versions, the NFS version 4 protocol supports traditional file access while integrating support for file locking and the mount protocol. In addition, support for strong security (and its negotiation), compound operations, client caching, and internationalization have been added. Of course, attention has been applied to making NFS version 4 operate well in an Internet environment. 

 Interesting  and relevant information

  1. NFSv4 wiki
    (Includes information on 4.1, pNFS prototype)
  2. Connectathon Test Suite
  3. General troubleshooting recommendations
  4. Performance and Stress tests for NFS

Monday, October 22, 2012

NFS4 Client unable to mount Server NFS4 file

When I was mounting NFSv4 on a CentOS 5 client with a CentOS 6 Server. I receive the error......
# mount -t 192.168.1.1:/tmp /home
mount.nfs4: 192.168.1.1:/tmp failed, reason given by server: 
No such file or directory.

My NSFv4 Server exports
/tmp   192.168.1.0/255.255.255.0(rw,no_root_squash,sync,no_subtree_check,fsid=0)

On the NFSv4 Server, I recall I export file system and restart the nfs server
# exportfs -av
# service nfs start

 I was having the issue, because I fail to understand the characteristics of the NFSv4. In NFSv4, it uses the virtual file system to present the server’s export and associated root filehandles to the client. The keep idea is to look what it mean fsid=0 on the NFS Server. For more information, do look at
A brief look at the difference between NFSv3 and NFSv4

The solution is to
# mount -t nfs4 192.168.1.1:/ /home


and the solution is there.

Friday, October 19, 2012

A brief look at the difference between NFSv3 and NFSv4

There are a few interesting differences between NFSv3 and NFSv4. Comparison of  NFSv3 and NFSv4 is quite hard to obtain and the information is referenced from NFS Version 4 Open Source Project.
From a File System perspective, there are
Export Management
  1. In NFSv3, client must rely on auxiliary protocol, the mount protocol to request a list of server’s exports and obtain root filehandle of a given export. It is fed into the NFS protocol proper once the root filehandle is obtained.
  2. In NFSv4 uses the virtual file system to present the server’s export and associated root filehandles to the client.
  3. NFSv4 defines a special operation to retrieve the Root filehandle and the NFS Server presents the appearance to the client that each export is just a directory in the pseudofs
  4. NFSv4 Pseudo File System is supposed to provide maximum flexibility. Exports Pathname on servers can be changed transparently to clients.
State
  1. NFSv3 is stateless. In other words if the server reboots, the clients can pick up where it left off. No state has been lost.
  2. NFSv3 is typically used with NLM, an auxiliary protocol for file locking. NLM is stateful that the server LOCKD keeps track of locks.
  3. In NFSv4, locking operations are part of the protocol
  4. NFSv4 servers keep track of open files and delegations
Blocking Locks
  1. NFSv3 rely on NLM. Basically, Client process is put to “sleep”. When a callback is received from the server, client process is granted the lock.
  2. For NFSv4, the client to put to sleep, but will poll the server periodically for the lock.
  3. The benefits of the mechanism is that there is one-way reachability from client to server. But it may be less efficient.

Saturday, October 13, 2012

IBM Interconnect 2012 Live Stream

An IBM InterConnect 2012 event in Singapore

View replay in Livestream

  1. A source of global innovation, the art of the possible in growth markets
    John Dunderdale, vice president of Software, IBM Growth Markets
  2. Turning opportunities into outcomes
    Steve Mills, senior vice president and Group Executive, IBM Software & Systems
  3. Unleashing innovation: The new economics of IT
    Rod Adkins, senior vice president, IBM Systems & Technology Group
  4. Managing the velocity of Change
    Robert LeBlanc, senior vice president, IBM Middleware Software
  5. Reinventing relationships and uncovering new markets
    Mike Rhodin, senior vice president, IBM Software Solutions Group
    Jim Bramante, senior vice president, IBM Growth Markets

Friday, October 12, 2012

PBS (Portable Batch System) Commands on Torque

There are some PBS Commands that you can use for your customised PBS templates and scripts. Note: # Remarks: # A line beginning with # is a comments; # A line beginning with #PBS is a pbs command; # Case sensitive. Job Name (Default)
#PBS -N jobname
Specifies the number of nodes (nodes=N) and the number of processors per node (ppn=M) that the job should use
#PBS -l nodes=2:ppn=8
Specifies the maximum amount of physical memory used by any process in the job.
#PBS -l pmem=4gb
Specifies maximum walltime (real time, not CPU time)
#PBS -l walltime=24:00:00
Queue Name (If default is used, there is no need to specify)
#PBS -q fastqueue
Group account (for example, g12345) to be charged
#PBS -W group_list=g12345
Put both normal output and error output into the same output file.
#PBS -j oe
Send me an email when the job begins,end and abort
#PBS -m bea
#PBS -M mymail@mydomain.com
Export all my environment variables to the job
#PBS -V
Rerun this job if it fails
#PBS -r y

Wednesday, October 10, 2012

Predefined Environmental Variables for OpenPBS qsub

The following environment variable reflect the environment when the user run qsub
  1. PBS_O_HOSTThe host where you ran the qsub command.
  2. PBS_O_LOGNAMEYour user ID where you ran qsub
  3. PBS_O_HOMEYour home directory where you ran qsub
  4. PBS_O_WORKDIRThe working directory where you ran qsub

The following reflect the environment where the job is executing
  1. PBS_ENVIRONMENTSet to PBS_BATCH to indicate the job is a batch job, or # to PBS_INTERACTIVE to indicate the job is a PBS interactive job
  2. PBS_O_QUEUEThe original queue you submitted to
  3. PBS_QUEUEThe queue the job is executing from
  4. PBS_JOBNAMEThe job’s name
  5. PBS_NODEFILE - The name of the file containing the list of nodes assigned to the job

Tuesday, October 9, 2012

iWARP, RDMA and TOE

I have captured some basic information on iWARP, RDMA, TOE and RDMA communication....

Remote Direct Access Memory Access (RDMA) allows data to be transferred over a network from the memory of one computer to the memory of another computer without CPU intervention. There are 2 types of RDMA hardware: Infiniband and RDMA over IP (iWARP). OpenFabrics Enterprise Distribution (OFED) stack provides common interface to both types of RDMA hardware.

For more information: iWARP, RDMA and TOE by Linux Cluster

Saturday, October 6, 2012

I/O and filled disk error when running Molpro 2010

I encountered this error when running molpro 2010 binary on a compute node.

ERROR WRITING        32768 WORDS AT OFFSET   20630927. TO FILE 1  IMPLEMENTATION=d
f   FILE HANDLE=  1018  IERR=******
 ? Error 
 ? I/O error
 ? The problem occurs in writew
Write error in iow_direct_write; fd=12, l=32768, p=20630927; write returns -1
This may indicate a filled disk, or that a disk quota has been exceeded
1:1:fehler:: 21556614
(rank:1 hostname:node-c00.cluster.spms.ntu.edu.sg pid:3742):ARMCI DASSERT fail. 
src/armci.c:ARMCI_Error():276 cond:0  1: ARMCI aborting 21556614 (0x148ed86).
Write error in iow_direct_write; fd=12, l=32768, p=20630927; write returns -1
This may indicate a filled disk, or that a disk quota has been exceeded
3:3:fehler:: 21556614
(rank:3 hostname:node-c00.cluster.spms.ntu.edu.sg pid:3745):ARMCI DASSERT fail. 
src/armci.c:ARMCI_Error():276 cond:0  3: ARMCI aborting 21556614 (0x148ed86).
0:0:fehler:: 21556614
(rank:0 hostname:node-c00.cluster.spms.ntu.edu.sg pid:3741):ARMCI DASSERT fail. src/armci.c:ARMCI_Error():276 cond:0
  0: ARMCI aborting 21556614 (0x148ed86).
Write error in iow_direct_write; fd=12, l=32768, p=20630927; write returns -1
This may indicate a filled disk, or that a disk quota has been exceeded
2:2:fehler:: 21556614(rank:2 hostname:node-c00.cluster.spms.ntu.edu.sg pid:3744):ARMCI DASSERT fail. src/armci.c:ARMCI_Error():276 cond:0
  2: ARMCI aborting 21556614 (0x148ed86).
Write error in iow_direct_write; fd=12, l=32768, p=20590741; write returns -1
This may indicate a filled disk, or that a disk quota has been exceeded
5:5:fehler:: 21556614
(rank:5 hostname:node-c00.cluster.spms.ntu.edu.sg pid:3747):ARMCI DASSERT fail. src/armci.c:ARMCI_Error():276 cond:0
  5: ARMCI aborting 21556614 (0x148ed86).
As the error message suggest, there is a filled disk / partition that is used by molpro 2010. Looks for your molprop
  • Scratch file directories, 
  • /tmp
  • quota set by administrators.
All these will cause errors above

Friday, October 5, 2012

Goodbye to VSphere vRAM licensing


With the upcoming release of VSphere 5.1, Vmware is removing the vRAM licensing requirements and returning to previously CPU-based licensing model. You may want to read this interesting article on what is coming up beside removing vTax as MS coined it.

Information:
  1. For a good summary of the new features in VSphere 5.1, do look at VMware releases vSphere 5.1
  2.  Wave good-bye to VMware's unloved vSphere vRAM 'vTax'


Thursday, October 4, 2012

Encountering PBS chdir /home/user1 failed. No such file or directory

If you encounter an error after you have qsub on Torque

PBS: chdir to /home/user1 failed: No such file or directory.

If you set OpenPBS to mail, you may see the issues more clearly
Post job file processing error; job 17676.headnode-h00 on host node-c05/7+node-c05/6+node-c05/5+node-c05/4+node-c05/3+node-c05/2+node-c05/1+node-c05/0Unknown resource type  REJHOST=node-c05 MSG=invalid home directory '/home/user1' specified, errno=2 (No such file or directory)

Apparently, the node 5 has the /home directory unmounted and  the job sent to them will be lost. The solution is very simple, remount the /home directory back again