Monday, March 30, 2015

Using ibdev2netdev to quickly identify ports

ibdev2netdev is a nice tool to quickly identify ports to ib0

[root@headnode-h99 ~]# ibdev2netdev
mlx4_0 port 1 ==> ib0 (Up)
mlx4_0 port 2 ==> ib1 (Down)

Tools for Performance Test for IB

ibportstate
  • Enables the querying of the logical link and physical por tstates of an IB Port.
  • Displays information such as LinkSpeed, LinkWidth and extended link speed
  • Allows adjusting of link speed that is enabled on any IB Port
# ibportstate LID PortNumber
# Port info: Lid 15 port 1
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................15
SMLid:...........................1
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
LinkSpeedExtSupported:...........14.0625 Gbps
LinkSpeedExtEnabled:.............14.0625 Gbps
LinkSpeedExtActive:..............14.0625 Gbps
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0
# MLNX ext Port info: Lid 15 port 1
StateChangeEnable:...............0x00
LinkSpeedSupported:..............0x01
LinkSpeedEnabled:................0x01
LinkSpeedActive:.................0x00

Friday, March 27, 2015

Leap Second on 30th June 2015 and effects on CentOS and RHEL

At 11:59 p.m. on June 30, clocks will count up all the way to 60 seconds. That will allow the Earth's spin to catch up with atomic time.

Background - http://www.usatoday.com/story/tech/2015/01/08/computer-chaos-feares/21433363/

All of Red Hat Enterprise Linux 4, 5, 6 & 7 will be affected.

*Resolve Leap Second Issues in Red Hat Enterprise Linux
https://access.redhat.com/articles/15145

*Are we susceptible to a leap second event?
https://access.redhat.com/articles/199563

*Labs: Leap Second Issue Detector
https://access.redhat.com/labs/leapsecond/

Basic Configuration of Octopus 4.1.2 with OpenMPI on CentOS 6

Octopus is a scientific program aimed at the ab initio virtual experimentation on a hopefully ever-increasing range of system types. Electrons are described quantum-mechanically within density-functional theory (DFT), in its time-dependent form (TDDFT) when doing simulations in time. Nuclei are described classically as point particles. Electron-nucleus interaction is described within the pseudopotential approximation.

Do take a look at the installation writeup by linuxcluster Basic Configuration of Octopus 4.1.2 with OpenMPI on CentOS 6

Saturday, March 21, 2015

Unable to Submit via Torque Submission Node - Socket_Connect Error for Torque 4.2.7

I am using Torque Server version 4.2.7. I was trying to configure a Submission Node. Here are a sample of my qmgr -c 'p s" output. Firewall has allows the necessary traffic in outr

# qmgr -c "p s"
.......... 
set server acl_hosts = submission_node.cluster.spms.ntu.edu.sg
set server acl_hosts += head_node.cluster.spms.ntu.edu.sg
set server submit_hosts = submission_node.cluster.spms.ntu.edu.sg
set server submit_hosts += head_node.cluster.spms.ntu.edu.sg
set server allow_node_submit = True 
.......

After we ssh into the submission_node, and as I simulate as a user, I got this errors. Yes, the submission_node has been configured as a conventional client.

socket_connect error (VERIFY THAT trqauthd IS RUNNING)
Error in connection to trqauthd (15137)-[could not connect to unix socket /tmp/trqauthd-unix: 111]
socket_connect error (VERIFY THAT trqauthd IS RUNNING)
Error in connection to trqauthd (15137)-[could not connect to unix socket /tmp/trqauthd-unix: 111]
socket_connect error (VERIFY THAT trqauthd IS RUNNING)
Error in connection to trqauthd (15137)-[could not connect to unix socket /tmp/trqauthd-unix: 111]
Unable to communicate with head_node(10.10.10.20)
Communication failure. qsub: cannot connect to server head_node (errno=15137) could not connect to trqauthd

Taking a look at the Torque 4.2.7 documentation, the documentation mentioned that you have to make sure the submission node have trqauthd script at /etc/init.d if you are  using RH / CentOS. You can easily scp the /etc/init.d/trqauthd to the submision node

From the head_node
# scp -v /etc/init.d/trqauthd root@submssion_node:/etc/init.d/

Create a /etc/hosts_equiv file
# touch /etc/hosts_equiv
Put the Submission_Node file name at the /etc/hosts.equiv of the head_node
submission_node 

At the Submission_Node, start the  trqauthd service
# service trqauthd start

Now trying submitting as a normal user

Tuesday, March 17, 2015

Where to download Intel Compiler?



I often has to google a while before I can locate the download site for the our purchased Intel Compiler. Here is the link just in case I forget again. Just log on and you can access the Intel Compilers

https://registrationcenter.intel.com/RegCenter/MyProducts.aspx


Enabling Massive Multi-GPU Scaling and Peering



Do take a look at http://www.cirrascale.com/ for high density Multi-GPU Scaling and Peering.

Monday, March 9, 2015

Enabling Predictive Cache Statistics (PCS) for Data OnTap 8.2p

* node1 is the controller currently primary to the aggregate/vol/LUN.

Step 1: Enable PCS
node1::> node run –node node1
node1::> options flexscale.enable on
node1::>options flexscale.enable
flexscale.enable pcs  you should see this
node1::>options flexscale.pcs_size 330GB  based on 3 x 200GB SSD RAID4

Step 2: Allow the representative workload to run and Run your workload

Step 3: Collect data throughout the process
node1::>stats show -p flexscale-access
NetApp recommends issuing this command through an SSH connection and logging the output throughout the observation period because you want to capture and observe the peak performance of the system and the cache. This output can also be easily imported into spreadsheet software, graphed, and so on. This process initially provides information on the “cold” state of the emulated cache. That is, no data is in the cache at the start of the test, and the cache is filled as the workload runs. The best time to observe the emulated cache is once it is filled, or “warmed”, as this will be the point when it enters a steady state. Filling the emulated cache can take a considerable amount of time and depends greatly on the workload. References:
  1. Introduction to Predictive Cache Statistics
  2. Clustered_Data_ONTAP_82_System_Administration

Sunday, March 8, 2015

Using Tuned to tune CentOS 6 System

Tuned is a Dynamic Adaptive Tuning System Daemon. According to Manual Page

tuned is a dynamic adaptive system tuning daemon that tunes system settings dynamically depending on usage. For each hardware subsystem a specific monitoring plugin collects data periodically. This information is then used by tuning plugins to change system settings to lower or higher power saving modes in order to adapt to the current usage. Currently monitoring and tuning plugins for CPU, ethernet network and ATA harddisk devices are implemented.

Using Tuned

1. Installing tuned
# yum install tuned

2. To view a list of available tuning profiles
 [root@myCentOS ~]# tuned-adm list
Available profiles:
- laptop-ac-powersave
- server-powersave
- laptop-battery-powersave
- desktop-powersave
- virtual-host
- virtual-guest
- enterprise-storage
- throughput-performance
- latency-performance
- spindown-disk
- default

3. Tuning to a specific profile
# tuned-adm profile latency-performance
Switching to profile 'latency-performance'
Applying deadline elevator: dm-0 dm-1 dm-2 sda             [  OK  ]
Applying ktune sysctl settings:
/etc/ktune.d/tunedadm.conf:                                [  OK  ]
Calling '/etc/ktune.d/tunedadm.sh start':                  [  OK  ]
Applying sysctl settings from /etc/sysctl.conf
Starting tuned:                                            [  OK  ]

4. Checking current tuned profile used and its status
# tuned-adm active
Current active profile: latency-performance
Service tuned: enabled, running
Service ktune: enabled, running

5. Turning off the tuned daemon
# tuned-adm off

References:
  1. Tuning Your System With Tuned (http://servicesblog.redhat.com)

Compiling Gromacs 5.0.4 on CentOS 6

Compiling Gromacs has never been easier using the cmake. There are a few assumptions.
  1. Use MKL and Intel Compilers
  2. Use OpenMPI as the MPI-of-choice. The necessary PATH and LD_LIBRARY_PATH have been placed in .bashrc
  3. We will use SINGLE precision for speed used MDRUN and MPI Flags
Here is my configuration file using Intel Compilers
# tar xfz gromacs-5.0.4.tar.gz
# cd gromacs-5.0.4
# mkdir build
# cd build

# /usr/local/cmake-3.1.3/bin/cmake -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON 
-DCMAKE_INSTALL_PREFIX=/usr/local/gromacs-5.0.4 -DGMX_MPI=on -DGMX_FFT_LIBRARY=mkl 
-DGMX_DOUBLE=off -DGMX_BUILD_MDRUN_ONLY=on -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc

# make
# make check
# sudo make install
# source /usr/local/gromacs/bin/GMXRC

References:
  1.  Compiling Gromacs 5.0.4 on CentOS 6 (linuxcluster.wordpress.com)

Friday, March 6, 2015

FREAK (Factoring Attack on RSA-EXPORT Keys) Attack

FREAK (Factoring Attack on RSA-EXPORT Keys) Attack

The vulnerability allows attackers to intercept HTTPS connections between vulnerable clients and servers and force them to use ‘export-grade’ cryptography(weak export cipher suites), which can then be decrypted.

 It is recommended to update to the latest software patches. OpenSSL (CVE-2015-0204): versions before 1.0.1k are vulnerable.
For non-OpenSSL, disable support for any export cipher suites and known insecure ciphers on your web server.

Solutions:
  1. Use latest version of Chrome/IE/Mozilla instead of the Android Browser and Safari.
  2. Check if your site is vulnerable. SSL Labs - https://www.ssllabs.com/ssltest/

References:
  1. FREAK Attack - https://freakattack.com/
  2. Graham Cluley - https://grahamcluley.com/2015/03/freak-attack-what-is-it-heres-what-you-need-to-know/
  3. Recommended Configuration - https://wiki.mozilla.org/Security/Server_Side_TLS#Recommended_configurations

do_vfs_lock: VFS is out of sync with lock manager for CentOS 5

If you are reading at the "do_vfs_lock: VFS is out of sync with lock manager" messages at your screen or in your log file,

According to RedHat Site,

The message will be printed whenever there is locking contention (two or more processes trying to lock the same file) and the mount had nolock specified.

The RHEL-5 code prints  the message unconditionally, while on the upstream code it is a debugging message, so it won't be seen on normal operation  there.

Do take a look at your /etc/fstab and the mounting option. You should remove the "nolock" options

References:
  1. Many "do_vfs_lock: VFS is out of sync with lock manager" messages on a "-o nolock" NFS mount in RHEL?