Friday, February 28, 2014

Amazon EC2 Console Improvements

EC2 has made some improvement on the console. Do take a look at Amazon Web Services Blog. Some of the features are:

  1. Cloning Security Group Rules
    You can now copy the rules from an existing security group to a new one by selecting the existing rule ad choosing Copy to new from the Actions menu:
  2. Managing Outbound Rules in VPC Security Groups
    You can now edit the outbound rules of a VPC Security Group from within the EC2 console (this operation was previously available from the VPC console):
  3. Deep Linking Across EC2 Resources
    The new deep linking feature lets you easily locate and work with resources that are associated with one another. For example, you can move from an instance to one of its security groups with a single click:
  4. Compare Spot Prices Across AZs
    The updated Spot Pricing History graph makes it easier for you to compare Spot prices across Availability Zones. Simply hover your cursor over the graph and observe the Spot prices across all of the Availability Zones in the Region:
  5. Tagging of Spot Requests
    You can now add tags to requests for EC2 Spot instances:


Thursday, February 27, 2014

SGI UV way of doing Hadoop


I thought this video clip is quite interesting as it proposed Large SMP-like machine, in this case SGI-UV to solve "Big Analytics" Issues. Do look at this interesting video from SGI - Hadoop-WHAT? How to find that needle of information in your own Big Data haystacks (HD)

Wednesday, February 26, 2014

Using mcelog to detect cpu and memory issues on CentOS 6

 mcelog is a daemon that collects and decodes Machine Check Exception data on x86-64 machines

According to mcelog website, 

The mcelog daemon accounts memory and some other errors errors in various ways. mcelog --client can be used to query a running daemon. The daemon can also execute triggers when configurable error thresholds are exceeded. This is used to implement a range of automatic predictive failure analysis algorithms: including bad page offlining and automatic cache error handling. User defined actions can be also configured.

For CentOS 6, mcelog is a default install. But you could do a yum install
# yum install mcelog

CentOS has already configured the cron to run hourly check. You can take a look at /etc/cron.hourly/mcelog.cron. It should be something like this below
#!/bin/bash

# do not run if mcelogd is running
service mcelogd status >& /dev/null
[ $? -eq 0 ] && exit 0

# is mcelog supported?
/usr/sbin/mcelog --supported >& /dev/null
if [ $? -eq 1 ]; then
       exit 1;
fi

/usr/sbin/mcelog --ignorenodev --filter >> /var/log/mcelog

To view the error, do take a look at /var/log/mcelog
# less /var/log/mcelog

To see log in real time,
# tail -f /var/log/mcelog

Tuesday, February 25, 2014

A peek into NTU's supercomputer and hybrid cloud

A writeup of "A peek into NTU's supercomputer and hybrid cloud" by Enterprise NTU efforts to deploy Hybrid Cloud IT Architecture

Monday, February 24, 2014

NFS Share getting a (1) appended when adding NFS storage at VCentre

I seems to get be getting this (1) appended to the NFS Share when I enter the NFS storage and volume and Datastore Name at the VCentre "Add Storage" dialogue box. I was using VSphere 5.1 and VCentre 5.1

There are 2 scenarios

Scenario 1 - Case Sensitive for the ServerName
Do look at NFS share getting a (1) appended when adding to a new host in existing datacenter . The solution is due to different caps for the storage name


Scenario 2- Missing "/" for the Storage Folder

For example ( I have this in my existing host):

Server: 192.168.1.1
Folder: vol/vol1
DataStore Name: MyDataStore 

Do note that the volume has already mapped in other hosts. I realized that the correction error was very simple. I missed the "/" in front on the vol/vol1

Server: 192.168.1.1
Folder: /vol/vol1
DataStore Name: MyDataStore 

Sunday, February 23, 2014

50% Enterprise to Use Hybrid Cloud by 2017

I read this article from Information week Gartner: 50% Of Enterprises Use Hybrid Cloud By 2017 dated. Some exerpt from the article



Gartner predicts that almost half of large enterprises will be engaged in a combined, public/private cloud operation, often described as "hybrid" cloud computing, four years from now.
......
......
VMware, IBM and Microsoft have all launched technology initiatives on the strength of the future prospects of hybrid cloud computing. Despite that, Bittman noted that "actual hybrid cloud computing deployments are rare." While three-fourths of those polled predicted hybrid deployments would occur in the next two years, Bittman scaled that optimism back to "nearly half by the end of 2017" for his prediction.
.......
.......
If IT finds public cloud use follows a pattern already established in the private cloud, then it becomes possible, in some cases, to gain the flexibility of adding resources when needed from the public cloud. There will still be limiting factors. Only some public clouds are likely to have a degree of compatibility with a given private cloud. Application-to-application integration may not be possible for many legacy apps. But public/private cloud operations will be a logical outcome of the path many organizations have already started down, Bittman concluded.

Tuesday, February 18, 2014

SPICE - Simple Protocol for Independent Computing Environments

According to wikipedia,

In computing, SPICE (the Simple Protocol for Independent Computing Environments) is a remote-display system built for virtual environments which allows users to view a computing "desktop" environment - not only on its computer-server machine, but also from anywhere on the Internet and using a wide variety of machine architectures.

There is a good writeup of some observation of SPICE usage experiences and performance and comments on other remote protocol. do read Taking SPICE for a Spin

Sunday, February 16, 2014

Compiling GSL/4.1 from GIT on CentOS 5

GSL/4.1  is a code construction tool. It will generate code in all languages and for all purposes. To compile GSL, do the followings

Prerequisites
# yum install pcre


Compilation
# git clone git://github.com/imatix/gsl
# cd gsl/src
# make
# sudo make install

To show command-line help
# ./gsl

Friday, February 14, 2014

High Performance Capabilities for Windows Azure

Taken from HPCWire Weekly Updates (3 Feb 2014)........



The Windows Azure Cloud Service now includes two new compute-intensive virtual machine sizes. Known as A8 and A9, they are Azure’s most performant instances to date. The A8 instance comes with 8 Intel virtual processor cores and 56 GB of RAM, while A9 comes with 16 such cores and 112 GB of memory. The instance family also includes 40 Gbps InfiniBand networking for low-latency and high-throughput communication.
......
......
The new instance type actually employs two interconnect protocols. Traditional Ethernet is the link to Azure Storage, CDN, and other Windows Azure services or solutions, while a 40 Gbps InfiniBand network connects compute instances within the same Cloud Services deployment. Furthermore, the InfiniBand network employs remote direct memory access (RDMA) technology for maximum efficiency of parallel MPI applications, an enhancement that Microsoft first previewed more than a year ago, when it debuted its Big Compute strategy.

References
  1. New High Performance Capabilities for Windows Azure
  2. Windows Azure Continues ‘Big Compute’ Rollout

Wednesday, February 12, 2014

Open Source Enterprise-Ready Request Trackers



If you are looking for an Open Source Enterprise-Ready Request Trackers, do take a look at RT: Request Tracker . The current version is 4.2.2

According to the website, RT is a battle-tested issue tracking system which thousands of organizations use for bug tracking, help desk ticketing, customer service, workflow processes, change management, network operations, youth counselling and even more

Thursday, February 6, 2014

MPI fail due to insufficient space

If you encounter an error that cause your MPI run to fail, such as the one below, it is due to the /tmp not having sufficient space  

[node1:25646] [[31090,0],0] ORTE_ERROR_LOG: Error in file orterun.c at line 543
Number of requested processors =  8
[node1:25663] opal_os_dirpath_create: Error: Unable to create the sub-directory 
(/tmp/openmpi-sessions-user1@node1_0) of (/tmp/openmpi-sessions-user1@node1_0/31075/0/0), mkdir failed [1]
[node1:25663] [[31075,0],0] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 106
[node1:25663] [[31075,0],0] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 399
[node1:25663] [[31075,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 304

The solution is to clean up the /tmp or the partition where /tmp is residing. You can choose the "df -h" at the console to verify that your HDD space

# df -h

Tuesday, February 4, 2014

Depreciated libguide.so and replaced by libiomp5.so

If your codes are encountering errors when compiling,

"Error while loading shared libraries: libguide.so: cannot open shared object file: No such file or directory"

It could be due to that you are using depreceated libguide.so which may not be available in your recent version in Intel. To solve the issues, do use the latest libguide.so which is found in more recent Intel Compiler  and replacement for libguide.so

So do change the compilation flag from -lguide to -liomp5
Libguide.so was not was not compatible gcc implementations of OpenMP. libiomp5.so is compatible now and is the default for Intel Library

References:
  1. libguide.so library in Maya 2011

Saturday, February 1, 2014

Using mpirun --mca orte_base_help_aggregate 0 to debug error

If your mpirun dies without any error messages  you may want to take read from OpenMPI FAQ which
Debugging applications in parallel 7. My process dies without any output. Why?

If your application fails due to memory corruption, Open MPI may subsequently fail to output an error message before dying. Specifically, starting with v1.3, Open MPI attempts to aggregate error messages from multiple processes in an attempt to show unique error messages only once (vs. one for each MPI process -- which can be unweildly, especially when running large MPI jobs).

However, this aggregation process requires allocating memory in the MPI process when it displays the error message. If the process' memory is already corrupted, Open MPI's attempt to allocate memory may fail and the process will simply die, possibly silently. When Open MPI does not attempt to aggregate error messages, most of its setup work is done during MPI_INIT and no memory is allocated during the "print the error" routine. It therefore almost always successfully outputs error messages in real time -- but at the expense that you'll potentially see the same error message for each MPI process that encourntered the error.

Hence, the error message aggregation is usually a good thing, but sometimes it can mask a real error. You can disable Open MPI's error message aggregation with the orte_base_help_aggregate MCA parameter. For example: 


 $ mpirun --mca orte_base_help_aggregate 0 ...