Frequently Asked Questions
|
RCDC ACCOUNTS
How Do I Request a RCDC Account?
The easiest way to request a new account is to fill out the Account Request Form . After we create your account, you will receive an email with further instructions and additional information on synchronizing your password with the rest of campus. Please be sure to read and follow the instructions carefully. Also note that a 1 hour introductory course is a co-requisite.
How Do I Find Some Quick Start Information?
Consult the RCDC new user orientation for advice on getting started with RCDC resources quickly.
I Can’t Login (New User / First Time Login)
After submitting the account request, you have to wait for account creation confirmation email before attempting to login. As mentioned in the confirmation email, your password for logging into RCDC resources is your standard cougarnet account password.
I’ve Forgotten My Password
Your password for logging into RCDC resources is your standard cougarnet account password. You can reset your password through the Cougarnet password reset page.
How Do I Request an Account for an External Collaborator?
To get the form needed to make an account for an external collaborator, you must first go to the sponsored accounts page and follow the procedure for creating sponsored cougarnet account.
How Can I Maintain My RCDC Account After Graduating/Leaving UH?
Your RCDC account (and all associated data) will remain active while your general UH account remains active. You can expect your UH account to be deleted within approximately 90 days of leaving the university.
To maintain your UH account (and hence your RCDC account) you should ensure that a faculty sponsor renews your account annually. This process can be initiated by completing an external collaborator request.
I Am Getting A “Module: Command Not Found” Message When I Log In To My RCDC Account
This message generally indicates that you have deleted/corrupted your shell configuration scripts which prevent the Module environment from initializing correctly. To fix the issue, you will need to clean the .bashrc or .cshrc files.
How much does it cost to use RCDC’s systems?
Currently, the services rendered by RCDC to UH community are absolutely FREE. Charges may apply for external participants.
REMOTE ACCESS
Why Is My Remote Connection So Slow?
Slow remote GUI response time is actually quite difficult to troubleshoot.
- First because slow in this regard does not have a fully tangible metric
- In this case slow means it is impractical for you to work with
- Secondly because it varies on numerous conditions:
- OS and Graphics card/drivers on your system
- Remote visualization client (Putty/Xming)
- Network reliability (on campus wired is usually pretty good)
- Load (CPU, RAM, Network, Disk I/O) on the machine you are trying to access
For the first two you need laptop/desktop support which is provided by your department/college IT support folks.
For the third you can verify if you are getting at least 100Mbps consistently, using UIT Network Test.
For the fourth you can monitor the load on RCDC machines.
TRAINING
How Do I Receive Training?
RCDC User Support conducts introductory training for new users all year round with video recording of the basic courses available anytime with active cougarnet account.
Periodically throughout the year, RCDC organizes scheduled training classes (typically at the beginning of Fall and Spring semesters). These classes are announced on the HPE DSI Tutorials & Courses page.
Outside of the scheduled training classes, training is provided on-demand by RCDC User Support staff. Please send an email to contact@hpedsi.uh.edu to arrange a training session.
Do I need to bring anything to the training or workshop?
In most cases, no: training and workshops are meant to be hands-on, which means your hands should be on a keyboard while you’re learning. We have desktop computers with appropriate software stack available for training and workshops.
RCDC SOFTWARE
What Is The Available Software At RCDC?
RCDC maintains information and usage instructions for all currently available software. This information can also be accessed at the RCDC cluster command line using themodule help and module whatis commands.
How Do I Request Software To Be Installed/Updated?
RCDC encourages users to install software packages in the $HOME directory. If the package is used by wider UH community then RCDC will install it as a module for all users.
All paid license software requests for new/updated software on RCDC systems must come from UH Faculty. Please submit a Software Request Form that is attached to the RCDC Software Policy. The software policy also contains information on licensing and cost sharing with RCDC.
Please Note: all software requests require license approval via the University’s General Council. This process can take up to 21 business days from receipt of the software request. If you require an expedited service, please make that apparent on your request form.
NFS STORAGE
How Do I Check My Available NFS Storage?
You can check your $HOME directory disk usage with the following command e.g.
> cd $HOME > du -sh
How Do I Request More NFS Storage Space?
Please submit a storage space request by creating a support ticket with brief justification for the additional storage space.
How Do I Give A Particular User/Group Access To A Directory In My Personal NFS Space?
For a detailed explanation, see this documentation.
I Am Receiving A Locking Authority File Error
If you receive the following error:
> /usr/bin/xauth: error in locking authority file /nfs/RCDC.uh.edu/user...
you have most likely exhausted your NFS storage quota. Please delete unwanted files and/or request more storage space.
I’m Not Seeing My Output When Running From NFS!
NFS improves I/O performance by caching program output on the local host. The output data is only written to the file server when the program is complete. To overcome this issue, you should add the fsync command to your submission script. Please see the following wiki page for more details:
I Am Getting The Following Message Using Find
> find ./ -type d -print find: WARNING: Hard link count is wrong for .: this may be a bug in your filesystem driver.
Use the -noleaf option when using the find command:
-noleaf Do not optimize by assuming that directories contain 2 fewer subdirectories than their hard link count. This option is needed when searching file systems that do not follow the Unix directory-link convention, such as CD-ROM or MS-DOS file systems or NFS volume mount points. e.g find ./ -noleaf -type d -print
HIGH-PERFORMANCE STORAGE
How Do I Check My Available (/scratch) Storage?
Currently there is no high performance storage on Opuntia but it will be added to Sabine in near future.
How Do I Request More (/scratch) Storage Space?
Currently there is no high performance storage.
In case you are interested in high performance storage, please email contact@hpedsi.uh.edu with the following information:
- Name
- Name of PI
- Brief Justification for the need
JOBS
My Job Is Aborted At Runtime Due To Excessive Disk-Swapping
If your job is aborted at runtime due to disk-swapping, this typically indicates that your simulation is starved of RAM at runtime. A general rule of thumb is to ensure that yourequest a maximum of 3.2 GB of RAM for every 1 core your simulation requires e.g. if your simulation requires 8GB of RAM in total, then you specify that in your slurm job submission script.
If unsure how much RAM your job will require, you can use the Ganglia Memory Monitoring tool to see how much RAM your job is using. So there should not be a problem if your job’s Actual memory graph never reaches the Total in-core memory graph. If it does, that means the machine has run out of RAM for your job and that will cause the system to use the Hard Disk to read/write data. This can cause your job to take longer than expected as well as potentially crash the system your job is running on if it is excessively reading/writing to disk.
The RCDC also provides some large memory nodes (512GB or 1TB RAM) which can be used for particularly demanding simulations.
How Does Swap Affect Performance?
Once a system starts to run out of RAM, it “swaps” in memory from the hard drive. Hard drives are much slower than RAM, so any process that uses swapped memory will experience a substantial decrease in performance.
What Is A Swap File And Where Is It Located?
There is a swap file on each machine. It is a portion of the hard drive set aside to supplement RAM, or physical memory. The swap file is also known as virtual memory. Usually, the swap file is used as a RAM overflow, if you will. When there isn’t enough RAM to run an application, or applications, the machine will swap between RAM and the swap file. Since accessing the hard drive is much slower than RAM, performance begins to degrade. When a large portion of swap is needed, it can cripple or crash a machine.
Do Different Operating System Versions Affect Performance?
Each machine uses a version of Red Hat, which has various versions available. These different Red Hat builds can have a slight impact on performance.
How Do I Target The Large-Memory Nodes?
If your simulation requires a large amount of memory then you may be eligible to access a limited set of large-memory (512GB) nodes.
You can target these nodes in your submission script by specifying the memory required:
#SBATCH --mem=512GB
If your simulation requires all 512GB/1TB RAM of the large-memory nodes then you should request all 40/60 cores of the large-memory nodes respectively to avoid other users from running on the same machine and using a portion of the RAM. You can request all the cores by specifying the following parallel environment in your submission script:
#SBATCH -n 40 -N 1
Note that the maximum runtime for jobs using the large memory nodes is 14 days.
‘ Unable to run job: error: no suitable queues’ Message When Submitting Job Script
Please check the partition that you have specified in your slurm job submission script. It should be either -p gpu or remove the partition field all together.
How Can I Monitor The Behavior Of My Running Jobs?
The behavior of running jobs can monitored using the Ganglia online tool. After usingsqueue to identify the node(s) on which your jobs are running, use Ganglia to locate the specific node name. Associated with each node is a series of sub-links that you can use to monitor CPU status, memory usage, communication statistics etc.
My Job Has Created Zombies… What Does That Mean?
Zombie Processes are processes that still remain running even though your job has terminated within the Slurm batch system. There are various reasons why this happens including ungraceful termination of MPI due to either software or hardware issues.
The RCDC plans to run a script that runs periodically to remove these zombie processes from our systems. If zombie processes are detected during the running of this script, due to one of your jobs, you will be notified via email. If you continue to receive zombie notifications regularly please contact HPE DSI Support for assistance in detecting the cause.
APPLICATIONS
I Need To Install ArcGIS On My System
Please consult the library staff on how to request an installation on your local system.
I am trying to compile my code with mpi but cannot find libmpi.so
Please compile your code on the login node. Please note this front end node is accessible within campus network or through VPN if you are off-campus.
I See Incomplete Images In COMSOL v4+
The OpenGL graphics libraries on our machines are not compatible with the latest versions of Comsol v4+. This results in distortion or incompleteness of rendered images. To overcome this issue please set your COMSOL graphics configuration to use software rendering via the following menu option:
Option -> Preferences > Graphics > Rendering >Software
My Gaussian Jobs Are Being Aborted Due To /tmp Over-usage
By default GAUSSIAN jobs write temporary working data to the /tmp directory of the compute node. For large GAUSSIAN jobs this /tmp space can fill very quickly causing the machine to slowdown/crash ( note: machine supervisors may pre-empt machine crashes by killing your jobs).
To avoid this situation please direct GAUSSIAN to write temporary working files to your own storage allocation (either NFS or /scratch in future). Details on how to set this up in your GAUSSIAN scripts can be found at the Gaussian wiki page.
Qestion: How can I request ArcGIS Installation and Support?
All GIS software installation and GIS desktop support is handled by UH library or departmental IT support. You can get the desktop support team contact info from the library website.
GRANTS/PROPOSALS
Question: How Do I Acknowledge the RCDC In My Proposals?
To acknowledge collaboration with the RCDC, please refer to this page.
Question: Does the RCDC Provide "facilities and equipments" document??
You can download the document here. Please consult one of the HPE DSI staff for more information.