Wednesday, March 30, 2016

OpsMgr (SCOM) - Unix/Linux Agents Requisites and Troubleshooting

UNIX/Linux Monitoring/Discovery in OpsMgr can be very hard to troubleshoot sometimes.
You can have several discovery issues, like:
- SSH connection erros;
- Certificate Issues;
- Network port issues;
- Bad SUDO permissions;
- And so on.

Since there's a lot of information spreaded in several blogs, but i've never founded a centralized source of troubleshooting steps about this thread, i've decided to create this post/thread to continually update it with found errors and related resolution.

First, check if you have any of this errors documented on TechNet :
http://social.technet.microsoft.com/wiki/contents/articles/4966.scom-2012-troubleshooting-unixlinux-agent-discovery.aspx

If not, you can read further :)

In first place i'll leave all the pre-requisites you'll need to have on your environment to make it work perfectly.

First, you need a user!
root or with SUDO permissions ?
If your UNIX/Linux SysAdmin wants to limit your 'sudoers' file, this is what you need :
(root) NOPASSWD: /bin/sh -c cp /tmp/scx-monuser/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-monuser; /opt/microsoft/scx/bin/tools/scxadmin -restart  
(root) NOPASSWD: /bin/sh -c sh /tmp/scx-monuser/GetOSVersion.sh; EC\=$?; rm -rf /tmp/scx-monuser; exit $EC  
(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem  
(root) NOPASSWD: /bin/sh -c rpm -e scx  
(root) NOPASSWD: /bin/sh -c /bin/rpm -F --force /tmp/scx-monuser/scx-*.rpm; EC\=$?; cd /tmp; rm -rf /tmp/scx-monuser; exit $EC  
(root) NOPASSWD: /bin/sh -c /bin/rpm -U --force /tmp/scx-monuser/scx-*.rpm; EC\=$?; cd /tmp; rm -rf /tmp/scx-monuser; exit $EC  
(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p  
# I've added this so you can re-generate certificates if you need to  
(root) NOPASSWD: /opt/microsoft/scx/bin/tools/scxsslconfig *  

If not, you'll just need :
monuser ALL=(ALL) NOPASSWD: ALL
Remind that you need to comment out the line :
#Defaults !requiretty
so it needs to be as :
Defaults !requiretty

Next, you need to ensure you've TCP/IP port connection from your UNIX/Linux Resource Pool Servers to your UNIX/Linux servers on ports 22 and 1270.

Other thing you might need to is to re-generate your SCXAgent certificate.
Some companies have 2 different FQDN's for the same server so it can respond in a different network device (management network device instead of the service network device), so if you're discovering your server by that particular management FQDN, the certificate needs to be generated with the FQDN you're discovering the server with.
Eg.
You have server myserver.mydomain.com to discover
The management FQDN is myserver.mymngtdomain.com
You'll be discoverying your server by myserver.mymngtdomain.com
So, you need to ensure that the certificate is generated to it.
Login by SSH into myserver.mymngtdomain.com and run :
openssl x509 -noout -in /etc/opt/microsoft/scx/ssl/scx.pem -subject -issuer -dates  
If it's not the FQDN you want, run :
sudo /opt/microsoft/scx/bin/tools/scxsslconfig -f -h myserver -d mymngtdomain.com
So, the errors you can come across with are :

1) WinRM cannot complete the operation.
    Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enable and allows access from this computer.
   By default, the WinRM firewall exception for public profiles limits access to remote computers within the same local subnet.
2) Agent verification failed. Error detail: The server certificate on the destination computer (SERVER_FQDN:1270) has the following errors
3) DNS Configuration error:
    The provided hostname SERVER_FQDN resolved to the IP address of x.x.x.x.
    The hostname SERVER_FQDN returned by reverse lookup of the IP address x.x.x.x did not match the provided hostname.
    Verify the DNS configuration and try the request again.
4) sudo: no tty present and no askpass program specified
5) The agent responded to the request but the WSMan connection failed due to: Access is Denied.”

# 1) WinRM cannot complete the operation.

Verify that your FQND is correct;
You have TCP/IP connection with your server in 22 and 1270 ports;
You might use this PS1 script from one of your UNIX/Linux Resource Pool servers to check :
$list = Get-Content -path 'Path_to_ServerList'
Foreach ($server in $list) {
    $SSHStatus = (new-object System.Net.Sockets.TcpClient("$server","22")).connected
    $MNGTStatus = (new-object System.Net.Sockets.TcpClient("$server","1270")).connected
    "$server | $SSHStatus | $MNGTStatus"
}
# 2) Agent verification failed. Error detail: The server certificate on the destination computer (SERVER_FQDN:1270) has the following errors

The certificate is not compliant with the FQDN you're discoverying the server with.
For example :
You have server myserver.mydomain.com to discover
The management FQDN is myserver.mymngtdomain.com
You'll be discoverying your server by myserver.mymngtdomain.com
So, you need to ensure that the certificate is generated to it.
Login by SSH into myserver.mymngtdomain.com and run :
openssl x509 -noout -in /etc/opt/microsoft/scx/ssl/scx.pem -subject -issuer -dates
If it's not the FQDN you want, run :
sudo /opt/microsoft/scx/bin/tools/scxsslconfig -f -h myserver -d mymngtdomain.com
If you've several UNIX/Linux servers in this condition, use this ShellScript i've made to correct it :
(I personal use MobaXTerm to do such things)
#!/bin/sh
for i in `cat list`
do
    a=`echo $i | cut -d"." -f1`
    ssh monuser@$i "sudo /opt/microsoft/scx/bin/tools/scxsslconfig -f -h $a -d mymngtdomain.com"
done
# 3) DNS Configuration error.

You might need to correct name resolution configuration for forward and reverse lookup on your DNS server.

# 4) sudo: no tty present and no askpass program specified

Well, this is a sudo problem.
Check if you have either this in (/etc/sudoers) :
monuser ALL=(ALL) NOPASSWD: ALL
or this in your sudoers file:
(root) NOPASSWD: /bin/sh -c cp /tmp/scx-monuser/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-monuser; /opt/microsoft/scx/bin/tools/scxadmin -restart
(root) NOPASSWD: /bin/sh -c sh /tmp/scx-monuser/GetOSVersion.sh; EC\=$?; rm -rf /tmp/scx-monuser; exit $EC
(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem
(root) NOPASSWD: /bin/sh -c rpm -e scx
(root) NOPASSWD: /bin/sh -c /bin/rpm -F --force /tmp/scx-monuser/scx-*.rpm; EC\=$?; cd /tmp; rm -rf /tmp/scx-monuser; exit $EC
(root) NOPASSWD: /bin/sh -c /bin/rpm -U --force /tmp/scx-monuser/scx-*.rpm; EC\=$?; cd /tmp; rm -rf /tmp/scx-monuser; exit $EC
(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p
# I've added this so you can re-generate certificates if you need to
(root) NOPASSWD: /opt/microsoft/scx/bin/tools/scxsslconfig *
and if you've this line commented out in (/etc/sudoers) :
#Defaults !requiretty
# 5) The agent responded to the request but the WSMan connection failed due to: Access is Denied.”

  If you get the "The agent responded to the request but the WSMan connection failed due to: Access is Denied.”" error, first, from one of your UNIX/Linux Resource Pool, run :
Test-WSMan -Port 1270 -ComputerName “ServerName” -Authentication Basic -Credential (Get-Credential) -UseSSL
If you got an error, you might need to edit your pam file (/etc/pam.d/omi) and leave it like this :
omi auth sufficient pam_vas3.so create_homedir get_nonvas_pass store_creds try_first_pass
omi auth requisite pam_vas3.so echo_return
omi auth required /usr/lib/security/pam_aix use_new_state use_first_pass
omi account required /usr/lib/security/pam_seos.o
omi account sufficient pam_vas3.so
omi account requisite pam_vas3.so echo_return
omi account required /usr/lib/security/pam_aix
Re-discover your agent, and you'll get it working!

2 comments:

  1. (Yet, Another Blog About ...) System Center: Opsmgr (Scom) - Unix/Linux Agents Requisites And Troubleshooting >>>>> Download Now

    >>>>> Download Full

    (Yet, Another Blog About ...) System Center: Opsmgr (Scom) - Unix/Linux Agents Requisites And Troubleshooting >>>>> Download LINK

    >>>>> Download Now

    (Yet, Another Blog About ...) System Center: Opsmgr (Scom) - Unix/Linux Agents Requisites And Troubleshooting >>>>> Download Full

    >>>>> Download LINK

    ReplyDelete