Sunday, 31 July 2011

Troubleshooting a ShoreTel IP655

Before I tell you the story of how I troubleshot this device, let me first tell you about the product.

ShoreTel IP655


The ShoreTel IP655 is the latest IP Phone produced by ShoreTel, and is typically designed for hosting conference calls. Some key features this phone has to offer is a Gigabit port, Integrated VPN Client, Visual Voice Mail and a 640x480 backlit LCD display. The UI is entirely touch-based for initiating calls, changing the Call Handling mode, and much more. The phone also comes with 12 lines that can be used for calls, speed dials, and other features. Alright, time for the story now.

I installed one of these bad boys at a clients site a few weeks back, after they got fed up of their old IP8000 constantly dying during boardroom meetings. Just when I thought we resolved the problem, this phone started to act up too. "The screen is black!" was the complaint this time; I figured I could just go there, check the power and leave, it wasn't so easy. I got there seeing the phone look dead as a door nail, so I go to unplug it and plug it back in. Upon unplugging it, I heard an audible *click!* sound meaning there was power!
This time however, the screen turned on but displayed "No Ethernet". "Okay, time to verify we don't have any bad or disconnected cables!", I thought. I swapped out the power adapter with a new one and a fresh CAT5 to test with. After powering back on the device, the error was still there.

Next, I went to check the port on the Access Switch. The LED port for where I installed it last time wasn't lit, so there definitely was a bad connection in the path of the cables. Now, I think to try a different drop in another office but had the same result. I couldn't confirm whether that drop was activated, so I connected the phone to a port that I know was servicing a working phone. No Ethernet still!? this is a brand spanking new phone from ShoreTel and it's defective somehow??? The phone is getting power, yet there is a problem with the connection. I verify the cable is seated properly in the port and not loose. As a last effort before requesting an RMA from ShoreTel, I try to push the cable in a bit further into the phone.. WE HAVE LIFE NOW! the phone makes contact with the DHCP server to get it's IP Addressing info. I switch the cable to the other side thinking that end of the cable was bad; the error presented itself again so I jammed it in further, it worked again!

I look at the back of the phone this time to see if something is wrong with the LAN port and everything looks kosher. I look closer now and see one of the pins look a tad off from the rest. Now, I try looking at the port from a different angle and BINGO! one of the pins is protruding right out of the port!.



It only takes one pin to be offset to cause the connection to fail. The phone had power because PoE only uses a couple of pins and not all eight of the pair; however, sending and receiving data is a whole different story. Now I'm definitely going to RMA this piece of junk!

Saturday, 25 June 2011

Intro to Wireless Networking



I know it's been a while since my last post, but I've been quite busy researching and testing a lot of wireless products; but the good news is I'm going to tell you all about them! Below is a list of the topics I will be blogging about in the posts to come, so stay tuned!
  • Autonomous vs. Lightweight Access Points (LWAPs)
  • 802.11N Networking and 2.4Ghz vs 5Ghz bands
  • Wireless Spectrum Analyzers and troubleshooting tools
  • How to deploy a second Access Point for increased coverage
  • Deploying Cisco Aironet 1140 series Access Points
  • My new Linksys E2500 Advanced Dual-Band N Router
    Let's get started on the first topic; Autonomous vs. LWAPs. I'm going to make a very bold assumption here that 90% of you reading this are at least a little familiar with Autonomous APs; (you might not know it yet, but you are) these are typically any consumer wireless router available at your local computer outlet that you administer yourself, such as a Linksys, Netgear, Belkin, or D-Link wireless router.
    Cisco also has a myriad of Autonomous APs (hereafter known as APs) such as the Cisco 800 series, for SOHOs; and the Cisco Aironet 1100 series, for small to mid-sized businesses. Cisco and other vendors also offer these popular Access Points in another flavor known as a Lightweight AP or LWAP. LWAPs are not autonomous and are not locally administered by the Network Administrator; instead, they obtain wireless network related settings such as their Wireless SSID, WEP/WPA/WPA2 keys, VLAN, and other settings from a Wireless LAN Controller (WLC). 

    LWAP and WLCs are an attractive solution for Network Administrators who manage dozens if not hundreds of Access Points throughout their company, because they automate deployments and relieve the burden of manually configuring new APs. Every LWAP connected to the network downloads it's unique configuration from the WLAN Controller, this also means device management is centralized to a single point on the network. One thing you other consultants or network admins should know about Cisco Aironet Access Points, particularly the 1100 series, is that they come in both flavors. If you read my very first post in this blog, I discussed how I was not able to manually configure a brand new Cisco Aironet 1130-AG, because we accidentally ordered an LAP instead of an AP model; this resulted in an hour or so of troubleshooting, and a few minutes of Googling until we discovered the problem.

    There is some good news to this though; you're not totally screwed if you accidentally procure the wrong model, you can easily download and upgrade to the correct firmware for an Autonomous AP. This is how it is done:

    #1 - Backup existing firmware

    First make a backup of your current IOS firmware on the AP, this is so we can quickly revert back to the original firmware in case we brick this device. If you already have a TFTP server installed on your network, you can skip this next step; download and install a free TFTP server from the Internet (I find Solarwinds TFTP Server is a good one to use). Next, make sure you are using a wired connection for this upgrade and avoid performing the upgrade over WIFI; WIFI is subject to a lot of signal interference's, which puts you in a position to brick the AP, so trust in good ol' wired Ethernet! 

    Just before you do this next step, test that you have connectivity between the Access Point and your TFTP Server via a PING. Now, assuming the PING was successful log into the AP, change to enable mode, and issue the command

    archive upload-sw tftp://x.x.x.x/filename

    x.x.x.x should be the IP address for your TFTP Server, and filename is the name of the IOS image currently loaded on the AP. After the upload completes with no errors, move onto the next step

    #2 - Upload new firmware

    Download and copy the new IOS firmware to your TFTP Server, this will be under C:\TFTP-Root if using Solarwinds TFTP. All that is left now is to issue the following command from the AP

    archive download-sw /overwrite tftp://x.x.x.x/filename_of_new_image

    NOTE: Make sure to include the file extension of the IOS image (e.g: .tar) or running this command will produce an error on the console. 

    Finally, reload the Access Point to activate the new IOS and run the show version command to confirm the firmware has been properly upgraded. You could also include the /reload option in the above command before the tftp path, to immediately reboot device after the upgrade. Another neat thing to know is that you can perform these upgrades without an outage window (if you omit the /reload option), and when clients are associated to the Access Point; nonetheless, you will have to reload it eventually to complete the upgrade.

    Alright, I think that is it for now on the AP vs. LWAP discussion; I hope you stick around to read more wireless stuff, because I guarantee that it's going to get deep!

    Saturday, 21 May 2011

    SNMP, NetFlow and OpManager

    For the past week, I have been getting familiar with a new monitoring tool I found while searching the web.
    I was looking for one, because one of our clients was complaining about performance issues with their network, and I also wanted to start using Cisco's NetFlow protocol to monitor traffic statistics better. Although, I was very interested in NetFlow I also wanted to start using the infamous SNMP protocol too. After a few hours of searching on Google, I stumble across this program called OpManager, which gathers stats through SNMP and also has a NetFlow Analyzer plug-in!.

    Allow me to briefly explain what the heck SNMP and Netflow is

    SNMP

    For those who just started learning about IP Networks, devices that support the SNMP protocol can advertise all sorts of cool information about themselves, such as CPU, Memory and Disk utilization (and that's just some of the cool things). SNMP typically runs on port 161 over UDP, and consists of an Managed Device (or SNMP Agent) that advertises info to a Network Management System (NMS). The NMS receives SNMP messages from all the agents and processes the data into tables or nice graphical reports. In order for SNMP agents to talk to an NMS, they must advertise the proper keyword, or 'Community String' to it. The latest version of the SNMP protocol (v3) includes support for user authentication and better security from sniffing attacks.

    NetFlow

    Cisco defines a flow by packets that match the same criteria of:

    • Source IP Address
    • Destination IP Address
    • Source TCP or UDP port
    • Destination TCP or UDP port
    • Layer 3 protocol
    • Class of Service
    • Input interface


    Using a tool like the NetFlow Analyzer, you can tell your devices to send these flow stats to a server, using UDP port 9996 (default for NetFlow) and also generate some nice graphical reports as to who's using up your bandwidth.

    Instead of explaining all this, how about I just show you what I'm talking about








    The first two screenshots are from SNMP monitoring and the last are NetFlow stats. The 10.x.x.x IP addresses are the internal Source and Destination IP Addresses, and what applications they are using. These graphs show the Top Talkers, on the network.

    If you would like to start monitoring your network too, you can download the Free Edition of OpManager if you'll just be monitoring 10 devices, or you can get a trial/licensed version to use in your company. Download here http://www.manageengine.com/network-monitoring/download.html


    Configuring SNMP on an ASA
    snmp-server host inside 192.168.0.50
    snmp-server community secretpasswordhere
    also, to specify a specific version: snmp-server version x


    Configuring Netflow on an ASA device:
    flow-export destination inside ipaddress 9996
    access-list acl_name extended permit ip any any
    class-map class_name
    match acl_name
    policy-map policy_name
    class class_name
    flow-export event-type all destination server_ip


    Configuring NetFlow on IOS devices:
    go under the interface you want to monitor
    int FastEthernet0
    ip route-cache flow

    ip flow-export destination ipaddress 9996

    Wednesday, 27 April 2011

    Network Blunders: Sabotage

    It's late, so I'm going to try and get in as much as I can possibly remember the day it happened.

    Last Thursday, before the long Easter weekend, I was pre-configuring the ShoreGear switches for our clients offices, so they could be shipped to their remote locations in the U.K, and be Plug-N-Play when they go live.
    After I completely setup the switches (with the exception of one that needed an RMA) I was on my way home to begin my 3-day weekend; until I got a call from my boss.
    "Are you doing anything on "XYZ clients" network right now? I told him no, as I have been at our other clients site all afternoon. "well, all the phones just did a reboot" he said, I told him to call me back if they go down again. Before I got home, I did a small bit of shopping and had not received any call back from my boss. Just to be on the safe side, I gave him a follow up call and not only were the phones still down, but the entire network is down now too! This is bad, very bad I thought and I had a feeling, my 3-day weekend is going to have to wait until later to start; so I went back on-site to see what was going on. Part of my reasoning for going back down to work was because I was involved in a change the night before, to setup a new VLAN fore a sister company that was utilizing the clients existing infrastructure.

    My boss let's me in the door, and I see he's already logged into the HP 5406 switch checking the logs and configuration. After about a good hour or so of dead-end troubleshooting, these are our findings:


    • Access to the Net - FAILED
    • PINGs to inside IP on WAN router from Desktop VLAN - FAILED
    • PINGs to outside interface from Internet (tested using 3G connection) - PASS
    • PING from Core Switch management VLAN IP to IPs assigned to VLAN interfaces - PASS
    • PING from core switch to IP phones - FAILED
    • PING from any VLAN to any other VLAN - FAILED


    So, basically a PING from outside the LAN passes, but communication between VLANs fails. This really started to make us believe there is an Inter-VLAN routing issue, and I started to think it was caused by the changes that I made the night before; but let's recap here, what happened between last night and today?


    1. 9PM - Night Before: 
    2. Added the new VLAN, tagger fiber ports and untagged edge ports for users 
    3. Next Day: 
    4. Network humming smoothly for over 20 hours with no hiccups
    5. 4:30PM: Network goes completely KAPUT and stays down


    What changed in that time? why did everything just cease to function at the end of the day? was it a possible reboot on the switch? if it was, I did a write mem the night before so the config would have been maintained in NVRAM. Another hour passed by, and some of the actual IT guys who work for this client showed up and try to help, but truthfully we all just scratched our heads together. Eventually, I see a ShoreTel IP phone on a users desk display their name and the speaker light lit. It appeared to be working now, until I took it off-hook and heard absent dial-tone and the LCD displayed "No service for 10.x.x.x"; all the phones did every minute even if you didn't put one off-hook. Then, the little gears started to turn in my head.

    The Desktop VLAN, where workstations and phones reside is different from the VLAN used for the phone system, how did the phone get registered with his/her name if the VLAN communication is broken? We begin to take a different approach now, we assume the configuration has not changed on the core switch and start to look for a flapping Fiber or other type of connection that would cause intermittent connectivity, dropping the phones every so often. The four of us begin brainstorming possibilities: "bad fiber link", "faulty switch", "bad UPS", "loop", "DHCP overlap with Linksys device", etc. Me and my boss head down to one of the floors to physically inspect a switch in the access layer. When we got there, we knew something was definitely wrong, the switchports for users were not blinking...every single port was pegged and solid green!

    Without a second thought, we isolated the floor from the rest of the LAN by unplugging the Gigabit fiber links; we now see IP phones on the other floors registering and giving dial-tone. Our observation tells us the root cause is on this floor somewhere, we start physically checking on top and under each desk for unsupported network hardware (hub, switch from BestBuy, etc) and any connections that could be causing a switching loop. We put in as much effort as we could, but then figured it would be easier to go back to the riser room and unplug each port until the Link LEDs go normal again. In one of the 5 modules (modules A-E) on the switch, Craig pulled the lucky cable and the LEDs started to blink again; we traced the cable back the patch panel and recorded the drop (or wall jack) #. To our amazement, all the phones on the floor were alive again, except for one, which was looped into two jacks on the wall.

    Basic Network and VoIP background info

    IP phones, whether Cisco, ShoreTel or Avaya have a mini 2-port network switch built into them.
    One port goes to the network, and the other connects to the LAN port on your PC, so the phone gives it access to the network. If instead, you plug a cable to the network and PC port on the back of the phone to the wall jack, you create a loop, and broadcast packets endlessly get forwarded and corrupt the MAC address Table in the switch. This is exactly what happened, root cause: Network loop caused broadcast storm through entire network.

    Some of you might be thinking "But you had VLANs, VLANs are suppose to eliminate these broadcast storms I thought", well, that is not so when you route VLANs in the core, and span the same VLAN across ALL your switches. Another very valid point that also crossed my mind, "WHAT ABOUT SPANNING-TREE? WASN'T THAT ENABLED?", the short answer is "No, it wasn't". But if you want the long answer, it wasn't enable intentionally, it came down to a design decision (not mine) to disable it on all of our clients switches because it has been known to interfere with the phone system.


    But that's not the worst of it, this apparently is not the first time something like this has happened, for the same client, at the same time of day; it has apparently happened a few times in the past too. When the night was over (we were there from 8PM-12AM) and we resolved the problem, my boss decided to rule this incident as sabotage and has notified the IT staff to spring an investigation against it.

    Planning forward, I did some research on ways how we can prevent this from happening again in the future and stumbled across a feature in newer HP switches called Loop-detection which can be used independently from Spanning-Tree; we will be looking into this as a solution in the near future.

    Tuesday, 26 April 2011

    I've been slacking!

    Sorry all, I've been swamped with things for weeks! I promise as soon as I have a day to just do nothing but stare at a wall (or screen) all day I will post all the news I have backlogged!

    Some things to expect:

    • Results from the graveyard shift I did a while back
    • Huge catastrophic network outage before Easter weekend
    • Integrating ShoreTel with Microsoft's Office Communications Server (OCS) 2007
    • ShoreTel IP8000 conference phone (SIP)
    • Configuring an HP switch for VLANs and "VLAN tagging"
    • Writing an VLAN Access Control List (VACL) on an HP switch - not as easy as ACLs in Cisco!
    • DHCP  Relay - How to hand-out DHCP requests from a single server to multiple VLANs
    • Example configs of a small subnet I recently setup with a complex VACL!

    Hopefully I will have the majority of this written by the weekend. Stay tuned!

    Thursday, 7 April 2011

    Network Blunders: ACL flub

    Alright, this was the plan:
    1. Translate connections outside on TCP port 3392 to port 3389 (RDP) on one of the inside hosts
    2. Configure Access List on the WAN interface to only allow this connection from our office
    Seemed simple enough, I did something similar a week back for a different client and had no issues; but this time..I broke something..

    I went to paste the Access List Entries (ACE) I had prepared into Notepad, each entry contained "line" then a number, but I realized the "line" command was unrecognized on this device. So I figured since I'm not able to use the line command, I'll have to modify the ACL the old fashion way, by removing the whole ACL with the "no access-list" command and pasting my new ACL without the 'line' command. This did not go as planned at all. When I pasted the ACL into the config terminal, it stopped right at the beginning. I thought "ooh...fudge.", my PuTTY session just hung there and I could no longer PING or access the router.

    I knew exactly what had happened, the ACL was still applied to the WAN interface, the router immediately started denying any traffic that did not match the few lines which I pasted. I tried not to look panicked but I was freaking out inside. I told my manager that I lost connection with the site, and he gave me the O.K to run like the wind and go onsite to fix the mess I caused. I arrived to the site relieved to find that nobody noticed the impact, they could still access resources on the LAN and internet; however, anyone connected remotely by VPN or Terminal Services definitely noticed it. So even though the impact was minimal, we still had to restore the router to the original configuration before the changes, so that meant rebooting it and causing a temporary outage.

    We were provided an outage window in the afternoon to re-do the change onsite, while everybody was on lunch. Before then, I spoke to one of our senior consultants and he told me what I did wrong. Apparently the IOS does support the line numbering but not the same way as on the ASAs.

    On the Cisco IOS software, the entries look like this

    1 permit 192.168.1.0, wildcard bits 0.0.0.255
    2 permit 192.168.2.0, wildcard bits 0.0.0.255
    3 permit 192.168.3.0, wildcard bits 0.0.0.255
    4 permit 192.168.4.0, wildcard bits 0.0.0.255

    On the Cisco ASA software it looks like this:

    access-list mylist line 1 extended permit tcp any any eq http
    access-list mylist line 2 extended permit tcp any any eq ftp
    access-list mylist line 3 extended permit tcp any any eq telnet

    Using this newly acquired knowledge, I performed another ACL change for a client and they remained up and running smoothly.

    Saturday, 26 March 2011

    Graveyard shift

    I thought I would never have to speak those words again, but it turns out I'm involved with a network change tonight @ 11PM. Our client has been having performance issues ever since they switched ISPs, so they want me on site to troubleshoot after hours. I don't really have much tools at my disposal, but I'm going to be thorough and hopefully spot the problem and fix it. That is if I can get inside the building tonight!

    Getting a Cisco Adaptive Security Appliance!



    My coworker and I have been bugging our manager about procuring a Cisco ASA to play and get familiar with, to support our clients that have them installed. A couple of days ago, he replied to us saying they ordered a Cisco ASA 5505, and we should see it in the next couple of days. I can't wait to console into it!

    Also, I found some really helpful Quick Learning Modules for the ASA 5505 on Cisco.com if anyone else needs training on them.

    First client Demo

    A couple weeks back, I was part of a demo or sales pitch for a potential client who was considering installing a ShoreTel system. This client used to be a a very large Oil & Gas company but has since downsized to less than 50 people, one of which was the head IT guy, who we were presenting for. Not being knowledgeable enough with the ShoreTel system to run with the demo by myself, I mostly kept quiet and just listened to Ben (my Project Manager) pitch the product. When it was time for Q&A, the IT guy, also named Ben, had a lot of tough questions for us; one that had us stumped, was when he asked about support for Secure Socket Layer (SSL) when logging into the Administration interface. A very reasonable question, but with me being the only technical guy, and being without the knowledge, we had to do some research before we could give him an answer; and I was the one to find out.

    After spending an afternoon on Google, I wasn't finding any clear answers, I was even starting to lose faith in the product and thinking maybe it did not have the support, but even Cisco's CUCM has https on by default! 
    Eventually, I started seeing the bigger picture of it, unlike CUCM which is an all in one appliance, ShoreTel requires you install Microsoft IIS before installing the server. I did some more Googling and found a document on how to enable SSL on the IIS Server. After installing a Certificate Authority, certificates, reconfiguring the website in IIS to require SSL 128-bit encryption, I tested it out and could confidently reply with Yes, the ShoreTel will support encrypted SSL sessions.

    What a week that was!

    Saturday, 12 March 2011

    ShoreTel and XIRRUS

    Well, the first week has gone by and I already feel exhausted, but I'm starting to get the hang of things at the new job. We had a speaker come in and demo this wireless product called XIRRUS for us, which looks rather interesting and is said to do WI-FI differently than anyone else in the industry. A unit costs about $5,000 USD, but hopefully we get to start playing with an AP soon. Anyways, the past few days I have been getting trained on the ShoreTel Phone System, as we do a lot of installations and projects around it and are a certified reseller for the company. Because I have experience with VoIP, primarily in the Cisco world, I'm able to grasp this other vendor's product with ease. ShoreTel's IP Telephony solution is very similar to Cisco's, but they do things a little different if not easier when it comes to the administration, I believe. My Project Manager helping train me says we have approximately 8 ShoreTel projects/installations coming down the pipe, and a large deployment just around the bend. I can't wait!

    Mid-week, I was shown the ShoreTel server installation process, then finally got to configure the phones on my own and test them out. The image below shows a ShoreTel IP Phone (aka ShorePhone) 265 with a ShoreTel BB24 24-line button box, and the second one shows some other phones I also setup.










    Monday, 7 March 2011

    Day 1 - Orientation??

    Today, after returning from a nice chilly week in Montreal, I finally started my new job as an IT Consultant.

    We all know how the first day or two of orientation goes like; you meet the rest of your colleagues, get setup with the payroll, and of course, configure a Cisco Aironet 1141 Access Point from scratch for a client! This is my first time ever touching one of these devices I've read so much about in my previous job, but hey, it beats the hell out of doing password resets for end users, so I'm not complaining. Anyways, I plugged in the power, the console cable and brought up a terminal to configure it, I've got the enable prompt now, time to run a 'config t'. Expecting a config prompt, I instead receive % Invalid input detected at '^' marker. I checked and double checked that I typed the command correctly, but still no luck. Being the new guy, I figured I must be doing something wrong, so I resort to consulting with my boss and other members on the team; they appear to be just as stumped as I am. 

    OKAY, time to hit up my good friend Google for some assistance, and we're in luck! The first result pointed out that the IOS image preloaded with my particular part # was a Controller-based and not Standalone AP, luckily I've read a little about these devices in my previous job to know the difference, basically with the Controller-based (aka Lightweight) Access Points, all the configuration and management is done on a Wireless LAN Controller and downloaded to the AP; you CANNOT issue configuration commands like you would on any other Cisco IOS device, directly. Alright, so the solution seems simple enough, procure a Standalone version of the AP, but wait, even better, apparently we can just download and load the AP we already have with a Standalone image! :)

    While I discovered all this info by myself (with a little help from Google) one of our senior techs in another city just figured it out too. Once we got the right image loaded onto the device, we could then enter configuration mode and dump on the config. I knew victory was mine once I got one of our test laptops associated with the SSID, although the DHCP config was missing, mission accomplished. So this is my first entry into the journal I will be keeping on my adventures at this new position, I hope this has been both helpful and entertaining to my fellow Cisco guys out there.

    If any of you run into this problem, this is a great resource to steer you in the right direction