Disclaimer: The opinions expressed in this blog are my own views and not those of Cisco.
Dear readers,
Yesterday I embarked upon an important milestone in my life and I wanted to share my story with all of you.
After graduating from high school, with very little education other than my self-taught knowledge about computers, I found my very first job. I took out trash, mopped floors, and cleaned toilets for minimum wage in a department store. Dissatisfied with this career path, I quit a year later. A couple of years passed and I found myself with no job, bumming around the house, and goofing off on the Internet from late night until early morning. Eventually, I decided to do something with my life and went back to school to further my education.
In February of 2007, I enrolled in a six-month program at a local post secondary institution to certify as a Network Technician. Midway through the program, I started this class called "Internetworking" where the topic was this thing called "Cisco". At the time, I had no idea who or what Cisco was, and instantly thought about a dude with platinum hair on the beach singing “The Thong Song”.
After my first few Cisco classes, my eyes widened and I was intrigued to learn more. I learned that this "Cisco" thing is partly responsible for creating the Internet we have today (NOTE: I'm not talking about the ARPANET). I thought to myself, "This is cool!" and I started to invest hours into studying by staying after class and traveling to campus on weekends to do labs. I honestly didn't care about passing the course or getting good grades, it was just fun and interesting to me. Before I graduated from the Cisco Networking Academy, I challenged my first Cisco certification (CCNA). It was one of the hardest exams I’ve ever done, and to my surprise, I passed it on the first try.
Immediately after graduating from the program with honors, I was hired on a contract by IBM as a Cisco VoIP specialist where my passion for anything to do with Cisco grew even more. When I wasn't working, I read thick textbooks published by Ciscopress and studied day and night for more Cisco exams. By 2010, I setup my own "Cisco lab" at home, passed 5 more Cisco exams, and earned an intermediate level Cisco certification (CCVP). After my contract expired, I did a couple of more network admin jobs that involved administrating Cisco networks. A few years later, I was out of work for a while just before the economic downturn in oil prices.
Months went by and I was struggling to find employment. I'm not a kid anymore, so now I have responsibilities and bills to worry about paying. I applied at several companies and went to a dozen or more interviews. Then, last February, as I was about to go to yet another interview for a short-term contract, I decided to meet with an acquaintance for lunch. We met through a video game we played together on our Android smartphones and we hardly interacted much outside of the game before. I also didn't really know what he did for work but I know it had something to do with writing software. My intentions for meeting with him were just to hang out and have a bite to eat before my interview. After telling him how unsuccessful I've been with job searching, he tells me his workplace is hiring and they are looking for somebody experienced with networks and computer security, like me. "meh, what the hell, I'll apply", I thought.
Showing a very mild interest, I asked him "Where do you work?", and he replied, "Cisco".
After a month, I had an interview, then after a couple of weeks of waiting impatiently, ready to lose all hope in finding a job, I receive a formal offer of employment with the company. I can't describe the joy I felt; I was now officially
an employee for one of the largest tech companies on the planet, a company I've admired, invested my career in, and without it, I would probably still be bumming around at home with no ambitions.
Yesterday, I celebrated my one-year of service as a full-time employee at Cisco, and for the first time in my career, I could see myself working at the same company until I reach retirement.
As I reflect back to my teenage years, I feel a strong sense of pride in myself, where my career started, and all of my accomplishments since. I hope this post inspires other people, young and old, to never give up and to always follow their dreams.
Yours Truly,
Kevin - Proud Cisco Employee
P.S:
One person I can't forget to thank for making this happen is Jonathan, the guy who met me for lunch. If you're reading this, I hope you can now understand how eternally grateful I am to you.
Diary of a Cisco guy
Thursday, 14 April 2016
Wednesday, 12 September 2012
My CCNP SWITCH Story
Today, I took the 642-813 Exam, better known as the CCNP SWITCH Exam, for the first time and passed. For those of you not aware, this exam is one of the three qualifying exams to become a Cisco Certified Network Professional (CCNP®). The exam consisted of a little under 50 questions and I passed with 923 points. I was totally shocked because in my previous mock tests I couldn't get above 78% and they were a hell of a lot easier than the real thing!
I started studying the material on the first day of June and my goal was to do the exam by September. I took a break for a week or two to finish an assignment for a business course I'm currently doing which made me go a little past the preparation deadline I set. Actually, you could say I started preparing earlier in the Fall of 2010 but I only read up until the High Availability topics back then, so I had to start from the beginning to refresh my memory. So, the total time I took to study was roughly 4 months.
Below is a break down of the primary study material I used to prepare myself:
Books:
Other materials:
- Chris Bryant Advantage CCNP video bundle (very affordable compared to CBT Nuggets!!)
- CCNP SWITCH 642-813 Cert Kit: Video, Flash Card, and Quick Reference Preparation Package
- Cisco Learning Labs for CCNP SWITCH 25-hour 90-day labs
- 2x Cisco Catalyst 3550 Switches
I knew from the get go that the new CCNP track that Cisco introduced in 2010 was going to be more challenging than the previous CCNP exams and I needed to be more than just a CLI wizard. I needed to know all of the technologies inside and out and master every topic covered in the exam objectives. I feel I should warn those experience or inexperienced network admins that may feel the need to rely on Pass4Sure to familiarize themselves with the questions. It's a dangerous gamble, they really should just change the name because using it will NOT help in this exam; you might get away with a few questions but you won't be able to pass those lovely simulation questions and let me tell you there is a lot this time! I was surprised to be greeted by over 5 simulation questions during my exam. I had one on my second question, followed by some multiple choice, and the pattern continued all the way near the end. With 30 mins left on the clock, I was not far away from the last 10 questions and was greeted with yet another sim! I was ready to throw in the towel; that was just way too many sims!
Before I took the exam, I already knew what the major topics would be:
First-Hop Redundancy Protocols (HSRP, VRRP, GLBP) and Switch Security (DHCP Snooping, DAI, Port Security, etc) and I was right. I didn't get very many Voice questions, I don't remember a single question about wireless technologies, and had none about SNMP or IP SLA. The first sim I got was a nightmare of six separate questions part of the same scenario involving HSRP. Now, I just have the ROUTE and TSHOOT exams to complete to be certified but I think a small break from four grueling months of studying is in order.
For everybody's convenience, I took the liberty of writing notes on a little device called the DigiMemo which allowed me to export the pages to PDF documents. You can download them all from my DropBox (I apologize for any spelling mistakes or words that don't come out clearly as I didn't go back and check every page for errors). Also, for practice I created flash cards of my own on FlashcardExchange.com to quiz myself while I learned the material. You can view those here.
For everybody's convenience, I took the liberty of writing notes on a little device called the DigiMemo which allowed me to export the pages to PDF documents. You can download them all from my DropBox (I apologize for any spelling mistakes or words that don't come out clearly as I didn't go back and check every page for errors). Also, for practice I created flash cards of my own on FlashcardExchange.com to quiz myself while I learned the material. You can view those here.
I hope you enjoyed reading about my SWITCH Story and that it's helped some CCNP candidates out there. Good luck!!
Sunday, 31 July 2011
Troubleshooting a ShoreTel IP655
Before I tell you the story of how I troubleshot this device, let me first tell you about the product.
ShoreTel IP655
The ShoreTel IP655 is the latest IP Phone produced by ShoreTel, and is typically designed for hosting conference calls. Some key features this phone has to offer is a Gigabit port, Integrated VPN Client, Visual Voice Mail and a 640x480 backlit LCD display. The UI is entirely touch-based for initiating calls, changing the Call Handling mode, and much more. The phone also comes with 12 lines that can be used for calls, speed dials, and other features. Alright, time for the story now.
I installed one of these bad boys at a clients site a few weeks back, after they got fed up of their old IP8000 constantly dying during boardroom meetings. Just when I thought we resolved the problem, this phone started to act up too. "The screen is black!" was the complaint this time; I figured I could just go there, check the power and leave, it wasn't so easy. I got there seeing the phone look dead as a door nail, so I go to unplug it and plug it back in. Upon unplugging it, I heard an audible *click!* sound meaning there was power!
This time however, the screen turned on but displayed "No Ethernet". "Okay, time to verify we don't have any bad or disconnected cables!", I thought. I swapped out the power adapter with a new one and a fresh CAT5 to test with. After powering back on the device, the error was still there.
Next, I went to check the port on the Access Switch. The LED port for where I installed it last time wasn't lit, so there definitely was a bad connection in the path of the cables. Now, I think to try a different drop in another office but had the same result. I couldn't confirm whether that drop was activated, so I connected the phone to a port that I know was servicing a working phone. No Ethernet still!? this is a brand spanking new phone from ShoreTel and it's defective somehow??? The phone is getting power, yet there is a problem with the connection. I verify the cable is seated properly in the port and not loose. As a last effort before requesting an RMA from ShoreTel, I try to push the cable in a bit further into the phone.. WE HAVE LIFE NOW! the phone makes contact with the DHCP server to get it's IP Addressing info. I switch the cable to the other side thinking that end of the cable was bad; the error presented itself again so I jammed it in further, it worked again!
I look at the back of the phone this time to see if something is wrong with the LAN port and everything looks kosher. I look closer now and see one of the pins look a tad off from the rest. Now, I try looking at the port from a different angle and BINGO! one of the pins is protruding right out of the port!.
It only takes one pin to be offset to cause the connection to fail. The phone had power because PoE only uses a couple of pins and not all eight of the pair; however, sending and receiving data is a whole different story. Now I'm definitely going to RMA this piece of junk!
ShoreTel IP655
I installed one of these bad boys at a clients site a few weeks back, after they got fed up of their old IP8000 constantly dying during boardroom meetings. Just when I thought we resolved the problem, this phone started to act up too. "The screen is black!" was the complaint this time; I figured I could just go there, check the power and leave, it wasn't so easy. I got there seeing the phone look dead as a door nail, so I go to unplug it and plug it back in. Upon unplugging it, I heard an audible *click!* sound meaning there was power!
This time however, the screen turned on but displayed "No Ethernet". "Okay, time to verify we don't have any bad or disconnected cables!", I thought. I swapped out the power adapter with a new one and a fresh CAT5 to test with. After powering back on the device, the error was still there.
Next, I went to check the port on the Access Switch. The LED port for where I installed it last time wasn't lit, so there definitely was a bad connection in the path of the cables. Now, I think to try a different drop in another office but had the same result. I couldn't confirm whether that drop was activated, so I connected the phone to a port that I know was servicing a working phone. No Ethernet still!? this is a brand spanking new phone from ShoreTel and it's defective somehow??? The phone is getting power, yet there is a problem with the connection. I verify the cable is seated properly in the port and not loose. As a last effort before requesting an RMA from ShoreTel, I try to push the cable in a bit further into the phone.. WE HAVE LIFE NOW! the phone makes contact with the DHCP server to get it's IP Addressing info. I switch the cable to the other side thinking that end of the cable was bad; the error presented itself again so I jammed it in further, it worked again!
I look at the back of the phone this time to see if something is wrong with the LAN port and everything looks kosher. I look closer now and see one of the pins look a tad off from the rest. Now, I try looking at the port from a different angle and BINGO! one of the pins is protruding right out of the port!.
It only takes one pin to be offset to cause the connection to fail. The phone had power because PoE only uses a couple of pins and not all eight of the pair; however, sending and receiving data is a whole different story. Now I'm definitely going to RMA this piece of junk!
Saturday, 25 June 2011
Intro to Wireless Networking
I know it's been a while since my last post, but I've been quite busy researching and testing a lot of wireless products; but the good news is I'm going to tell you all about them! Below is a list of the topics I will be blogging about in the posts to come, so stay tuned!
- Autonomous vs. Lightweight Access Points (LWAPs)
- 802.11N Networking and 2.4Ghz vs 5Ghz bands
- Wireless Spectrum Analyzers and troubleshooting tools
- How to deploy a second Access Point for increased coverage
- Deploying Cisco Aironet 1140 series Access Points
- My new Linksys E2500 Advanced Dual-Band N Router
Let's get started on the first topic; Autonomous vs. LWAPs. I'm going to make a very bold assumption here that 90% of you reading this are at least a little familiar with Autonomous APs; (you might not know it yet, but you are) these are typically any consumer wireless router available at your local computer outlet that you administer yourself, such as a Linksys, Netgear, Belkin, or D-Link wireless router.
Cisco also has a myriad of Autonomous APs (hereafter known as APs) such as the Cisco 800 series, for SOHOs; and the Cisco Aironet 1100 series, for small to mid-sized businesses. Cisco and other vendors also offer these popular Access Points in another flavor known as a Lightweight AP or LWAP. LWAPs are not autonomous and are not locally administered by the Network Administrator; instead, they obtain wireless network related settings such as their Wireless SSID, WEP/WPA/WPA2 keys, VLAN, and other settings from a Wireless LAN Controller (WLC).
LWAP and WLCs are an attractive solution for Network Administrators who manage dozens if not hundreds of Access Points throughout their company, because they automate deployments and relieve the burden of manually configuring new APs. Every LWAP connected to the network downloads it's unique configuration from the WLAN Controller, this also means device management is centralized to a single point on the network. One thing you other consultants or network admins should know about Cisco Aironet Access Points, particularly the 1100 series, is that they come in both flavors. If you read my very first post in this blog, I discussed how I was not able to manually configure a brand new Cisco Aironet 1130-AG, because we accidentally ordered an LAP instead of an AP model; this resulted in an hour or so of troubleshooting, and a few minutes of Googling until we discovered the problem.
There is some good news to this though; you're not totally screwed if you accidentally procure the wrong model, you can easily download and upgrade to the correct firmware for an Autonomous AP. This is how it is done:
#1 - Backup existing firmware
First make a backup of your current IOS firmware on the AP, this is so we can quickly revert back to the original firmware in case we brick this device. If you already have a TFTP server installed on your network, you can skip this next step; download and install a free TFTP server from the Internet (I find Solarwinds TFTP Server is a good one to use). Next, make sure you are using a wired connection for this upgrade and avoid performing the upgrade over WIFI; WIFI is subject to a lot of signal interference's, which puts you in a position to brick the AP, so trust in good ol' wired Ethernet!
Just before you do this next step, test that you have connectivity between the Access Point and your TFTP Server via a PING. Now, assuming the PING was successful log into the AP, change to enable mode, and issue the command
archive upload-sw tftp://x.x.x.x/filename
x.x.x.x should be the IP address for your TFTP Server, and filename is the name of the IOS image currently loaded on the AP. After the upload completes with no errors, move onto the next step
#2 - Upload new firmware
Download and copy the new IOS firmware to your TFTP Server, this will be under C:\TFTP-Root if using Solarwinds TFTP. All that is left now is to issue the following command from the AP
archive download-sw /overwrite tftp://x.x.x.x/filename_of_new_image
NOTE: Make sure to include the file extension of the IOS image (e.g: .tar) or running this command will produce an error on the console.
Finally, reload the Access Point to activate the new IOS and run the show version command to confirm the firmware has been properly upgraded. You could also include the /reload option in the above command before the tftp path, to immediately reboot device after the upgrade. Another neat thing to know is that you can perform these upgrades without an outage window (if you omit the /reload option), and when clients are associated to the Access Point; nonetheless, you will have to reload it eventually to complete the upgrade.
Alright, I think that is it for now on the AP vs. LWAP discussion; I hope you stick around to read more wireless stuff, because I guarantee that it's going to get deep!
Saturday, 21 May 2011
SNMP, NetFlow and OpManager
For the past week, I have been getting familiar with a new monitoring tool I found while searching the web.
I was looking for one, because one of our clients was complaining about performance issues with their network, and I also wanted to start using Cisco's NetFlow protocol to monitor traffic statistics better. Although, I was very interested in NetFlow I also wanted to start using the infamous SNMP protocol too. After a few hours of searching on Google, I stumble across this program called OpManager, which gathers stats through SNMP and also has a NetFlow Analyzer plug-in!.
Allow me to briefly explain what the heck SNMP and Netflow is
SNMP
For those who just started learning about IP Networks, devices that support the SNMP protocol can advertise all sorts of cool information about themselves, such as CPU, Memory and Disk utilization (and that's just some of the cool things). SNMP typically runs on port 161 over UDP, and consists of an Managed Device (or SNMP Agent) that advertises info to a Network Management System (NMS). The NMS receives SNMP messages from all the agents and processes the data into tables or nice graphical reports. In order for SNMP agents to talk to an NMS, they must advertise the proper keyword, or 'Community String' to it. The latest version of the SNMP protocol (v3) includes support for user authentication and better security from sniffing attacks.
NetFlow
Cisco defines a flow by packets that match the same criteria of:
Using a tool like the NetFlow Analyzer, you can tell your devices to send these flow stats to a server, using UDP port 9996 (default for NetFlow) and also generate some nice graphical reports as to who's using up your bandwidth.
Instead of explaining all this, how about I just show you what I'm talking about
I was looking for one, because one of our clients was complaining about performance issues with their network, and I also wanted to start using Cisco's NetFlow protocol to monitor traffic statistics better. Although, I was very interested in NetFlow I also wanted to start using the infamous SNMP protocol too. After a few hours of searching on Google, I stumble across this program called OpManager, which gathers stats through SNMP and also has a NetFlow Analyzer plug-in!.
Allow me to briefly explain what the heck SNMP and Netflow is
SNMP
For those who just started learning about IP Networks, devices that support the SNMP protocol can advertise all sorts of cool information about themselves, such as CPU, Memory and Disk utilization (and that's just some of the cool things). SNMP typically runs on port 161 over UDP, and consists of an Managed Device (or SNMP Agent) that advertises info to a Network Management System (NMS). The NMS receives SNMP messages from all the agents and processes the data into tables or nice graphical reports. In order for SNMP agents to talk to an NMS, they must advertise the proper keyword, or 'Community String' to it. The latest version of the SNMP protocol (v3) includes support for user authentication and better security from sniffing attacks.
NetFlow
Cisco defines a flow by packets that match the same criteria of:
- Source IP Address
- Destination IP Address
- Source TCP or UDP port
- Destination TCP or UDP port
- Layer 3 protocol
- Class of Service
- Input interface
Using a tool like the NetFlow Analyzer, you can tell your devices to send these flow stats to a server, using UDP port 9996 (default for NetFlow) and also generate some nice graphical reports as to who's using up your bandwidth.
Instead of explaining all this, how about I just show you what I'm talking about
The first two screenshots are from SNMP monitoring and the last are NetFlow stats. The 10.x.x.x IP addresses are the internal Source and Destination IP Addresses, and what applications they are using. These graphs show the Top Talkers, on the network.
If you would like to start monitoring your network too, you can download the Free Edition of OpManager if you'll just be monitoring 10 devices, or you can get a trial/licensed version to use in your company. Download here http://www.manageengine.com/network-monitoring/download.html
Configuring SNMP on an ASA
snmp-server host inside 192.168.0.50
snmp-server community secretpasswordhere
also, to specify a specific version: snmp-server version x
Configuring Netflow on an ASA device:
flow-export destination inside ipaddress 9996
access-list acl_name extended permit ip any any
class-map class_name
match acl_name
policy-map policy_name
class class_name
flow-export event-type all destination server_ip
Configuring NetFlow on IOS devices:
go under the interface you want to monitor
int FastEthernet0
ip route-cache flow
ip flow-export destination ipaddress 9996
Wednesday, 27 April 2011
Network Blunders: Sabotage
It's late, so I'm going to try and get in as much as I can possibly remember the day it happened.
Last Thursday, before the long Easter weekend, I was pre-configuring the ShoreGear switches for our clients offices, so they could be shipped to their remote locations in the U.K, and be Plug-N-Play when they go live.
After I completely setup the switches (with the exception of one that needed an RMA) I was on my way home to begin my 3-day weekend; until I got a call from my boss.
"Are you doing anything on "XYZ clients" network right now? I told him no, as I have been at our other clients site all afternoon. "well, all the phones just did a reboot" he said, I told him to call me back if they go down again. Before I got home, I did a small bit of shopping and had not received any call back from my boss. Just to be on the safe side, I gave him a follow up call and not only were the phones still down, but the entire network is down now too! This is bad, very bad I thought and I had a feeling, my 3-day weekend is going to have to wait until later to start; so I went back on-site to see what was going on. Part of my reasoning for going back down to work was because I was involved in a change the night before, to setup a new VLAN fore a sister company that was utilizing the clients existing infrastructure.
My boss let's me in the door, and I see he's already logged into the HP 5406 switch checking the logs and configuration. After about a good hour or so of dead-end troubleshooting, these are our findings:
So, basically a PING from outside the LAN passes, but communication between VLANs fails. This really started to make us believe there is an Inter-VLAN routing issue, and I started to think it was caused by the changes that I made the night before; but let's recap here, what happened between last night and today?
What changed in that time? why did everything just cease to function at the end of the day? was it a possible reboot on the switch? if it was, I did a write mem the night before so the config would have been maintained in NVRAM. Another hour passed by, and some of the actual IT guys who work for this client showed up and try to help, but truthfully we all just scratched our heads together. Eventually, I see a ShoreTel IP phone on a users desk display their name and the speaker light lit. It appeared to be working now, until I took it off-hook and heard absent dial-tone and the LCD displayed "No service for 10.x.x.x"; all the phones did every minute even if you didn't put one off-hook. Then, the little gears started to turn in my head.
The Desktop VLAN, where workstations and phones reside is different from the VLAN used for the phone system, how did the phone get registered with his/her name if the VLAN communication is broken? We begin to take a different approach now, we assume the configuration has not changed on the core switch and start to look for a flapping Fiber or other type of connection that would cause intermittent connectivity, dropping the phones every so often. The four of us begin brainstorming possibilities: "bad fiber link", "faulty switch", "bad UPS", "loop", "DHCP overlap with Linksys device", etc. Me and my boss head down to one of the floors to physically inspect a switch in the access layer. When we got there, we knew something was definitely wrong, the switchports for users were not blinking...every single port was pegged and solid green!
Without a second thought, we isolated the floor from the rest of the LAN by unplugging the Gigabit fiber links; we now see IP phones on the other floors registering and giving dial-tone. Our observation tells us the root cause is on this floor somewhere, we start physically checking on top and under each desk for unsupported network hardware (hub, switch from BestBuy, etc) and any connections that could be causing a switching loop. We put in as much effort as we could, but then figured it would be easier to go back to the riser room and unplug each port until the Link LEDs go normal again. In one of the 5 modules (modules A-E) on the switch, Craig pulled the lucky cable and the LEDs started to blink again; we traced the cable back the patch panel and recorded the drop (or wall jack) #. To our amazement, all the phones on the floor were alive again, except for one, which was looped into two jacks on the wall.
Basic Network and VoIP background info
IP phones, whether Cisco, ShoreTel or Avaya have a mini 2-port network switch built into them.
One port goes to the network, and the other connects to the LAN port on your PC, so the phone gives it access to the network. If instead, you plug a cable to the network and PC port on the back of the phone to the wall jack, you create a loop, and broadcast packets endlessly get forwarded and corrupt the MAC address Table in the switch. This is exactly what happened, root cause: Network loop caused broadcast storm through entire network.
Some of you might be thinking "But you had VLANs, VLANs are suppose to eliminate these broadcast storms I thought", well, that is not so when you route VLANs in the core, and span the same VLAN across ALL your switches. Another very valid point that also crossed my mind, "WHAT ABOUT SPANNING-TREE? WASN'T THAT ENABLED?", the short answer is "No, it wasn't". But if you want the long answer, it wasn't enable intentionally, it came down to a design decision (not mine) to disable it on all of our clients switches because it has been known to interfere with the phone system.
But that's not the worst of it, this apparently is not the first time something like this has happened, for the same client, at the same time of day; it has apparently happened a few times in the past too. When the night was over (we were there from 8PM-12AM) and we resolved the problem, my boss decided to rule this incident as sabotage and has notified the IT staff to spring an investigation against it.
Planning forward, I did some research on ways how we can prevent this from happening again in the future and stumbled across a feature in newer HP switches called Loop-detection which can be used independently from Spanning-Tree; we will be looking into this as a solution in the near future.
Last Thursday, before the long Easter weekend, I was pre-configuring the ShoreGear switches for our clients offices, so they could be shipped to their remote locations in the U.K, and be Plug-N-Play when they go live.
After I completely setup the switches (with the exception of one that needed an RMA) I was on my way home to begin my 3-day weekend; until I got a call from my boss.
"Are you doing anything on "XYZ clients" network right now? I told him no, as I have been at our other clients site all afternoon. "well, all the phones just did a reboot" he said, I told him to call me back if they go down again. Before I got home, I did a small bit of shopping and had not received any call back from my boss. Just to be on the safe side, I gave him a follow up call and not only were the phones still down, but the entire network is down now too! This is bad, very bad I thought and I had a feeling, my 3-day weekend is going to have to wait until later to start; so I went back on-site to see what was going on. Part of my reasoning for going back down to work was because I was involved in a change the night before, to setup a new VLAN fore a sister company that was utilizing the clients existing infrastructure.
My boss let's me in the door, and I see he's already logged into the HP 5406 switch checking the logs and configuration. After about a good hour or so of dead-end troubleshooting, these are our findings:
- Access to the Net - FAILED
- PINGs to inside IP on WAN router from Desktop VLAN - FAILED
- PINGs to outside interface from Internet (tested using 3G connection) - PASS
- PING from Core Switch management VLAN IP to IPs assigned to VLAN interfaces - PASS
- PING from core switch to IP phones - FAILED
- PING from any VLAN to any other VLAN - FAILED
So, basically a PING from outside the LAN passes, but communication between VLANs fails. This really started to make us believe there is an Inter-VLAN routing issue, and I started to think it was caused by the changes that I made the night before; but let's recap here, what happened between last night and today?
- 9PM - Night Before:
- Added the new VLAN, tagger fiber ports and untagged edge ports for users
- Next Day:
- Network humming smoothly for over 20 hours with no hiccups
- 4:30PM: Network goes completely KAPUT and stays down
What changed in that time? why did everything just cease to function at the end of the day? was it a possible reboot on the switch? if it was, I did a write mem the night before so the config would have been maintained in NVRAM. Another hour passed by, and some of the actual IT guys who work for this client showed up and try to help, but truthfully we all just scratched our heads together. Eventually, I see a ShoreTel IP phone on a users desk display their name and the speaker light lit. It appeared to be working now, until I took it off-hook and heard absent dial-tone and the LCD displayed "No service for 10.x.x.x"; all the phones did every minute even if you didn't put one off-hook. Then, the little gears started to turn in my head.
The Desktop VLAN, where workstations and phones reside is different from the VLAN used for the phone system, how did the phone get registered with his/her name if the VLAN communication is broken? We begin to take a different approach now, we assume the configuration has not changed on the core switch and start to look for a flapping Fiber or other type of connection that would cause intermittent connectivity, dropping the phones every so often. The four of us begin brainstorming possibilities: "bad fiber link", "faulty switch", "bad UPS", "loop", "DHCP overlap with Linksys device", etc. Me and my boss head down to one of the floors to physically inspect a switch in the access layer. When we got there, we knew something was definitely wrong, the switchports for users were not blinking...every single port was pegged and solid green!
Without a second thought, we isolated the floor from the rest of the LAN by unplugging the Gigabit fiber links; we now see IP phones on the other floors registering and giving dial-tone. Our observation tells us the root cause is on this floor somewhere, we start physically checking on top and under each desk for unsupported network hardware (hub, switch from BestBuy, etc) and any connections that could be causing a switching loop. We put in as much effort as we could, but then figured it would be easier to go back to the riser room and unplug each port until the Link LEDs go normal again. In one of the 5 modules (modules A-E) on the switch, Craig pulled the lucky cable and the LEDs started to blink again; we traced the cable back the patch panel and recorded the drop (or wall jack) #. To our amazement, all the phones on the floor were alive again, except for one, which was looped into two jacks on the wall.
Basic Network and VoIP background info
IP phones, whether Cisco, ShoreTel or Avaya have a mini 2-port network switch built into them.
One port goes to the network, and the other connects to the LAN port on your PC, so the phone gives it access to the network. If instead, you plug a cable to the network and PC port on the back of the phone to the wall jack, you create a loop, and broadcast packets endlessly get forwarded and corrupt the MAC address Table in the switch. This is exactly what happened, root cause: Network loop caused broadcast storm through entire network.
Some of you might be thinking "But you had VLANs, VLANs are suppose to eliminate these broadcast storms I thought", well, that is not so when you route VLANs in the core, and span the same VLAN across ALL your switches. Another very valid point that also crossed my mind, "WHAT ABOUT SPANNING-TREE? WASN'T THAT ENABLED?", the short answer is "No, it wasn't". But if you want the long answer, it wasn't enable intentionally, it came down to a design decision (not mine) to disable it on all of our clients switches because it has been known to interfere with the phone system.
But that's not the worst of it, this apparently is not the first time something like this has happened, for the same client, at the same time of day; it has apparently happened a few times in the past too. When the night was over (we were there from 8PM-12AM) and we resolved the problem, my boss decided to rule this incident as sabotage and has notified the IT staff to spring an investigation against it.
Planning forward, I did some research on ways how we can prevent this from happening again in the future and stumbled across a feature in newer HP switches called Loop-detection which can be used independently from Spanning-Tree; we will be looking into this as a solution in the near future.
Tuesday, 26 April 2011
I've been slacking!
Sorry all, I've been swamped with things for weeks! I promise as soon as I have a day to just do nothing but stare at a wall (or screen) all day I will post all the news I have backlogged!
Some things to expect:
Hopefully I will have the majority of this written by the weekend. Stay tuned!
Some things to expect:
- Results from the graveyard shift I did a while back
- Huge catastrophic network outage before Easter weekend
- Integrating ShoreTel with Microsoft's Office Communications Server (OCS) 2007
- ShoreTel IP8000 conference phone (SIP)
- Configuring an HP switch for VLANs and "VLAN tagging"
- Writing an VLAN Access Control List (VACL) on an HP switch - not as easy as ACLs in Cisco!
- DHCP Relay - How to hand-out DHCP requests from a single server to multiple VLANs
- Example configs of a small subnet I recently setup with a complex VACL!
Hopefully I will have the majority of this written by the weekend. Stay tuned!
Subscribe to:
Posts (Atom)