Browsing the “notification” Category
January 29th, 2008

Slice Diagnostics

As of Friday, January 25, the Slice Manager has the ability to display simple diagnostics for your Slices. It will display whether your Slice and/or host server are up, the load on the host server, and the rate of swap usage (reads and writes per second). Additionally, we’ve added links to allow you to use the Just-Ping service to ping your Slice, or your own internet connection, from over 20 points around the world.

Datacenter 1 will have a network maintenance window Friday Dec. 14 between 0601-1200 GMT. At this time, BGP configurations will be modified and BGP sessions moved to the new core (which we were physically moved to last week). The expected impact is momentary outages affecting some customers as BGP updates propagate, with most of these changes taking place between 0700-1000 GMT. We’ll be available in the chatroom during the window. As always please contact us with any questions.

December 7th, 2007

New Notifications

Today we’ve enabled a few notifications for Slicehosters. These are sent to the customer email addresses, not the billing email addresses.

Swap

The first notification is about swap usage. We’re monitoring pure I/O on each Slice’s swap partition, and if it exceeds our current threshold (subject to change) you’ll receive an email. This will hopefully let you know that the slice is using more swap than it should. This is bad because using excessive swap will degrade performance dramatically. You have a 3-day threshold to rectify the situation before you are emailed again.

We’ll follow up next week with some ways to tackle swap usage on your slice.

Blocklist

This monitor watches the IPs we provide on 2 Spam Blocklists with more to come soon. The email you receive gives a link to the Blocklist website. There you will find information on how to remove your IP(s) from that list.

Bandwidth Overage

You shall receive this notification the day you go over your bandwidth allotment. There is no change in service, but you will be charged an extra $0.30 (USD) per extra gigabyte of transfer.

Datacenter 1 is in the final stages of moving us to a new core network and we’ll be moved this Friday Dec 7 between 0700 and 0900 GMT. There will be a network outage of 5-10 minutes during this window as we are physically moved onto the new network. We’ll be in the chatrooms during the window. As always we apologize for this downtime, but ultimately it should result in a more stable network.

Network maintenance will be performed on Friday Nov 16 at our first datacenter from 0600-1100 GMT. Expected impact will be momentary network outages as traffic is migrated to the new core from 0700-1000 GMT. This should be one of the last major maintenance events as we move onto the new core at this facility. We’ll be online during the window. As always, please contact us with any questions.

November 11th, 2007

Switch firmware upgrades

No nightclubbing for us this Saturday night, instead we completed an emergency maintenance – upgrading firmware in all of our switches to circumvent a bug that started popping up. You shouldn’t have noticed any downtime, this was done between 0500 and 0700 GMT at both datacenters.

We’ve been running on our backup connection most of the day. Around 1645 GMT yesterday we failed over and remained there while the NOC researched what happened. At 0800 GMT they will be moving us back over, people may see brief outages as the routes are updated.

Your prayers to the NOC gods have been answered! Earlier this week we turned up our second facility and started burning through the waitlist once again. What does this mean?

Current customers You won’t notice a difference, SliceManager handles all of this transparently. When you add a slice, it’s kept on the same local network as your other slices for fast communication. If you want to add slices at another facility for redundancy, the best way to currently do this is creating another account. That may change in the future, but we wanted to keep things simple to start. If you’d like to have slices in both facilities, send us an email and we’ll push you through the waitlist.

New customers It’s been a long and painful journey, but things will get better fast. We started tearing through the backlog this week and should get everyone onboard faster than the estimates on the waitlist page. Pretty soon, you’ll be able to tell all the newbies to wait patiently in line!

Future We’re pretty pumped because this is the fruit of several months of labor. These changes allow us to grow easily in both the short and long term. And we can get cracking on some swell stuff we’ve been dreaming up for the past several months.

We just received word of a datacenter maintenance event scheduled this Friday. It will affect most customers, although not those who signed up this week. During the 4-hour window, the service impacting activities are scheduled to occur between 0800-1000GMT.

Details
  • routing config changes on core routers
  • software updates on core routers
  • replacing interface card with a defective port

The expected impact is 3-4 outages of 5 minutes or less. We’ll be monitoring during the window, remember to check status.slicehost.com, slicechat and our twitterstream for updates.

Update

These are the time frames to expect network outages if everything goes as planned:

0800 to 0810 08330 to 0835 0900 to 0910

November 5th, 2007

Slice image updates

FYI:

  • Ubuntu Gutsy (7.10) is available
  • Gentoo is now 2007.0
  • CentOS is now 4.5 (5 is coming soon)
  • Bug fixes for all images

The datacenter network maintenance that caused the outage yesterday was rolled back and will be performed again tonite (Nov 3 0800 GMT). It was supposed to be non-impacting, as is tonite’s maintenance. The Cisco bug that caused the problem has a work around and it will be employed this evening.

We’ll be monitoring things on our end – should there be problems remember to check the chatrooms, Twitter or status.slicehost.com. This maintenance is in preparation for the core cutover scheduled for next week.

Relaying more information as we receive it: the cause of the outage was due to a Cisco IOS bug dealing with HSRP. Unfortunately, it should have been circumvented and can be attributed to human error during the maintenance procedure. We’re meeting with the datacenter leads next week to discuss the issues of the past week and what is being done to correct them. There’s also discussion of moving the core cutover up, keep an eye on the blog/forum for any news on that front – we’ll send an email if it is happening sooner than expected. Again we are truly sorry for an unacceptable round of outages. As always, please contact us if you have any questions or concerns.

November 1st, 2007

Nov 1 network outage [update]

For those of you just tuning in, there was another network outage from 0900-1100GMT. We’re still piecing together what exactly happened but here is what we know so far:

  • there was a non-impacting maintenance schedule by our datacenter in preparation for the core upgrade taking place in about a week. They were inserting a switch in parallel with the existing core switches.
  • starting at 0900 GMT when the maintenance was completed, the outage started.
  • after working with Cisco, the issue was resolved around 1100 GMT

This is an unacceptable level of service and people have expressed concerns in the chatroom about the recent outages. We agree completely and cannot offer more than our sincerest apologies at the moment. We are gathering more information and reviewing our options, updates will follow. In the meantime, if you have any questions or feel like venting, you can contact us via email or call me directly (314.266.3502). If I don’t answer, leave a message and one of us will get back to you shortly.

As I’m sure most people noticed, we had 2 major bouts of network trouble – yesterday afternoon and one today. It appears they were distributed denial of service attacks directed at multiple customers. The amount of traffic was filling both of the datacenter’s incoming pipes. There aren’t any excuses for why this took 2 extended outages to figure out and we’ll be addressing that issue with our provider. All I can offer are our sincere apologies and a promise that this will get better.

During the outages a lot of questions came up regarding multiple carriers, routers, etc. So to answer a couple of those:

  • we have redundant transport via XO and Verizon, however this won’t help since traffic will end up filling both.
  • we also have redundant routers using VRRP to failover on our side as does the datacenter.

Again, we’re extremely disappointed with how this was handled, but here is some good news. In the next couple of weeks, the datacenter’s core network (not ours, but how we connect to them) will be upgraded. We’re assured this will result in a more stable setup for growth.

Secondly and this was the announcement we referred to earlier this week, we are putting the final touches on a second facility. This space uses separate carriers and we will be managing the network, but everything else stays the same. We hope this expansion allows us to accelerate our growth and offer a higher level of redundancy for customers who need it.

Thanks to everyone for their patience during the past 2 days. For future reference, aside from the chatrooms, the slicehost twitterstream and the network status page are worth bookmarking for quick updates.

Appears to be an upstream issue at the moment, we’ll have more details later in another post. Our apologies for the downtime and flurry of pages/emails/calls you all likely received.

Update 2145 GMT – issue appears to be resolved.