A couple of weeks old, but I wanted to mention SysAdmin’s Chronicles extensive review of Slicehost and the new SliceManager . It covers interacting with a Slice and has several great screenshots. A great article for people considering our services.

We’ve been running on our backup connection most of the day. Around 1645 GMT yesterday we failed over and remained there while the NOC researched what happened. At 0800 GMT they will be moving us back over, people may see brief outages as the routes are updated.

November 9th, 2007

Welcome Matt Myers to Slicehost

Matt Myers has joined Slicehost as our Hardware/NOC guru. He’ll be diving into operational issues such as hardware provisioning, networking and purchasing. Matt lives in St. Louis and will be at the office, so that means Slicehost’s “pants free Fridays” are a thing of the past. Give him a shout in chat or the forums, he goes by SpaceGhost online.

November 9th, 2007

Slices at both datacenters

An update to the post yesterday, we realized it’s a hassle for customers to create a new account. Instead, just open a support request with us and we will add the slices for you at another datacenter. SliceManager will handle everything else.

Your prayers to the NOC gods have been answered! Earlier this week we turned up our second facility and started burning through the waitlist once again. What does this mean?

Current customers You won’t notice a difference, SliceManager handles all of this transparently. When you add a slice, it’s kept on the same local network as your other slices for fast communication. If you want to add slices at another facility for redundancy, the best way to currently do this is creating another account. That may change in the future, but we wanted to keep things simple to start. If you’d like to have slices in both facilities, send us an email and we’ll push you through the waitlist.

New customers It’s been a long and painful journey, but things will get better fast. We started tearing through the backlog this week and should get everyone onboard faster than the estimates on the waitlist page. Pretty soon, you’ll be able to tell all the newbies to wait patiently in line!

Future We’re pretty pumped because this is the fruit of several months of labor. These changes allow us to grow easily in both the short and long term. And we can get cracking on some swell stuff we’ve been dreaming up for the past several months.

We just received word of a datacenter maintenance event scheduled this Friday. It will affect most customers, although not those who signed up this week. During the 4-hour window, the service impacting activities are scheduled to occur between 0800-1000GMT.

Details
  • routing config changes on core routers
  • software updates on core routers
  • replacing interface card with a defective port

The expected impact is 3-4 outages of 5 minutes or less. We’ll be monitoring during the window, remember to check status.slicehost.com, slicechat and our twitterstream for updates.

Update

These are the time frames to expect network outages if everything goes as planned:

0800 to 0810 08330 to 0835 0900 to 0910

November 5th, 2007

Slice image updates

FYI:

  • Ubuntu Gutsy (7.10) is available
  • Gentoo is now 2007.0
  • CentOS is now 4.5 (5 is coming soon)
  • Bug fixes for all images

The datacenter network maintenance that caused the outage yesterday was rolled back and will be performed again tonite (Nov 3 0800 GMT). It was supposed to be non-impacting, as is tonite’s maintenance. The Cisco bug that caused the problem has a work around and it will be employed this evening.

We’ll be monitoring things on our end – should there be problems remember to check the chatrooms, Twitter or status.slicehost.com. This maintenance is in preparation for the core cutover scheduled for next week.

Relaying more information as we receive it: the cause of the outage was due to a Cisco IOS bug dealing with HSRP. Unfortunately, it should have been circumvented and can be attributed to human error during the maintenance procedure. We’re meeting with the datacenter leads next week to discuss the issues of the past week and what is being done to correct them. There’s also discussion of moving the core cutover up, keep an eye on the blog/forum for any news on that front – we’ll send an email if it is happening sooner than expected. Again we are truly sorry for an unacceptable round of outages. As always, please contact us if you have any questions or concerns.

November 1st, 2007

Nov 1 network outage [update]

For those of you just tuning in, there was another network outage from 0900-1100GMT. We’re still piecing together what exactly happened but here is what we know so far:

  • there was a non-impacting maintenance schedule by our datacenter in preparation for the core upgrade taking place in about a week. They were inserting a switch in parallel with the existing core switches.
  • starting at 0900 GMT when the maintenance was completed, the outage started.
  • after working with Cisco, the issue was resolved around 1100 GMT

This is an unacceptable level of service and people have expressed concerns in the chatroom about the recent outages. We agree completely and cannot offer more than our sincerest apologies at the moment. We are gathering more information and reviewing our options, updates will follow. In the meantime, if you have any questions or feel like venting, you can contact us via email or call me directly (314.266.3502). If I don’t answer, leave a message and one of us will get back to you shortly.

As I’m sure most people noticed, we had 2 major bouts of network trouble – yesterday afternoon and one today. It appears they were distributed denial of service attacks directed at multiple customers. The amount of traffic was filling both of the datacenter’s incoming pipes. There aren’t any excuses for why this took 2 extended outages to figure out and we’ll be addressing that issue with our provider. All I can offer are our sincere apologies and a promise that this will get better.

During the outages a lot of questions came up regarding multiple carriers, routers, etc. So to answer a couple of those:

  • we have redundant transport via XO and Verizon, however this won’t help since traffic will end up filling both.
  • we also have redundant routers using VRRP to failover on our side as does the datacenter.

Again, we’re extremely disappointed with how this was handled, but here is some good news. In the next couple of weeks, the datacenter’s core network (not ours, but how we connect to them) will be upgraded. We’re assured this will result in a more stable setup for growth.

Secondly and this was the announcement we referred to earlier this week, we are putting the final touches on a second facility. This space uses separate carriers and we will be managing the network, but everything else stays the same. We hope this expansion allows us to accelerate our growth and offer a higher level of redundancy for customers who need it.

Thanks to everyone for their patience during the past 2 days. For future reference, aside from the chatrooms, the slicehost twitterstream and the network status page are worth bookmarking for quick updates.

Appears to be an upstream issue at the moment, we’ll have more details later in another post. Our apologies for the downtime and flurry of pages/emails/calls you all likely received.

Update 2145 GMT – issue appears to be resolved.

It took a while to get going thanks to some last minute bugs and an edge rails change that bit us, but the new SliceManager is live, as is the revamped backend powering it. We haven’t talked about this rewrite, namely because we’ve been working like dogs to get it out the door. Here are some highlights of the new software and how it affects you.

Slices

  • The design of SliceManager has changed – we streamlined the UI for the growing number of customers working with multiple slices for different projects.
  • You can add a slice quickly from the front page, previously this was buried under the Account tab.
  • Clicking a slice’s name from the master list allows you to work with it.
  • The web-based console now supports multiple windows, which was a bug in the previous version.
  • The stats page now allows you to retrieve a snapshot of your slice’s performance and review a list of previous stats for comparison.
  • Backups won’t look different, but they are under the microscope and we’ll be changing them in the future. Please note, for the next week old daily and weekly are unusable via SliceManager, as they’re being overwritten to be compatible with the new system. We still have them, you can email us if you need one restored. Ditto for snapshots.
  • A big user request – you can rename a slice at anytime!
  • Reboots and resizes stay the same, but now have progress metrics to show you how far along they are.
  • In addition to Rescue mode, you can also reset a slice’s root password should you lock yourself out (that never happens, right?).
  • Drumroll please – extra IP’s. They’re $2/month, you’ll need to email us to allocate them and please provide justification.

Adding, removing and resizing Slices

  • Adding/upgrading a slice stays the same – you’re billed a prorated amount based on the number of days remaining in your billing cycle.
  • Previously removing/downgrading a slice would affect the next billing cycle (you wouldn’t be billed). Now, when you remove/downgrade a Slice, a credit is applied to your account (minus one day). For example, you need a 1GB Slice for a week and you’re halfway through a billing cycle – you’d be charged $35 (half of $70). A week later you’re finished and delete the Slice – you’d receive a credit of ~$17.50. So the most you’re on the hook for at any given time with a new Slice is 1 day. Not quite hourly billing ala EC2 but a step in the right direction ;)

DNS

  • We did away with the easy/advanced DNS distinction, since most users preferred the advanced interface. Now there is a dedicated DNS tab where you can handle zones, records and reverse DNS.
  • We increased the length of the data field, which should help with DKIM records.
  • There is no longer a delay between entering/editing a record and its propagation to our DNS server. We used to sync records between SliceManager and DNS servers, now all of the records are manipulated via a REST API on top of the DNS server.

Help and Support

  • We removed the emergency pager, since our monitoring system alerts us of trouble preemptively.
  • You can review previous support requests, before contacting us for a new issue.

Account and Billing

  • The Email & Password display allows you to edit your login password and email address, which is now separate from the billing email address and information. This should help people who use a corporate card for billing and invoicing, but have a technical person logging in to interact with slices.
  • It’s not quite ready yet, but the groundwork has been laid for allowing multiple users to login and interact with slices via SliceManager. We’re still finalizing how granular the access control will be, so keep an eye out for updates on this feature.
  • The Payments, Balance and Invoices page also contains a bunch of information designed to simplify billing. At any time you can see your balance, pending charges to your account, the next billing cycle and your current monthly rate.
  • Going forward, invoices are saved, so every time your card is charged you’ll receive an email and can review the invoice in SliceManager down the road.
  • If you are carrying a positive balance (from a large prepayment), we’ll send you monthly invoices showing what was deducted from your balance each month.

Future

This rewrite is the culmination of a year’s worth of lessons learned, outstanding community feedback and our plans for the company. You can expect APIs, enhanced slice images and multi-user accounts in the coming months, in addition to some news we hope to share with everyone in the next week.

October 19th, 2007

SliceManager cutover Monday

Heads up – we’ll be switching to new versions of our backend software and SliceManager Monday morning. manage.slicehost.com site may experience minor outages as we move things into place and squash any last minute bugs. There will be no service impact to your Slices during this time, but the webconsole, rescue mode, adding slices and upgrades will be unavailable while the site is down.

We’ll be around Monday to keep an eye on things, if after logging into the new site you notice any problems – please let us know. Once things have settled down, we plan to resume new signups and have some exciting news to share.

October 8th, 2007

CVE-2007-4573 patched

Background info – Pickled Onion has a new page on the wiki up with more details.