How to Sleep Tight When You Have 50+ Servers to Monitor 😴

Posted by Forcelate Team on January 17, 2019 · 3 mins read

Spoiler: Use Slackbots!

At Forcelate we have to keep an eye on 50+ web servers. Every day. From Monday to Sunday. On New Year’s Eve, Christmas, 4th of May, April Fools and 29th of February ⚡️ Here’s how we manage this!

If you read my previous post (if you didn’t — how dare you) you know that at Forcelate we created an insanely useful 🤖 Slack bot that helps us keep an eye on all web servers in a snap.

Naturally, when your company grows, it’s just harder to monitor all servers & keep customers happy. Free tracking tools become a no-go because they don’t accept many web servers.

Meanwhile, creating a Slack bot has tons of benefits. Just think of it — we already spend hours in Slack! Who wants to check 100500 different apps, just to confirm that everything is up and running? Moreover, if you create your own tool — you can add all the features you require.

How does it work?

It’s simple. Our infrastructure bot reacts to certain keywords (status, show, update, etc). For example, if you want to get an overview of your websites, all you need to do is to simply type ‘show’. If everything is Ok, you will see the following message (all IPs are randomized for safety purposes):

This how it will look like if there is a problem:

Let’s just say, that you don’t need everyone on your team to use this app. No worries — you can easily set up permissions only for those co-workers who are involved:

To help you enjoy life, this bot keeps you posted on your tech infrastructure daily. Every day at 8 am, 2 pm and 9 pm it tells you to relax, take it easy, watch 🎬 Netflix because everything is just fine.

But what if you need to quickly check additional info — for instance, if a certain profile is deployed on your server? All you need to do is to type ‘ext’.

Note: this feature works only with java spring-based servers (for now).

When there’s a problem…

Infrastructure bot notifies you immediately if there are any issues. It keeps an eye on your tech infrastructure, and once your server is down — it alerts you ASAP.

When the server is back and live — you receive an ‘OK’ message that is everything is alright! 👏 Hurray!

Conclusion

When the server goes offline — everyone loses their mind. There’s no traffic on a website. No sales. There’s an angry boss — and furious customers are calling tech support non-stop.

You can’t predict when this is going to happen. But your best take is to get notified as soon as possible. If your team uses Slack at workspace — an Infrastructure bot just can’t be more perfect. It surveys all your web servers, gives you timely alerts & helps you lead a peaceful, calm life without worrying about servers every 5 minutes! 😁 When the server is back and live — you receive an ‘OK’ message that is everything is alright! 👏 Hurray!