When we have no more budget left, we stop all our engineering work and focus solely on improving the reliability of the system. When the reliability of our system suffers over a sustained period of time, we call a multiple-alarm fire (a system used for categorizing the seriousness of a fire in the US and Canada). When this occurs, the entire team deprioritizes feature development work and focuses only on restoring the reliability of the platform for which the multiple-alarm fire was called for. This is the situation we found ourselves in toward the end of 2020. Let's take a quick look at how we run our Splunk deployment before we get into the problems we face when running Splunk at scale. We run a Splunk Enterprise deployment on AWS EC2. Our deployment consists of a single search head cluster and multiple indexer clusters in a single AWS region.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |