Then, sometime in late December, the logs stopped going into ElasticSearch. No errors were to be seen anywhere on the Logstash server, just no logs in ElasticSearch.
Life is busy, between the many things happening at Adobe and the class I'm teaching at BYU, so I poked at it on and off again for a few weeks. I fired an email to a contact I have at ES, but got dead air. I eventually joined the #logstash IRC channel on freenode, but never found the time to troubleshoot to a point where I could succinctly describe the situation and environment.
This situation has driven me crazy ever since. I've looked at it again several times, but never found any debug info or logs that helped me figure out what was going on. netstat and tcpdump showed that the traffic was flowing, but nothing was going to ES. Upgrading Logstash and ElasticSearch, while adding cool features, did not help in the slightest. Changing the output to a file didn't help either, so it likely wasn't ES or the ES plugin. Data was coming in, so the logstash-forwarders were doing their jobs.
And, on top of that, not having very much time to work on it meant that it was going to be a pebble in my shoe for some time. Adobe uses Splunk, so this was something of a skunkworks project anyway, and it would be hard to push for a product that just stops working for no apparent reason, so to keep my sanity (or what's left of it), I eventually turned off the logstash-forwarders and left the logstash VM lie fallow..
Fast forward a couple of months and now I'm looking for a new monitoring solution for my businss unit. I've come to understand that at our scale (40,000+ devices), tracking monitoring data, especially with historical data, graphs, etc., really is a big data problem. In fact, just tracking the configuration of monitoring is a small-ish big data problem. So, I've been studying up on big data tools. Among the tools that pop up was ElasticSearch.
Ah, ElasticSearch and that beautiful Logstash-forwarded data. It was so beautiful while it lasted. Maybe it would be worth another go-around to try to get it going. Wouldn't it be beautiful to get that going again and actually be able to use my server and application logs.
There's no time at work, so, a couple of nights ago I decided to set it up as if from scratch. I updated to the current version of Logstash and ElasticSearch. No change yet. I double-checked connections with netcat and netstat. Still reachable and connections still there. I recreated the Logstash and logstash-forwarder configs for a couple of servers. Nothing.
Then I replaced all the certificates used by logstash-forwarder and voilà. Logs started flowing again.
Upon some investigation, the original cert I'd created had expired at the end of December -- the default expiration for openssl certificates is 1 month. While I'm sure this is very secure, 1 month is not very long in the real world.
So, now, I have a few years before this new cert expires. Maybe I'll be better at debugging Logstash by then. Or maybe I'll find this blog post. Either way., I'll probably facepalm, generate & distribute a new cert and go for a few more years.
In process of listening to LS and ES youtube videos, Jordan Sissel ( who is awesome, by the way) described himself as SysAdmin like this: "I'm a SysAdmin, which means that I like being angry with computers." How true that is at times! I love his concept of "anger-driven development"--something makes him angry, so he writes some project that fixes the problem. That's cathartic in the best sense of the word, I think. Logstash, fpm and his other projects are awesome, and awesomely cathartic for me, too.