In our seventeenth issue of the Architects' Newsletter we are taking a break from our regularly scheduled program to bring you a "year in review" edition of the newsletter. In the list of highlights below we have chosen a combination of our favourite articles, and also the most popular articles with readers, as well as some new content items that build on the themes we've covered over the last year.

We hope you have enjoyed reading the newsletter over the past year, and we will continue with the theme of microservices in January 2019. Happy Holidays!

News Highlights for 2018

Five Things Every Developer Should Know About Software Architecture

Simon Brown wrote on InfoQ the "Five Things Every Developer Should Know about Software Architecture" and argued that understanding the basics of software architecture is more important than ever before, given the distributed nature of the software systems we're now building, and the distributed nature of the teams building them. A key goal is getting the amount of "up front design" correct - somewhere between too much and none at all - and architects should focus on understanding the significant decisions and trade-offs that influence the shape of a software system.

Brown stated that effective architects are active members of the development team, from collaborating on code to coaching and providing technical leadership. Communicating about software architecture is challenging, and the C4 model can help structure the dialogue, starting with a context diagram, and working down to more technical aspects of the system. Brown concludes the article by stating that contrary to some popular assumptions, putting effort towards good architecture actually enables agility.

Microservices in a Post-Kubernetes Era

The microservice architecture is still the most popular architectural style for distributed systems, argued Bilgrin Ibryam in this recent InfoQ article, but Kubernetes and the cloud native movement have redefined certain aspects of application design and development at scale. On a cloud native platform, observability of services is not enough; a more fundamental prerequisite is to make microservices automatable, by implementing health checks, reacting to signals, declaring resource consumption, etc.

In what Ibryam refers to as the "post-Kubernetes era", using libraries to implement operational networking concerns (such as Hystrix circuit breaking) has been completely overtaken by service mesh technology. Ibryam argues that microservices must now be designed for "recovery", by implementing idempotency from multiple dimensions; and that modern developers must be fluent in a programming language to implement the business functionality, and equally fluent in cloud native technologies to address the non-functional infrastructure level requirements.

In March of 2018, InfoQ published an eMag on the topic of "Microservices: Patterns and Practices".

The Present and Future of Serverless Observability

At QCon London, Yan Cui provided an overview of the challenges of observing a serverless architecture, and discussed the tradeoffs to consider, the current state of the tooling for serverless observability, and also examined new and proposed tooling to help with the current challenges.

Attempting to observe Function-as-a-Service (FaaS) serverless applications can present many challenges. First, there is nowhere to install monitoring agents anymore, and no opportunity for background processing, and so if you want to send telemetry data this has to be done during a function's invocation when the user is still waiting on a (potentially business critical) response.

Second, the deep integration between AWS Lambda and AWS Kinesis has made event-driven architectures much easier to implement within the AWS ecosystem, and patterns like CQRS have become much simpler to implement in practice. However, tracing function invocations through asynchronous event sources like AWS Kinesis is not easy, and is not currently supported out-of-the-box by existing tools like Amazon X-Ray.

Cui concluded by stating that serverless observability tooling has become better over the past year, and that "through the work by many smart people, more and more developers are waking up to the new constraints and challenges around operations and observability" when it comes to serverless technologies like AWS Lambda. FaaS technologies can provide many benefits, but architects must be aware of the potential tradeoffs.

InfoQ has published two related eMags in 2018 that cover the topics of "Observability" and "Testing Your Distributed (Cloud) Systems".

Chaos Engineering

"At Netflix", senior software engineer Nora Jones wrote in her introduction to her eMag for InfoQ on chaos engineering, "we've been embracing chaos engineering since Chaos Monkey was born in 2011". Jones continued by describing the evolution of the technology over the past six years: "It has gone through several iterations and tools that eventually evolved into the Failure Injection Testing (FIT) platform and, ultimately, Chaos Automation Platform (ChAP), a platform for safely automating and running chaos experiments in production, through the efforts of many amazing engineers".

The eMag features articles by Michael Kehoe from LinkedIn, Patrick Higgins from Gremlin, Aaron Rinehart and John Allspaw and is a great overview of the topic.

On the Packt Hub Richard Gall explained that "Chaos Engineering is based on a fundamental assertion about software infrastructure today: that it is inherently chaotic. Or, to be more specific, it is chaotic because it is complex". After acknowledging the initial contributions of Netflix to the discipline of Chaos Engineering, he describes how other organisations are now getting involved. For example, Facebook's Project Storm simulates data center failures on a huge scale, while Uber uses a tool called uDestroy.

Gall talks about the key challenges of chaos engineering, and states that first and foremost, it requires a big cultural change. Engineers and leadership will need to be aware of both the complexity of their systems and also the business impact that the associated failure or degradation has. The article concludes by asking how many businesses want to have these conversations? It's not just about the inclination - it's also about the time and money.

A video recording is also available for the QCon London talk "Chaos Engineering: Why the World Needs More Resilient Systems" by Tammy Butow, principal SRE at Gremlin, which explains why the world needs more resilient systems and how this can be achieved with the practice of chaos engineering. The talk suggests that three primary prerequisites for chaos engineering must be implemented before additional work can begin. Specifically: high severity "SEV" incident management; effective monitoring; and the ability to measure the impact of a failure (both in technical and business terms). Butow also presents a series of guidelines, tools and principles for creating a chaos testing practice.

Containers in 2018: The Challenges of Security and Networking

In a video recording of the DevOps Pro Vilnius talk, "It's 2018; Are My Containers Secure Yet?", Phil Estes, senior technical staff member at IBM, examines the topic of container security. Estes begins with a deep-dive exploration of what container technology actually is, and then explores what engineers require in regard to security from both the implementation and supporting toolchain. He also enumerates and evaluates the current strengths and weaknesses of the container technology landscape.

The key takeaway is that although container security has improved dramatically over the past five years, engineers must still be aware of the current limitations, and act (and test) accordingly.

On a related topic, implementing effective and secure networking within a container-based environment is typically non-trivial due to the inherent complexity of multiple abstractions, many moving parts, and the ephemeral nature of the underlying platform. The term "service mesh" has been created to group together a number of network proxy-based implementations that attempt to overcome these challenges. Technologies within this space include Envoy, Istio, Cilium, NGINX nginMesh, and Consul Connect.

InfoQ has recently published a free eMag, "Service Meshes: Managing Complex Communication within Cloud Native Applications", which provides a guide to the problem space and summary of current work. The microXchg conference talk recording of an introductory presentation by Daniel Bryant has been published on InfoQ: "What is a Service Mesh, and Do I Need One When Developing Microservices?". Matt Klein's QCon NY talk recording of Lyft's usage of Envoy has also been made available on InfoQ: "Lyft's Envoy: Embracing a Service Mesh".

Missed a newsletter? You can find all of the previous issues on InfoQ.

InfoQ strives to facilitate the spread of knowledge and innovation within this space, and in this newsletter we aim to curate and summarise key learnings from news items, articles and presentations created by industry peers, both on InfoQ and across the web. We aim to keep readers informed and educated about emerging trends, peer-validated early adoption of technologies, and architectural best practices, and are always keen to receive feedback from our readers. We hope you find it useful, but if not you can unsubscribe using the link below.

Unsubscribe

Forwarded email? Subscribe and get your own copy.

The Software Architects' NewsletterDecember 2018View in browser