In our most recent post we made note of the common pattern of centralized policy input coupled with distributed implementation of policies, and this week’s issue follows that thread into the world of Service Meshes.
Great to see you looking into popular stuff with a CS fundamentals approach.
Looking service meshes as a forward/policy system is a somewhat limiting view though. We are using a LinkerD based mesh and our main benefit is the excellent observability it provides out of the box.
Our services receive large input data such as documents or sound files, and perform some computationally expensive tasks (preprocessing, extraction, conversion, machine learning, etc) amongst regular database/cache or other service accesses.
Even our ML researchers can take a look at the visualization LinkerD dashboard generates and quickly see which transaction processed through which paths, where were the high latencies or high error rate links etc in a very large cloud system.
Now, all of these can be custom built, but it is a lot of work, especially when you have hundreds of services built by teams where their main competence is not necessarily the network/cloud.
This also wouldn't work well in anything below L7 as the information becomes less accessible to people, and cannot capture what is going on in today's multiplexed protocols (http2, grpc, etc).
Service mesh is just at the right level to capture meaningful information and integrate easily with other tracing/metrics systems.
LinkerD's L7 proxy performance is more than enough for us, even on a beefed up GPU machine, service can only process a few hundred transactions per sec anyway and we can scale vertically instead. If you are looking for 10k/sec+ transactions with minimum machine cost, micro service architecture is not for you anyway.
There are other benefits for sure (policies, upgrading connections to https, providing a in-cluster CA, etc etc). But to me, observability out-of-box dwarfs everything else.
Good to hear from you! Yes, you are absolutely right that visibility is a huge part of the value proposition for Service Meshes. If I can connect-the-dots back to SDN :-) In-band Network Telemetry (INT) is the SDN-counterpart getting traction, and a big part of the argument in favor of P4-programmable forwarding pipelines. Closing the control loop (e.g., with ML or a runtime verifier) is happening up and down the stack.
It's good to hear that proxy performance is not a problem for your workloads, and I'm confident we'll figure out how to optimize it even more as Service Meshes become commonplace. I do expect having multiple implementation points, with different tradeoffs, will remain important though.
INT looks highly interesting. I would love to have capability to pick a transaction and zoom in into all levels of communication happening in that. Right now you can only do this if you have control over all the systems involved, and even that it is painful to try to associate individual traces/metrics together. Thanks for the info! - Gurer
Hi Larry!
Great to see you looking into popular stuff with a CS fundamentals approach.
Looking service meshes as a forward/policy system is a somewhat limiting view though. We are using a LinkerD based mesh and our main benefit is the excellent observability it provides out of the box.
Our services receive large input data such as documents or sound files, and perform some computationally expensive tasks (preprocessing, extraction, conversion, machine learning, etc) amongst regular database/cache or other service accesses.
Even our ML researchers can take a look at the visualization LinkerD dashboard generates and quickly see which transaction processed through which paths, where were the high latencies or high error rate links etc in a very large cloud system.
Now, all of these can be custom built, but it is a lot of work, especially when you have hundreds of services built by teams where their main competence is not necessarily the network/cloud.
This also wouldn't work well in anything below L7 as the information becomes less accessible to people, and cannot capture what is going on in today's multiplexed protocols (http2, grpc, etc).
Service mesh is just at the right level to capture meaningful information and integrate easily with other tracing/metrics systems.
LinkerD's L7 proxy performance is more than enough for us, even on a beefed up GPU machine, service can only process a few hundred transactions per sec anyway and we can scale vertically instead. If you are looking for 10k/sec+ transactions with minimum machine cost, micro service architecture is not for you anyway.
There are other benefits for sure (policies, upgrading connections to https, providing a in-cluster CA, etc etc). But to me, observability out-of-box dwarfs everything else.
Best,
Gurer Ozen
Gurer,
Good to hear from you! Yes, you are absolutely right that visibility is a huge part of the value proposition for Service Meshes. If I can connect-the-dots back to SDN :-) In-band Network Telemetry (INT) is the SDN-counterpart getting traction, and a big part of the argument in favor of P4-programmable forwarding pipelines. Closing the control loop (e.g., with ML or a runtime verifier) is happening up and down the stack.
It's good to hear that proxy performance is not a problem for your workloads, and I'm confident we'll figure out how to optimize it even more as Service Meshes become commonplace. I do expect having multiple implementation points, with different tradeoffs, will remain important though.
INT looks highly interesting. I would love to have capability to pick a transaction and zoom in into all levels of communication happening in that. Right now you can only do this if you have control over all the systems involved, and even that it is painful to try to associate individual traces/metrics together. Thanks for the info! - Gurer