极客时间-Service Mesh实战(实战部分)
目录:
31 实战演练1:项目准备和构建过程
flux是一个gitops operator for kubernetes,是一个基于gitops的自动化部署工具,具有以下特点
- 自动同步
- 声明式
- 基于代码pull request
安装流程
- 安装fluxctl工具,mac可以brew install fluxctl工具
- 创建flux的ns
- 下载flux的安装包,github.com/fluxcd/flux
- 运行deploy目录
deploy文件中,有git地址等需要调整,也可以通过fluxctl install的方式进行部署
fluxctl install \
--git-user=xxx \
--git-email=xxx@xxx \
--git-url=git@github.com:xxx/smdemo \
--namespace=flux | kubectl apply -f -
查看ssh-key的命令为fluxctl identity
我们需要将配置文件放到对应的git地址,执行fluxctl sync即可,这个过程默认也会5分钟执行一次
32 实战演练2:实现自动化灰度发布
Flagger
Flagger会根据配置,创建virtualservice,通过权重的方式控制流量
helm repo add flagger https://flagger.app
kubectl apply -f https://raw.githubusercontent.com/weaveworks/flagger/master/artifacts/flagger/crd.yaml
# 部署flagger with istio
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set crd.create=false \
--set meshProvider=istio \
--set metricsServer=http://prometheus.istio-system:9090
# grafana
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus.istio-system:9090 \
--set user=admin \
--set password=admin
通过ingress将服务暴露
# ingress for expose mesh
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: public-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
创建Canary
# 创建 canary 分析
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: httpbin
namespace: demo
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: httpbin
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: httpbin
service:
# service port number
port: 8000
# container port number or name (optional)
targetPort: 80
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
analysis:
# schedule interval (default 60s)
interval: 30s
# max number of failed metric checks before rollback
threshold: 5
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 100
# canary increment step
# percentage (0-100)
stepWeight: 20
metrics:
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
thresholdRange:
min: 99
interval: 1m
- name: latency
templateRef:
name: latency
namespace: istio-system
# maximum req duration P99
# milliseconds
thresholdRange:
max: 500
interval: 30s
# testing (optional)
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://httpbin-canary.demo:8000/headers"
在创建了这个后,会帮助你创建
- virtualservice 帮助控制流量
- deployment 有primary,而原有的deploy的副本数调整为0,并且变为primary提供pod
- service 有primary和canary
当有更新的时候,自动创建canary的deployment
在更新完之后canary变为primary
33 实战演练3:提升系统的弹性能力
弹性设计主要包括:
- 容错性:重试,幂等
- 伸缩性:自动扩容
- 过载保护:超时、熔断、限流和降级
- 弹性测试:故障注入
1.5之前是有限流功能的,但是性能问题废弃了
超时
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: httpbin
namespace: demo
spec:
hosts:
- "*"
gateways:
- httpbin-gateway
http:
- route:
- destination:
host: httpbin
port:
number: 8000
timeout: 1s
重试
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: httpbin
namespace: demo
spec:
hosts:
- "*"
gateways:
- httpbin-gateway
http:
- route:
- destination:
host: httpbin
port:
number: 8000
retry:
attempts: 3
perTryTimeout: 1s
timeout: 8s
熔断
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: httpbin
namespace: demo
spec:
host: httpbin
trafficPolicy:
connectionPool:
tcp:
maxConnections: 1
http:
http1MaxPendingRequests: 1
maxRequestsPerConnection: 1
outlierDetection:
consecutiveErrors: 1
interval: 1s
baseEjectionTime: 3m
maxEjectionPercent: 100
34 实战演练4:配置安全策略
istio的安全有
- istiod:ca证书颁发
- apiserver:下发认证策略
- envoy
- sidecar
- sidecar-agent:负责证书和密钥的供应
创建授权
# 创建授权 - 特定服务,注意没有rule,表示deny当前服务
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: httpbin
namespace: demo
spec:
selector:
matchLabels:
app: httpbin
EOF
# 来源必须是demo ns
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: httpbin
namespace: demo
spec:
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/demo/sa/sleep"]
- source:
namespaces: ["demo"]
# 只容许特定接口
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: httpbin
namespace: demo
spec:
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/demo/sa/sleep"]
- source:
namespaces: ["demo"]
to:
- operation:
methods: ["GET"]
paths: ["/get"]
EOF
# 其他特定条件 - 请求头
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: httpbin
namespace: demo
spec:
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/demo/sa/sleep"]
- source:
namespaces: ["demo"]
to:
- operation:
methods: ["GET"]
paths: ["/get"]
when:
- key: request.headers[x-rfma-token]
values: ["test*"]
EOF
35 实战演练5:收集指标并监控应用
istio的指标暴露
- istio的/metrics
- envoy的:15090/stats/prometheus
36 实战演练6:集成ELK stack日志套件
es+kibana
kind: List
apiVersion: v1
items:
- apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: kibana
spec:
replicas: 1
template:
metadata:
name: kibana
labels:
app: kibana
spec:
containers:
- image: docker.elastic.co/kibana/kibana:6.4.0
name: kibana
env:
- name: ELASTICSEARCH_URL
value: "http://elasticsearch:9200"
ports:
- name: http
containerPort: 5601
- apiVersion: v1
kind: Service
metadata:
name: kibana
spec:
type: NodePort
ports:
- name: http
port: 5601
targetPort: 5601
nodePort: 32001
selector:
app: kibana
- apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: elasticsearch
spec:
replicas: 1
template:
metadata:
name: elasticsearch
labels:
app: elasticsearch
spec:
containers:
- image: docker.elastic.co/elasticsearch/elasticsearch:6.4.0
name: elasticsearch
env:
- name: network.host
value: "_site_"
- name: node.name
value: "${HOSTNAME}"
- name: discovery.zen.ping.unicast.hosts
value: "${ELASTICSEARCH_NODEPORT_SERVICE_HOST}"
- name: cluster.name
value: "test-single"
- name: ES_JAVA_OPTS
value: "-Xms128m -Xmx128m"
volumeMounts:
- name: es-data
mountPath: /usr/share/elasticsearch/data
volumes:
- name: es-data
emptyDir: {}
- apiVersion: v1
kind: Service
metadata:
name: elasticsearch-nodeport
spec:
type: NodePort
ports:
- name: http
port: 9200
targetPort: 9200
nodePort: 32002
- name: tcp
port: 9300
targetPort: 9300
nodePort: 32003
selector:
app: elasticsearch
- apiVersion: v1
kind: Service
metadata:
name: elasticsearch
spec:
clusterIP: None
ports:
- name: http
port: 9200
- name: tcp
port: 9300
selector:
app: elasticsearch
filebeat
kind: List
apiVersion: v1
items:
- apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
labels:
k8s-app: filebeat
kubernetes.io/cluster-service: "true"
app: filebeat-config
data:
filebeat.yml: |
processors:
- add_cloud_metadata:
filebeat.modules:
- module: system
filebeat.inputs:
- type: log
paths:
- /var/log/containers/*.log
symlinks: true
output.elasticsearch:
hosts: ['elasticsearch:9200']
logging.level: info
- apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: filebeat
labels:
k8s-app: filebeat
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
template:
metadata:
name: filebeat
labels:
app: filebeat
k8s-app: filebeat
kubernetes.io/cluster-service: "true"
spec:
containers:
- image: docker.elastic.co/beats/filebeat:6.4.0
name: filebeat
args: [
"-c", "/home/filebeat-config/filebeat.yml",
"-e",
]
securityContext:
runAsUser: 0
volumeMounts:
- name: filebeat-storage
mountPath: /var/log/containers
- name: varlogpods
mountPath: /var/log/pods
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
- name: "filebeat-volume"
mountPath: "/home/filebeat-config"
volumes:
- name: filebeat-storage
hostPath:
path: /var/log/containers
- name: varlogpods
hostPath:
path: /var/log/pods
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: filebeat-volume
configMap:
name: filebeat-config
- apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
namespace: elk
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
- apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: filebeat
labels:
k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
verbs:
- get
- watch
- list
- apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
namespace: elk
labels:
k8s-app: filebeat
37 实战演练7:集成分布式追踪工具
安装
# clone repo
https://github.com/jaegertracing/jaeger-operator.git
# 修改watch namespace 为空
WATCH_NAMESPACE =
# install crd
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
# 创建 ns
k create ns observability
# 1. install operator
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/service_account.yaml
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role_binding.yaml
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/operator.yaml
# install cluster role
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/cluster_role.yaml
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/cluster_role_binding.yaml
# apply jaeger
k apply -f examples/simplest.yaml -n observability
# 2. 集成istio
# --set values.global.tracer.zipkin.address=<jaeger-collector-service>.<jaeger-collector-namespace>:9411
istioctl manifest apply \
--set values.global.tracer.zipkin.address=simplest-collector.observability:9411 \
--set values.tracing.ingress.enabled=true \
--set values.pilot.traceSampling=100
重启pod
#添加注入agent
sidecar.jaegertracing.io/inject: "true"
kubectl patch deployment productpage-v1 -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"
38 调试工具和方法:调试网格的工具和方法有哪些?
- istioctl命令行
- controlZ自检工具
- envoy的admin接口
- pilot debug接口
istioctl
安装部署相关
- istioctl verify-install 验证集群环境
- istioctl mainfest apply/diff/generate/migrate/versions
- istioctl profile list/diff
- istioctl analyze 配置验证
- istioctl dashboard controlz/envoy/grafana/jaeger/kiali/prometheus/zipkin
- istioctl kube-inject istio注入
网络配置检查
配置同步状态
istioctl ps(proxy-status) <pod>
状态有:SYNCED(已经下发)/NOT SENT(没有下发)/STALE(下发了但是pod没有ack)
配置信息
istioctl pc(proxy-config) cluster/route <pod-name.namespace>
pod相关的网格配置信息
istioctl x describe pod <pod-name>
验证是否在网格内,验证VirtualService,验证DestinationRule,验证路由等
配置诊断
istioctl analyze -n <namespace>
istioctl analyze --use-kube=false a.yaml my-app-config/
controlz
istioctl d contronz <pod> -n <namespace>
常见的是对LoggingScopes日志级别的调整,这个是pilot等服务的功能
envoy amdin
istioctl d envoy <pod>.<namespace>
或
kubectl port-forward pod-name <pod>:15000
如果需要修改log级别可以post方法到
pilot debug接口
kubectl port-forward service/istio-pilot -n istio-system 8080:8080
通过:8080/debug查看接口
下载的文件可以通过
go tool pprof <filename>
输入top查看
39 实践经验总结:实际落地中的常见问题有哪些
常见问题
503错误
- 上游服务不可用(UH)
- 连接不上(UF)
- 熔断(UO)
- 路由配置(NR)
解决方式可以是
- 通过Enovy的debug日志RESPONSE_FLAGS判断
- 保证配置可用性,先配置后应用
请求中断分析
之前是直连的方式,就是A->B,而在Istio中,变为了A->A'->B'->B,导致了无法判断出现问题的节点
解决方式
- 根据requestid串联上下游请求
- 分析Envoy日志中的上上下游元组信息
路由规则没有生效
- Pod和Service是否满足定义要求
- 是否占用了Istio的默认端口
- 是否配置下发配置延迟
- 使用Kiali的配置验证
路由规则冲突
解决方式
- 避免交叉定义
- 使用Kiali的配置验证
VirtualService作用域
主要是gateway的字段
- 作用于网关:
- 作用于网格内部: mesh或者空
- 作用于以上二者:显示加上
和mesh
JWT身份认证失败
- 确保设置了正确的jwks或jwksUrl
- 确保issuer设置正确
- 确保token正确,没有过期
TLS/mTLS连接失败
- 确保Citadel运行正常
- 确保授权策略下发正确
- 查看策略生效的作用域
- 确保Client端和Server端都正确的配置了自动mTLS
性能问题
控制平面
- 使用controlZ观察内存,GC情况
- 使用pilot的debug接口
数据平面
- 使用Envoy的admin API
流量控制
总是配置默认路由
kind: VirtualService
...
spec:
hosts:
- reviews
http:
- match:
- headers:
end-user:
exact: jason
route:
- destination:
host: reviews
subset: v2
- route:
- destination:
host: reviews
subset: v1
防止因为
配置命名空间可见性
kind: VirtualService
...
spec:
hosts:
- myservice.com
exportTo:
- "."
http:
- route:
- destination:
host: myservice
拆分复杂路由配置
将不同的服务配置进行拆分成单一的配置
注意配置生效的顺序
因为Istio Pilot下发配置的时候不能保证执行顺序,所以可能有路由信息先下发,调用的子集还没有下发,就会有服务不可用的问题
在添加子集的时候
- 更新DestinationRules,添加新的子集
- 更新使用它的VirtualServices
在删除子集的时候
- 从VirtualServices删除对该子集的所有引用
- 从DestinationRules中删除子集
安全相关建议
- 通过命名空间隔离访问权限
- 尽可能使用ISTIO_MUTUAL模式,自动管理证书密钥
- JWT平滑过渡:先添加新的再删除旧的
- mTLS平滑过渡:优先使用permissive模式
- 使用第三方ServiceAccountToken:
set values.global.jwtPolicy=third-party-jwt
问题排查
善用debug工具
- istioctl proxy-status/proxy-config/analyze
- istio-pilot debug接口8080
- envoy admin接口15000
关联故障发生的日志,监控数据进行分析
40 未来架构-从ServiceMesh迈向云原生
云原生的最终形态——微逻辑和Mecha组件