极客时间-Service Mesh实战(实战部分)

时间:March 14, 2021 分类:

目录:

31 实战演练1:项目准备和构建过程

flux是一个gitops operator for kubernetes,是一个基于gitops的自动化部署工具,具有以下特点

  • 自动同步
  • 声明式
  • 基于代码pull request

安装流程

  1. 安装fluxctl工具,mac可以brew install fluxctl工具
  2. 创建flux的ns
  3. 下载flux的安装包,github.com/fluxcd/flux
  4. 运行deploy目录

deploy文件中,有git地址等需要调整,也可以通过fluxctl install的方式进行部署

fluxctl install \
--git-user=xxx \
--git-email=xxx@xxx \
--git-url=git@github.com:xxx/smdemo \
--namespace=flux | kubectl apply -f -

查看ssh-key的命令为fluxctl identity

我们需要将配置文件放到对应的git地址,执行fluxctl sync即可,这个过程默认也会5分钟执行一次

32 实战演练2:实现自动化灰度发布

Flagger

Flagger会根据配置,创建virtualservice,通过权重的方式控制流量

helm repo add flagger https://flagger.app
kubectl apply -f https://raw.githubusercontent.com/weaveworks/flagger/master/artifacts/flagger/crd.yaml
# 部署flagger with istio
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set crd.create=false \
--set meshProvider=istio \
--set metricsServer=http://prometheus.istio-system:9090
# grafana
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus.istio-system:9090 \
--set user=admin \
--set password=admin

通过ingress将服务暴露

# ingress for expose mesh
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: public-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"

创建Canary

# 创建 canary 分析
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: httpbin
  namespace: demo
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: httpbin
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 60
  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    name: httpbin
  service:
    # service port number
    port: 8000
    # container port number or name (optional)
    targetPort: 80
    # Istio gateways (optional)
    gateways:
    - public-gateway.istio-system.svc.cluster.local
  analysis:
    # schedule interval (default 60s)
    interval: 30s
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 100
    # canary increment step
    # percentage (0-100)
    stepWeight: 20
    metrics:
    - name: request-success-rate
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      thresholdRange:
        min: 99
      interval: 1m
    - name: latency
      templateRef:
        name: latency
        namespace: istio-system
      # maximum req duration P99
      # milliseconds
      thresholdRange:
        max: 500
      interval: 30s
    # testing (optional)
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://httpbin-canary.demo:8000/headers"

在创建了这个后,会帮助你创建

  • virtualservice 帮助控制流量
  • deployment 有primary,而原有的deploy的副本数调整为0,并且变为primary提供pod
  • service 有primary和canary

当有更新的时候,自动创建canary的deployment

在更新完之后canary变为primary

33 实战演练3:提升系统的弹性能力

弹性设计主要包括:

  • 容错性:重试,幂等
  • 伸缩性:自动扩容
  • 过载保护:超时、熔断、限流和降级
  • 弹性测试:故障注入

1.5之前是有限流功能的,但是性能问题废弃了

超时

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
  namespace: demo
spec:
  hosts:
  - "*"
  gateways:
  - httpbin-gateway
  http:
  - route:
    - destination:
        host: httpbin
        port:
          number: 8000
    timeout: 1s

重试

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
  namespace: demo
spec:
  hosts:
  - "*"
  gateways:
  - httpbin-gateway
  http:
  - route:
    - destination:
        host: httpbin
        port:
          number: 8000
    retry:
      attempts: 3
      perTryTimeout: 1s
    timeout: 8s

熔断

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: httpbin
  namespace: demo
spec:
  host: httpbin
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
      outlierDetection:
        consecutiveErrors: 1
        interval: 1s
        baseEjectionTime: 3m
        maxEjectionPercent: 100

34 实战演练4:配置安全策略

istio的安全有

  • istiod:ca证书颁发
  • apiserver:下发认证策略
  • envoy
  • sidecar
  • sidecar-agent:负责证书和密钥的供应

创建授权

# 创建授权 - 特定服务,注意没有rule,表示deny当前服务
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: httpbin
  namespace: demo
spec:
  selector:
    matchLabels:
      app: httpbin
EOF
# 来源必须是demo ns
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: demo
spec:
 action: ALLOW
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/demo/sa/sleep"]
   - source:
       namespaces: ["demo"]
# 只容许特定接口
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: demo
spec:
 action: ALLOW
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/demo/sa/sleep"]
   - source:
       namespaces: ["demo"]
   to:
   - operation:
       methods: ["GET"]
       paths: ["/get"]
EOF
# 其他特定条件 - 请求头
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: demo
spec:
 action: ALLOW
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/demo/sa/sleep"]
   - source:
       namespaces: ["demo"]
   to:
   - operation:
       methods: ["GET"]
       paths: ["/get"]
   when:
   - key: request.headers[x-rfma-token]
     values: ["test*"]
EOF

35 实战演练5:收集指标并监控应用

istio的指标暴露

  • istio的/metrics
  • envoy的:15090/stats/prometheus

36 实战演练6:集成ELK stack日志套件

es+kibana

kind: List
apiVersion: v1
items:
- apiVersion: apps/v1beta1
  kind: Deployment
  metadata:
    name: kibana
  spec:
    replicas: 1
    template:
      metadata:
        name: kibana
        labels:
          app: kibana
      spec:
        containers:
        - image: docker.elastic.co/kibana/kibana:6.4.0
          name: kibana
          env:
          - name: ELASTICSEARCH_URL
            value: "http://elasticsearch:9200"
          ports:
          - name: http
            containerPort: 5601
- apiVersion: v1
  kind: Service
  metadata:
    name: kibana
  spec:
    type: NodePort
    ports:
    - name: http
      port: 5601
      targetPort: 5601 
      nodePort: 32001
    selector:
      app: kibana            
- apiVersion: apps/v1beta1
  kind: Deployment
  metadata:
    name: elasticsearch
  spec:
    replicas: 1
    template:
      metadata:
        name: elasticsearch
        labels:
          app: elasticsearch
      spec:
        containers:
        - image: docker.elastic.co/elasticsearch/elasticsearch:6.4.0
          name: elasticsearch
          env:
          - name: network.host
            value: "_site_"
          - name: node.name
            value: "${HOSTNAME}"
          - name: discovery.zen.ping.unicast.hosts
            value: "${ELASTICSEARCH_NODEPORT_SERVICE_HOST}"
          - name: cluster.name
            value: "test-single"
          - name: ES_JAVA_OPTS
            value: "-Xms128m -Xmx128m"
          volumeMounts:
          - name: es-data
            mountPath: /usr/share/elasticsearch/data
        volumes:
          - name: es-data
            emptyDir: {}
- apiVersion: v1
  kind: Service
  metadata: 
    name: elasticsearch-nodeport
  spec:
    type: NodePort
    ports:
    - name: http
      port: 9200
      targetPort: 9200
      nodePort: 32002
    - name: tcp
      port: 9300
      targetPort: 9300
      nodePort: 32003
    selector:
      app: elasticsearch
- apiVersion: v1
  kind: Service
  metadata:
    name: elasticsearch
  spec:
    clusterIP: None
    ports:
    - name: http
      port: 9200
    - name: tcp
      port: 9300
    selector:
      app: elasticsearch

filebeat

kind: List
apiVersion: v1
items:
- apiVersion: v1
  kind: ConfigMap
  metadata:
    name: filebeat-config
    labels:
      k8s-app: filebeat
      kubernetes.io/cluster-service: "true"
      app: filebeat-config
  data:
    filebeat.yml: |
      processors:
        - add_cloud_metadata:
      filebeat.modules:
      - module: system
      filebeat.inputs:
      - type: log
        paths:
          - /var/log/containers/*.log
        symlinks: true
      output.elasticsearch:
        hosts: ['elasticsearch:9200']
      logging.level: info        
- apiVersion: extensions/v1beta1
  kind: Deployment 
  metadata:
    name: filebeat
    labels:
      k8s-app: filebeat
      kubernetes.io/cluster-service: "true"
  spec:
    replicas: 1
    template:
      metadata:
        name: filebeat
        labels:
          app: filebeat
          k8s-app: filebeat
          kubernetes.io/cluster-service: "true"
      spec:
        containers:
        - image: docker.elastic.co/beats/filebeat:6.4.0
          name: filebeat
          args: [
            "-c", "/home/filebeat-config/filebeat.yml",
            "-e",
          ]
          securityContext:
            runAsUser: 0
          volumeMounts:
          - name: filebeat-storage
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
          - name: "filebeat-volume"
            mountPath: "/home/filebeat-config"
        volumes:
          - name: filebeat-storage
            hostPath:
              path: /var/log/containers
          - name: varlogpods
            hostPath:
              path: /var/log/pods
          - name: varlibdockercontainers
            hostPath:
              path: /var/lib/docker/containers
          - name: filebeat-volume
            configMap:
              name: filebeat-config
- apiVersion: rbac.authorization.k8s.io/v1beta1
  kind: ClusterRoleBinding
  metadata:
    name: filebeat
  subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: elk
  roleRef:
    kind: ClusterRole
    name: filebeat
    apiGroup: rbac.authorization.k8s.io
- apiVersion: rbac.authorization.k8s.io/v1beta1
  kind: ClusterRole
  metadata:
    name: filebeat
    labels:
      k8s-app: filebeat
  rules:
  - apiGroups: [""] # "" indicates the core API group
    resources:
    - namespaces
    - pods
    verbs:
    - get
    - watch
    - list
- apiVersion: v1
  kind: ServiceAccount
  metadata:
    name: filebeat
    namespace: elk
    labels:
      k8s-app: filebeat

37 实战演练7:集成分布式追踪工具

安装

# clone repo
https://github.com/jaegertracing/jaeger-operator.git

# 修改watch namespace 为空
WATCH_NAMESPACE = 

# install crd
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml

# 创建 ns
k create ns observability

# 1. install operator
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/service_account.yaml
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role_binding.yaml
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/operator.yaml
# install cluster role
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/cluster_role.yaml
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/cluster_role_binding.yaml

# apply jaeger
k apply -f examples/simplest.yaml -n observability

# 2. 集成istio
# --set values.global.tracer.zipkin.address=<jaeger-collector-service>.<jaeger-collector-namespace>:9411
istioctl manifest apply \
--set values.global.tracer.zipkin.address=simplest-collector.observability:9411 \
--set values.tracing.ingress.enabled=true \
--set values.pilot.traceSampling=100

重启pod

#添加注入agent
sidecar.jaegertracing.io/inject: "true"

kubectl patch deployment productpage-v1 -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"

38 调试工具和方法:调试网格的工具和方法有哪些?

  • istioctl命令行
  • controlZ自检工具
  • envoy的admin接口
  • pilot debug接口

istioctl

安装部署相关

  • istioctl verify-install 验证集群环境
  • istioctl mainfest apply/diff/generate/migrate/versions
  • istioctl profile list/diff
  • istioctl analyze 配置验证
  • istioctl dashboard controlz/envoy/grafana/jaeger/kiali/prometheus/zipkin
  • istioctl kube-inject istio注入

网络配置检查

配置同步状态

istioctl ps(proxy-status) <pod>

状态有:SYNCED(已经下发)/NOT SENT(没有下发)/STALE(下发了但是pod没有ack)

配置信息

istioctl pc(proxy-config) cluster/route <pod-name.namespace>

pod相关的网格配置信息

istioctl x describe pod <pod-name>

验证是否在网格内,验证VirtualService,验证DestinationRule,验证路由等

配置诊断

istioctl analyze -n <namespace>
istioctl analyze --use-kube=false a.yaml my-app-config/

controlz

istioctl d contronz <pod> -n <namespace>

常见的是对LoggingScopes日志级别的调整,这个是pilot等服务的功能

envoy amdin

istioctl d envoy <pod>.<namespace>
或
kubectl port-forward pod-name <pod>:15000

如果需要修改log级别可以post方法到:15000/logging?level=debug

pilot debug接口

kubectl  port-forward service/istio-pilot -n istio-system 8080:8080

通过:8080/debug查看接口

下载的文件可以通过

go tool pprof <filename>

输入top查看

39 实践经验总结:实际落地中的常见问题有哪些

常见问题

503错误

  • 上游服务不可用(UH)
  • 连接不上(UF)
  • 熔断(UO)
  • 路由配置(NR)

解决方式可以是

  • 通过Enovy的debug日志RESPONSE_FLAGS判断
  • 保证配置可用性,先配置后应用

请求中断分析

之前是直连的方式,就是A->B,而在Istio中,变为了A->A'->B'->B,导致了无法判断出现问题的节点

解决方式

  • 根据requestid串联上下游请求
  • 分析Envoy日志中的上上下游元组信息

路由规则没有生效

  • Pod和Service是否满足定义要求
  • 是否占用了Istio的默认端口
  • 是否配置下发配置延迟
  • 使用Kiali的配置验证

路由规则冲突

解决方式

  • 避免交叉定义
  • 使用Kiali的配置验证

VirtualService作用域

主要是gateway的字段

  • 作用于网关:
  • 作用于网格内部: mesh或者空
  • 作用于以上二者:显示加上和mesh

JWT身份认证失败

  • 确保设置了正确的jwks或jwksUrl
  • 确保issuer设置正确
  • 确保token正确,没有过期

TLS/mTLS连接失败

  • 确保Citadel运行正常
  • 确保授权策略下发正确
  • 查看策略生效的作用域
  • 确保Client端和Server端都正确的配置了自动mTLS

性能问题

控制平面

  • 使用controlZ观察内存,GC情况
  • 使用pilot的debug接口

数据平面

  • 使用Envoy的admin API

流量控制

总是配置默认路由

kind: VirtualService
...
spec: 
  hosts:
  - reviews
  http:
  - match:
    - headers:
        end-user:
          exact: jason
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1       

防止因为

配置命名空间可见性

kind: VirtualService
...
spec: 
  hosts:
  - myservice.com
  exportTo:
  - "."
  http:
  - route:
    - destination:
        host: myservice 

拆分复杂路由配置

将不同的服务配置进行拆分成单一的配置

注意配置生效的顺序

因为Istio Pilot下发配置的时候不能保证执行顺序,所以可能有路由信息先下发,调用的子集还没有下发,就会有服务不可用的问题

在添加子集的时候

  1. 更新DestinationRules,添加新的子集
  2. 更新使用它的VirtualServices

在删除子集的时候

  1. 从VirtualServices删除对该子集的所有引用
  2. 从DestinationRules中删除子集

安全相关建议

  • 通过命名空间隔离访问权限
  • 尽可能使用ISTIO_MUTUAL模式,自动管理证书密钥
  • JWT平滑过渡:先添加新的再删除旧的
  • mTLS平滑过渡:优先使用permissive模式
  • 使用第三方ServiceAccountToken:set values.global.jwtPolicy=third-party-jwt

问题排查

善用debug工具

  • istioctl proxy-status/proxy-config/analyze
  • istio-pilot debug接口8080
  • envoy admin接口15000

关联故障发生的日志,监控数据进行分析

40 未来架构-从ServiceMesh迈向云原生

云原生的最终形态——微逻辑和Mecha组件