Istio的http流量管理

时间：Feb. 26, 2019 分类：容器

http流量管理

Istio的核心功能是将大量微服务连接在一起，进行有效的远程服务调用。

可以进行有效的解决网络故障，升级，测试，扩缩容，故障隔离等

这边也可以参考ServiceMesh公众号

定义目标规则

Service和Deployment之间可以是一对一，也可以是一对多的关系。

kubernetes的Service是不具备选择后端的能力的，但是随机调用，在Istio中，同一服务的不同后端被称为子集（Subset），规则通过DestinationRule定义

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: flaskapp
spec:
  host: flaskapp.default.svc.cluster.local
  subsets: 
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2

host是一个必要字段，代表一个Service或者ServiceEntry(定义外部服务)
subsets根据选择器定义子集
trafficPolicy流量策略，在DestinationRule和Subsets都可以定义，但是Subset中设置的优先级更高

定义默认路由

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: flaskapp-default-v1
spec:
  hosts: 
  - flaskapp.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: flaskapp.default.svc.cluster.local
        subset: v1

流量的拆分和迁移

VirtualService的http.route下一级可以是一个数组

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: flaskapp
spec:
  hosts: 
  - flaskapp.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: flaskapp.default.svc.cluster.local
        subset: v1
      weight: 70
    - destination:
        host: flaskapp.default.svc.cluster.local
        subset: v2
      weight: 30

然后用100次请求做一下测试

$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v1 -o jsonpath={.items..metadata.name})
$ kubectl -n istio-system exec -it -c sleep $SOURCE_POD bash
bash-4.4# for i in `seq 100`; do http --body http://flaskapp/env/version ;done  | awk -F"v1" '{print NF-1}'
75

将v1和v2的权重比改为10和90

bash-4.4# for i in `seq 100`; do http --body http://flaskapp/env/version ;done  | awk -F"v1" '{print NF-1}'
13

将v1和v2的权重比改为0和100

bash-4.4# for i in `seq 100`; do http --body http://flaskapp/env/version ;done  | awk -F"v1" '{print NF-1}'
0

流量分配的权重和必须是100，如果不显示声明权重，则默认值为100

金丝雀部署

在发布新版本的时候，选择一部分用户作为金丝雀用户，访问的时候通过一个HEADER作为标识，其他用于都访问旧的版本

示例使用Header为lab: canary

配置VirtualService

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: flaskapp
spec:
  hosts:
  - flaskapp.default.svc.cluster.local
  http:
  - match:
    - headers:
        lab:
          exact: canary
    route:
    - destination:
        host: flaskapp.default.svc.cluster.local
        subset: v1
  - route:
    - destination:
        host: flaskapp.default.svc.cluster.local
        subset: v2

验证

$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v1 -o jsonpath={.items..metadata.name})
$ kubectl exec -it -c sleep $SOURCE_POD bash
bash-4.4# http --body http://flaskapp/env/version
v2

bash-4.4# http --body http://flaskapp/env/version
v2

bash-4.4# http --body http://flaskapp/env/version lab:why
v2

bash-4.4# http --body http://flaskapp/env/version lab:why
v2

bash-4.4# http --body http://flaskapp/env/version lab:canary
v1

bash-4.4# http --body http://flaskapp/env/version lab:canary
v1

支持不仅有HTTP Header，还有url，scheme，method，authority，端口，来源标签和gateway等

另一方面除了exact完全匹配，也可以通过prefix前缀匹配和regax正则匹配

根据来源服务进行路由

根据请求端Label

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: flaskapp
spec:
  hosts:
  - flaskapp.default.svc.cluster.local
  http:
  - match:
    - sourceLabels:
        app: sleep
        version: v1
    route:
    - destination:
        host: flaskapp.default.svc.cluster.local
        subset: v1
  - route:
    - destination:
        host: flaskapp.default.svc.cluster.local
        subset: v2

验证

$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v1 -o jsonpath={.items..metadata.name})
$ kubectl exec -it -c sleep $SOURCE_POD bash
bash-4.4# http --body http://flaskapp/env/version 
v1

bash-4.4# http --body http://flaskapp/env/version 
v1
bash-4.4# exit
$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v2 -o jsonpath={.items..metadata.name})
$ kubectl exec -it -c sleep $SOURCE_POD bash
bash-4.4# http --body http://flaskapp/env/version 
v2

bash-4.4# http --body http://flaskapp/env/version 
v2

可以看到v1的sleep请求flaskapp获取的version也是v1，而v2的sleep请求flaskapp获取得version也是v2

对URL进行重定向

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: flaskapp
spec:
  hosts:
  - flaskapp.default.svc.cluster.local
  http:
  - match:
    - sourceLabels:
        app: sleep
        version: v1
      uri:
         exact: "/env/HOSTNAME"
    redirect:
      uri: /env/version
  - route:
    - destination:
        host: flaskapp.default.svc.cluster.local
        subset: v2

验证

$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v2 -o jsonpath={.items..metadata.name})
$ kubectl exec -it -c sleep $SOURCE_POD bash
bash-4.4# http http://flaskapp/env/HOSTNAME
HTTP/1.1 200 OK
content-length: 28
content-type: text/html; charset=utf-8
date: Fri, 22 Feb 2019 09:09:39 GMT
server: envoy
x-envoy-upstream-service-time: 1

flaskapp-v2-787ddcbf8c-lgfgt
$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v1 -o jsonpath={.items..metadata.name})
$ kubectl exec -it -c sleep $SOURCE_POD bash
bash-4.4# http http://flaskapp/env/HOSTNAME
HTTP/1.1 301 Moved Permanently
content-length: 0
date: Fri, 22 Feb 2019 09:04:46 GMT
location: http://flaskapp/env/version
server: envoy

对于v2请求获取到的是主机名，而v1获取的是重定向的执行

使用--follow参数查看一下可以看到也是获取到对应的version了

bash-4.4# http --follow http://flaskapp/env/HOSTNAME
HTTP/1.1 200 OK
content-length: 2
content-type: text/html; charset=utf-8
date: Fri, 22 Feb 2019 09:08:49 GMT
server: envoy
x-envoy-upstream-service-time: 1

v2

但是这种方式，redirect会对url进行整体的替换，因此灵活性不高，并且301指令不能支持POST方法

需要通过类似rewrite的方式

这边先参考一个支持POST的httpbin

bash-4.4# http -f POST http://httpbin:8000/post
HTTP/1.1 200 OK
access-control-allow-credentials: true
access-control-allow-origin: *
content-length: 559
content-type: application/json
date: Fri, 22 Feb 2019 09:29:43 GMT
server: envoy
x-envoy-upstream-service-time: 9

{
    "args": {},
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Content-Length": "0",
        "Content-Type": "application/x-www-form-urlencoded; charset=utf-8",
        "Host": "httpbin:8000",
        "User-Agent": "HTTPie/0.9.9",
        "X-B3-Sampled": "0",
        "X-B3-Spanid": "8985aea6578dcc4b",
        "X-B3-Traceid": "8985aea6578dcc4b",
        "X-Request-Id": "98268bac-3e74-48c1-864c-06df74c27dcd"
    },
    "json": null,
    "origin": "127.0.0.1",
    "url": "http://httpbin:8000/post"
}

bash-4.4# http -f POST http://httpbin:8000/get
HTTP/1.1 405 Method Not Allowed
access-control-allow-credentials: true
access-control-allow-origin: *
allow: OPTIONS, HEAD, GET
content-length: 178
content-type: text/html
date: Fri, 22 Feb 2019 09:29:48 GMT
server: envoy
x-envoy-upstream-service-time: 9

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>405 Method Not Allowed</title>
<h1>Method Not Allowed</h1>
<p>The method is not allowed for the requested URL.</p>

对于get接口我们直接使用Pod方式获取到的是配置405方法不允许

为httpbin创建VirtualService

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.default.svc.cluster.local
  http:
  - match:
    - uri:
        exact: "/get"
    redirect:
      uri: /post
  - route:
    - destination:
        host: httpbin.default.svc.cluster.local

对POST的301请求进行验证

bash-4.4# http -f POST http://httpbin:8000/get
HTTP/1.1 301 Moved Permanently
content-length: 0
date: Fri, 22 Feb 2019 09:52:54 GMT
location: http://httpbin:8000/post
server: envoy

可以看到还没有获取到对应的数据

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.default.svc.cluster.local
  http:
  - match:
    - uri:
        exact: "/get"
    rewrite:
      uri: /post
    route:
    - destination:
        host: httpbin.default.svc.cluster.local
  - route:
    - destination:
        host: httpbin.default.svc.cluster.local

验证

bash-4.4# http -f POST http://httpbin:8000/get
HTTP/1.1 200 OK
access-control-allow-credentials: true
access-control-allow-origin: *
content-length: 597
content-type: application/json
date: Fri, 22 Feb 2019 10:03:45 GMT
server: envoy
x-envoy-upstream-service-time: 5

{
    "args": {},
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Content-Length": "0",
        "Content-Type": "application/x-www-form-urlencoded; charset=utf-8",
        "Host": "httpbin:8000",
        "User-Agent": "HTTPie/0.9.9",
        "X-B3-Sampled": "0",
        "X-B3-Spanid": "3408b8a4c435f9fc",
        "X-B3-Traceid": "3408b8a4c435f9fc",
        "X-Envoy-Original-Path": "/get",
        "X-Request-Id": "6862e97e-d3f0-4dcf-b265-46e828177124"
    },
    "json": null,
    "origin": "127.0.0.1",
    "url": "http://httpbin:8000/post"
}

还准备变rewrite和redirect是不能共存的

通信超时控制

对于服务问题或者网络问题，如果没进行管控，就会等待业务定义或服务端定义的超时时间，会代指业务处理时间大幅延长，如果故障持续存在会造成业务挤压，导致故障扩散

VirtualService可以设置超时的上限

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: httpbin.default.svc.cluster.local
    timeout: 3s

验证一下

这边/delay可以接受一个整数参数，在服务器延时指定秒数后返回响应

$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v2 -o jsonpath={.items..metadata.name})
$ kubectl exec -it -c sleep $SOURCE_POD bash
bash-4.4# http --body http://httpbin:8000/delay/2
{
    "args": {},
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Content-Length": "0",
        "Host": "httpbin:8000",
        "User-Agent": "HTTPie/0.9.9",
        "X-B3-Sampled": "0",
        "X-B3-Spanid": "76735cdf02abccf0",
        "X-B3-Traceid": "76735cdf02abccf0",
        "X-Request-Id": "3203e7f0-c2a3-47b3-ad26-ef6ae18b5d1e"
    },
    "origin": "127.0.0.1",
    "url": "http://httpbin:8000/delay/2"
}

bash-4.4# http --body http://httpbin:8000/delay/5
upstream request timeout

可以看到延迟2秒可以正常返回，而延迟5秒就超过了3秒的超时时间

可以支持通过请求的源和目的标签等实现超时策略的配置

故障重试控制

在服务调用过程中，会发生闪断的情况，目标服务在短时间内不可用，如果进行重试可能会继续顺利完成调用

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: httpbin.default.svc.cluster.local
    retries:
      attempts: 3

验证

$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v2 -o jsonpath={.items..metadata.name})
$ kubectl exec -it -c sleep $SOURCE_POD bash
bash-4.4# http http://httpbin:8000/status/500
HTTP/1.1 500 Internal Server Error
access-control-allow-credentials: true
access-control-allow-origin: *
content-length: 0
content-type: text/html; charset=utf-8
date: Mon, 25 Feb 2019 02:59:07 GMT
server: envoy
x-envoy-upstream-service-time: 224

如何判断请求了三次呢，这边可以通过查看httpbin中的istio-proxy容器，再次请求

$ kubectl logs -f httpbin-776487d667-2dsd2 -c istio-proxy
...
[2019-02-25T02:59:07.566Z] "GET /status/500HTTP/1.1" 500 - 0 0 3 1 "-" "HTTPie/0.9.9" "7748274e-b093-4283-8466-94ba376c1f2a" "httpbin:8000" "127.0.0.1:8000" inbound|8000||httpbin.default.svc.cluster.local - 10.244.2.149:8000 10.244.2.144:57380
[2019-02-25T02:59:07.593Z] "GET /status/500HTTP/1.1" 500 - 0 0 2 1 "-" "HTTPie/0.9.9" "7748274e-b093-4283-8466-94ba376c1f2a" "httpbin:8000" "127.0.0.1:8000" inbound|8000||httpbin.default.svc.cluster.local - 10.244.2.149:8000 10.244.2.144:57384
[2019-02-25T02:59:07.660Z] "GET /status/500HTTP/1.1" 500 - 0 0 1 1 "-" "HTTPie/0.9.9" "7748274e-b093-4283-8466-94ba376c1f2a" "httpbin:8000" "127.0.0.1:8000" inbound|8000||httpbin.default.svc.cluster.local - 10.244.2.149:8000 10.244.2.144:57388
[2019-02-25T02:59:07.788Z] "GET /status/500HTTP/1.1" 500 - 0 0 2 1 "-" "HTTPie/0.9.9" "7748274e-b093-4283-8466-94ba376c1f2a" "httpbin:8000" "127.0.0.1:8000" inbound|8000||httpbin.default.svc.cluster.local - 10.244.2.149:8000 10.244.2.144:57392

可以看到每次请求都会产生四条访问日志，即进行了一次请求和三次重试

如果将超时和重试放在一起

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: httpbin.default.svc.cluster.local
    retries:
      attempts: 3
      perTryTimeout: 1s
    timeout: 8s

验证

bash-4.4# time http http://httpbin:8000/delay/7
HTTP/1.1 504 Gateway Timeout
content-length: 24
content-type: text/plain
date: Mon, 25 Feb 2019 03:39:45 GMT
server: envoy

upstream request timeout


real    0m4.409s
user    0m0.310s
sys 0m0.048s

可以看到整体的请求时间为4s多，就是请求了四次，每次请求重试超时时间为1s导致的

这边修改一下这个请求重试时间

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: httpbin.default.svc.cluster.local
    retries:
      attempts: 3
      perTryTimeout: 8s
    timeout: 3s

验证

bash-4.4# time http http://httpbin:8000/delay/7
HTTP/1.1 504 Gateway Timeout
content-length: 24
content-type: text/plain
date: Mon, 25 Feb 2019 03:54:30 GMT
server: envoy

upstream request timeout


real    0m3.357s
user    0m0.312s
sys 0m0.044s

可以看到，实际是重试超时时间*重试次数和总体超时时间做比较进行判断，那个小那个会生效

入口流量管理

Gateway相当于网络边缘的一个负载均衡器，用于接受和处理网格边缘的出站和入站的网络连接，其中包含开放端口和TLS等配置

使用helm部署istio的时候可以通过指定gateway参数来实现启用Ingress Gateway

gateways:
  enabled: true

  istio-ingressgateway:
    enabled: true

默认也是开启的

默认情况下VirtualService的gateways字段是被设置为

  gateways:
  - mesh

mesh是istio内部的虚拟Gateway，代表所有的Sidecar

也就是所有网格内部服务之间都是通过这个网关进行进行的，如果要对外提供服务就需要Gateways

使用Gateway服务

和ingress服务类似，需要先定义一个Gateway

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata: 
  name: example-gateway
spec:
  seletor:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
      - "*.whysdomain.xyz"
      - "*.whysdomain.rocks"

Seletor是一个标签选择器，指定由那些Gateway的Pod来负责这个Gateway对象的运行
hosts字段用通配符域名指明这个Gateway可能要负责的主机名，可以使用kubectl get svc -n istio-system istio-ingressgateway查看

$ kubectl apply -f example-gateway.yml 
gateway.networking.istio.io/example-gateway created`
$ kubectl get svc -n istio-system istio-ingressgateway
NAME                   TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                                                                                   AGE
istio-ingressgateway   LoadBalancer   10.109.241.220   <pending>     80:31380/TCP,443:31390/TCP,31400:31400/TCP,15011:31179/TCP,8060:30246/TCP,853:32472/TCP,15030:32300/TCP,15031:31520/TCP   9d

将flaskapp.whysdomain.xyz解析到istio-ingressgateway的CLUSTER-IP

$ vi /etc/hosts
...
10.109.241.220 flaskapp.whysdomain.xyz

在外部使用http请求

$ yum install -y python2-httpie
$ http flaskapp.whysdomain.xyz/env/version
HTTP/1.1 404 Not Found
content-length: 0
date: Mon, 25 Feb 2019 06:51:26 GMT
server: envoy

由于没有指定响应的服务器，再度配置一下flaskapp的vs

kind: VirtualService
metadata:
  name: flaskapp
spec:
  hosts: 
  - flaskapp.default.svc.cluster.local
  - flaskapp.whysdomain.xyz
  gateways:
  - mesh
  - example-gateway
  http:
  - route:
    - destination:
        host: flaskapp.default.svc.cluster.local
        subset: v2

再在外部使用http请求

http flaskapp.whysdomain.xyz/env/version
HTTP/1.1 200 OK
content-length: 2
content-type: text/html; charset=utf-8
date: Mon, 25 Feb 2019 07:10:22 GMT
server: envoy
x-envoy-upstream-service-time: 2

v2

可以看到服务通过Gateway的方式开放了出来

gateway添加证书支持

使用证书文件创建Secret

kubectl create -n istio-system secret tls istio-ingressgateway-certs --key rocks/key.pem --cert rocks/cert.pem

配置对应的Gateway

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: example-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
      - "*.whysdomain.xyz"
      - "*.whysdomain.rocks"
    - port:
        number: 443
        name: https
        protocol: HTTPS
      tls:
        model: SIMPLE
        serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
        privateKey: /etc/istio/ingressgateway-certs/tls.key
      hosts:
      - "*.whysdomain.xyz"
      - "*.whysdomain.rocks"

这样就可以对flaskapp.whysdomain.xyz进行https请求

gateway添加多个证书支持

对于tls Secret只能包含一个证书对，一个Gateway是无法处理两个域名的HTTPS的，可以换用Ceneric

$ kubectl create secret generic istio-ingressgateway-certs -n istio-system \
  --from-file=rocks-cert.pem
  --from-file=rocks-key.pem
  --from-file=xyz.pem
  --from-file=xyz.pem
  secret/istio-ingressgateway-certs create

让后再配置对应Gateway即可

配置入口流量的路由

对于VirtualService的路由匹配功能也是对Ingress流量有效的

出口流量管理

Istio在对应用进行注入的时候，会劫持该应用的所有流量，在默认情况下网格内的应用是无法访问网格之外的服务的

bash-4.4# http http://api.jd.com
HTTP/1.1 404 Not Found
content-length: 0
date: Mon, 25 Feb 2019 08:16:09 GMT
server: envoy

但是对于网格内部服务发起对外部网络的请求是常见的需求，Istio提供了几种方式用于网格外部通信

设置Sidecar的流量劫持范围，根据IP地址来告知Sidecar，那些外部资源可以放开访问
注册ServiceEntry，将网格外部的服务使用ServiceEntry的方式注入到网络的内部

设置Sidecar的流量劫持范围

应用的流量劫持是通过istio-init容器完成的，有两种情况会影响流量劫持范围

设置values.yaml的proxy.includeIPRanges变量
使用Pod注解traffic.sidecar.istio.io/includeOutboundIPRanges表明劫持范围

第一种方式和之前修改Helm输入变得方式基本相同，但是需要重新创建需要被注入的Pod

第二种就是加入注解了

apiVersion: v1
kind: Service
metadata:
  name: sleep
  labels:
    app: sleep
spec:
  selector:
    app: sleep
  ports:
    - name: ssh
      port: 80
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: sleep-v1
spec:
  replicas: 1
  template:
    metadata:
      annotations:
        traffic.sidecar.istio.io/includeOutboundIPRanges: 10.96.0.0/12
      labels:
        app: sleep
        version: v1
    spec:
      containers:
      - name: sleep
        image: dustise/sleep
        imagePullPolicy: IfNotPresent
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: sleep-v2
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: sleep
        version: v2
    spec:
      containers:
      - name: sleep
        image: dustise/sleep
        imagePullPolicy: IfNotPresent

这里设置需要劫持的IP范围，要求初始化容器对这个IP范围内的IP进行劫持，这边我配置的子网就是集群的service-cluster-ip-range

可以通过kubectl -n kube-system get pod kube-apiserver-master -o json查看

在进入v1中进行请求

$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v1 -o jsonpath={.items..metadata.name})
$ kubectl exec -it -c sleep $SOURCE_POD bash
bash-4.4# http http://api.jd.com
HTTP/1.1 200 OK
Cache-Control: max-age=0
Connection: close
Content-Encoding: gzip
Content-Type: text/html
Date: Mon, 25 Feb 2019 08:40:33 GMT
ETag: W/"131-1547781676000"
Expires: Mon, 25 Feb 2019 08:40:33 GMT
Last-Modified: Fri, 18 Jan 2019 03:21:16 GMT
Server: JDWS/1.0.0
Transfer-Encoding: chunked
Vary: Accept-Encoding

<html>
<script type="text/javascript">
    window.location="http://jos.jd.com";
</script>
<body>
<h2></h2>
</body>
</html>

原理是访问外部地址的请求会从业务Pod发出，绕过Sidecar，完全不受Istio的监控，如果这边配置的子网不是集群IP会造成无法使用Sidecar的

设置ServiceEntry

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: api-jd
spec:
  hosts:
  - api.jd.com
  ports:
  - number: 80
    name: http
    protocol: HTTP
  resolution: DNS

记得将原来的sleep的annotations恢复进行验证

bash-4.4# http http://api.jd.com
HTTP/1.1 200 OK
cache-control: max-age=0
content-encoding: gzip
content-type: text/html
date: Mon, 25 Feb 2019 10:06:28 GMT
etag: W/"131-1547781676000"
expires: Mon, 25 Feb 2019 10:06:28 GMT
last-modified: Fri, 18 Jan 2019 03:21:16 GMT
server: envoy
transfer-encoding: chunked
vary: Accept-Encoding
x-envoy-upstream-service-time: 80

<html>
<script type="text/javascript">
    window.location="http://jos.jd.com";
</script>
<body>
<h2></h2>
</body>
</html>

可以看到也是可以正常访问的，这样做的好处就是可以使用VirtualService进行限制

这边使用httpbin.org先创建一个ServiceEntry

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: httpbin-ext
spec:
  hosts:
  - httpbin.org
  ports:
  - number: 80
    name: http
    protocol: HTTP
  resolution: DNS

这样就可以访问外部了httpbin.org了

bash-4.4# http http://httpbin.org/delay/10
HTTP/1.1 200 OK
access-control-allow-credentials: true
access-control-allow-origin: *
content-encoding: gzip
content-length: 479
content-type: application/json
date: Mon, 25 Feb 2019 10:12:23 GMT
server: envoy
x-envoy-upstream-service-time: 10449

...

然后添加VirtualService

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.org
  http:
  - route:
    - destination:
        host: httpbin.org
    timeout: 3s

验证超时时间

bash-4.4# time http http://httpbin.org/delay/10
HTTP/1.1 504 Gateway Timeout
content-length: 24
content-type: text/plain
date: Mon, 25 Feb 2019 10:15:30 GMT
server: envoy

upstream request timeout


real    0m3.460s
user    0m0.398s
sys 0m0.058s

可以看到超时策略已经生效

新建gateway控制器

实际工作的时候可能需要设置多个不同的边缘网关来完成不同的任务，例如特定的节点提供了出站连接的能力，或者外部负载均衡器只能为部分服务器分发负载等，就需要对服务网格中的Gatewat控制器部署进行定制

不同用途的Gateway控制可以分布在不通的节点，或者使用不同数量的资源等

Istio在Helm的chart中提供了一个新建Gateway的功能，可以在对输入值进行定制，使用Helm指令生成新的控制器

~/istio-1.0.4/install/kubernetes/helm/istio/values.yaml

gateways:
  enabled: true

  istio-myingress:
    enabled: true
    labels:
      app: istio-ingressgateway
      istio: myingress
    replicaCount: 3
    autoscaleMax: 5
    resources: {}
    cpu:
      targetAverageUtilization: 80
    loadBalancerIP: ""
    serviceAnnotations: {}
    type: LoadBalancer 
    ports:
    - port: 80
      targetPort: 80
      name: http-myingress

  istio-ingressgateway:
    enabled: true
    labels:

添加一个自定义的istio-myingress，这边type可以指定为NodePort, ClusterIP或LoadBalancer

部署

$ helm template install/kubernetes/helm/istio --name istio --namespace istio-system -f install/kubernetes/helm/istio/values.yaml | kubectl apply -f -
$ kubectl -n istio-system get svc istio-myingress
NAME              TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
istio-myingress   LoadBalancer   10.107.204.213   <pending>     80:31117/TCP   76s

为httpbin服务创建一个新的Gateway对象

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata: 
  name: my-gateway
spec:
  selector:
    istio: myingress
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
      - "*"

在这个定义中开放了80端口，使用了通配的主机名，Selector选择定义的gateway

为httpbin创建VirtualService

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
    - "*"
  gateways:
    - my-gateway
  http:
    - route:
       - destination:
           host: httpbin.default.svc.cluster.local

创建对应的解析记录

10.107.204.213 httpbin.whysdomain.xyz

然后验证一下

time http httpbin.whysdomain.xyz/delay/2
HTTP/1.1 200 OK
access-control-allow-credentials: true
access-control-allow-origin: *
content-length: 526
content-type: application/json
date: Tue, 26 Feb 2019 04:39:42 GMT
server: envoy
x-envoy-upstream-service-time: 2005

{
    "args": {}, 
    "data": "", 
    "files": {}, 
    "form": {}, 
    "headers": {
        "Accept": "*/*", 
        "Accept-Encoding": "gzip, deflate", 
        "Content-Length": "0", 
        "Host": "httpbin.whysdomain.xyz", 
        "User-Agent": "HTTPie/0.9.4", 
        "X-B3-Sampled": "0", 
        "X-B3-Spanid": "d90dc1be9a929e42", 
        "X-B3-Traceid": "d90dc1be9a929e42", 
        "X-Envoy-Internal": "true", 
        "X-Request-Id": "fd22206a-8565-4d26-a828-89951ab12db2"
    }, 
    "origin": "10.244.0.0", 
    "url": "http://httpbin.whysdomain.xyz/delay/2"
}


real    0m2.290s
user    0m0.226s
sys 0m0.053s

设置服务熔断

服务熔断是一种保护性措施，在服务实例无法提供正常的服务的情况下，将其从负载均衡池中移除，不再为其分配任务，避免在故障实例上挤压更多的任务，并且可以在等待服务能力恢复之后，重新讲发生故障的Pod加载到负载均衡池

这边Istio也是采用非侵入式的服务熔断功能，通过DestinationRule对象完成

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: httpbin
spec:
  host: httpbin
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
    outlierDetection:
      consecutiveErrors: 1
      interval: 1s
      baseEjectionTime: 3m
      maxEjectionPercent: 100

设置的比较极端

TCP和HTTP连接池大小都被设置为1
只允许出错一次
每秒做一次请求计数
可以从负载均衡池中移除100%的Pod
发生故障的Pod最少在被移除3分钟之后才能再次加入负载均衡池

先测试一下正常情况

bash-4.4# wrk -c 3 -t 3 http://httpbin:8000/ip
Running 10s test @ http://httpbin:8000/ip
  3 threads and 3 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.79ms    1.29ms  20.41ms   93.56%
    Req/Sec   373.74     59.60   454.00     81.67%
  11172 requests in 10.01s, 2.74MB read
Requests/sec:   1116.00
Transfer/sec:    280.09KB

然后设置熔断

bash-4.4# wrk -c 3 -t 3 http://httpbin:8000/ip
Running 10s test @ http://httpbin:8000/ip
  3 threads and 3 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.78ms    1.36ms  21.81ms   77.47%
    Req/Sec   367.15     40.42   525.00     78.67%
  10982 requests in 10.02s, 2.63MB read
  Non-2xx or 3xx responses: 2003
Requests/sec:   1096.51
Transfer/sec:    268.98KB

可以看到，同样进行了10000多的请求，但是2xx或3xx的数量只有2003

故障注入测试

在微服务的测试过程中，需要对网络故障场景进行模拟，Istio也提供了两种故障注入的能力，延迟和中断

Istio通过VirtualService实现

注入延迟

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: httpbin.default.svc.cluster.local
    fault:
      delay:
        fixedDelay: 3s
        percent: 100

为服务设置一个3秒的延迟

percent 是一个百分比，用于指定注入延迟的比率，默认也为100
fixedDelay 表明延迟的时间长度，必须大于1毫秒

验证

bash-4.4# time http http://httpbin:8000/delay/1
HTTP/1.1 200 OK
access-control-allow-credentials: true
access-control-allow-origin: *
content-length: 472
content-type: application/json
date: Tue, 26 Feb 2019 02:36:09 GMT
server: envoy
x-envoy-upstream-service-time: 1006

{
    "args": {},
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Content-Length": "0",
        "Host": "httpbin:8000",
        "User-Agent": "HTTPie/0.9.9",
        "X-B3-Sampled": "0",
        "X-B3-Spanid": "646f8d770c83a818",
        "X-B3-Traceid": "646f8d770c83a818",
        "X-Request-Id": "2ad7a43e-f7a4-4d24-8216-e1540c587da4"
    },
    "origin": "127.0.0.1",
    "url": "http://httpbin:8000/delay/1"
}


real    0m4.430s
user    0m0.387s
sys 0m0.034s

可以明显看到请求要求的延时为1s，但是实际延迟了4s，证明注入的延迟生效了

但是如果对于和服务的超时设置叠加在一起是怎样呢

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: httpbin.default.svc.cluster.local
    fault:
      delay:
        fixedDelay: 3s
        percent: 100
    timeout: 3s

验证一下

bash-4.4# time http --body http://httpbin:8000/delay/1
{
    "args": {},
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Content-Length": "0",
        "Host": "httpbin:8000",
        "User-Agent": "HTTPie/0.9.9",
        "X-B3-Sampled": "0",
        "X-B3-Spanid": "0bdcf3e55d31a3d7",
        "X-B3-Traceid": "0bdcf3e55d31a3d7",
        "X-Request-Id": "533daead-dbe4-41f6-a788-d6cdf0a3b6c0"
    },
    "origin": "127.0.0.1",
    "url": "http://httpbin:8000/delay/1"
}


real    0m4.416s
user    0m0.370s
sys 0m0.036s
bash-4.4# time http --body http://httpbin:8000/delay/3
upstream request timeout


real    0m6.393s
user    0m0.350s
sys 0m0.040s

对于1s的请求，可以看到设置2s超时，注入的3s延迟不会计算到这个超时时间内的，这边如果调用3s的请求，就会导致超时

注入中断

可以通过向服务调用过程中注入中断的方式，测试服务中断通信的结果

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
    - httpbin.default.svc.cluster.local
  http:
    - match:
      - sourceLabels:
          version: v1
      route:
        - destination:
            host: httpbin.default.svc.cluster.local
      fault:
        abort:
          httpStatus: 500
          percent: 100
    - route:
       - destination:
           host: httpbin.default.svc.cluster.local

验证

$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v1 -o jsonpath={.items..metadata.name})
$ kubectl exec -it -c sleep $SOURCE_POD bash 
bash-4.4# http http://httpbin:8000/ip
HTTP/1.1 500 Internal Server Error
content-length: 18
content-type: text/plain
date: Tue, 26 Feb 2019 03:35:05 GMT
server: envoy

fault filter abort

bash-4.4# exit
$ export SOURCE_POD=$(kubectl get pod -l app=sleep,version=v2 -o jsonpath={.items..metadata.name})
$ kubectl exec -it -c sleep $SOURCE_POD bash
bash-4.4# http http://httpbin:8000/ip 
HTTP/1.1 200 OK
access-control-allow-credentials: true
access-control-allow-origin: *
content-length: 28
content-type: application/json
date: Tue, 26 Feb 2019 03:35:22 GMT
server: envoy
x-envoy-upstream-service-time: 4

{
    "origin": "127.0.0.1"
}

对于v1的请求会注入HTTP 500的错误，而对于v2的请求是不收影响

流量复制

流量复制是另一个用于测试的功能，可以吧指向一个服务版本的流量复制出来一份，发送给另一个服务版本，但是复制出来的流量不会等待响应，因此对正常应用的性能影响较小

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: flaskapp
spec:
  hosts:
  - flaskapp.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: flaskapp.default.svc.cluster.local
        subset: v1
    mirror:
      host: flaskapp.default.svc.cluster.local
      subset: v2

验证

bash-4.4# http http://flaskapp/env/version
HTTP/1.1 200 OK
content-length: 2
content-type: text/html; charset=utf-8
date: Tue, 26 Feb 2019 03:41:59 GMT
server: envoy
x-envoy-upstream-service-time: 1

v1

查看一下flaskapp-v1和flaskapp-v2的

$ kubectl logs -f flaskapp-v1-76b45bc665-t48d6  -c flaskapp | tail 
...
10.244.2.144 - - [26/Feb/2019:03:40:37 +0000] "GET /env/version HTTP/1.1" 200 2 "-" "HTTPie/0.9.9" "-"
$ kubectl logs -f flaskapp-v2-787ddcbf8c-lgfgt  -c flaskapp | tail 
...
10.244.2.144 - - [26/Feb/2019:03:40:37 +0000] "GET /env/version HTTP/1.1" 200 2 "-" "HTTPie/0.9.9" "10.244.2.144"

可以在日志中看到请求到v1的流量也被复制了一份到v2

火眼征信大数据工程师闫大佬