CentOS7上kubernetes服务初体验

  |   0 评论   |   0 浏览

背景

目前kubernetes最新社区稳定版本为v1.15.2

组件

  • docker: 容器
  • kubelet: pods和节点生命周期管理的agent, 每个节点部署一个
  • kubectl: 命令行控制工具,在master上使用
  • kubeadm: 用来bootstrap kubernetes集群

安装

安装kubelet

修改源信息

增加文件 /etc/yum.repos.d/kubernetes.repo,内容为:

[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

安装包

yum install -y kubelet kubeadm kubectl

启动kubelet

systemctl enable kubelet
systemctl start kubelet

安装docker

修改源

wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo

安装包

yum install -y docker-ce

修改cgroup驱动

修改cgroup驱动:cgroupfs -> systemd

创建文件 /etc/docker/daemon.json,内容为:

{
  "exec-opts": ["native.cgroupdriver=systemd"]
}

启动docker

启动docker,并且设置为开机启动

systemctl start docker
systemctl enable docker kubelet

bootstrap集群

查看需要的镜像

 kubeadm config images list
W0810 16:34:24.725390   26710 version.go:98] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
W0810 16:34:24.725465   26710 version.go:99] falling back to the local client version: v1.15.2
k8s.gcr.io/kube-apiserver:v1.15.2
k8s.gcr.io/kube-controller-manager:v1.15.2
k8s.gcr.io/kube-scheduler:v1.15.2
k8s.gcr.io/kube-proxy:v1.15.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.10
k8s.gcr.io/coredns:1.3.1

下载镜像

由于和k8s.gcr.io网络不通,只能曲线下载。

REGISTRY=gcr.azk8s.cn/google-containers

docker pull ${REGISTRY}/kube-apiserver:v1.15.2
docker pull ${REGISTRY}/kube-controller-manager:v1.15.2
docker pull ${REGISTRY}/kube-scheduler:v1.15.2
docker pull ${REGISTRY}/kube-proxy:v1.15.2
docker pull ${REGISTRY}/pause:3.1
docker pull ${REGISTRY}/etcd:3.3.10
docker pull ${REGISTRY}/coredns:1.3.1

docker tag ${REGISTRY}/kube-apiserver:v1.15.2 k8s.gcr.io/kube-apiserver:v1.15.2
docker tag ${REGISTRY}/kube-controller-manager:v1.15.2 k8s.gcr.io/kube-controller-manager:v1.15.2
docker tag ${REGISTRY}/kube-scheduler:v1.15.2 k8s.gcr.io/kube-scheduler:v1.15.2
docker tag ${REGISTRY}/kube-proxy:v1.15.2 k8s.gcr.io/kube-proxy:v1.15.2
docker tag ${REGISTRY}/pause:3.1 k8s.gcr.io/pause:3.1
docker tag ${REGISTRY}/etcd:3.3.10 k8s.gcr.io/etcd:3.3.10
docker tag ${REGISTRY}/coredns:1.3.1 k8s.gcr.io/coredns:1.3.1

# 删除原来镜像
docker rmi ${REGISTRY}/kube-apiserver:v1.15.2
docker rmi ${REGISTRY}/kube-controller-manager:v1.15.2
docker rmi ${REGISTRY}/kube-scheduler:v1.15.2
docker rmi ${REGISTRY}/kube-proxy:v1.15.2
docker rmi ${REGISTRY}/pause:3.1
docker rmi ${REGISTRY}/etcd:3.3.10
docker rmi ${REGISTRY}/coredns:1.3.1

初始化集群

查看集群默认配置

# kubeadm config print init-defaults
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 1.2.3.4
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: 172-19-120-198
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.14.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}

使用默认配置初始化集群

如果使用Flannel,init时必须要加上 --pod-network-cidr=10.244.0.0/16参数,如果有多网卡的,需要指定网卡--api-advertise-addresses=。如果已经安装过,而没有加,可以使用 kubeadm reset来重新初始化。

kubeadm init --pod-network-cidr 10.244.0.0/16

结果

[init] Using Kubernetes version: v1.15.2
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.1. Latest validated version: 18.09
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [172-19-120-198 localhost] and IPs [172.19.120.198 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [172-19-120-198 localhost] and IPs [172.19.120.198 127.0.0.1 ::1]
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [172-19-120-198 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.19.120.198]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 19.501895 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.15" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node 172-19-120-198 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node 172-19-120-198 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: vc7hcw.v297nbb3j06ok3at
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.19.120.198:6443 --token ktiiwq.7xdu0zof4224ce6o \
    --discovery-token-ca-cert-hash sha256:390c780bf0879b8fc8e5a8b52f5f80e7bed1e49614d5ca376c0c84f9ee9caf93

配置环境变量

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

配置flatten

# wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# kubectl apply -f kube-flannel.yml

确认master安装成功

# kubectl get nodes
NAME             STATUS   ROLES    AGE     VERSION
172-19-120-198   Ready    master   3m43s   v1.15.2
# kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                     READY   STATUS    RESTARTS   AGE     IP               NODE             NOMINATED NODE   READINESS GATES
kube-system   coredns-5c98db65d4-cz8x2                 1/1     Running   0          3m19s   10.244.0.2       172-19-120-198   <none>           <none>
kube-system   coredns-5c98db65d4-vzrrh                 1/1     Running   0          3m19s   10.244.0.3       172-19-120-198   <none>           <none>
kube-system   etcd-172-19-120-198                      1/1     Running   0          2m19s   172.19.120.198   172-19-120-198   <none>           <none>
kube-system   kube-apiserver-172-19-120-198            1/1     Running   0          2m24s   172.19.120.198   172-19-120-198   <none>           <none>
kube-system   kube-controller-manager-172-19-120-198   1/1     Running   0          2m33s   172.19.120.198   172-19-120-198   <none>           <none>
kube-system   kube-flannel-ds-amd64-llktb              1/1     Running   0          2m32s   172.19.120.198   172-19-120-198   <none>           <none>
kube-system   kube-proxy-zvdsm                         1/1     Running   0          3m19s   172.19.120.198   172-19-120-198   <none>           <none>
kube-system   kube-scheduler-172-19-120-198            1/1     Running   0          2m17s   172.19.120.198   172-19-120-198   <none>           <none>

向集群中添加node

增加netfilter模型,否则网络会有问题

# modprobe br_netfilter
# kubeadm join 172.19.120.198:6443 --token ktiiwq.7xdu0zof4224ce6o  --discovery-token-ca-cert-hash sha256:390c780bf0879b8fc8e5a8b52f5f80e7bed1e49614d5ca376c0c84f9ee9caf93
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.1. Latest validated version: 18.09
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

如果执行错了,可以reset

kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0810 21:24:21.782720   24113 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

查看状态

# kubectl get cs
NAME                 STATUS    MESSAGE             ERROR
controller-manager   Healthy   ok
scheduler            Healthy   ok
etcd-0               Healthy   {"health":"true"}

查看集群状态

# kubectl get nodes -o wide
NAME             STATUS   ROLES    AGE     VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION              CONTAINER-RUNTIME
172-19-120-198   Ready    master   5m57s   v1.15.2   172.19.120.198   <none>        CentOS Linux 7 (Core)   3.10.0-693.2.2.el7.x86_64   docker://19.3.1
172-19-120-201   Ready    <none>   80s     v1.15.2   172.19.120.201   <none>        CentOS Linux 7 (Core)   3.10.0-693.2.2.el7.x86_64   docker://19.3.1
172-19-120-202   Ready    <none>   42s     v1.15.2   172.19.120.202   <none>        CentOS Linux 7 (Core)   3.10.0-693.2.2.el7.x86_64   docker://19.3.1
172-19-120-203   Ready    <none>   41s     v1.15.2   172.19.120.203   <none>        CentOS Linux 7 (Core)   3.10.0-693.2.2.el7.x86_64   docker://19.3.1

重启kubelet

# 重载所有修改过的配置文件
systemctl daemon-reload 
# 重启kubelet 
systemctl start kubelet.service

查看集群信息

# kubectl cluster-info
Kubernetes master is running at https://172.19.120.198:6443
KubeDNS is running at https://172.19.120.198:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

查看所有pods

# kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                     READY   STATUS    RESTARTS   AGE     IP               NODE             NOMINATED NODE   READINESS GATES
kube-system   coredns-5c98db65d4-cz8x2                 1/1     Running   0          6m6s    10.244.0.2       172-19-120-198   <none>           <none>
kube-system   coredns-5c98db65d4-vzrrh                 1/1     Running   0          6m6s    10.244.0.3       172-19-120-198   <none>           <none>
kube-system   etcd-172-19-120-198                      1/1     Running   0          5m6s    172.19.120.198   172-19-120-198   <none>           <none>
kube-system   kube-apiserver-172-19-120-198            1/1     Running   0          5m11s   172.19.120.198   172-19-120-198   <none>           <none>
kube-system   kube-controller-manager-172-19-120-198   1/1     Running   0          5m20s   172.19.120.198   172-19-120-198   <none>           <none>
kube-system   kube-flannel-ds-amd64-8cgkr              1/1     Running   0          72s     172.19.120.202   172-19-120-202   <none>           <none>
kube-system   kube-flannel-ds-amd64-drtwj              1/1     Running   0          110s    172.19.120.201   172-19-120-201   <none>           <none>
kube-system   kube-flannel-ds-amd64-llktb              1/1     Running   0          5m19s   172.19.120.198   172-19-120-198   <none>           <none>
kube-system   kube-flannel-ds-amd64-mv6pt              1/1     Running   0          71s     172.19.120.203   172-19-120-203   <none>           <none>
kube-system   kube-proxy-fl5n9                         1/1     Running   0          71s     172.19.120.203   172-19-120-203   <none>           <none>
kube-system   kube-proxy-kbz6k                         1/1     Running   0          72s     172.19.120.202   172-19-120-202   <none>           <none>
kube-system   kube-proxy-pvx57                         1/1     Running   0          110s    172.19.120.201   172-19-120-201   <none>           <none>
kube-system   kube-proxy-zvdsm                         1/1     Running   0          6m6s    172.19.120.198   172-19-120-198   <none>           <none>
kube-system   kube-scheduler-172-19-120-198            1/1     Running   0          5m4s    172.19.120.198   172-19-120-198   <none>           <none>

安装helm

下载helm

# wget "https://get.helm.sh/helm-v2.14.3-linux-amd64.tar.gz"

# tar xvf helm-v2.14.3-linux-amd64.tar.gz
linux-amd64/
linux-amd64/helm
linux-amd64/README.md
linux-amd64/LICENSE
linux-amd64/tiller

# cd linux-amd64/
# mv helm /usr/local/bin/
# mv tiller /usr/local/bin/

安装 tailer (国内因网络原因不可行)

创建 rbac-config.yaml文件,内容如下:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

安装

# kubectl create -f rbac-config.yaml
serviceaccount/tiller created
clusterrolebinding.rbac.authorization.k8s.io/tiller created

查看状态

# kubectl get pods -n kube-system
NAME                                     READY   STATUS         RESTARTS   AGE
tiller-deploy-8557598fbc-mszs7           0/1     ErrImagePull   0          65s

发现没有启动起来,看日志查看原因:

# kubectl describe pod tiller-deploy-8557598fbc-mszs7 -n kube-system
Failed to pull image "gcr.io/kubernetes-helm/tiller:v2.14.3"

单独下载镜像

# docker pull gcr.azk8s.cn/kubernetes-helm/tiller:v2.14.3

改tag
# docker tag gcr.azk8s.cn/kubernetes-helm/tiller:v2.14.3 gcr.io/kubernetes-helm/tiller:v2.14.3

重新部署

删除一下刚才的pods
# kubectl delete pods tiller-deploy-8557598fbc-mszs7 -n kube-system

确认部署正常

# kubectl get pods -n kube-system
NAME                                     READY   STATUS    RESTARTS   AGE
tiller-deploy-8557598fbc-hmskh           1/1     Running   0          18m

重新初始化tiller

# helm reset -f
# helm init --service-account tiller --tiller-image gcr.io/kubernetes-helm/tiller:v2.14.3 --skip-refresh
$HELM_HOME has been configured at /root/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.

Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
# kubectl get pod -n kube-system -l app=helm
NAME                             READY   STATUS    RESTARTS   AGE
tiller-deploy-8557598fbc-z7cg4   1/1     Running   0          4m53s

修改helm chart仓库的地址为azure提供的镜像地址:

# helm repo add stable http://mirror.azure.cn/kubernetes/charts
"stable" has been added to your repositories

# helm repo list
NAME  	URL
stable	http://mirror.azure.cn/kubernetes/charts
local 	http://127.0.0.1:8879/charts

使用Helm部署Nginx Ingress

使用Helm将Nginx Ingress部署到边缘节点上。

将 172-19-120-201 做为边缘节点。

# kubectl label node 172-19-120-201 node-role.kubernetes.io/edge=
node/172-19-120-201 labeled

# kubectl get node
NAME             STATUS   ROLES    AGE   VERSION
172-19-120-198   Ready    master   62m   v1.15.2
172-19-120-201   Ready    edge     59m   v1.15.2
172-19-120-202   Ready    <none>   59m   v1.15.2
172-19-120-203   Ready    <none>   59m   v1.15.2

创建 ingress-nginx.yaml文件,内容如下:

controller:
  replicaCount: 1
  hostNetwork: true
  nodeSelector:
    node-role.kubernetes.io/edge: ''
  affinity:
    podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - nginx-ingress
            - key: component
              operator: In
              values:
              - controller
          topologyKey: kubernetes.io/hostname
  tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: PreferNoSchedule
defaultBackend:
  nodeSelector:
    node-role.kubernetes.io/edge: ''
  tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: PreferNoSchedule

其中:

  • 副本数为1:replicaCount 1
  • 调度到edge节点上: nodeSelector edge
  • 使用宿主机网络: hostNetwork: true

使用

# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "stable" chart repository
Update Complete.

# helm install stable/nginx-ingress -n nginx-ingress --namespace ingress-nginx -f ingress-nginx.yaml
NAME:   nginx-ingress
LAST DEPLOYED: Wed Aug 21 00:08:39 2019
NAMESPACE: ingress-nginx
STATUS: DEPLOYED

查看资源

# kubectl get pod -n ingress-nginx -o wide
NAME                                             READY   STATUS              RESTARTS   AGE   IP               NODE             NOMINATED NODE   READINESS GATES
nginx-ingress-controller-598c7fd878-smwwr        0/1     Running             0          36s   172.19.120.201   172-19-120-201   <none>           <none>
nginx-ingress-default-backend-7b8b45bd49-st4vh   0/1     ContainerCreating   0          36s   <none>           172-19-120-201   <none>           <none>

下载资源

# docker pull gcr.azk8s.cn/kubernetes-helm/tiller:v2.14.3

改tag
# docker tag gcr.azk8s.cn/kubernetes-helm/tiller:v2.14.3 gcr.io/kubernetes-helm/tiller:v2.14.3
k8s.gcr.io/defaultbackend-amd64:1.5

确认安装正常

# curl "172.19.120.201"
default backend - 404

使用Helm部署dashboard

卸载

如果之前安装过,可以先卸载。

helm delete kubernetes-dashboard
helm del --purge kubernetes-dashboard

安装

# helm install stable/kubernetes-dashboard --name kubernetes-dashboard --namespace kube-system
NAME:   kubernetes-dashboard
LAST DEPLOYED: Wed Aug 21 11:06:39 2019
NAMESPACE: kube-system
STATUS: DEPLOYED

查看状态,发现镜像拉取失败

# kubectl describe pod kubernetes-dashboard-77f54dc48f-5fdt4 -n kube-system

 Pulling image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1"

如果网络不通,可以单独下载镜像

# docker pull gcr.azk8s.cn/google-containers/defaultbackend-amd64:1.5

改tag
# docker tag gcr.azk8s.cn/google-containers/defaultbackend-amd64:1.5 k8s.gcr.io/defaultbackend-amd64:1.5

暴露服务

默认是通过 cluster ip方式暴露服务的。

我们现在改为node ip方式,来提供外网访问。

编辑服务的配置文件,

# kubectl edit service kubernetes-dashboard --namespace kube-system

ClusterIP 改为 NodePort

# kubectl get services --namespace kube-system
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
kubernetes-dashboard   NodePort    10.109.214.222   <none>        443:30404/TCP            59m

访问服务,浏览器打开 https://172.19.120.198:30404 即可

访问令牌

查看token

# kubectl get services --namespace kube-system
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
kubernetes-dashboard   NodePort    10.109.214.222   <none>        443:30404/TCP            59m

# kubectl describe -n kube-system secret/kubernetes-dashboard-token-xljml | grep ^token
token:      eyJhbxxxxVIw

仅使用这个token是可以登录的,但是登录进行没有查看内容的权限。

创建权限

# kubectl create serviceaccount dashboard-admin -n kube-system
serviceaccount/dashboard-admin created

# kubectl create clusterrolebinding dashboard-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
clusterrolebinding.rbac.authorization.k8s.io/dashboard-cluster-admin created

# kubectl get secret -n kube-system | grep ^dashboard-admin-token
dashboard-admin-token-zz66k                      kubernetes.io/service-account-token   3      91s

# kubectl describe secret dashboard-admin-token-zz66k -n kube-system | grep ^token
token:      eyJhxxxRYw

使用Helm部署metrics-server

卸载

如果之前安装过,可以先卸载。

helm delete metrics-server
helm del --purge metrics-server

安装

创建配置文件 metrics-server.yaml,如下:

args:
- --logtostderr
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
nodeSelector:
    node-role.kubernetes.io/edge: ''
tolerations:
    - key: node-role.kubernetes.io/master
      operator: Exists
      effect: NoSchedule
    - key: node-role.kubernetes.io/master
      operator: Exists
      effect: PreferNoSchedule

安装

# helm install stable/metrics-server --name metrics-server --namespace kube-system -f metrics-server.yaml
NAME:   metrics-server
LAST DEPLOYED: Wed Aug 21 14:15:38 2019
NAMESPACE: kube-system
STATUS: DEPLOYED

查看状态,发现镜像拉取失败

# kubectl describe pod metrics-server-d97f5c6d9-zgt7x -n kube-system

Failed to pull image "gcr.io/google_containers/metrics-server-amd64:v0.3.2"

单独下载镜像

# docker pull gcr.azk8s.cn/google-containers/metrics-server-amd64:v0.3.2

改tag
# docker tag gcr.azk8s.cn/google-containers/metrics-server-amd64:v0.3.2 gcr.io/google_containers/metrics-server-amd64:v0.3.2

查看基本信息

# kubectl top node
NAME             CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
172-19-120-198   193m         4%     5819Mi          36%
172-19-120-201   66m          1%     2168Mi          13%
172-19-120-202   37m          0%     1683Mi          10%
172-19-120-203   45m          1%     1694Mi          10%

常见问题排查

NotReady

# kubectl get nodes
NAME             STATUS     ROLES    AGE   VERSION
172-19-120-198   NotReady   master   96m   v1.15.2
172-19-120-201   NotReady   <none>   81m   v1.15.2
172-19-120-202   NotReady   <none>   67m   v1.15.2
172-19-120-203   NotReady   <none>   67m   v1.15.2

噢,这里居然是NotReady,怎么办呢?

查看pods状态

kubectl get pods -n kube-system -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP               NODE             NOMINATED NODE   READINESS GATES
coredns-5c98db65d4-nqpq4                 0/1     Pending   0          96m   <none>           <none>           <none>           <none>
coredns-5c98db65d4-wf4th                 0/1     Pending   0          96m   <none>           <none>           <none>           <none>
etcd-172-19-120-198                      1/1     Running   0          95m   172.19.120.198   172-19-120-198   <none>           <none>
kube-apiserver-172-19-120-198            1/1     Running   0          95m   172.19.120.198   172-19-120-198   <none>           <none>
kube-controller-manager-172-19-120-198   1/1     Running   0          95m   172.19.120.198   172-19-120-198   <none>           <none>
kube-proxy-dwz8l                         1/1     Running   0          68m   172.19.120.202   172-19-120-202   <none>           <none>
kube-proxy-qxv9p                         1/1     Running   0          81m   172.19.120.201   172-19-120-201   <none>           <none>
kube-proxy-tbn9w                         1/1     Running   0          67m   172.19.120.203   172-19-120-203   <none>           <none>
kube-proxy-xxmqw                         1/1     Running   0          96m   172.19.120.198   172-19-120-198   <none>           <none>
kube-scheduler-172-19-120-198            1/1     Running   0          95m   172.19.120.198   172-19-120-198   <none>           <none>

查看status不对的pod的日志,这里是查看coredns-5c98db65d4-nqpq4的日志。

# kubectl -n kube-system logs coredns-5c98db65d4-nqpq4

结果为空,即没有日志

查看master节点的日志,在master上执行

# journalctl -f -u kubelet
8月 10 22:14:00 172-19-120-198 kubelet[10749]: W0810 22:14:00.186949   10749 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
8月 10 22:14:00 172-19-120-198 kubelet[10749]: E0810 22:14:00.841316   10749 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

搜索了一下,是缺少 flannel 的 Pods

master安装 flannel

# wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# kubectl apply -f kube-flannel.yml

ClusterIP暴露服务

开启服务

kubectl proxy --address='0.0.0.0'  --accept-hosts='^*$' --port=31100

访问:

http://localhost:31100/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#/login

参考