CentOS7上kubernetes服务初体验
背景
目前kubernetes最新社区稳定版本为v1.15.2。
组件
- docker: 容器
- kubelet: pods和节点生命周期管理的agent, 每个节点部署一个
- kubectl: 命令行控制工具,在master上使用
- kubeadm: 用来bootstrap kubernetes集群
安装
安装kubelet
修改源信息
增加文件 /etc/yum.repos.d/kubernetes.repo
,内容为:
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
安装包
yum install -y kubelet kubeadm kubectl
启动kubelet
systemctl enable kubelet
systemctl start kubelet
安装docker
修改源
wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
安装包
yum install -y docker-ce
修改cgroup驱动
修改cgroup驱动:cgroupfs -> systemd
创建文件 /etc/docker/daemon.json
,内容为:
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
启动docker
启动docker,并且设置为开机启动
systemctl start docker
systemctl enable docker kubelet
bootstrap集群
查看需要的镜像
kubeadm config images list
W0810 16:34:24.725390 26710 version.go:98] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
W0810 16:34:24.725465 26710 version.go:99] falling back to the local client version: v1.15.2
k8s.gcr.io/kube-apiserver:v1.15.2
k8s.gcr.io/kube-controller-manager:v1.15.2
k8s.gcr.io/kube-scheduler:v1.15.2
k8s.gcr.io/kube-proxy:v1.15.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.10
k8s.gcr.io/coredns:1.3.1
下载镜像
由于和k8s.gcr.io网络不通,只能曲线下载。
REGISTRY=gcr.azk8s.cn/google-containers
docker pull ${REGISTRY}/kube-apiserver:v1.15.2
docker pull ${REGISTRY}/kube-controller-manager:v1.15.2
docker pull ${REGISTRY}/kube-scheduler:v1.15.2
docker pull ${REGISTRY}/kube-proxy:v1.15.2
docker pull ${REGISTRY}/pause:3.1
docker pull ${REGISTRY}/etcd:3.3.10
docker pull ${REGISTRY}/coredns:1.3.1
docker tag ${REGISTRY}/kube-apiserver:v1.15.2 k8s.gcr.io/kube-apiserver:v1.15.2
docker tag ${REGISTRY}/kube-controller-manager:v1.15.2 k8s.gcr.io/kube-controller-manager:v1.15.2
docker tag ${REGISTRY}/kube-scheduler:v1.15.2 k8s.gcr.io/kube-scheduler:v1.15.2
docker tag ${REGISTRY}/kube-proxy:v1.15.2 k8s.gcr.io/kube-proxy:v1.15.2
docker tag ${REGISTRY}/pause:3.1 k8s.gcr.io/pause:3.1
docker tag ${REGISTRY}/etcd:3.3.10 k8s.gcr.io/etcd:3.3.10
docker tag ${REGISTRY}/coredns:1.3.1 k8s.gcr.io/coredns:1.3.1
# 删除原来镜像
docker rmi ${REGISTRY}/kube-apiserver:v1.15.2
docker rmi ${REGISTRY}/kube-controller-manager:v1.15.2
docker rmi ${REGISTRY}/kube-scheduler:v1.15.2
docker rmi ${REGISTRY}/kube-proxy:v1.15.2
docker rmi ${REGISTRY}/pause:3.1
docker rmi ${REGISTRY}/etcd:3.3.10
docker rmi ${REGISTRY}/coredns:1.3.1
初始化集群
查看集群默认配置
# kubeadm config print init-defaults
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1.2.3.4
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: 172-19-120-198
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.14.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}
使用默认配置初始化集群
如果使用Flannel,init时必须要加上 --pod-network-cidr=10.244.0.0/16
参数,如果有多网卡的,需要指定网卡--api-advertise-addresses=
。如果已经安装过,而没有加,可以使用 kubeadm reset
来重新初始化。
kubeadm init --pod-network-cidr 10.244.0.0/16
结果
[init] Using Kubernetes version: v1.15.2
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.1. Latest validated version: 18.09
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [172-19-120-198 localhost] and IPs [172.19.120.198 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [172-19-120-198 localhost] and IPs [172.19.120.198 127.0.0.1 ::1]
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [172-19-120-198 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.19.120.198]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 19.501895 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.15" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node 172-19-120-198 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node 172-19-120-198 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: vc7hcw.v297nbb3j06ok3at
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 172.19.120.198:6443 --token ktiiwq.7xdu0zof4224ce6o \
--discovery-token-ca-cert-hash sha256:390c780bf0879b8fc8e5a8b52f5f80e7bed1e49614d5ca376c0c84f9ee9caf93
配置环境变量
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
配置flatten
# wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# kubectl apply -f kube-flannel.yml
确认master安装成功
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172-19-120-198 Ready master 3m43s v1.15.2
# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-5c98db65d4-cz8x2 1/1 Running 0 3m19s 10.244.0.2 172-19-120-198 <none> <none>
kube-system coredns-5c98db65d4-vzrrh 1/1 Running 0 3m19s 10.244.0.3 172-19-120-198 <none> <none>
kube-system etcd-172-19-120-198 1/1 Running 0 2m19s 172.19.120.198 172-19-120-198 <none> <none>
kube-system kube-apiserver-172-19-120-198 1/1 Running 0 2m24s 172.19.120.198 172-19-120-198 <none> <none>
kube-system kube-controller-manager-172-19-120-198 1/1 Running 0 2m33s 172.19.120.198 172-19-120-198 <none> <none>
kube-system kube-flannel-ds-amd64-llktb 1/1 Running 0 2m32s 172.19.120.198 172-19-120-198 <none> <none>
kube-system kube-proxy-zvdsm 1/1 Running 0 3m19s 172.19.120.198 172-19-120-198 <none> <none>
kube-system kube-scheduler-172-19-120-198 1/1 Running 0 2m17s 172.19.120.198 172-19-120-198 <none> <none>
向集群中添加node
增加netfilter模型,否则网络会有问题
# modprobe br_netfilter
# kubeadm join 172.19.120.198:6443 --token ktiiwq.7xdu0zof4224ce6o --discovery-token-ca-cert-hash sha256:390c780bf0879b8fc8e5a8b52f5f80e7bed1e49614d5ca376c0c84f9ee9caf93
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.1. Latest validated version: 18.09
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
如果执行错了,可以reset
kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0810 21:24:21.782720 24113 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
查看状态
# kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
查看集群状态
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
172-19-120-198 Ready master 5m57s v1.15.2 172.19.120.198 <none> CentOS Linux 7 (Core) 3.10.0-693.2.2.el7.x86_64 docker://19.3.1
172-19-120-201 Ready <none> 80s v1.15.2 172.19.120.201 <none> CentOS Linux 7 (Core) 3.10.0-693.2.2.el7.x86_64 docker://19.3.1
172-19-120-202 Ready <none> 42s v1.15.2 172.19.120.202 <none> CentOS Linux 7 (Core) 3.10.0-693.2.2.el7.x86_64 docker://19.3.1
172-19-120-203 Ready <none> 41s v1.15.2 172.19.120.203 <none> CentOS Linux 7 (Core) 3.10.0-693.2.2.el7.x86_64 docker://19.3.1
重启kubelet
# 重载所有修改过的配置文件
systemctl daemon-reload
# 重启kubelet
systemctl start kubelet.service
查看集群信息
# kubectl cluster-info
Kubernetes master is running at https://172.19.120.198:6443
KubeDNS is running at https://172.19.120.198:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
查看所有pods
# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-5c98db65d4-cz8x2 1/1 Running 0 6m6s 10.244.0.2 172-19-120-198 <none> <none>
kube-system coredns-5c98db65d4-vzrrh 1/1 Running 0 6m6s 10.244.0.3 172-19-120-198 <none> <none>
kube-system etcd-172-19-120-198 1/1 Running 0 5m6s 172.19.120.198 172-19-120-198 <none> <none>
kube-system kube-apiserver-172-19-120-198 1/1 Running 0 5m11s 172.19.120.198 172-19-120-198 <none> <none>
kube-system kube-controller-manager-172-19-120-198 1/1 Running 0 5m20s 172.19.120.198 172-19-120-198 <none> <none>
kube-system kube-flannel-ds-amd64-8cgkr 1/1 Running 0 72s 172.19.120.202 172-19-120-202 <none> <none>
kube-system kube-flannel-ds-amd64-drtwj 1/1 Running 0 110s 172.19.120.201 172-19-120-201 <none> <none>
kube-system kube-flannel-ds-amd64-llktb 1/1 Running 0 5m19s 172.19.120.198 172-19-120-198 <none> <none>
kube-system kube-flannel-ds-amd64-mv6pt 1/1 Running 0 71s 172.19.120.203 172-19-120-203 <none> <none>
kube-system kube-proxy-fl5n9 1/1 Running 0 71s 172.19.120.203 172-19-120-203 <none> <none>
kube-system kube-proxy-kbz6k 1/1 Running 0 72s 172.19.120.202 172-19-120-202 <none> <none>
kube-system kube-proxy-pvx57 1/1 Running 0 110s 172.19.120.201 172-19-120-201 <none> <none>
kube-system kube-proxy-zvdsm 1/1 Running 0 6m6s 172.19.120.198 172-19-120-198 <none> <none>
kube-system kube-scheduler-172-19-120-198 1/1 Running 0 5m4s 172.19.120.198 172-19-120-198 <none> <none>
安装helm
下载helm
# wget "https://get.helm.sh/helm-v2.14.3-linux-amd64.tar.gz"
# tar xvf helm-v2.14.3-linux-amd64.tar.gz
linux-amd64/
linux-amd64/helm
linux-amd64/README.md
linux-amd64/LICENSE
linux-amd64/tiller
# cd linux-amd64/
# mv helm /usr/local/bin/
# mv tiller /usr/local/bin/
安装 tailer (国内因网络原因不可行)
创建 rbac-config.yaml
文件,内容如下:
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
安装
# kubectl create -f rbac-config.yaml
serviceaccount/tiller created
clusterrolebinding.rbac.authorization.k8s.io/tiller created
查看状态
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
tiller-deploy-8557598fbc-mszs7 0/1 ErrImagePull 0 65s
发现没有启动起来,看日志查看原因:
# kubectl describe pod tiller-deploy-8557598fbc-mszs7 -n kube-system
Failed to pull image "gcr.io/kubernetes-helm/tiller:v2.14.3"
单独下载镜像
# docker pull gcr.azk8s.cn/kubernetes-helm/tiller:v2.14.3
改tag
# docker tag gcr.azk8s.cn/kubernetes-helm/tiller:v2.14.3 gcr.io/kubernetes-helm/tiller:v2.14.3
重新部署
删除一下刚才的pods
# kubectl delete pods tiller-deploy-8557598fbc-mszs7 -n kube-system
确认部署正常
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
tiller-deploy-8557598fbc-hmskh 1/1 Running 0 18m
重新初始化tiller
# helm reset -f
# helm init --service-account tiller --tiller-image gcr.io/kubernetes-helm/tiller:v2.14.3 --skip-refresh
$HELM_HOME has been configured at /root/.helm.
Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
# kubectl get pod -n kube-system -l app=helm
NAME READY STATUS RESTARTS AGE
tiller-deploy-8557598fbc-z7cg4 1/1 Running 0 4m53s
修改helm chart仓库的地址为azure提供的镜像地址:
# helm repo add stable http://mirror.azure.cn/kubernetes/charts
"stable" has been added to your repositories
# helm repo list
NAME URL
stable http://mirror.azure.cn/kubernetes/charts
local http://127.0.0.1:8879/charts
使用Helm部署Nginx Ingress
使用Helm将Nginx Ingress部署到边缘节点上。
将 172-19-120-201 做为边缘节点。
# kubectl label node 172-19-120-201 node-role.kubernetes.io/edge=
node/172-19-120-201 labeled
# kubectl get node
NAME STATUS ROLES AGE VERSION
172-19-120-198 Ready master 62m v1.15.2
172-19-120-201 Ready edge 59m v1.15.2
172-19-120-202 Ready <none> 59m v1.15.2
172-19-120-203 Ready <none> 59m v1.15.2
创建 ingress-nginx.yaml
文件,内容如下:
controller:
replicaCount: 1
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/edge: ''
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx-ingress
- key: component
operator: In
values:
- controller
topologyKey: kubernetes.io/hostname
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: PreferNoSchedule
defaultBackend:
nodeSelector:
node-role.kubernetes.io/edge: ''
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: PreferNoSchedule
其中:
- 副本数为1:replicaCount 1
- 调度到edge节点上: nodeSelector edge
- 使用宿主机网络: hostNetwork: true
使用
# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "stable" chart repository
Update Complete.
# helm install stable/nginx-ingress -n nginx-ingress --namespace ingress-nginx -f ingress-nginx.yaml
NAME: nginx-ingress
LAST DEPLOYED: Wed Aug 21 00:08:39 2019
NAMESPACE: ingress-nginx
STATUS: DEPLOYED
查看资源
# kubectl get pod -n ingress-nginx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-ingress-controller-598c7fd878-smwwr 0/1 Running 0 36s 172.19.120.201 172-19-120-201 <none> <none>
nginx-ingress-default-backend-7b8b45bd49-st4vh 0/1 ContainerCreating 0 36s <none> 172-19-120-201 <none> <none>
下载资源
# docker pull gcr.azk8s.cn/kubernetes-helm/tiller:v2.14.3
改tag
# docker tag gcr.azk8s.cn/kubernetes-helm/tiller:v2.14.3 gcr.io/kubernetes-helm/tiller:v2.14.3
k8s.gcr.io/defaultbackend-amd64:1.5
确认安装正常
# curl "172.19.120.201"
default backend - 404
使用Helm部署dashboard
卸载
如果之前安装过,可以先卸载。
helm delete kubernetes-dashboard
helm del --purge kubernetes-dashboard
安装
# helm install stable/kubernetes-dashboard --name kubernetes-dashboard --namespace kube-system
NAME: kubernetes-dashboard
LAST DEPLOYED: Wed Aug 21 11:06:39 2019
NAMESPACE: kube-system
STATUS: DEPLOYED
查看状态,发现镜像拉取失败
# kubectl describe pod kubernetes-dashboard-77f54dc48f-5fdt4 -n kube-system
Pulling image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1"
如果网络不通,可以单独下载镜像
# docker pull gcr.azk8s.cn/google-containers/defaultbackend-amd64:1.5
改tag
# docker tag gcr.azk8s.cn/google-containers/defaultbackend-amd64:1.5 k8s.gcr.io/defaultbackend-amd64:1.5
暴露服务
默认是通过 cluster ip
方式暴露服务的。
我们现在改为node ip
方式,来提供外网访问。
编辑服务的配置文件,
# kubectl edit service kubernetes-dashboard --namespace kube-system
将 ClusterIP
改为 NodePort
# kubectl get services --namespace kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard NodePort 10.109.214.222 <none> 443:30404/TCP 59m
访问服务,浏览器打开 https://172.19.120.198:30404
即可
访问令牌
查看token
# kubectl get services --namespace kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard NodePort 10.109.214.222 <none> 443:30404/TCP 59m
# kubectl describe -n kube-system secret/kubernetes-dashboard-token-xljml | grep ^token
token: eyJhbxxxxVIw
仅使用这个token是可以登录的,但是登录进行没有查看内容的权限。
创建权限
# kubectl create serviceaccount dashboard-admin -n kube-system
serviceaccount/dashboard-admin created
# kubectl create clusterrolebinding dashboard-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
clusterrolebinding.rbac.authorization.k8s.io/dashboard-cluster-admin created
# kubectl get secret -n kube-system | grep ^dashboard-admin-token
dashboard-admin-token-zz66k kubernetes.io/service-account-token 3 91s
# kubectl describe secret dashboard-admin-token-zz66k -n kube-system | grep ^token
token: eyJhxxxRYw
使用Helm部署metrics-server
卸载
如果之前安装过,可以先卸载。
helm delete metrics-server
helm del --purge metrics-server
安装
创建配置文件 metrics-server.yaml
,如下:
args:
- --logtostderr
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
nodeSelector:
node-role.kubernetes.io/edge: ''
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: PreferNoSchedule
安装
# helm install stable/metrics-server --name metrics-server --namespace kube-system -f metrics-server.yaml
NAME: metrics-server
LAST DEPLOYED: Wed Aug 21 14:15:38 2019
NAMESPACE: kube-system
STATUS: DEPLOYED
查看状态,发现镜像拉取失败
# kubectl describe pod metrics-server-d97f5c6d9-zgt7x -n kube-system
Failed to pull image "gcr.io/google_containers/metrics-server-amd64:v0.3.2"
单独下载镜像
# docker pull gcr.azk8s.cn/google-containers/metrics-server-amd64:v0.3.2
改tag
# docker tag gcr.azk8s.cn/google-containers/metrics-server-amd64:v0.3.2 gcr.io/google_containers/metrics-server-amd64:v0.3.2
查看基本信息
# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
172-19-120-198 193m 4% 5819Mi 36%
172-19-120-201 66m 1% 2168Mi 13%
172-19-120-202 37m 0% 1683Mi 10%
172-19-120-203 45m 1% 1694Mi 10%
常见问题排查
NotReady
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172-19-120-198 NotReady master 96m v1.15.2
172-19-120-201 NotReady <none> 81m v1.15.2
172-19-120-202 NotReady <none> 67m v1.15.2
172-19-120-203 NotReady <none> 67m v1.15.2
噢,这里居然是NotReady,怎么办呢?
查看pods状态
kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-5c98db65d4-nqpq4 0/1 Pending 0 96m <none> <none> <none> <none>
coredns-5c98db65d4-wf4th 0/1 Pending 0 96m <none> <none> <none> <none>
etcd-172-19-120-198 1/1 Running 0 95m 172.19.120.198 172-19-120-198 <none> <none>
kube-apiserver-172-19-120-198 1/1 Running 0 95m 172.19.120.198 172-19-120-198 <none> <none>
kube-controller-manager-172-19-120-198 1/1 Running 0 95m 172.19.120.198 172-19-120-198 <none> <none>
kube-proxy-dwz8l 1/1 Running 0 68m 172.19.120.202 172-19-120-202 <none> <none>
kube-proxy-qxv9p 1/1 Running 0 81m 172.19.120.201 172-19-120-201 <none> <none>
kube-proxy-tbn9w 1/1 Running 0 67m 172.19.120.203 172-19-120-203 <none> <none>
kube-proxy-xxmqw 1/1 Running 0 96m 172.19.120.198 172-19-120-198 <none> <none>
kube-scheduler-172-19-120-198 1/1 Running 0 95m 172.19.120.198 172-19-120-198 <none> <none>
查看status不对的pod的日志,这里是查看coredns-5c98db65d4-nqpq4的日志。
# kubectl -n kube-system logs coredns-5c98db65d4-nqpq4
结果为空,即没有日志
查看master节点的日志,在master上执行
# journalctl -f -u kubelet
8月 10 22:14:00 172-19-120-198 kubelet[10749]: W0810 22:14:00.186949 10749 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
8月 10 22:14:00 172-19-120-198 kubelet[10749]: E0810 22:14:00.841316 10749 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
搜索了一下,是缺少 flannel 的 Pods
master安装 flannel
# wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# kubectl apply -f kube-flannel.yml
ClusterIP暴露服务
开启服务
kubectl proxy --address='0.0.0.0' --accept-hosts='^*$' --port=31100
访问:
http://localhost:31100/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#/login