A cerebro-worker is overloaded
To understand where the connections are coming from on an overloaded worker, collect the output from a few of the stats endpoints on the affected worker:
- Use the "System" tab in the Okera UI to identify the host and IP address for the "cerebro_worker:webui" service of the overloaded worker
- Option 1 (if the port is exposed):
- Paste the <host:port> into a browser.
- Click on the "/queries" link to look at the list of queries.
- Click on the "/sessions" link to look at the list of sessions and related information.
- Click on the "/logs" link to look for errors.
- Option 2:
- Use kubernetes to exec onto that host.
- Find the kubernetes pod running the worker:
kubectl exec -it <container id> curl localhost:11050/queries > /tmp/queries.txt
so using the container id from step 3:
kubectl exec -it 5c9e6d107e3c curl localhost:11050/queries > /tmp/queries.txt
Do the same thing against the /sessions endpoint:
kubectl exec -it 5c9e6d107e3c curl localhost:11050/sessions > /tmp/sessions.txt
Look at the output of sessions.txt and queries.txt for the query that is causing the bottleneck.