4. Deploying Streams on Kubernetes

In this getting started guide, the Data Flow Server is run as a standalone application outside the Kubernetes cluster. A future version will allow the Data Flow Server itself to run on Kubernetes.

  1. Deploy a Kubernetes cluster.

    The Kubernetes Getting Started guide lets you choose among many deployment options so you can pick one that you are most comfortable using. We have successfully used the Vagrant option from a downloaded Kubernetes release.

    Of note, the docker-compose-kubernetes is not among those options, but it was also used by the developers of this project to run a local Kubernetes cluster using Docker Compose.

    The rest of this getting started guide assumes that you have a working Kubernetes cluster and a kubectl command line.

  2. Create a Kafka service on the Kubernetes cluster.

    The Kafka service will be used for messaging between modules in the stream. There are sample replication controller and service YAML files in the spring-cloud-dataflow-server-kubernetes repository that you can use as a starting point as they have the required metadata set for service discovery by the modules.

    $ git clone https://github.com/spring-cloud/spring-cloud-dataflow-server-kubernetes
    $ cd spring-cloud-dataflow-server-kubernetes
    $ kubectl create -f src/etc/kubernetes/kafka-controller.yml
    $ kubectl create -f src/etc/kubernetes/kafka-service.yml

    You can use the command kubectl get pods to verify that the controller is running. Note that it can take a minute or so until there is an external IP address for the kafka server. Use the command kubectl get services to check on the state of the service and look for when there is a value under the EXTERNAL_IP column. Use the commands kubectl delete svc kafka and kubectl delete rc kafka to clean up afterwards.

  3. Determine the location of your Kubernetes Master URL, for example:

    $ kubectl cluster-info
    
    Kubernetes master is running at https://10.245.1.2
    
       ...other output omitted...
  4. Export environment variables to connect to Kubernetes.

    The Data Flow Server uses the fabric8 Java client library to connect to the Kubernetes cluster. It can be configured using system properties, environment variables, and the Kube config file. In testing using the Google Container Engine, only setting the environment variables KUBERNETES_MASTER and KUBERNETES_NAMESPACE were required. Other configuration values were read from the Kube config file.

    $ export KUBERNETES_MASTER=https://10.245.1.2/
    $ export KUBERNETES_NAMESPACE=default

    This approach supports using one Data Flow Server instance per Kubernetes namespace.

  5. Run a local Redis server.

    $ cd <redis-install-dir>
    $ ./src/redis-server

    This is used by the locally running Data Flow Server to store the state of registered stream app module URIs to be used for stream definitions.

  6. Download and run the Spring Cloud Data Flow Server for Kubernetes.

    $ wget http://repo.spring.io/milestone/org/springframework/cloud/spring-cloud-dataflow-server-kubernetes/1.0.0.M2/spring-cloud-dataflow-server-kubernetes-1.0.0.M2.jar
    
    $ java -jar spring-cloud-dataflow-server-kubernetes-1.0.0.M2.jar --spring.cloud.deployer.kubernetes.memory=768Mi
    [Note]Note

    We haven’t tuned the memory use of the OOTB apps yet, so to be on the safe side we are increasing the memory for the pods by providing the following property: --spring.cloud.deployer.kubernetes.memory=768Mi

    [Note]Note

    If you are running Kubernetes using vagrant locally, then you might need to increase the CPU for the deployed apps using the following property: --spring.cloud.deployer.kubernetes.cpu=1

    Ensure that the Data Flow Server is running in the same terminal session that has the Kubernetes environment variables set.

  7. Download and run the Spring Cloud Data Flow shell.

    $ wget http://repo.spring.io/milestone/org/springframework/cloud/spring-cloud-dataflow-shell/1.0.0.M3/spring-cloud-dataflow-shell-1.0.0.M3.jar
    
    $ java -jar spring-cloud-dataflow-shell-1.0.0.M3.jar
  8. Register the Kafka version of the time and log app modules using the shell

    dataflow:>module register --type source --name time --uri docker:springcloudstream/time-source-kafka
    dataflow:>module register --type sink --name log --uri docker:springcloudstream/log-sink-kafka
  9. Deploy a simple stream in the shell

    dataflow:>stream create --name ticktock --definition "time | log" --deploy

    You can use the command kubectl get pods to check on the state of the pods corresponding to this stream. We can run this from the shell by running it as an OS command by adding a "!" before the command.

    dataflow:>! kubectl get pods
    command is:kubectl get pods
    NAME                  READY     STATUS    RESTARTS   AGE
    kafka-d207a           1/1       Running   0          50m
    ticktock-log-qnk72    1/1       Running   0          2m
    ticktock-time-r65cn   1/1       Running   0          2m

    Look at the logs for the pod deployed for the log sink.

    $ kubectl logs -f ticktock-log-qnk72
    ...
    2015-12-28 18:50:02.897  INFO 1 --- [           main] o.s.c.s.module.log.LogSinkApplication    : Started LogSinkApplication in 10.973 seconds (JVM running for 50.055)
    2015-12-28 18:50:08.561  INFO 1 --- [hannel-adapter1] log.sink                                 : 2015-12-28 18:50:08
    2015-12-28 18:50:09.556  INFO 1 --- [hannel-adapter1] log.sink                                 : 2015-12-28 18:50:09
    2015-12-28 18:50:10.557  INFO 1 --- [hannel-adapter1] log.sink                                 : 2015-12-28 18:50:10
    2015-12-28 18:50:11.558  INFO 1 --- [hannel-adapter1] log.sink                                 : 2015-12-28 18:50:11
    [Note]Note

    If you need to be able to connect from outside of the Kubernetes cluster to an app that you deploy, like the http-source, then you can provide a deployment property of spring.cloud.deployer.kubernetes.createLoadBalancer=true for the app module to specify that you want to have a LoadBalancer with an external IP address created for your app’s service.

    To register the http-source, deploy it so you can post data to it you can use the following commands:

    dataflow:>module register --type source --name http --uri docker:springcloudstream/http-source-kafka
    dataflow:>stream create --name test --definition "http | log"
    dataflow:>stream deploy test --properties "module.http.spring.cloud.deployer.kubernetes.createLoadBalancer=true"

    Now, look up the external IP address for the http app (it can sometimes take a minute or two for the external IP to get assigned):

    dataflow:>! kubectl get service
    command is:kubectl get service
    NAME         CLUSTER-IP       EXTERNAL-IP      PORT(S)    AGE
    kafka        10.103.240.92    <none>           9092/TCP   7m
    kubernetes   10.103.240.1     <none>           443/TCP    4h
    test-http    10.103.251.157   130.211.200.96   8080/TCP   58s
    test-log     10.103.240.28    <none>           8080/TCP   59s
    zk           10.103.247.25    <none>           2181/TCP   7m

    Next, post some data to the test-http app:

    dataflow:>http post --target http://130.211.200.96:8080 --data "Hello"

    Finally, look at the logs for the test-log pod:

    dataflow:>! kubectl get pods
    command is:kubectl get pods
    NAME              READY     STATUS             RESTARTS   AGE
    kafka-o20qq       1/1       Running            0          9m
    test-http-9obkq   1/1       Running            0          2m
    test-log-ysiz3    1/1       Running            0          2m
    dataflow:>! kubectl logs test-log-ysiz3
    command is:kubectl logs test-log-ysiz3
    ...
    2016-04-27 16:54:29.789  INFO 1 --- [           main] o.s.c.s.b.k.KafkaMessageChannelBinder$3  : started inbound.test.http.test
    2016-04-27 16:54:29.799  INFO 1 --- [           main] o.s.c.support.DefaultLifecycleProcessor  : Starting beans in phase 0
    2016-04-27 16:54:29.799  INFO 1 --- [           main] o.s.c.support.DefaultLifecycleProcessor  : Starting beans in phase 2147482647
    2016-04-27 16:54:29.895  INFO 1 --- [           main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 8080 (http)
    2016-04-27 16:54:29.896  INFO 1 --- [  kafka-binder-] log.sink                                 : Hello

    A useful command to help in troubleshooting issues, such as a container that has a fatal error starting up, add the options --previous to view last terminated container log. You can also get more detailed information about the pods by using the kubctl describe like:

    kubectl describe pods/ticktock-log-qnk72
  10. Destroy the stream

    dataflow:>stream destroy --name ticktock
    [Warning]Warning

    If you stop and restart the Data Flow Server when streams are deployed, you will not be able to destroy them via shell commands. You would have to destroy the services and replication containers using the kubectl command. This is a bug that is being addressed in a future release.