In this getting started guide, the Data Flow Server is run as a standalone application outside the Kubernetes cluster. A future version will allow the Data Flow Server itself to run on Kubernetes.
Deploy a Kubernetes cluster.
The Kubernetes Getting Started guide lets you choose among many deployment options so you can pick one that you are most comfortable using. We have successfully used the Vagrant option from a downloaded Kubernetes release.
Of note, the docker-compose-kubernetes is not among those options, but it was also used by the developers of this project to run a local Kubernetes cluster using Docker Compose.
The rest of this getting started guide assumes that you have a working Kubernetes cluster and a kubectl
command line.
Create a Kafka service on the Kubernetes cluster.
The Kafka service will be used for messaging between modules in the stream. There are sample replication controller and service YAML files in the spring-cloud-dataflow-server-kubernetes
repository that you can use as a starting point as they have the required metadata set for service discovery by the modules.
$ git clone https://github.com/spring-cloud/spring-cloud-dataflow-server-kubernetes $ cd spring-cloud-dataflow-server-kubernetes $ kubectl create -f src/etc/kubernetes/kafka-controller.yml $ kubectl create -f src/etc/kubernetes/kafka-service.yml
You can use the command kubectl get pods
to verify that the controller is running. Note that it can take a minute or so until there is an external IP address for the kafka server. Use the command kubectl get services
to check on the state of the service and look for when there is a value under the EXTERNAL_IP column. Use the commands kubectl delete svc kafka
and kubectl delete rc kafka
to clean up afterwards.
Determine the location of your Kubernetes Master URL, for example:
$ kubectl cluster-info Kubernetes master is running at https://10.245.1.2 ...other output omitted...
Export environment variables to connect to Kubernetes.
The Data Flow Server uses the fabric8 Java client library to connect to the Kubernetes cluster. It can be configured using system properties, environment variables, and the Kube config file. In testing using the Google Container Engine, only setting the environment variables KUBERNETES_MASTER
and KUBERNETES_NAMESPACE
were required. Other configuration values were read from the Kube config file.
$ export KUBERNETES_MASTER=https://10.245.1.2/ $ export KUBERNETES_NAMESPACE=default
This approach supports using one Data Flow Server instance per Kubernetes namespace.
Run a local Redis server.
$ cd <redis-install-dir> $ ./src/redis-server
This is used by the locally running Data Flow Server to store the state of registered stream app module URIs to be used for stream definitions.
Download and run the Spring Cloud Data Flow Server for Kubernetes.
$ wget http://repo.spring.io/milestone/org/springframework/cloud/spring-cloud-dataflow-server-kubernetes/1.0.0.M2/spring-cloud-dataflow-server-kubernetes-1.0.0.M2.jar $ java -jar spring-cloud-dataflow-server-kubernetes-1.0.0.M2.jar --spring.cloud.deployer.kubernetes.memory=768Mi
![]() | Note |
---|---|
We haven’t tuned the memory use of the OOTB apps yet, so to be on the safe side we are increasing the memory for the pods by providing the following property: |
![]() | Note |
---|---|
If you are running Kubernetes using vagrant locally, then you might need to increase the CPU for the deployed apps using the following property: |
Ensure that the Data Flow Server is running in the same terminal session that has the Kubernetes environment variables set.
Download and run the Spring Cloud Data Flow shell.
$ wget http://repo.spring.io/milestone/org/springframework/cloud/spring-cloud-dataflow-shell/1.0.0.M3/spring-cloud-dataflow-shell-1.0.0.M3.jar $ java -jar spring-cloud-dataflow-shell-1.0.0.M3.jar
Register the Kafka version of the time
and log
app modules using the shell
dataflow:>module register --type source --name time --uri docker:springcloudstream/time-source-kafka dataflow:>module register --type sink --name log --uri docker:springcloudstream/log-sink-kafka
Deploy a simple stream in the shell
dataflow:>stream create --name ticktock --definition "time | log" --deploy
You can use the command kubectl get pods
to check on the state of the pods corresponding to this stream. We can run this from the shell by running it as an OS command by adding a "!" before the command.
dataflow:>! kubectl get pods command is:kubectl get pods NAME READY STATUS RESTARTS AGE kafka-d207a 1/1 Running 0 50m ticktock-log-qnk72 1/1 Running 0 2m ticktock-time-r65cn 1/1 Running 0 2m
Look at the logs for the pod deployed for the log sink.
$ kubectl logs -f ticktock-log-qnk72 ... 2015-12-28 18:50:02.897 INFO 1 --- [ main] o.s.c.s.module.log.LogSinkApplication : Started LogSinkApplication in 10.973 seconds (JVM running for 50.055) 2015-12-28 18:50:08.561 INFO 1 --- [hannel-adapter1] log.sink : 2015-12-28 18:50:08 2015-12-28 18:50:09.556 INFO 1 --- [hannel-adapter1] log.sink : 2015-12-28 18:50:09 2015-12-28 18:50:10.557 INFO 1 --- [hannel-adapter1] log.sink : 2015-12-28 18:50:10 2015-12-28 18:50:11.558 INFO 1 --- [hannel-adapter1] log.sink : 2015-12-28 18:50:11
![]() | Note |
---|---|
If you need to be able to connect from outside of the Kubernetes cluster to an app that you deploy, like the |
To register the http-source
, deploy it so you can post data to it you can use the following commands:
dataflow:>module register --type source --name http --uri docker:springcloudstream/http-source-kafka dataflow:>stream create --name test --definition "http | log" dataflow:>stream deploy test --properties "module.http.spring.cloud.deployer.kubernetes.createLoadBalancer=true"
Now, look up the external IP address for the http
app (it can sometimes take a minute or two for the external IP to get assigned):
dataflow:>! kubectl get service command is:kubectl get service NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE kafka 10.103.240.92 <none> 9092/TCP 7m kubernetes 10.103.240.1 <none> 443/TCP 4h test-http 10.103.251.157 130.211.200.96 8080/TCP 58s test-log 10.103.240.28 <none> 8080/TCP 59s zk 10.103.247.25 <none> 2181/TCP 7m
Next, post some data to the test-http
app:
dataflow:>http post --target http://130.211.200.96:8080 --data "Hello"
Finally, look at the logs for the test-log
pod:
dataflow:>! kubectl get pods command is:kubectl get pods NAME READY STATUS RESTARTS AGE kafka-o20qq 1/1 Running 0 9m test-http-9obkq 1/1 Running 0 2m test-log-ysiz3 1/1 Running 0 2m dataflow:>! kubectl logs test-log-ysiz3 command is:kubectl logs test-log-ysiz3 ... 2016-04-27 16:54:29.789 INFO 1 --- [ main] o.s.c.s.b.k.KafkaMessageChannelBinder$3 : started inbound.test.http.test 2016-04-27 16:54:29.799 INFO 1 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Starting beans in phase 0 2016-04-27 16:54:29.799 INFO 1 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Starting beans in phase 2147482647 2016-04-27 16:54:29.895 INFO 1 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 8080 (http) 2016-04-27 16:54:29.896 INFO 1 --- [ kafka-binder-] log.sink : Hello
A useful command to help in troubleshooting issues, such as a container that has a fatal error starting up, add the options --previous
to view last terminated container log. You can also get more detailed information about the pods by using the kubctl describe
like:
kubectl describe pods/ticktock-log-qnk72
Destroy the stream
dataflow:>stream destroy --name ticktock
![]() | Warning |
---|---|
If you stop and restart the Data Flow Server when streams are deployed, you will not be able to destroy them via shell commands. You would have to destroy the services and replication containers using the |