CBDOCLOADER

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
HOST FORMATS
EXAMPLES
SAMPLE DATA FORMAT
ENVIRONMENT AND CONFIGURATION VARIABLES
SEE ALSO
CBDOCLOADER

NAME

cbdocloader − Loads sample datasets into Couchbase

SYNOPSIS

cbdocloader [−−cluster <host>] [−−username <username>] [−−password <password>]
[−−bucket <bucket>] [−−bucket−quota <quota>] [−−dataset <path>]
[−−thread <num>] [−−verbose]

DESCRIPTION

cbdocloader loads Couchbase sample datasets into Couchbase Server. Sample data sets are zip files provided by Couchbase that contain documents and index definitions. These datasets are meant for users to explore the various Couchbase features before loading in their own datasets.

OPTIONS

Below are a list of required and optional parameters for the cbdocloader command.

Required
−c,−−cluster <host>

The hostname of one of the nodes in the cluster to load data into. See the Host Formats section below for hostname specification details.

−u,−−username <username>

The username for cluster authentication. The user must have the appropriate privileges to create a bucket, write data and create view, secondary index and full-text index definitions.

−p,−−password <password>

The password for cluster authentication. The user must have the appropriate privileges to create a bucket, write data and create view, secondary index and full-text index definitions.

−b,−−bucket

The name of the bucket to create and load data into. If the bucket already exists then bucket creation is skipped and data is loaded into the existing bucket.

−m,−−bucket-quota

The amount of memory to assign to the buckets cache. If the bucket already exists then this parameter is ignored.

−d,−−dataset

The path to the sample dataset to load. The path can either refer to a zip file or a directory to load data from.

Optional
−t,−−threads <num>

Specifies the number of concurrent clients to use when loading data. Fewer clients means data loading will take longer, but there will be less cluster resources used. More clients means faster data loading, but at the cost of more cluster resource usage. This parameter defaults to 1 if it is not specified and it is recommended that this parameter is not set to be higher than the number of CPUs on the machine where the command is being run.

−v,−−verbose

Prints log messages to stdout. This flag is useful for debugging errors in the data loading process.

HOST FORMATS

When specifying a host for the cbdoclader command the following formats are expected:

couchbase://<addr>

<addr>:<port>

http://<addr>:<port>

It is recommended to use the couchbase://<addr> format for standard installations. The other two formats allow an option to take a port number which is needed for non-default installations where the admin port has been set up on a port other that 8091.

EXAMPLES

To load the dataset travel-sample.zip which is located at /opt/couchbase/samples/travel-sample.zip into a bucket with a memory quota of 1024MB we would run the following command.

$ cbdocloader -c couchbase://127.0.0.1 -u Administrator -p password -m 1024 \
-b travel-sample -d /opt/couchbase/samples/travel-sample.zip

If we want to increase the parallelism of data loading then we can increase the parallelism by using the threads option. In the example below we will use 4 threads.

$ cbdocloader -c couchbase://127.0.0.1 -u Administrator -p password -m 1024 \
-b travel-sample -d /opt/couchbase/samples/travel-sample.zip -t 4

The cbdocloader command can also be used to load data from a folder. This folder must contain files that correspond to the samples format. See the SAMPLE DATA FORMAT section below for more information on this format. Below is an example of how to load data from a folder /home/alice/census-sample

$ cbdocloader -c couchbase://127.0.0.1 -u Administrator -p password -m 1024 \
-b census-sample -d /home/alice/census-sample

SAMPLE DATA FORMAT

The cbdocloader command is used to load data from zip files or folders that correspond to the Couchbase sample data format. An example of this format is below.

+ sample_folder
+ design_docs
indexes.json
design_doc.json
+ docs
document1.json
document2.json
document3.json
document4.json

The top level directory can be given any name and will always contain two folders. The "design_docs" folder is where index definitions are kept. This folder will contain zero or more json files that contain the various indexes that should be created when the sample dataset is loaded. Global Secondary Indexes (GSI) should always be in a file named "indexes.json". Below is an example of the format for defining GSI indexes.

{
"statements": [
{
"statement": "CREATE PRIMARY INDEX on ‘bucket1‘",
"args": null
},
{
"statement": "CREATE INDEX by_type on ‘bucket1‘(name) WHERE _type=’User’"
"args": null
}
]
}

GSI indexes are defined as a JSON document where each index definition is contained in a list called "statements". Each element in the list is an object that contains two keys. The "statement" key contains that actual index definition and the "args" key is used if the statement contains any positional arguments.

All other files in the design_docs folder are used to define view design documents and each design document should be put into a separate file. These files can be named anything, but should always have the ".json" file extension. Below is an example of a view design document definition.

{
"_id": "_design/players"
"views": {
"total_experience": {
"map": "function(doc,meta){if(doc.jsonType ==
"reduce": "_sum"
},
"player_list": {
"map": "function (doc, meta){if(doc.jsonType ==
}
}
}

In the document above, the "_id" field is used to name the design document. This name should always be prefixed with "_design/". The other field in the top level of the document is the "views" field. This field contains a map of view definitions. The key for each element in the map is the name of the view. Each view must contain a "map" element that defines the map function and may also contain an optional "reduce" element that defines the reduce function.

View design documents support map-reduce views as well as spatial views. Below is an example of a spatial view definition. Spatial views follow similar rules as the map-reduce views above.

{
"_id": "_design/spatial"

"spatial": { "position": "<spatial view function definition>", "location": "<spatial view function definition>"

}
}

Note that spatial views only use a single function to define the index. As a result this function is defined as the value of the spatial views name.

The other folder at the top level directory of a sample data folder is the "docs" folder This folder will contain all of the documents to load into Couchbase. Each document in this folder is contained in a separate file and each file should contain a single JSON document. The key name for the document will be the name of the file. Each file should also have a ".json" file extension which will be removed from the key name when the data is loaded. Since each document to be loaded into Couchbase is in a separate file there can potentially be a large amount of files. The docs folder allows subfolders to help categorize documents.

ENVIRONMENT AND CONFIGURATION VARIABLES

CB_USERNAME - The username of the Couchbase cluster to connect to. CB_PASSWORD - The password of the Couchbase cluster to connect to.

SEE ALSO

cbimport(1), cbexport(1)

CBDOCLOADER

Part of the cbdocloader(1) suite