Scalability-Evaluation.md 5.93 KB
Newer Older
Daniel Seybold's avatar
Daniel Seybold committed
1
### Evaluating the scalability of a DBMS cluster
Daniel Seybold's avatar
Daniel Seybold committed
2

Daniel Seybold's avatar
Daniel Seybold committed
3
4
5
6
7
8
9
------

Scalability scenarios deploy a DBMS cluster of *n* nodes  on the specified cloud resource and executes the specified workload against this single node cluster. 

The resulting metrics are throughput and latency in correlation to the applied cluster sizes to enable the scalability computation. 

------
Daniel Seybold's avatar
Daniel Seybold committed
10
11
12
13
14
15



#### Supported DBMS

- Cassandra
Daniel Seybold's avatar
Daniel Seybold committed
16
- CockroachDB
Daniel Seybold's avatar
Daniel Seybold committed
17
18
- Couchbase
- Elasticsearch
Daniel Seybold's avatar
Daniel Seybold committed
19
20
21
- MongoDB 
  - sharded: no replica sets, sharded data across all nodes
  - replicated: 1 data node and n replica sets
Daniel Seybold's avatar
Daniel Seybold committed
22
23
24
- Riak
- Yugabyte (beta)

Daniel Seybold's avatar
Daniel Seybold committed
25
26
27
28
29
30
31
32
##### DBMS Deployment Model
- the DBMS cluster deployment model is separated into three components
- databaseSeedComponent define the seed node to be deployed first and all remaining nodes will register at this node to join the cluster
  - instances = 1 **(mandatory!)**
- databaseDataComponent defines the data nodes that join the cluster via the seed node
  - instances >= 1 (seed + data nodes = total cluster size of nodes serving client requests )
- databaseManagementComponent are DBMS specific (currently only MongoDB and Yugabyte require these components ) and handle management task such as request routing
  - instances = 1 **(mandatory!)**
Daniel Seybold's avatar
Daniel Seybold committed
33
34
- ReplicationFactor -> 1..clustersize-1

Daniel Seybold's avatar
Daniel Seybold committed
35
Additional Deployment Parameters
Daniel Seybold's avatar
Daniel Seybold committed
36
37
38

#### Supported Workloads

Daniel Seybold's avatar
Daniel Seybold committed
39
40
41
42
43
44
45
46
- ReplicationFactor defines the total number of replicas in the cluster, range:  >= 1 && <= cluster size
- DataMemory specifies the data memory limitation of the specified DBMS, setting it to 0 will use the DBMS default value
- IndexMemory specifies the index memory limitation of the specified DBMS, setting it to 0 will use the DBMS default value

------



Daniel Seybold's avatar
Daniel Seybold committed
47
48
49
50
51
52
53
54
55
56
57
58
#### YCSB
- Sensor Storage (YCSB write only)
- YCSB (multi-phase: write + CRUD phase)

##### Important YCSB Parameters
- *maxExecutionTime* defines the runtime of the evaluation in seconds
- *recordCount* defines the number of records to be inserted in the DBMS, i.e. the total number of operations in a sensor storage scenario
- *operations* defines the total number of operations to be executed in a multi-phase scenario
- *fieldLength* defines the size of a record in bytes * 10 (for number of items per record)
- DBMS specific Read/Write consistency settings, please check the [YCSB DBMS bindings](https://github.com/brianfrankcooper/YCSB) for more details 

#### TPC-C
Daniel Seybold's avatar
Daniel Seybold committed
59
60
61
62
- TPC-C implementation currently only supported by CockroachDB (by Cockroach loadgen)
- for specific properties check the [loadgen documentation](https://github.com/cockroachdb/loadgen)

------
Daniel Seybold's avatar
Daniel Seybold committed
63
64
65
66
67
68
69
70
71



#### Supported Clouds

- OpenStack V2
- OpenStack V3
- Amazon EC2

Daniel Seybold's avatar
Daniel Seybold committed
72
73
------

Daniel Seybold's avatar
Daniel Seybold committed
74
75
76
77


#### Examples

Daniel Seybold's avatar
Daniel Seybold committed
78
79
For getting started please have a look at the scalability [example templates](../examples/scalability)

80
Mowgli automates the following evaluation workflow:
Daniel Seybold's avatar
Daniel Seybold committed
81

82
![scalability evaluation process](..\misc\evaluation_process_scalability.png)
Daniel Seybold's avatar
Daniel Seybold committed
83
84
85
86
87
88
89

#### Evaluation Scenario Workflow

The evaluation scenario execution comprises the following steps:

1. deploy *DBMS_X* on cloud resources *CR_Y*

90
   1.1 select an example template from the [scalability examples](../examples/scalability) 
Daniel Seybold's avatar
Daniel Seybold committed
91
92
93
94
95
96
97

   1.2 get a VM template as described here and replace the TODOs for the resource object

   1.3 replace the TODO for the workload instance with the public IP of the prepared Workload-API instance (e.g. the Mowgli VM or a separate VM if applicable). You can add multiple IPs in a comma separated list if you want to use multiple Workload-API instances 

2. execute workload *W_Z* 

98
   2.1 For **YCSB Load Scenario** open http://MOWGLI_IP:8282/explorer/#!/scalability/scenarioScalabilityPost or for the **YCSB Multi Phase Scenario** open http://MOWGLI_IP:8282/explorer/#!/scalability/scenarioScalabilityMultiPhasePost
Daniel Seybold's avatar
Daniel Seybold committed
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139

   2.2 Copy the prepared scenario template into the body form

   2.3 fill in the remaining fields

   2.4 execute the scenario by clicking on the **TRY** button

3. monitor the evaluation process

   3.1 check the evaluation-orchestrator logs via the Portainer dashboard: http://MOWGLi_IP:9001/#/containers , user: admin, PW: mowgli19

   3.2 follow the system metrics via the Chronograf dashboard: http://MOWGLi_IP:8888

4. Check the results after the evaluation has finished under `/opt/evaluation-results` as described in the following.



------

#### Evaluation Results 

All evaluation results are stored to the file system of the host that runs the Mowgli Framework in the following structure

```
opt
 |_evaluation-results
  |_SCENARIO
   |_CLOUD
    |_DBMS
     |_CONFIG
      |_RUN_X
       |_data         # contains raw evaluation data 
       |_monitoring	  # contains system usage plots
       |_specs        # contains the applied templates
       |_taskLogs     # additional logs
       |_timeseries	  # throughput plot of the evaluation run
      |_plots         # contains aggregated evaluation data over all runs (manual 	                             processing steps required)
			
			
```

140
141
142
143
144
145
##### Evaluation Scenario Specs

Under the `specs` folder you will find the applied DBMS and cloud resources under the `dbmsSpec.json` and the applied workload specs as JSON files.

In addition you can will find the `resourceMapping.json` that contains a mapping of the applied abstract VM template to the concrete properties of the selected cloud provider.

Daniel Seybold's avatar
Daniel Seybold committed
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
##### Workload Metrics

The `data` folder contains the raw evaluation results of the selected workload  

**YCSB**

The `data` folder contains the raw evaluation results of the load phase in the load.txt and the CRUD (or transaction phase in YCSB context) in the transaction phase.   

By default the plots for throughput and latency are generated under the timeseries folder. 

In addition system metric plots of the Workload-API instances and the DBMS nodes are available under the monitoring folder.  

**TPC-C**

The `data` folder contains the raw evaluation results. 

Currently no advanced plotting support is available.  

In addition system metric plots of the Workload-API instances and the DBMS nodes are available under the monitoring folder.