Commits · 90932e7765257f81e0fef8f821e20f7739aa19dc · mowgli / benchmarks / Mowgli-TSBS

28 Aug, 2019 25 commits

Use correct tag value types for jsonb tagset value · 90932e77

Ante Kresic authored 5 years ago

Tags used to be only string values but with recent changes, they get
their types correctly inserted into the schema. This commit does that
for jsonb tagsets, using number values for corresponding values and
strings for everything else.

90932e77

Update stationary trucks Influx query · 332cd591

Ante Kresic authored 5 years ago

This change should put the query on equal footing with the Timescale
version of the query because the time interval handling was pushed to
the query engine but now it should be handled in query generation.

332cd591

Optimize Timescale queries for IoT use case · 97fef768

Ante Kresic authored 5 years ago

This is a second pass at optimizing the queries, mainly includes
filtering out trucks with NULL valued names for better performance.

97fef768

Change time format for Influx queries to RFC3339 · 721efe57

Ante Kresic authored 5 years ago

The time format used for Influx was the same as RFC3339, so rather than
redefining that format ourselves we just use the provided constant from
the `time` package.

721efe57

Change HTTP method when issuing InfluxDB queries · d8ca07ca

Blagoj Atanasovski authored 5 years ago

The GET method allows only read-only queries to be issued, for
the iot use case we need to issue a SELECT INTO query. The POST
method of the InfluxDB API accepts both write and read-only queries.

d8ca07ca

Add query generators for Influx database for IoT use case · 741dcad0
Ante Kresic authored 5 years ago
```
Added first pass of Influx queries to query generator with
accompanying unit tests.
```
741dcad0
Cassandra and MongoDB serializers panic if they encounter a non-string tag · 0ba094ca
Blagoj Atanasovski authored 5 years ago

0ba094ca
Fix serialization of non-text tags in Mongo · ecefbb54
Blagoj Atanasovski authored 5 years ago

ecefbb54

Add the tag types to the header of an input file · 53fd8a69

Blagoj Atanasovski authored 5 years ago

Tests are written to cover the new code and the file format
for ClickHouse and Postgres/TimescaleDB is updated to have the
tag types in the header

53fd8a69

Modify data generation to have tags of any type · f502e19c

Blagoj Atanasovski authored 5 years ago

Changes are done to the Point interface where tags are
no longer just strings, and serializers for the different databases
are modified to serialize the tags depending on their type.
For TimescaleDB and Postgres it doesn't matter which type is it,
for InfluxDB the tag is serialized as a field.

f502e19c

Fix current load starting point for diagnostics measurement · ea1cdf26

Ante Kresic authored 5 years ago

NormFloat64 returns a value in the range of [-math.MaxFloat64,
+math.MaxFloat64] which means the current load value can be negative
which it should not be. Changing that function to be Float64 which should
return a value in the range of [0.0,1.0).

ea1cdf26

Fix debug and print respones flags for Influx run queries command · fee58cf9
Ante Kresic authored 5 years ago
```
This change fixes aforementioned flags whose implementations were
commented out and not working properly.
```
fee58cf9

Fix insert interval duration flag description for data generator · c23f7268

Ante Kresic authored 5 years ago

Previous description referred hosts which are relevant only for the
devops use case. With new use cases added, description had to be changed
to include them as well.

c23f7268

Add ability for loader worker to sleep between inserts · 846c9c3b

Blagoj Atanasovski authored 5 years ago

A loading worker can be configured to have a minimum interval
between two batches being inserted. Configuration is optional,
if not configured batches are inserted ASAP. Intervals are expressed
in seconds.
Interval can also be configured to support range sleeping intervals.
The sleep regulator is in charge of putting the calling goroutine to
sleep

846c9c3b

Fix jsonb null value when inserting missing entry values · 628c40d6

Ante Kresic authored 5 years ago

Previous implementation used the value `NULL` when inserting missing
values into jsonb column. This commit fixes that to the correct value
`null`.

628c40d6

Add IoT use case query generators for Timescale database · 863cc13c

Ante Kresic authored 5 years ago

Adding the necessary boilerplate for supporting the queries of the new
IoT use case and implementing the first versions of them. A single pass
of optimizations has been done but more optimization passes are needed
by a TimescaleDB query specialist.

863cc13c

Remove strings.ReplaceAll usage · 2cc0fa8c

Rob Kiefer authored 5 years ago

Since strings.ReplaceAll was only added in 1.12, it is not usable
for all the Go versions we currently support (1.9+). Even if we
were to drop some of the older versions we support, like 1.9 or 1.10,
it still would not compile on versions < 1 year old. So for now,
we'll use the old way.

2cc0fa8c

Modify the influx data serializer to skip missing values · 650094a5

Blagoj Atanasovski authored 5 years ago

With the introduction of the possibility of missing values
in the IoT use case, the serializer for InfluxDB needs to be made
aware and skip those tags and fields in order to generate proper
Line Protocol inserts

650094a5

Update fuel max value to 1.0 and add refueling when empty · 1a09cdc3

Ante Kresic authored 5 years ago

Fuel state range changed from 0.0 - 100.0 to 0.0 - 1.0 to better reflect
real world (car gauges report from Empty to Full). We also add refueling
the trucks when the state goes under minimum value.

1a09cdc3

Refactor query generator to support multiple use cases · 65be6b3d

Ante Kresic authored 5 years ago

Previous implementation assumed devops use case by default. Here we
refactor to support multiple use cases and add the initial iot use case
query generator for timescaledb database.

65be6b3d

Use correct field count for batch configurator · a084cc02
Ante Kresic authored 5 years ago

a084cc02

Add support for empty field values for TimescaleDB data loader · 8b598cdd

Ante Kresic authored 5 years ago

IoT data can contain empty field and tag values. We need to support that
in the data loaders to be able to load the data correctly into the
database, in this case Timescale database. We also add some tests to
verify that empty field and tag values are stored correctly.

8b598cdd

Add irregularity to data generation for IoT use case · d563bfa2

Ante Kresic authored 5 years ago

IoT data sets contain a lot of irregularities like lots of gaps, out of
order entries, missing entries, zero values etc. This change updates the
data generator so it can create data sets which contain these features
in a deterministic way.

d563bfa2

Add IoT use case data generator · f5ad979b

Ante Kresic authored 5 years ago

This first version of the data generator behaves similarly as the devops
use case and does not contain any data irregularity features which will
be added in a future commit.

f5ad979b

Refactor measurements and common functions for data generation · 8793173a

Ante Kresic authored 5 years ago

This improves code quality by extracting the common parts of the logic
which can be reused for multiple use cases. First step in to creating
data generators for the next use case.

8793173a

26 Jul, 2019 1 commit

Improve Timescale query timing by confirming query completion · 57922da0

Ante Kresic authored 5 years ago

When using the `pgx` sql driver, running the query does not wait for a
response from the server. In order to verify that the query has returned
complete results, we must run `Rows.Next()` until it returns false
meaning we have fetched all the rows from the query result. Note that
this behavior is different than the current implementation of the `pq`
driver.

57922da0

15 Jul, 2019 1 commit

Fix argument name · a8aec49d

Stephen Polcyn authored 5 years ago

Previously, the -n flag sent its data to the "max-queries" variable, which results in an unknown variable name when running the script because the python variable used to generate the run script is 'limit' (see line 163). "Max-queries" is only applicable as a flag for the tsbs_run_queries script, i.e., "--max-queries=###".

a8aec49d

17 Jun, 2019 1 commit
- Add CrateDB database support · 3adcb558
  Ruslan Kovalov authored 5 years ago
```
This includes support for data generation and querying for
the devops use case.
```
  3adcb558
28 May, 2019 1 commit

Make statProcessor an interface and create tests for BenchRunner · ad198f45

Blagoj Atanasovski authored 5 years ago

The statProcessor responsible for gathering statistics
when executing queries was built as a struct. This commit
will change it to an interface to make the BenchmarkRunner code
more easier to test. This commit also adds some unit tests for
the benchmark runner that check if proper argument checks are
done, and if proper init happens when the Run method is called

ad198f45

24 May, 2019 1 commit

Add unit tests to query generation functions for various databases · f7b8830f

Ante Kresic authored 5 years ago

Covering query generation functions for Influx, ClickHouse and SiriDB
databases. Tests are covering basic pre-generated outputs and provide
visual sanity checks. More robust tests are left as a future task.

f7b8830f

22 May, 2019 3 commits

Rename DevopsGenerator to QueryGenerator · a9858758

Rob Kiefer authored 5 years ago

This interface is not tied to the devops use case in any way, so
its naming was a misnomer. It is actually generic and can be used
for any use case since, so this renaming reflects that.

a9858758

Refactor generate_queries devops case to use errs · ff6e61ab

Rob Kiefer authored 5 years ago

Previously the devops use case generation code used a call to
log.Fatalf when something went wrong. This makes it awkward to test
error conditions when generating queries from other packages, since
we need a way to (a) replace the unexported call to log.Fatalf and
(b) prevent the runtime from actually quitting.

It is better for the library to actually return errors on calls
that can fail, rather than either fataling or panicking. Now other
packages can handle the errors themselves and also test error
conditions in their packages as well.

This refactor was pruned a bit to bubble the 'panic' up one level
for now. When the actual generation code encounters the error
during normal execution, it will panic. But these are easier to
test for and don't require adding hooks to replace the 'fatal'
path in the original package.

ff6e61ab

Combine both TimeInterval types into single type · 3cad7a88

Rob Kiefer authored 5 years ago

Query generation and Cassandra's query running both used a type
called TimeInterval that did roughly the same thing. This change
combines the two into one type that can be used from the utils
package in internal/. This improves code reuse and keeps the two
representations in sync, and also increases the testability of
the code.

3cad7a88

20 May, 2019 1 commit

Add unit tests to increase coverage for query generation · 3cfaae09

Ante Kresic authored 5 years ago

For now the tests are mainly matching the output against pre-generated/known
outputs to ensure we have some coverage. A more robust checking, e.g., making
sure the semantics of the query are actually correct, is a future task.

3cfaae09

25 Apr, 2019 1 commit

Fix parsing columns when --do-create-db=false · 19932e72

Lee Hampton authored 5 years ago

This fixes a bug where the PostCreateDB function would exit early when
the user set --do-create-db=false and/or --create-metrics-table=False.
This early exit caused TSBS to skip the updating of some global caches,
which broke assumptions in other parts of the codebase.

This commit also refactors the PostCreateDB function to split the
parsing of columns and the potential creation of tables and indexes into
separate functions. This makes it easier to test the functions in
isolation and cleaner to create the conditional create-table logic that is
at the heart of this bug.

While this does add tests to the parsing function, the create
tables/index function remains untested. This is left for a later PR that
will hopefully clean up global state and provide a more comprehensive
framework for testing IO.

19932e72

18 Apr, 2019 1 commit

Fix additional tags to work with pgx · addc7918

Rob Kiefer authored 5 years ago

Unlike libpq/sqlx, pgx expects JSON/B fields in the copy command to
be in the 'native' format, which is a map[string]interface{}, not a
string in valid JSON format. Without this change, the copy would
fail with "ERROR: unsupported jsonb version number 123".

addc7918

09 Apr, 2019 1 commit

Rewrite tsbs_generate_queries to use internal/inputs · feb417e0

Rob Kiefer authored 5 years ago

This PR continues on the work in the previous one that changed
tsbs_generate_data to use a new internal/inputs package. This PR
adds a new Generator type for query generation called QueryGenerator.

Now that these two generators share some common code, they both
become much more robust and easier to test and manage. Previously
tsbs_generate_queries had no test coverage, but with this change it
will actually have quite high coverage.

There are still some rough spots with this refactor. In particular,
how the useCaseMatrix is handled needs some more thought, especially
if we are going to add more use cases going forward. Additionally,
the database specific flags like TimescaleUseJSON could probably
be handled in a cleaner way as well.

feb417e0

04 Apr, 2019 1 commit

Add internal/inputs package and rewrite tsbs_generate_data · b5815287

Rob Kiefer authored 5 years ago

For a long time, our two generation binaries -- tsbs_generate_data
and tsbs_generate_queries -- have shared (roughly) a fair bit of
code, especially when it comes to flags and validation. However,
they were never truly in sync and combining them has been a long
wanted to-do. Similarly, to enable better tooling around TSBS, it
would be beneficial if more of its functionality was exposed as a
library instead of a CLI that needs to be called.

To those ends, this PR is a first step in addressing both of them.
It introduces the internal/inputs package, which can eventually be
moved to pkg/inputs when we are ready for other things to consume
its API. This package will contain the structs, interfaces, and
functions for generating 'input' to other TSBS tools. For now, that
only includes generating data files (for tsbs_load_* binaries) and
query files (for tsbs_run_queries_* binaries). This PR starts by
introducing these building blocks and converting tsbs_generate_data
to use it.

The idea is that each type of input (e.g., data, queries) is handled
by a Generator, which is customized by a GeneratorConfig. The config
contains fields such as the PRNG seed, number of items to generate,
etc, which are used by the Generator to control the output. These
GeneratorConfigs come with a means of easily adding their fields
to a flag.FlagSet, making them work well with CLIs while also not
restricting their use to only CLIs. Once configured, this
GeneratorConfig is passed to a Generator, which then produces the
output.

This design has a few other nice features to help cleanup TSBS.
One, it uses an approach of bubbling up errors and passing them
back to the caller, allowing for more graceful error handling. CLIs
can output them to the console, while other programs using the
library can pass them to another error handling system if they
desire. Two, Generators should be designed with an Out field that
allows the caller to point to any io.Writer it wants -- not
just the console or a file.

The next step will be to convert tsbs_generate_queries to use this
as well, which will be done in a follow up PR.

b5815287

28 Mar, 2019 1 commit

Add binary format support · f07cde40

niksa authored 5 years ago

Using binary format when talking to TimescaleDB
means less data being sent back and forth.
Config option is added to force TEXT format if needed
(binary is default). PGX driver is used for
binary and PQ driver for TEXT. Based on some benchmarks
binary should increase write throughput by 5-10% and
result in about 35% faster queries.

f07cde40

20 Mar, 2019 1 commit

Added connection DB flag to tsbs_load_timescaledb. Flag description is a bit · d360ad1e

Ante Kresic authored 5 years ago

on the longer side but it makes a clear point not to confuse it with a
similar flag used for specifying the benchmark database name.

d360ad1e