Commits · c23f7268f705956b12f9faa7dadacae88758dbdf · mowgli / benchmarks / Mowgli-TSBS

28 Aug, 2019 13 commits

Fix insert interval duration flag description for data generator · c23f7268

Ante Kresic authored 5 years ago

Previous description referred hosts which are relevant only for the
devops use case. With new use cases added, description had to be changed
to include them as well.

c23f7268

Add ability for loader worker to sleep between inserts · 846c9c3b

Blagoj Atanasovski authored 5 years ago

A loading worker can be configured to have a minimum interval
between two batches being inserted. Configuration is optional,
if not configured batches are inserted ASAP. Intervals are expressed
in seconds.
Interval can also be configured to support range sleeping intervals.
The sleep regulator is in charge of putting the calling goroutine to
sleep

846c9c3b

Fix jsonb null value when inserting missing entry values · 628c40d6

Ante Kresic authored 5 years ago

Previous implementation used the value `NULL` when inserting missing
values into jsonb column. This commit fixes that to the correct value
`null`.

628c40d6

Add IoT use case query generators for Timescale database · 863cc13c

Ante Kresic authored 5 years ago

Adding the necessary boilerplate for supporting the queries of the new
IoT use case and implementing the first versions of them. A single pass
of optimizations has been done but more optimization passes are needed
by a TimescaleDB query specialist.

863cc13c

Remove strings.ReplaceAll usage · 2cc0fa8c

Rob Kiefer authored 5 years ago

Since strings.ReplaceAll was only added in 1.12, it is not usable
for all the Go versions we currently support (1.9+). Even if we
were to drop some of the older versions we support, like 1.9 or 1.10,
it still would not compile on versions < 1 year old. So for now,
we'll use the old way.

2cc0fa8c

Modify the influx data serializer to skip missing values · 650094a5

Blagoj Atanasovski authored 5 years ago

With the introduction of the possibility of missing values
in the IoT use case, the serializer for InfluxDB needs to be made
aware and skip those tags and fields in order to generate proper
Line Protocol inserts

650094a5

Update fuel max value to 1.0 and add refueling when empty · 1a09cdc3

Ante Kresic authored 5 years ago

Fuel state range changed from 0.0 - 100.0 to 0.0 - 1.0 to better reflect
real world (car gauges report from Empty to Full). We also add refueling
the trucks when the state goes under minimum value.

1a09cdc3

Refactor query generator to support multiple use cases · 65be6b3d

Ante Kresic authored 5 years ago

Previous implementation assumed devops use case by default. Here we
refactor to support multiple use cases and add the initial iot use case
query generator for timescaledb database.

65be6b3d

Use correct field count for batch configurator · a084cc02
Ante Kresic authored 5 years ago

a084cc02

Add support for empty field values for TimescaleDB data loader · 8b598cdd

Ante Kresic authored 5 years ago

IoT data can contain empty field and tag values. We need to support that
in the data loaders to be able to load the data correctly into the
database, in this case Timescale database. We also add some tests to
verify that empty field and tag values are stored correctly.

8b598cdd

Add irregularity to data generation for IoT use case · d563bfa2

Ante Kresic authored 5 years ago

IoT data sets contain a lot of irregularities like lots of gaps, out of
order entries, missing entries, zero values etc. This change updates the
data generator so it can create data sets which contain these features
in a deterministic way.

d563bfa2

Add IoT use case data generator · f5ad979b

Ante Kresic authored 5 years ago

This first version of the data generator behaves similarly as the devops
use case and does not contain any data irregularity features which will
be added in a future commit.

f5ad979b

Refactor measurements and common functions for data generation · 8793173a

Ante Kresic authored 5 years ago

This improves code quality by extracting the common parts of the logic
which can be reused for multiple use cases. First step in to creating
data generators for the next use case.

8793173a

26 Jul, 2019 1 commit

Improve Timescale query timing by confirming query completion · 57922da0

Ante Kresic authored 5 years ago

When using the `pgx` sql driver, running the query does not wait for a
response from the server. In order to verify that the query has returned
complete results, we must run `Rows.Next()` until it returns false
meaning we have fetched all the rows from the query result. Note that
this behavior is different than the current implementation of the `pq`
driver.

57922da0

15 Jul, 2019 1 commit

Fix argument name · a8aec49d

Stephen Polcyn authored 5 years ago

Previously, the -n flag sent its data to the "max-queries" variable, which results in an unknown variable name when running the script because the python variable used to generate the run script is 'limit' (see line 163). "Max-queries" is only applicable as a flag for the tsbs_run_queries script, i.e., "--max-queries=###".

a8aec49d

17 Jun, 2019 1 commit
- Add CrateDB database support · 3adcb558
  Ruslan Kovalov authored 5 years ago
```
This includes support for data generation and querying for
the devops use case.
```
  3adcb558
28 May, 2019 1 commit

Make statProcessor an interface and create tests for BenchRunner · ad198f45

Blagoj Atanasovski authored 5 years ago

The statProcessor responsible for gathering statistics
when executing queries was built as a struct. This commit
will change it to an interface to make the BenchmarkRunner code
more easier to test. This commit also adds some unit tests for
the benchmark runner that check if proper argument checks are
done, and if proper init happens when the Run method is called

ad198f45

24 May, 2019 1 commit

Add unit tests to query generation functions for various databases · f7b8830f

Ante Kresic authored 5 years ago

Covering query generation functions for Influx, ClickHouse and SiriDB
databases. Tests are covering basic pre-generated outputs and provide
visual sanity checks. More robust tests are left as a future task.

f7b8830f

22 May, 2019 3 commits

Rename DevopsGenerator to QueryGenerator · a9858758

Rob Kiefer authored 5 years ago

This interface is not tied to the devops use case in any way, so
its naming was a misnomer. It is actually generic and can be used
for any use case since, so this renaming reflects that.

a9858758

Refactor generate_queries devops case to use errs · ff6e61ab

Rob Kiefer authored 5 years ago

Previously the devops use case generation code used a call to
log.Fatalf when something went wrong. This makes it awkward to test
error conditions when generating queries from other packages, since
we need a way to (a) replace the unexported call to log.Fatalf and
(b) prevent the runtime from actually quitting.

It is better for the library to actually return errors on calls
that can fail, rather than either fataling or panicking. Now other
packages can handle the errors themselves and also test error
conditions in their packages as well.

This refactor was pruned a bit to bubble the 'panic' up one level
for now. When the actual generation code encounters the error
during normal execution, it will panic. But these are easier to
test for and don't require adding hooks to replace the 'fatal'
path in the original package.

ff6e61ab

Combine both TimeInterval types into single type · 3cad7a88

Rob Kiefer authored 5 years ago

Query generation and Cassandra's query running both used a type
called TimeInterval that did roughly the same thing. This change
combines the two into one type that can be used from the utils
package in internal/. This improves code reuse and keeps the two
representations in sync, and also increases the testability of
the code.

3cad7a88

20 May, 2019 1 commit

Add unit tests to increase coverage for query generation · 3cfaae09

Ante Kresic authored 5 years ago

For now the tests are mainly matching the output against pre-generated/known
outputs to ensure we have some coverage. A more robust checking, e.g., making
sure the semantics of the query are actually correct, is a future task.

3cfaae09

25 Apr, 2019 1 commit

Fix parsing columns when --do-create-db=false · 19932e72

Lee Hampton authored 5 years ago

This fixes a bug where the PostCreateDB function would exit early when
the user set --do-create-db=false and/or --create-metrics-table=False.
This early exit caused TSBS to skip the updating of some global caches,
which broke assumptions in other parts of the codebase.

This commit also refactors the PostCreateDB function to split the
parsing of columns and the potential creation of tables and indexes into
separate functions. This makes it easier to test the functions in
isolation and cleaner to create the conditional create-table logic that is
at the heart of this bug.

While this does add tests to the parsing function, the create
tables/index function remains untested. This is left for a later PR that
will hopefully clean up global state and provide a more comprehensive
framework for testing IO.

19932e72

18 Apr, 2019 1 commit

Fix additional tags to work with pgx · addc7918

Rob Kiefer authored 5 years ago

Unlike libpq/sqlx, pgx expects JSON/B fields in the copy command to
be in the 'native' format, which is a map[string]interface{}, not a
string in valid JSON format. Without this change, the copy would
fail with "ERROR: unsupported jsonb version number 123".

addc7918

09 Apr, 2019 1 commit

Rewrite tsbs_generate_queries to use internal/inputs · feb417e0

Rob Kiefer authored 5 years ago

This PR continues on the work in the previous one that changed
tsbs_generate_data to use a new internal/inputs package. This PR
adds a new Generator type for query generation called QueryGenerator.

Now that these two generators share some common code, they both
become much more robust and easier to test and manage. Previously
tsbs_generate_queries had no test coverage, but with this change it
will actually have quite high coverage.

There are still some rough spots with this refactor. In particular,
how the useCaseMatrix is handled needs some more thought, especially
if we are going to add more use cases going forward. Additionally,
the database specific flags like TimescaleUseJSON could probably
be handled in a cleaner way as well.

feb417e0

04 Apr, 2019 1 commit

Add internal/inputs package and rewrite tsbs_generate_data · b5815287

Rob Kiefer authored 5 years ago

For a long time, our two generation binaries -- tsbs_generate_data
and tsbs_generate_queries -- have shared (roughly) a fair bit of
code, especially when it comes to flags and validation. However,
they were never truly in sync and combining them has been a long
wanted to-do. Similarly, to enable better tooling around TSBS, it
would be beneficial if more of its functionality was exposed as a
library instead of a CLI that needs to be called.

To those ends, this PR is a first step in addressing both of them.
It introduces the internal/inputs package, which can eventually be
moved to pkg/inputs when we are ready for other things to consume
its API. This package will contain the structs, interfaces, and
functions for generating 'input' to other TSBS tools. For now, that
only includes generating data files (for tsbs_load_* binaries) and
query files (for tsbs_run_queries_* binaries). This PR starts by
introducing these building blocks and converting tsbs_generate_data
to use it.

The idea is that each type of input (e.g., data, queries) is handled
by a Generator, which is customized by a GeneratorConfig. The config
contains fields such as the PRNG seed, number of items to generate,
etc, which are used by the Generator to control the output. These
GeneratorConfigs come with a means of easily adding their fields
to a flag.FlagSet, making them work well with CLIs while also not
restricting their use to only CLIs. Once configured, this
GeneratorConfig is passed to a Generator, which then produces the
output.

This design has a few other nice features to help cleanup TSBS.
One, it uses an approach of bubbling up errors and passing them
back to the caller, allowing for more graceful error handling. CLIs
can output them to the console, while other programs using the
library can pass them to another error handling system if they
desire. Two, Generators should be designed with an Out field that
allows the caller to point to any io.Writer it wants -- not
just the console or a file.

The next step will be to convert tsbs_generate_queries to use this
as well, which will be done in a follow up PR.

b5815287

28 Mar, 2019 1 commit

Add binary format support · f07cde40

niksa authored 5 years ago

Using binary format when talking to TimescaleDB
means less data being sent back and forth.
Config option is added to force TEXT format if needed
(binary is default). PGX driver is used for
binary and PQ driver for TEXT. Based on some benchmarks
binary should increase write throughput by 5-10% and
result in about 35% faster queries.

f07cde40

20 Mar, 2019 1 commit

Added connection DB flag to tsbs_load_timescaledb. Flag description is a bit · d360ad1e

Ante Kresic authored 5 years ago

on the longer side but it makes a clear point not to confuse it with a
similar flag used for specifying the benchmark database name.

d360ad1e

15 Mar, 2019 1 commit

Move tag creation to creator.go for TimescaleDB · 075e29fb

Rob Kiefer authored 6 years ago

The tags creation code was stored in main.go while all the other
database setup code was located in creator.go. This change moves
the function there and also takes the idempotent step of DROPing
the tags table if it exists before trying to create it. This helps
in scenarios where the database is already created and may have
tags left over.

075e29fb

06 Mar, 2019 1 commit

Add ability to explicitly specify postgres port and passsword · 642edf3c

Lee Hampton authored 6 years ago

Using the -pass and -port flags, users can now specify custom ports and
passwords without having to override the connection string using the
-postgres parameter.

642edf3c

02 Mar, 2019 1 commit
- Improve test coverage for generate queries · c887d73f
  Rob Kiefer authored 6 years ago
```
These functions are easy to cover with unit tests, so let's get them
in there.
```
  c887d73f
01 Mar, 2019 1 commit

Fix broken query output and minor other nits · 377b6224

Rob Kiefer authored 6 years ago

The single-group-by queries SQL for TimescaleDB was missing a comma
after the time grouping, creating an incorrect SQL statement. In
addition, a sanity check is added to make sure there are other
select clauses besides just the time grouping.

This also includes minor nits including:
 - fixing the alphabetical order in the README
 - have the generate_queries script output a symlink per query
 - add support for the -timescaledb-use-time-bucket flag to scripts

377b6224

26 Feb, 2019 4 commits

Fix the link to more clickhouse documentation · 26b195ed
Kevin Bowling authored 6 years ago

26b195ed

Separate db create from schema create · 94a85a73

niksa authored 6 years ago

There are some cases when database needs to be configured/setup in a certain
way (eg. setup db using some other tooling). So preferably you'd only want to
drop the metrics table and create a new one and leave the rest of db intact.

94a85a73

added feature to increase max points returned by query · e30557df
Anja Bruls authored 6 years ago

e30557df

Add SiriDB database support (#2) · f091a3e0

AnjaBruls authored 6 years ago

* siridb

* siridb

* update siridb

* update siridb

* siridb

* siridb

* update siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb

* siridb added doc

* removed bug in siridb serializer

* Upd script

* lint go

* fix lint

* Update on serialize

* Update serialize

* update shell script

* added new qpack method

* replaced local import of qpack

* replaced local import of qpack

* removed print method

* added tests

* update doc

* update doc

* update doc

* update doc

* update doc

* update documentation and added support for adding replica

* updateed siridb scripts

* .

* update doc

* postgres string

* update doc

* update doc

* removed influx_1.7.1_amd64.deb

* Add SiriDB database support

f091a3e0

19 Feb, 2019 2 commits

Handle JSON tags properly on tsbs_load_timescaledb · 23fda60c

lilvinz authored 6 years ago

Previously the returned values from insert tags assumed the data
was stored in a table and tried to use the first column as the key
to the client-side index. However, for JSON this first column was
a byte array representing the tags JSON, which created a mapping
with the wrong keys. Now, the JSON is unmarshalled first and then
the value corresponding to the first tags column is used as a key.

23fda60c

Split TS load data processing into own function · 720a6470

Rob Kiefer authored 6 years ago

To improve the testability of the tsbs_load_timescaledb binary and
reduce the complexity of some of its most complex function, we split
the processing of the insertData slice into tags and data into its
own function. This separates a non-DB task from a DB task so that
the former can be tested without an adequate mock for the latter.

720a6470

13 Feb, 2019 2 commits

Fix spelling mistakes · 4f563c0d

Rob Kiefer authored 6 years ago

Pseudo was incorrectly spelled as 'psuedo' and it was propagating
everywhere. Also a test had 'ouput' instead of 'output'.

4f563c0d

Improve usability of generate scripts · aeba1a8d

Rob Kiefer authored 6 years ago

Before the generate_data and generate_queries scripts would leave
behind broken gzip files if the underlying command failed. This
then required the user to remove the file before trying again,
otherwise the script would just detect the old (broken) file and skip
the step. This commit adds a 'trap' to cleanup any files when an
error occurs in generation.

aeba1a8d