annotate contrib/gmaggregate/README.md @ 5584:7ed9e32706d0 surveysperbottleneckid

Merged delault
author Sascha Wilde <wilde@sha-bang.de>
date Fri, 01 Apr 2022 16:47:53 +0200
parents 02c2d0edeb2a
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
5548
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
1 # gmaggregate
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
2
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
3 *Attention:* This is a copy of [gmaggregate](https://heptapod.host/intevation/gemma/gmaggregate).
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
4
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
5 A log message transformation tool for gauge measurement (gm) imports
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
6 in the gemma server.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
7
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
8 We recognized that the logging of the gm imports is producing a lot
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
9 of data itself by been very verbose and redundant. The has led
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
10 to the fact that over 99% of the log messages of all imports
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
11 in the gemma server are stemming from the gm imports ...
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
12 hundreds of millions of log lines.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
13
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
14 The logging itself is now done in a more compact and aggregated
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
15 way increasing the readability of the logs, too.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
16 To get rid of the repetitive old logs entries without losing
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
17 information these logs has to be aggregated in the same way,
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
18 too.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
19
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
20 Normally, we use SQL or PL/pgSQL scripts for this kind of migrations.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
21 We had a version of this for this task but first experiments
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
22 had shown that its run time would only be acceptable
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
23 for small data sets, but not for the multi million data sets
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
24 of the production system. It also had no to very little potential
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
25 to be significantly improved. Therefore we re-crafted this tool in Go.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
26
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
27 ## Build
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
28
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
29 You need a working Go build environment (tested successfully with 1.17).
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
30
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
31 ```(shell)
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
32 hg clone https://heptapod.host/intevation/gemma/gmaggregate
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
33 cd gmaggregate
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
34 go build
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
35 ```
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
36
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
37 Place the resulting `gmaggregate` binary into the `PATH` of your
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
38 database server. It needs execution rights for the `postgres` user.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
39
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
40 If you've modified the expressions in [matcher.rl](matcher.rl) you need
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
41 an installation of the [Ragel](http://www.colm.net/open-source/ragel/) FSM compiler.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
42 Compile the modified sources with:
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
43
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
44 ```(shell)
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
45 go generate
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
46 go build
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
47 ```
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
48
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
49 ## Usage
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
50
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
51 `gmaggregate` works in two phases: **filter** and **transfer**.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
52 The **filter** phase creates a new table in the database in which
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
53 the aggregated logs of the gm import logs are stored. In this
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
54 phase the original logs are __not__ modified. The modifications
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
55 are done in the **transfer** phase. In this phase the original
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
56 log lines are removed from the database which are associated
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
57 with the gm imports leading to the entries in the table
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
58 created in the first phase. All other log lines are not
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
59 touched. After the deletion off the old lines the content
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
60 of the new table is copied back into the log table and
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
61 the new table is dropped. All operations in the transfer
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
62 phase are are encapsulated within a transaction so that
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
63 no harm is done if the execution is failing.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
64
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
65 `gmaggregate` runs the two phases **filter** and **transfer**
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
66 one right after each other. If you want to run them
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
67 separated by hand you can can do this this the `-phases`
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
68 flag.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
69
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
70 > **CSV export**: For debugging purposes `gmaggregate` supports
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
71 > exporting the aggregated log lines as a CSV file. Use the
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
72 > `-c` flag to specify the file to write it to. When running
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
73 > the CVS export the new table in the database is not created.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
74 > So the transfer phase will fail. Therefore you should use
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
75 > the CSV export togetjer with the `-phase=filter` flag.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
76
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
77 For more options see `gmaggregate --help`.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
78
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
79 > Tip: After running the `gmaggregate` migration you should consider running
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
80 > `VACCUM FULL; CLUSTER import.import_logs USING import_logs_import_id;`
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
81 > from a `psql` shell on the database to recover the space
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
82 > used by the original log lines and physically order the data
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
83 > in a way corresponing to the process of logging.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
84
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
85 ## License
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
86
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
87 This is Free Software covered by the terms of the
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
88 GNU Affero General Public License version 3.0 or later.
02c2d0edeb2a Added gmaggregate tool as contrib.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
89 See [AGPL-3.0.txt](../../LICENSES/AGPL-3.0.txt) for details.