Mercurial > gemma
view contrib/gmaggregate/README.md @ 5638:1709a00f8e30
Make --openssl-legacy-provider option to node dependent on node version.
author | Sascha Wilde <wilde@sha-bang.de> |
---|---|
date | Thu, 29 Jun 2023 13:17:06 +0200 |
parents | 02c2d0edeb2a |
children |
line wrap: on
line source
# gmaggregate *Attention:* This is a copy of [gmaggregate](https://heptapod.host/intevation/gemma/gmaggregate). A log message transformation tool for gauge measurement (gm) imports in the gemma server. We recognized that the logging of the gm imports is producing a lot of data itself by been very verbose and redundant. The has led to the fact that over 99% of the log messages of all imports in the gemma server are stemming from the gm imports ... hundreds of millions of log lines. The logging itself is now done in a more compact and aggregated way increasing the readability of the logs, too. To get rid of the repetitive old logs entries without losing information these logs has to be aggregated in the same way, too. Normally, we use SQL or PL/pgSQL scripts for this kind of migrations. We had a version of this for this task but first experiments had shown that its run time would only be acceptable for small data sets, but not for the multi million data sets of the production system. It also had no to very little potential to be significantly improved. Therefore we re-crafted this tool in Go. ## Build You need a working Go build environment (tested successfully with 1.17). ```(shell) hg clone https://heptapod.host/intevation/gemma/gmaggregate cd gmaggregate go build ``` Place the resulting `gmaggregate` binary into the `PATH` of your database server. It needs execution rights for the `postgres` user. If you've modified the expressions in [matcher.rl](matcher.rl) you need an installation of the [Ragel](http://www.colm.net/open-source/ragel/) FSM compiler. Compile the modified sources with: ```(shell) go generate go build ``` ## Usage `gmaggregate` works in two phases: **filter** and **transfer**. The **filter** phase creates a new table in the database in which the aggregated logs of the gm import logs are stored. In this phase the original logs are __not__ modified. The modifications are done in the **transfer** phase. In this phase the original log lines are removed from the database which are associated with the gm imports leading to the entries in the table created in the first phase. All other log lines are not touched. After the deletion off the old lines the content of the new table is copied back into the log table and the new table is dropped. All operations in the transfer phase are are encapsulated within a transaction so that no harm is done if the execution is failing. `gmaggregate` runs the two phases **filter** and **transfer** one right after each other. If you want to run them separated by hand you can can do this this the `-phases` flag. > **CSV export**: For debugging purposes `gmaggregate` supports > exporting the aggregated log lines as a CSV file. Use the > `-c` flag to specify the file to write it to. When running > the CVS export the new table in the database is not created. > So the transfer phase will fail. Therefore you should use > the CSV export togetjer with the `-phase=filter` flag. For more options see `gmaggregate --help`. > Tip: After running the `gmaggregate` migration you should consider running > `VACCUM FULL; CLUSTER import.import_logs USING import_logs_import_id;` > from a `psql` shell on the database to recover the space > used by the original log lines and physically order the data > in a way corresponing to the process of logging. ## License This is Free Software covered by the terms of the GNU Affero General Public License version 3.0 or later. See [AGPL-3.0.txt](../../LICENSES/AGPL-3.0.txt) for details.