Logging generations with SQLite #2557

JohannesGaessler · 2023-08-08T17:46:29Z

I run a lot of benchmarks to test my implementations but the process of manually running the program and copying the results from console is kind of tedious. So I want to implement a way to automate benchmarking and transforming the data into a human-readable format. This PR aims to do that: I added a very simple prototype for logging generations to a SQLite database as well as two example scripts that let you run a benchmark and plot the results. Something like this should be the output after running benchmark.sh and plot_ts_per_ngl.py:

I am willing to do a proper implementation for this but in order to avoid wasted effort please let me know:

Is SQLite even desired? It would be my preferred way of logging results but something like JSON or YAML would also be an option.
At what level should logging be integrated? It would be possible to bake it directly into ggml but my intuition is that this is not desirable and instead should be at the level of llama.cpp.
How should compilation be handled? SQLite is public domain software so we could simply add a copy to the repository and I think this would be the easiest solution (currently you need to have it installed on your system).

klosax · 2023-08-08T18:51:32Z

I really like the idea of good logging capabilities that could be used for testing and benchmarking purposes. But adding sqlite would maybe be a bit overkill, it should be enough with json or even something as simple as writing to a csv file. Also, I think there should be a parameter to enable logging.

Green-Sky · 2023-08-09T09:09:14Z

something as simple as writing to a csv file

just wanted to add that sqlite can use csv files as a backend :)

slaren · 2023-08-09T11:32:09Z

A csv file would be more than enough, sqlite doesn't really add anything here. Instead of adding this to the already bloated main example, a new example that can perform a series of predefined benchmarks I think would be better. If it is simple enough to use, it may motivate others to run the benchmark and share the results.

JohannesGaessler · 2023-08-09T12:13:12Z

A csv file would be more than enough, sqlite doesn't really add anything here.

Yes it does. SQLite has ACID transactions so the log file is not going to be corrupted if the program for some reason crashes mid-write or if multiple processes write to it simultaneously (which I would want to do when running batch jobs on my machine with 3x P40).

On the analysis side I would also be using something like SQLite regardless of the llama.cpp output format because SQL would allow me to query and process the data in the easiest way (e.g. for grouping data by git revision and quantization format and then calculating mean and standard deviation for each group).

Instead of adding this to the already bloated main example, a new example that can perform a series of predefined benchmarks I think would be better.

That's not what I want or need. I don't want a binary that runs a pre-defined benchmark. I want the ability to write bash scripts that only need to call the pre-existing binaries with arbitrary parameters and to then run another script that analyzes the result (a dedicated benchmark binary would not be mutually exclusive with this though).

More generally, while I put the code for the prototype in the main example, this is not how I would do it in the actual implementation. I would instead put it into llama.cpp and add corresponding calls to main, perplexity, and server.

slaren · 2023-08-09T12:29:38Z

The program is not going to crash during a write. For multiple processes, you can output to a different file for each. You can use whatever you want for the analysis, if you like SQL convert the csv to sqlite. If you want something specifically tailored to what you need and don't care about doing it in a way that may be useful to other people, then maybe there is no need to merge it.

JohannesGaessler · 2023-08-09T12:37:43Z

The program is not going to crash during a write.

How do you know? Even if there is no bug in llama.cpp there could always e.g. be a power outage. For mere logs that's probably not too much of a problem but the risk is not 0.

If you want something specifically tailored to what you need and don't care about doing it in a way that may be useful to other people, then maybe there is no need to merge it.

I'm fine with just doing a quick and dirty implementation for myself but I would argue that something like this is useful for anyone with even rudimentary scripting skills.

slaren · 2023-08-09T12:55:21Z

The worst that can happen is that you may have to delete the last line of the csv, it's just not a good reason to add a dependency to the C/C++ code. Maybe if it was isolated to one specific example only.

I am not entirely sure what API you want to add to llama.cpp. I think that something that returns more detailed timings would be great, but a function that writes a sqlite or even a csv file is too high level and not generally useful enough to add to the library.

JohannesGaessler · 2023-08-09T13:00:06Z

@ggerganov as the authority regarding llama.cpp scope, can you weigh in?

SlyEcho · 2023-08-09T14:20:53Z

A database would be useful on the server. Saving and retrieving context data for multiple parallel clients. And maybe, in the future, chat sessions.

ggerganov · 2023-08-14T06:34:42Z

Benchmarking tools and examples are welcome, but an SQLite dependency in main or llama.cpp is not desirable

An alternative approach is to have a tool, say to-sqlite, and you use it like this for example:

./main [some args] | to-sqlite mydb.sql [etc ..]

The to-sqlite tool can be a Python script, a C++ program, etc. and it will parse the text output from main to extract the necessary info. main can be extended to output more parse-able info if necessary

JohannesGaessler · 2023-08-14T07:45:49Z

I've been thinking about parsing stdeout/stderr but I don't think it would be a good solution. The first issue is that parsing text that is designed to be human readable is just very fragile. The second issue is that pipes are a feature that only exists on Unix-like systems. I only ever boot up Windows once every few months but I would like to do some Windows vs. Linux testing if possible.

How about this: a command line option like --yaml <directory> that creates YAML files in said directory, one for each run of the program. The YAML syntax is extremely simple so it wouldn't take too much effort to write short pieces of code that dump the various structs/objects that llama.cpp uses as YAML and then you'd only need to write that to a file.

SlyEcho · 2023-08-14T11:51:32Z

I did some experiments with logging JSON files. One object per line. You can do it with fprintf, no library needed. It's pretty easy to parse even with command line tools like jq, etc.

ggerganov · 2023-08-14T12:11:36Z

How about this: a command line option like --yaml <directory> that creates YAML files in said directory, one for each run of the program. The YAML syntax is extremely simple so it wouldn't take too much effort to write short pieces of code that dump the various structs/objects that llama.cpp uses as YAML and then you'd only need to write that to a file.

Yes, this can work. The main requirements are to avoid 3rd party libs and implement something simple

JohannesGaessler · 2023-08-14T12:13:36Z

JSON would also work with the caveat that unlike YAML the data structures need to have explicit beginnings and endings. With YAML you could just dump a bunch of <key>: <value> lines completely independently from one another and you'd be done. In any case, reading data always requires more work than writing it so I would be fine with either format as long as we use any commonly used format that already has libraries supporting it.

slaren · 2023-08-14T16:13:47Z

JSON may have the advantage that it is more widely supported than YAML. For example, the python library pandas can read JSON files directly, but not YAML. For tabular data, I think that csv would be perfectly adequate too.

Regardless of how this is implemented, I still think that it is important to have a standardized way to benchmark llama.cpp, and in my opinion the best way to do that is with a new example specifically for this purpose, so I am going to work on that. I don't think that there is a lot of overlap with what you want to do here, though.

SlyEcho · 2023-08-14T16:26:44Z

I found my old patch and it looks like this:

@@ -12168,6 +12184,20 @@ void ggml_graph_compute(struct ggml_context * ctx, struct ggml_cgraph * cgraph)
             node->perf_runs++;
             node->perf_cycles  += perf_cycles_cur;
             node->perf_time_us += perf_time_us_cur;
+
+            if (node->op == GGML_OP_MUL_MAT) {
+                struct ggml_tensor *dst = node, *src0 = node->src0, *src1 = node->src1;
+                fprintf(timelog, "{"
+                    "\"ne0\":%ld,\"ne1\":%ld,\"ne2\":%ld,\"ne3\":%ld,\"d\":\"%s\","
+                    "\"ne00\":%ld,\"ne01\":%ld,\"ne02\":%ld,\"ne03\":%ld,\"d0\":\"%s\","
+                    "\"ne10\":%ld,\"ne11\":%ld,\"ne12\":%ld,\"ne13\":%ld,\"d1\":\"%s\","
+                    "\"time\":%ld"
+                "}\n",
+                dst->ne[0],  dst->ne[1],  dst->ne[2],  dst->ne[3], GGML_TYPE_NAME[ dst->type],
+                src0->ne[0], src0->ne[1], src0->ne[2], src0->ne[3], GGML_TYPE_NAME[src0->type],
+                src1->ne[0], src1->ne[1], src1->ne[2], src1->ne[3], GGML_TYPE_NAME[src1->type],
+                perf_time_us_cur);
+            }
         }
     }

I was trying to see how different sized matmuls were taking time, anyway I didn't develop this idea much further.

But you can see that it doesn't really need explicit beginnings and endings aside from { }.
The file doesn't have to be an array, because of something called JSONL.

cebtenzzre · 2023-08-14T18:09:47Z

pipes are a feature that only exists on Unix-like systems

Windows has pipes and redirection too...

JohannesGaessler · 2023-08-15T18:12:26Z

@slaren I've started working on logging runs. Among other things I am logging all user inputs and I plan to add a simple Python script like run_with_preset.py that will read in the log file and invoke the binary that produced it with the same inputs. This could also be used to define standardized tests: you would only need to write a simple config file that defines the relevant parameters and invoke the Python script.

slaren · 2023-08-15T18:59:26Z

I have also started working on an example to run benchmarks, I will open a PR later with more details. I think both options can coexist, there is value in recording the runs of main and other examples, but I think it is also good to have an example that can run benchmarks directly.

slaren · 2023-08-15T20:57:13Z

I have opened PR at #2626 with the benchmark example. It's still quite bare-bones, but it should be enough to show what I have in mind. I imagine that it is not exactly what you need, but let me know if there is anything that could be added to make it more useful for your case.

JohannesGaessler · 2023-08-31T19:53:41Z

Superseded by #2657

JohannesGaessler added 2 commits August 8, 2023 19:05

database writing works

da53236

add plotting files

97c8094

better SQL

829565b

JohannesGaessler mentioned this pull request Aug 18, 2023

YAML logging and presets #2657

Merged

JohannesGaessler closed this Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging generations with SQLite #2557

Logging generations with SQLite #2557

JohannesGaessler commented Aug 8, 2023

klosax commented Aug 8, 2023

Green-Sky commented Aug 9, 2023

slaren commented Aug 9, 2023

JohannesGaessler commented Aug 9, 2023

slaren commented Aug 9, 2023 •

edited

Loading

JohannesGaessler commented Aug 9, 2023

slaren commented Aug 9, 2023

JohannesGaessler commented Aug 9, 2023

SlyEcho commented Aug 9, 2023

ggerganov commented Aug 14, 2023

JohannesGaessler commented Aug 14, 2023

SlyEcho commented Aug 14, 2023

ggerganov commented Aug 14, 2023

JohannesGaessler commented Aug 14, 2023

slaren commented Aug 14, 2023

SlyEcho commented Aug 14, 2023

cebtenzzre commented Aug 14, 2023

JohannesGaessler commented Aug 15, 2023

slaren commented Aug 15, 2023

slaren commented Aug 15, 2023

JohannesGaessler commented Aug 31, 2023

Logging generations with SQLite #2557

Logging generations with SQLite #2557

Conversation

JohannesGaessler commented Aug 8, 2023

klosax commented Aug 8, 2023

Green-Sky commented Aug 9, 2023

slaren commented Aug 9, 2023

JohannesGaessler commented Aug 9, 2023

slaren commented Aug 9, 2023 • edited Loading

JohannesGaessler commented Aug 9, 2023

slaren commented Aug 9, 2023

JohannesGaessler commented Aug 9, 2023

SlyEcho commented Aug 9, 2023

ggerganov commented Aug 14, 2023

JohannesGaessler commented Aug 14, 2023

SlyEcho commented Aug 14, 2023

ggerganov commented Aug 14, 2023

JohannesGaessler commented Aug 14, 2023

slaren commented Aug 14, 2023

SlyEcho commented Aug 14, 2023

cebtenzzre commented Aug 14, 2023

JohannesGaessler commented Aug 15, 2023

slaren commented Aug 15, 2023

slaren commented Aug 15, 2023

JohannesGaessler commented Aug 31, 2023

slaren commented Aug 9, 2023 •

edited

Loading