Skip to content

Logging generations with SQLite #2557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

JohannesGaessler
Copy link
Collaborator

I run a lot of benchmarks to test my implementations but the process of manually running the program and copying the results from console is kind of tedious. So I want to implement a way to automate benchmarking and transforming the data into a human-readable format. This PR aims to do that: I added a very simple prototype for logging generations to a SQLite database as well as two example scripts that let you run a benchmark and plot the results. Something like this should be the output after running benchmark.sh and plot_ts_per_ngl.py:

benchmark

I am willing to do a proper implementation for this but in order to avoid wasted effort please let me know:

  1. Is SQLite even desired? It would be my preferred way of logging results but something like JSON or YAML would also be an option.
  2. At what level should logging be integrated? It would be possible to bake it directly into ggml but my intuition is that this is not desirable and instead should be at the level of llama.cpp.
  3. How should compilation be handled? SQLite is public domain software so we could simply add a copy to the repository and I think this would be the easiest solution (currently you need to have it installed on your system).

@klosax
Copy link
Contributor

klosax commented Aug 8, 2023

I really like the idea of good logging capabilities that could be used for testing and benchmarking purposes. But adding sqlite would maybe be a bit overkill, it should be enough with json or even something as simple as writing to a csv file. Also, I think there should be a parameter to enable logging.

@Green-Sky
Copy link
Collaborator

something as simple as writing to a csv file

just wanted to add that sqlite can use csv files as a backend :)

@slaren
Copy link
Member

slaren commented Aug 9, 2023

A csv file would be more than enough, sqlite doesn't really add anything here. Instead of adding this to the already bloated main example, a new example that can perform a series of predefined benchmarks I think would be better. If it is simple enough to use, it may motivate others to run the benchmark and share the results.

@JohannesGaessler
Copy link
Collaborator Author

A csv file would be more than enough, sqlite doesn't really add anything here.

Yes it does. SQLite has ACID transactions so the log file is not going to be corrupted if the program for some reason crashes mid-write or if multiple processes write to it simultaneously (which I would want to do when running batch jobs on my machine with 3x P40).

On the analysis side I would also be using something like SQLite regardless of the llama.cpp output format because SQL would allow me to query and process the data in the easiest way (e.g. for grouping data by git revision and quantization format and then calculating mean and standard deviation for each group).

Instead of adding this to the already bloated main example, a new example that can perform a series of predefined benchmarks I think would be better.

That's not what I want or need. I don't want a binary that runs a pre-defined benchmark. I want the ability to write bash scripts that only need to call the pre-existing binaries with arbitrary parameters and to then run another script that analyzes the result (a dedicated benchmark binary would not be mutually exclusive with this though).

More generally, while I put the code for the prototype in the main example, this is not how I would do it in the actual implementation. I would instead put it into llama.cpp and add corresponding calls to main, perplexity, and server.

@slaren
Copy link
Member

slaren commented Aug 9, 2023

The program is not going to crash during a write. For multiple processes, you can output to a different file for each. You can use whatever you want for the analysis, if you like SQL convert the csv to sqlite. If you want something specifically tailored to what you need and don't care about doing it in a way that may be useful to other people, then maybe there is no need to merge it.

@JohannesGaessler
Copy link
Collaborator Author

The program is not going to crash during a write.

How do you know? Even if there is no bug in llama.cpp there could always e.g. be a power outage. For mere logs that's probably not too much of a problem but the risk is not 0.

If you want something specifically tailored to what you need and don't care about doing it in a way that may be useful to other people, then maybe there is no need to merge it.

I'm fine with just doing a quick and dirty implementation for myself but I would argue that something like this is useful for anyone with even rudimentary scripting skills.

@slaren
Copy link
Member

slaren commented Aug 9, 2023

The worst that can happen is that you may have to delete the last line of the csv, it's just not a good reason to add a dependency to the C/C++ code. Maybe if it was isolated to one specific example only.

I am not entirely sure what API you want to add to llama.cpp. I think that something that returns more detailed timings would be great, but a function that writes a sqlite or even a csv file is too high level and not generally useful enough to add to the library.

@JohannesGaessler
Copy link
Collaborator Author

@ggerganov as the authority regarding llama.cpp scope, can you weigh in?

@SlyEcho
Copy link
Collaborator

SlyEcho commented Aug 9, 2023

A database would be useful on the server. Saving and retrieving context data for multiple parallel clients. And maybe, in the future, chat sessions.

@ggerganov
Copy link
Member

Benchmarking tools and examples are welcome, but an SQLite dependency in main or llama.cpp is not desirable

An alternative approach is to have a tool, say to-sqlite, and you use it like this for example:

./main [some args] | to-sqlite mydb.sql [etc ..]

The to-sqlite tool can be a Python script, a C++ program, etc. and it will parse the text output from main to extract the necessary info. main can be extended to output more parse-able info if necessary

@JohannesGaessler
Copy link
Collaborator Author

I've been thinking about parsing stdeout/stderr but I don't think it would be a good solution. The first issue is that parsing text that is designed to be human readable is just very fragile. The second issue is that pipes are a feature that only exists on Unix-like systems. I only ever boot up Windows once every few months but I would like to do some Windows vs. Linux testing if possible.

How about this: a command line option like --yaml <directory> that creates YAML files in said directory, one for each run of the program. The YAML syntax is extremely simple so it wouldn't take too much effort to write short pieces of code that dump the various structs/objects that llama.cpp uses as YAML and then you'd only need to write that to a file.

@SlyEcho
Copy link
Collaborator

SlyEcho commented Aug 14, 2023

I did some experiments with logging JSON files. One object per line. You can do it with fprintf, no library needed. It's pretty easy to parse even with command line tools like jq, etc.

@ggerganov
Copy link
Member

How about this: a command line option like --yaml <directory> that creates YAML files in said directory, one for each run of the program. The YAML syntax is extremely simple so it wouldn't take too much effort to write short pieces of code that dump the various structs/objects that llama.cpp uses as YAML and then you'd only need to write that to a file.

Yes, this can work. The main requirements are to avoid 3rd party libs and implement something simple

@JohannesGaessler
Copy link
Collaborator Author

JSON would also work with the caveat that unlike YAML the data structures need to have explicit beginnings and endings. With YAML you could just dump a bunch of <key>: <value> lines completely independently from one another and you'd be done. In any case, reading data always requires more work than writing it so I would be fine with either format as long as we use any commonly used format that already has libraries supporting it.

@slaren
Copy link
Member

slaren commented Aug 14, 2023

JSON may have the advantage that it is more widely supported than YAML. For example, the python library pandas can read JSON files directly, but not YAML. For tabular data, I think that csv would be perfectly adequate too.

Regardless of how this is implemented, I still think that it is important to have a standardized way to benchmark llama.cpp, and in my opinion the best way to do that is with a new example specifically for this purpose, so I am going to work on that. I don't think that there is a lot of overlap with what you want to do here, though.

@SlyEcho
Copy link
Collaborator

SlyEcho commented Aug 14, 2023

I found my old patch and it looks like this:

@@ -12168,6 +12184,20 @@ void ggml_graph_compute(struct ggml_context * ctx, struct ggml_cgraph * cgraph)
             node->perf_runs++;
             node->perf_cycles  += perf_cycles_cur;
             node->perf_time_us += perf_time_us_cur;
+
+            if (node->op == GGML_OP_MUL_MAT) {
+                struct ggml_tensor *dst = node, *src0 = node->src0, *src1 = node->src1;
+                fprintf(timelog, "{"
+                    "\"ne0\":%ld,\"ne1\":%ld,\"ne2\":%ld,\"ne3\":%ld,\"d\":\"%s\","
+                    "\"ne00\":%ld,\"ne01\":%ld,\"ne02\":%ld,\"ne03\":%ld,\"d0\":\"%s\","
+                    "\"ne10\":%ld,\"ne11\":%ld,\"ne12\":%ld,\"ne13\":%ld,\"d1\":\"%s\","
+                    "\"time\":%ld"
+                "}\n",
+                dst->ne[0],  dst->ne[1],  dst->ne[2],  dst->ne[3], GGML_TYPE_NAME[ dst->type],
+                src0->ne[0], src0->ne[1], src0->ne[2], src0->ne[3], GGML_TYPE_NAME[src0->type],
+                src1->ne[0], src1->ne[1], src1->ne[2], src1->ne[3], GGML_TYPE_NAME[src1->type],
+                perf_time_us_cur);
+            }
         }
     }

I was trying to see how different sized matmuls were taking time, anyway I didn't develop this idea much further.

But you can see that it doesn't really need explicit beginnings and endings aside from { }.
The file doesn't have to be an array, because of something called JSONL.

@cebtenzzre
Copy link
Collaborator

pipes are a feature that only exists on Unix-like systems

Windows has pipes and redirection too...

@JohannesGaessler
Copy link
Collaborator Author

@slaren I've started working on logging runs. Among other things I am logging all user inputs and I plan to add a simple Python script like run_with_preset.py that will read in the log file and invoke the binary that produced it with the same inputs. This could also be used to define standardized tests: you would only need to write a simple config file that defines the relevant parameters and invoke the Python script.

@slaren
Copy link
Member

slaren commented Aug 15, 2023

I have also started working on an example to run benchmarks, I will open a PR later with more details. I think both options can coexist, there is value in recording the runs of main and other examples, but I think it is also good to have an example that can run benchmarks directly.

@slaren
Copy link
Member

slaren commented Aug 15, 2023

I have opened PR at #2626 with the benchmark example. It's still quite bare-bones, but it should be enough to show what I have in mind. I imagine that it is not exactly what you need, but let me know if there is anything that could be added to make it more useful for your case.

@JohannesGaessler
Copy link
Collaborator Author

Superseded by #2657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants