-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Logging generations with SQLite #2557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I really like the idea of good logging capabilities that could be used for testing and benchmarking purposes. But adding sqlite would maybe be a bit overkill, it should be enough with json or even something as simple as writing to a csv file. Also, I think there should be a parameter to enable logging. |
just wanted to add that sqlite can use csv files as a backend :) |
A csv file would be more than enough, sqlite doesn't really add anything here. Instead of adding this to the already bloated main example, a new example that can perform a series of predefined benchmarks I think would be better. If it is simple enough to use, it may motivate others to run the benchmark and share the results. |
Yes it does. SQLite has ACID transactions so the log file is not going to be corrupted if the program for some reason crashes mid-write or if multiple processes write to it simultaneously (which I would want to do when running batch jobs on my machine with 3x P40). On the analysis side I would also be using something like SQLite regardless of the llama.cpp output format because SQL would allow me to query and process the data in the easiest way (e.g. for grouping data by git revision and quantization format and then calculating mean and standard deviation for each group).
That's not what I want or need. I don't want a binary that runs a pre-defined benchmark. I want the ability to write bash scripts that only need to call the pre-existing binaries with arbitrary parameters and to then run another script that analyzes the result (a dedicated benchmark binary would not be mutually exclusive with this though). More generally, while I put the code for the prototype in the |
The program is not going to crash during a write. For multiple processes, you can output to a different file for each. You can use whatever you want for the analysis, if you like SQL convert the csv to sqlite. If you want something specifically tailored to what you need and don't care about doing it in a way that may be useful to other people, then maybe there is no need to merge it. |
How do you know? Even if there is no bug in llama.cpp there could always e.g. be a power outage. For mere logs that's probably not too much of a problem but the risk is not 0.
I'm fine with just doing a quick and dirty implementation for myself but I would argue that something like this is useful for anyone with even rudimentary scripting skills. |
The worst that can happen is that you may have to delete the last line of the csv, it's just not a good reason to add a dependency to the C/C++ code. Maybe if it was isolated to one specific example only. I am not entirely sure what API you want to add to llama.cpp. I think that something that returns more detailed timings would be great, but a function that writes a sqlite or even a csv file is too high level and not generally useful enough to add to the library. |
@ggerganov as the authority regarding llama.cpp scope, can you weigh in? |
A database would be useful on the server. Saving and retrieving context data for multiple parallel clients. And maybe, in the future, chat sessions. |
Benchmarking tools and examples are welcome, but an SQLite dependency in An alternative approach is to have a tool, say ./main [some args] | to-sqlite mydb.sql [etc ..] The |
I've been thinking about parsing stdeout/stderr but I don't think it would be a good solution. The first issue is that parsing text that is designed to be human readable is just very fragile. The second issue is that pipes are a feature that only exists on Unix-like systems. I only ever boot up Windows once every few months but I would like to do some Windows vs. Linux testing if possible. How about this: a command line option like |
I did some experiments with logging JSON files. One object per line. You can do it with fprintf, no library needed. It's pretty easy to parse even with command line tools like jq, etc. |
Yes, this can work. The main requirements are to avoid 3rd party libs and implement something simple |
JSON would also work with the caveat that unlike YAML the data structures need to have explicit beginnings and endings. With YAML you could just dump a bunch of |
JSON may have the advantage that it is more widely supported than YAML. For example, the python library pandas can read JSON files directly, but not YAML. For tabular data, I think that csv would be perfectly adequate too. Regardless of how this is implemented, I still think that it is important to have a standardized way to benchmark llama.cpp, and in my opinion the best way to do that is with a new example specifically for this purpose, so I am going to work on that. I don't think that there is a lot of overlap with what you want to do here, though. |
I found my old patch and it looks like this: @@ -12168,6 +12184,20 @@ void ggml_graph_compute(struct ggml_context * ctx, struct ggml_cgraph * cgraph)
node->perf_runs++;
node->perf_cycles += perf_cycles_cur;
node->perf_time_us += perf_time_us_cur;
+
+ if (node->op == GGML_OP_MUL_MAT) {
+ struct ggml_tensor *dst = node, *src0 = node->src0, *src1 = node->src1;
+ fprintf(timelog, "{"
+ "\"ne0\":%ld,\"ne1\":%ld,\"ne2\":%ld,\"ne3\":%ld,\"d\":\"%s\","
+ "\"ne00\":%ld,\"ne01\":%ld,\"ne02\":%ld,\"ne03\":%ld,\"d0\":\"%s\","
+ "\"ne10\":%ld,\"ne11\":%ld,\"ne12\":%ld,\"ne13\":%ld,\"d1\":\"%s\","
+ "\"time\":%ld"
+ "}\n",
+ dst->ne[0], dst->ne[1], dst->ne[2], dst->ne[3], GGML_TYPE_NAME[ dst->type],
+ src0->ne[0], src0->ne[1], src0->ne[2], src0->ne[3], GGML_TYPE_NAME[src0->type],
+ src1->ne[0], src1->ne[1], src1->ne[2], src1->ne[3], GGML_TYPE_NAME[src1->type],
+ perf_time_us_cur);
+ }
}
} I was trying to see how different sized matmuls were taking time, anyway I didn't develop this idea much further. But you can see that it doesn't really need explicit beginnings and endings aside from |
Windows has pipes and redirection too... |
@slaren I've started working on logging runs. Among other things I am logging all user inputs and I plan to add a simple Python script like |
I have also started working on an example to run benchmarks, I will open a PR later with more details. I think both options can coexist, there is value in recording the runs of |
I have opened PR at #2626 with the benchmark example. It's still quite bare-bones, but it should be enough to show what I have in mind. I imagine that it is not exactly what you need, but let me know if there is anything that could be added to make it more useful for your case. |
Superseded by #2657 |
I run a lot of benchmarks to test my implementations but the process of manually running the program and copying the results from console is kind of tedious. So I want to implement a way to automate benchmarking and transforming the data into a human-readable format. This PR aims to do that: I added a very simple prototype for logging generations to a SQLite database as well as two example scripts that let you run a benchmark and plot the results. Something like this should be the output after running
benchmark.sh
andplot_ts_per_ngl.py
:I am willing to do a proper implementation for this but in order to avoid wasted effort please let me know: