Skip to content

Optimize iter_index itertools recipe #102088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pochmann opened this issue Feb 20, 2023 · 12 comments
Closed

Optimize iter_index itertools recipe #102088

pochmann opened this issue Feb 20, 2023 · 12 comments
Assignees
Labels
docs Documentation in the Doc dir performance Performance or resource usage

Comments

@pochmann
Copy link
Contributor

pochmann commented Feb 20, 2023

Documentation

The "Slow path for general iterables" somewhat reinvents operator.indexOf. it seems faster to use it, and could show off an effective combination of the other itertools.

Benchmark with the current implementation and two new alternatives, finding the indices of 0 in a million random digits:

 108.1 ms ± 0.1 ms  iter_index_current
  39.7 ms ± 0.3 ms  iter_index_new1
  31.0 ms ± 0.1 ms  iter_index_new2

Python version:
3.10.8 (main, Oct 11 2022, 11:35:05) [GCC 11.3.0]

Code of all three (just the slow path portion):

def iter_index_current(iterable, value, start=0):
    it = islice(iterable, start, None)
    for i, element in enumerate(it, start):
        if element is value or element == value:
            yield i

def iter_index_new1(iterable, value, start=0):
    it = islice(iterable, start, None)
    i = start - 1
    try:
        while True:
            yield (i := i+1 + operator.indexOf(it, value))
    except ValueError:
        pass

def iter_index_new2(iterable, value, start=0):
    it = iter(iterable)
    consume(it, start)
    i = start - 1
    try:
        for d in starmap(operator.indexOf, repeat((it, value))):
            yield (i := i + (1 + d))
    except ValueError:
        pass

The new1 version is written to be similar to the "Fast path for sequences". The new2 version has optimizations and uses three more itertools /recipes. Besides using starmap(...), the optimizations are:

  • Not piping all values through an islice iterator, instead only using consume to advance the direct iterator it, then using it.
  • Add 1 to d, which is more often one of the existing small ints (up to 256) than adding 1 to i.
Rest of benchmark script
funcs = iter_index_current, iter_index_new1, iter_index_new2

from itertools import islice, repeat, starmap
from timeit import timeit
import collections
import random
from statistics import mean, stdev
import sys
import operator

# Existing recipe
def consume(iterator, n=None):
    if n is None:
        collections.deque(iterator, maxlen=0)
    else:
        next(islice(iterator, n, n), None)

# Test arguments
iterable = random.choices(range(10), k=10**6)
value = 0
start = 100
def args():
    return iter(iterable), value, start
 
# Correctness   
expect = list(funcs[0](*args()))
for f in funcs[1:]:
    print(list(f(*args())) == expect, f.__name__)

# For timing
times = {f: [] for f in funcs}
def stats(f):
    ts = [t*1e3 for t in sorted(times[f])[:5]]
    return f'{mean(ts):6.1f} ms ± {stdev(ts):3.1f} ms '

# Timing
number = 1
for _ in range(25):
    for f in funcs:
        t = timeit(lambda: consume(f(*args())), number=number) / number
        times[f].append(t)

# Timing results
for f in sorted(funcs, key=stats, reverse=True):
    print(stats(f), f.__name__)

print('\nPython version:')
print(sys.version)

Linked PRs

@pochmann pochmann added the docs Documentation in the Doc dir label Feb 20, 2023
@rhettinger
Copy link
Contributor

Thanks for the suggestion. I'll run some of my own timings with the sieve() use case and mull it over for a while.

The new code looks a little gross in comparison to the existing code but does have the advantage of showing-off interesting ways to combine the existing tools.

The new code relies on an undocumented (and I believe untested) implicit property of indexOf that it leaves the iterator at exactly the position after the match. This obscure at best and unreliable at worst. Aside from the proposed recipe update, I haven't seen any precedent for using indexOf in this way.

FWIW, I'm still evaluating whether to make iter_index() into a regular itertool, at which point we would always prefer the simple code equivalent in the docs. Another way to go would be to have the concrete sequence types implement an iter_index() method, at which point I would drop this recipe entirely. The more-itertools project has included this recipe, so I'm waiting for people to adopt it so that we can get some good empirical data on how people use this in practice.

@pochmann
Copy link
Contributor Author

pochmann commented Feb 20, 2023

Thanks. Yes, the current version looks more "normal" /prettier. And takes fewer lines. Unless you let mine share the i = start - 1, try:, except ValueError: and pass lines with the "Fast path" code :-)

I'm sure I've used indexOf this way before, though can't find examples now. It's rare that I get to use indexOf at all (but when I do, it's always a delight).

islice likewise isn't documented to leave the source iterator at exactly the position after where the islice iterator needed to go, does it? But the consume recipe relies on that. Or is that an indirect documentation of that property?

Nevermind that last part, at least the "Roughly equivalent" Python code probably documents it. I'll read it more carefully.

@rhettinger
Copy link
Contributor

rhettinger commented Feb 21, 2023

On my M1 Max Pro, the gains are modest:

% python3.12 -m timeit -r 11 -s 'from tmp5 import iter_index_current, iter_index_new1, iter_index_new2, data' 'list(iter_index_current(data, "a"))'
10000 loops, best of 11: 35.1 usec per loop

% python3.12 -m timeit -r 11 -s 'from tmp5 import iter_index_current, iter_index_new1, iter_index_new2, data' 'list(iter_index_new1(data, "a"))'
10000 loops, best of 11: 30 usec per loop

% python3.12 -m timeit -r 11 -s 'from tmp5 import iter_index_current, iter_index_new1, iter_index_new2, data' 'list(iter_index_new2(data, "a"))'
10000 loops, best of 11: 27.8 usec per loop

The data was created with:

from random import seed, choices
seed('8675309' * 10)
data = choices('abc', k=1000)

Go ahead and make a PR for iter_index_new1(). Consider rearranging the yield line to yield (i := i + operator.indexOf(it, value) + 1) which I find to be nicer looking.

In the PR, add a test to test_operator to cover how indexOf is being used. Let's guarantee (by testing) the iterator position subsequent to the call.

@rhettinger rhettinger changed the title Improve iter_index itertools recipe Optimize iter_index itertools recipe Feb 21, 2023
@rhettinger
Copy link
Contributor

Changed the title from "improve" which is debatable to "optimize" which is something we can quantify. We're sacrificing readability and have made the logic more opaque in exchange for a modest speed improvement.

@rhettinger rhettinger added the performance Performance or resource usage label Feb 21, 2023
@pochmann
Copy link
Contributor Author

pochmann commented Feb 22, 2023

Ok, will do. I think the main reason you saw smaller gains was that your sought value occurred more often, 33% vs my 10%. Means more Python work instead of indexOf just running through.

Question: what do you think about using contextlib.suppress? Could be used for both slow and fast path and also for the iter_except recipe. Would look like this, which I find rather neat:

def iter_index(iterable, value, start=0):
    "Return indices where a value occurs in a sequence or iterable."
    # iter_index('AABCADEAF', 'A') --> 0 1 4 7
    try:
        seq_index = iterable.index
    except AttributeError:
        # Slow path for general iterables
        it = islice(iterable, start, None)
        i = start - 1
        with contextlib.suppress(ValueError):
            while True:
                yield (i := i + operator.indexOf(it, value) + 1)
    else:
        # Fast path for sequences
        i = start - 1
        with contextlib.suppress(ValueError):
            while True:
                yield (i := seq_index(value, i+1))

For fun/curiosity, I also took new2 further, replacing the loop and i and additions with accumulate, count and operator.add. I don't expect you to like it :-) (It's about as fast as new2)

def iter_index_new5(iterable, value, start=0):
    it = iter(iterable)
    consume(it, start)
    with contextlib.suppress(ValueError):
        deltas = starmap(operator.indexOf, repeat((it, value)))
        yield from map(operator.add, accumulate(deltas), count(start))

@rhettinger
Copy link
Contributor

For our docs, I prefer a cleaned-up version of iter_index_new1 so that there is a well understood proposal for promotion to a real itertool.

Consider proposing your fastest version for the more-itertools project. Users of that package just want results and typically don't look at the underlying code. This gives more freedom to try every trick in the book to eek out clock cycles for non-sequence inputs.

@pochmann
Copy link
Contributor Author

pochmann commented Feb 22, 2023

I just did a bigger benchmark, with 1000 letters 'a'/'b'/'c' as in your test, but with 'a' occurring between 0% and 100%. When it's 10% of all elements like in my test, we see my bigger gains, and when it's 33% like in your test, we see your modest gains. If all 1000 letters are 'a', then my solutions are slower than the current implementation. Do you still want to proceed?

download (1)

@pochmann
Copy link
Contributor Author

pochmann commented Feb 23, 2023

I updated the above plot to include a new1_local_indexOf version that uses local indexOf = operator.indexOf. That takes away much of the "slowness", so we (or users themselves) could do that for better gains or if speed for >50% density is an issue (my belief is that such high density is rare, for primes for example it's only 1/log(n)). I thought the recipes text mentioned such local variable optimizations, but I don't see it now, maybe it was somewhere else.

@rhettinger
Copy link
Contributor

Do you still want to proceed?

In our dialogue I've communicated the relevant considerations: speed, beauty, conciseness, understandability, educational value, clarity, demonstrating the toolkit, being useful as a standalone tool, not using every trick in the book, etc.

So what do you want? To make these docs maximally useful for readers, teaching them, but not leaving them in the dust, which is best?

There is no perfect answer. It is a matter of balance. My personal preference is to leave the recipe as-is, but I would really like to hear your best judgment as a person contributing the core of language used by millions of people.

@pochmann
Copy link
Contributor Author

Ok, my thoughts...

My main goals were speed and showcasing indexOf. Like I said, I always delight in using it. It's one of those tools that seem hardly known and which I like to point out whenever there's a good opportunity. I acknowledge the enumerate solution is easier, but everybody already knows enumerate.

The new2 version was already further than I wanted originally. When finalizing new1 for this issue, I realized that the recipes only import modules, so last-minute I had to prepend operator. in my code. Then I made new2 to counter that slowdown and perhaps show off a few more tools in the process. The matmul recipe also crams quite a bit into one line, so at least it wouldn't be the first.

I also like using suppress, both to teach people about it and because it saves a few lines and gives an in my opinion more appealing less scattered look than the try-except-pass. I think it helped making new1 look very similar to the fast path code.

All that said, I have too little experience here to judge what's best for the recipes. I'll instead probably indeed suggest it in more-itertools. They don't seem super-eager to optimize, though, my partition improvement is also getting resistance. I'll have to do more analysis.

@rhettinger
Copy link
Contributor

My main goals were speed and showcasing indexOf.

Showcasing indexOf is a reasonable goal. Used in this way, it is very itertool-like.

I see no need to go further than new1 to accomplish that goal. This is no place for suppress or a convoluted, starmap/repeat mix. So I suggest making a PR for new1 or opting to leave the recipe as-is. I'm fine with either of those paths.

I acknowledge the enumerate solution is easier, but everybody already knows enumerate.

You might think so, but most folks have never used enumerate() outside of a for-loop. I never fail to get amazement from a demo of dict(enumerate(seq)).

The matmul recipe also crams quite a bit into one line, so at least it wouldn't be the first.

It does, but it is at the outer limits of my tolerance and had enough benefits to make it worthwhile. In general, I don't aim for any of the recipes to go that far unless there is a sufficient payoff. For itertools, I had aspired to replicate the utility of features from APL but without incurring its enormous readability costs. At any rate, the choice is all about balance. Just because I once went out on a limb to show an amazing composition of itertools doesn't mean that "cramming it all into one line" is something we aspire to.

@pochmann
Copy link
Contributor Author

pochmann commented Mar 1, 2023

Ok, then I'd like to do it. The indexOf test I'd add:

        it = iter('leave the iterator at exactly the position after the match')
        self.assertEqual(operator.indexOf(it, 'a'), 2)
        self.assertEqual(next(it), 'v')

Does that suffice? Btw, I also saw indexOf tests in test_iter, including this one:

fiter = iter(f)
self.assertEqual(indexOf(fiter, "b\n"), 1)
self.assertEqual(indexOf(fiter, "d\n"), 1)
self.assertEqual(indexOf(fiter, "e\n"), 0)

That's over a file with lines "a\n" "b\n" "c\n" "d\n" "e\n", so the last two indexOf assertions already rely on it leaving the iterator at exactly the position after the match.

About `dict(enumerate(seq))` and `matmul` ...

I never fail to get amazement from a demo of dict(enumerate(seq))

Funny, I used that just recently in a new "sorting algorithm" I invented:

conversion sort

try it

from collections import Counter

def conversion_sorted(a):
    a = enumerate(a)
    a = dict(a)
    a = Counter(a)
    a = str(a)
    a = eval(a)
    a = a.values()
    a = reversed(a)
    a = list(a)
    return a

print(*conversion_sorted('qwertyuiopasdfghjklzxcvbnm'))

Output:

a b c d e f g h i j k l m n o p q r s t u v w x y z

[the matmul recipe] had enough benefits to make it worthwhile

Yes, it's kinda neat, and straightforward once you understand it. I btw also wrote my own matmul, which yields the rows as iterators instead of tuples. So for example getting only the lower triangle only does half the work. But it's less clear:

`matmul` and `tril`

try it

def matmul(m1, m2):
    return map(
        map,
        repeat(sumprod),
        map(repeat, m1),
        repeat(list(transpose(m2)))
    )

def tril(matrix):
    return starmap(take, enumerate(matrix, 1))

m1, m2 = [(7, 5), (3, 5)], [[2, 5], [7, 9]]
print(*matmul(m1, m2))
print(*map(list, matmul(m1, m2)))
print(*tril(matmul(m1, m2)))

Output:

<map object at 0x7fdb0c0feef0> <map object at 0x7fdb0c0ff2b0>
[49, 80] [41, 60]
[49] [41, 60]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

2 participants