-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Optimize iter_index
itertools recipe
#102088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the suggestion. I'll run some of my own timings with the The new code looks a little gross in comparison to the existing code but does have the advantage of showing-off interesting ways to combine the existing tools. The new code relies on an undocumented (and I believe untested) implicit property of FWIW, I'm still evaluating whether to make |
Thanks. Yes, the current version looks more "normal" /prettier. And takes fewer lines. Unless you let mine share the I'm sure I've used
Nevermind that last part, at least the "Roughly equivalent" Python code probably documents it. I'll read it more carefully. |
On my M1 Max Pro, the gains are modest:
The data was created with:
Go ahead and make a PR for In the PR, add a test to |
iter_index
itertools recipeiter_index
itertools recipe
Changed the title from "improve" which is debatable to "optimize" which is something we can quantify. We're sacrificing readability and have made the logic more opaque in exchange for a modest speed improvement. |
Ok, will do. I think the main reason you saw smaller gains was that your sought value occurred more often, 33% vs my 10%. Means more Python work instead of Question: what do you think about using def iter_index(iterable, value, start=0):
"Return indices where a value occurs in a sequence or iterable."
# iter_index('AABCADEAF', 'A') --> 0 1 4 7
try:
seq_index = iterable.index
except AttributeError:
# Slow path for general iterables
it = islice(iterable, start, None)
i = start - 1
with contextlib.suppress(ValueError):
while True:
yield (i := i + operator.indexOf(it, value) + 1)
else:
# Fast path for sequences
i = start - 1
with contextlib.suppress(ValueError):
while True:
yield (i := seq_index(value, i+1)) For fun/curiosity, I also took def iter_index_new5(iterable, value, start=0):
it = iter(iterable)
consume(it, start)
with contextlib.suppress(ValueError):
deltas = starmap(operator.indexOf, repeat((it, value)))
yield from map(operator.add, accumulate(deltas), count(start)) |
For our docs, I prefer a cleaned-up version of Consider proposing your fastest version for the |
I just did a bigger benchmark, with 1000 letters 'a'/'b'/'c' as in your test, but with 'a' occurring between 0% and 100%. When it's 10% of all elements like in my test, we see my bigger gains, and when it's 33% like in your test, we see your modest gains. If all 1000 letters are 'a', then my solutions are slower than the current implementation. Do you still want to proceed? |
I updated the above plot to include a |
In our dialogue I've communicated the relevant considerations: speed, beauty, conciseness, understandability, educational value, clarity, demonstrating the toolkit, being useful as a standalone tool, not using every trick in the book, etc. So what do you want? To make these docs maximally useful for readers, teaching them, but not leaving them in the dust, which is best? There is no perfect answer. It is a matter of balance. My personal preference is to leave the recipe as-is, but I would really like to hear your best judgment as a person contributing the core of language used by millions of people. |
Ok, my thoughts... My main goals were speed and showcasing The I also like using All that said, I have too little experience here to judge what's best for the recipes. I'll instead probably indeed suggest it in |
Showcasing I see no need to go further than
You might think so, but most folks have never used
It does, but it is at the outer limits of my tolerance and had enough benefits to make it worthwhile. In general, I don't aim for any of the recipes to go that far unless there is a sufficient payoff. For itertools, I had aspired to replicate the utility of features from APL but without incurring its enormous readability costs. At any rate, the choice is all about balance. Just because I once went out on a limb to show an amazing composition of itertools doesn't mean that "cramming it all into one line" is something we aspire to. |
Ok, then I'd like to do it. The it = iter('leave the iterator at exactly the position after the match')
self.assertEqual(operator.indexOf(it, 'a'), 2)
self.assertEqual(next(it), 'v') Does that suffice? Btw, I also saw Lines 840 to 843 in 360ef84
That's over a file with lines About `dict(enumerate(seq))` and `matmul` ...
Funny, I used that just recently in a new "sorting algorithm" I invented: conversion sortfrom collections import Counter
def conversion_sorted(a):
a = enumerate(a)
a = dict(a)
a = Counter(a)
a = str(a)
a = eval(a)
a = a.values()
a = reversed(a)
a = list(a)
return a
print(*conversion_sorted('qwertyuiopasdfghjklzxcvbnm')) Output:
Yes, it's kinda neat, and straightforward once you understand it. I btw also wrote my own `matmul` and `tril`def matmul(m1, m2):
return map(
map,
repeat(sumprod),
map(repeat, m1),
repeat(list(transpose(m2)))
)
def tril(matrix):
return starmap(take, enumerate(matrix, 1))
m1, m2 = [(7, 5), (3, 5)], [[2, 5], [7, 9]]
print(*matmul(m1, m2))
print(*map(list, matmul(m1, m2)))
print(*tril(matmul(m1, m2))) Output:
|
(cherry picked from commit eaae563) Co-authored-by: Stefan Pochmann <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.
Documentation
The "Slow path for general iterables" somewhat reinvents
operator.indexOf
. it seems faster to use it, and could show off an effective combination of the other itertools.Benchmark with the current implementation and two new alternatives, finding the indices of
0
in a million random digits:Code of all three (just the slow path portion):
The
new1
version is written to be similar to the "Fast path for sequences". Thenew2
version has optimizations and uses three more itertools /recipes. Besides usingstarmap(...)
, the optimizations are:islice
iterator, instead only usingconsume
to advance the direct iteratorit
, then usingit
.1
tod
, which is more often one of the existing small ints (up to 256) than adding1
toi
.Rest of benchmark script
Linked PRs
The text was updated successfully, but these errors were encountered: