-
Notifications
You must be signed in to change notification settings - Fork 228
Added MLKR algorithm #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added MLKR algorithm #28
Conversation
} | ||
|
||
def _process_inputs(self, X, y): | ||
if X.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's usually a good idea to convert inputs to numpy arrays here, just in case the user provided a plain Python list, or some other sequence.
self.X = np.array(X, copy=False)
y = np.array(y, copy=False)
A good starting point is to add a test to the existing ones based on the Iris dataset, just to exercise the code and make sure that it produces reasonable-looking results. It might also be useful to try to replicate the paper's result on the "twin-peaks" synthetic dataset. (See figure 2 in the PDF.) |
d73feac
to
4a9818a
Compare
@perimosocordiae I've addressed the initial comments. I'll try to reproduce the twin-peaks dataset too. Any notes on how I should be going about it? We'll look into how we can vectorize the gradient descent further since fitting iris with the original values for epsilon and alpha takes an awful lot of time. Can we switch to SGD? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some more minor comments.
Working from "twin peaks" dataset description in the paper, you could generate it like this:
def twin_peaks(n=8000):
r = np.random.random(n)
s = np.random.random(n)
z = np.sin(r) * np.tanh(s)
x = np.column_stack((r, s, z))
y = np.linalg.norm(x, axis=1)
noise = np.random.random((n, 7)) * 10
X = np.concatenate((x, noise), axis=1)
return X, y
A = np.identity(d) # Initialize A as eye matrix | ||
else: | ||
A = self.params['A0'] | ||
assert A.shape == (d, d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert
should be a ValueError
assert A.shape == (d, d) | ||
cost = np.Inf | ||
# Gradient descent procedure | ||
while cost > self.params['alpha']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For efficiency, it's better to get alpha
from the params dictionary once, rather than every iteration:
alpha = self.params['alpha']
while cost > alpha:
...
sum_j += diffK * x_ij.dot(x_ijT) | ||
sum_i += (yhat[i] - y[i]) * sum_j | ||
gradient = 4 * A.dot(sum_i) | ||
A -= self.params['epsilon'] * gradient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing for epsilon
as above with alpha
.
dist_mat = pdist(X, metric='mahalanobis', VI=A.T.dot(A)) | ||
dist_mat = np.square(dist_mat) | ||
dist_mat = squareform(dist_mat) | ||
return np.exp(-dist_mat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's faster to do all the math on the distances before calling squareform
:
dist = pdist(X, metric='mahalanobis', VI=A.T.dot(A))
return squareform(np.exp(-dist**2))
I just had a doubt about how to apply the rotation matrix in the final step after generating X. Is it necessary? |
Oh right, I forgot about that part! That's the most important step: rotation = scipy.linalg.orth(np.random.normal(size=(10,10)))
return X.dot(rotation) |
Thanks! Btw what do you think about adding an "SGD mode"? Running on this dataset is taking forever! |
It might be a good idea, but let's make sure the non-stochastic version works first (even if it's really slow). SGD is much more difficult to debug! |
On twin-peaks, with the default values for alpha and epsilon, the cost is always increasing. Should we put in a maximum number of line searches or function computations check in there? Would it be better to use |
ping @perimosocordiae :) |
I was away last week, so I'm just getting to look at this now. If possible, it would be great to use |
Alright awesome! I'll get to it in a while. |
There seems to be some problem somewhere in my code. My cost keeps rising on the twin peaks dataset. Here's what my cost looks like on consecutive iterations on twin_peaks(100): def twin_peaks(n=8000):
r = np.random.random(n)
s = np.random.random(n)
z = np.sin(r) * np.tanh(s)
x = np.column_stack((r, s, z))
y = np.linalg.norm(x, axis=1)
noise = np.random.random((n, 7)) * 10
X = np.concatenate((x, noise), axis=1)
rotation = scipy.linalg.orth(np.random.normal(size=(10, 10)))
return X.dot(rotation), y
X, y = twin_peaks(100)
mlkr = MLKR()
mlkr.fit(X, y)
I've been trying to debug it but not reaching anywhere. My gradient computation also seems alright :/ |
I'm looking into this now. |
I'm also trying switching to maximum number of line searches rather than keeping an |
I'm merging this now so that I can apply some fixups that I've made locally. |
And thanks @dsquareindia for making this happen! |
Oh I was experimenting with the line searches on twin peaks and getting pretty good results :P |
I've committed all of my changes now, so feel free to work with the current code. |
Implements MLKR (sorry I've named the branch mklr :P) algorithm mentioned in #13. @perimosocordiae could you please review once? Sorry for the loops! I'll try to make it more vectorized. I'm also checking if any mathematical mistakes have inadvertently crept in. Also, what could be a good test for this? Should it just be on the lines of
testRCA
? Sorry again for the poor quality of the code. This will surely go through many revisions.