Skip to content

Added MLKR algorithm #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 28, 2016
Merged

Conversation

devashishd12
Copy link
Contributor

Implements MLKR (sorry I've named the branch mklr :P) algorithm mentioned in #13. @perimosocordiae could you please review once? Sorry for the loops! I'll try to make it more vectorized. I'm also checking if any mathematical mistakes have inadvertently crept in. Also, what could be a good test for this? Should it just be on the lines of testRCA? Sorry again for the poor quality of the code. This will surely go through many revisions.

}

def _process_inputs(self, X, y):
if X.ndim == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually a good idea to convert inputs to numpy arrays here, just in case the user provided a plain Python list, or some other sequence.

self.X = np.array(X, copy=False)
y = np.array(y, copy=False)

@perimosocordiae
Copy link
Contributor

A good starting point is to add a test to the existing ones based on the Iris dataset, just to exercise the code and make sure that it produces reasonable-looking results.

It might also be useful to try to replicate the paper's result on the "twin-peaks" synthetic dataset. (See figure 2 in the PDF.)

@perimosocordiae perimosocordiae mentioned this pull request Sep 15, 2016
9 tasks
@devashishd12 devashishd12 force-pushed the mklr branch 2 times, most recently from d73feac to 4a9818a Compare September 17, 2016 15:26
@devashishd12
Copy link
Contributor Author

devashishd12 commented Sep 17, 2016

@perimosocordiae I've addressed the initial comments. I'll try to reproduce the twin-peaks dataset too. Any notes on how I should be going about it? We'll look into how we can vectorize the gradient descent further since fitting iris with the original values for epsilon and alpha takes an awful lot of time. Can we switch to SGD?

Copy link
Contributor

@perimosocordiae perimosocordiae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some more minor comments.

Working from "twin peaks" dataset description in the paper, you could generate it like this:

def twin_peaks(n=8000):
    r = np.random.random(n)
    s = np.random.random(n)
    z = np.sin(r) * np.tanh(s)
    x = np.column_stack((r, s, z))
    y = np.linalg.norm(x, axis=1)
    noise = np.random.random((n, 7)) * 10
    X = np.concatenate((x, noise), axis=1)
    return X, y

A = np.identity(d) # Initialize A as eye matrix
else:
A = self.params['A0']
assert A.shape == (d, d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert should be a ValueError

assert A.shape == (d, d)
cost = np.Inf
# Gradient descent procedure
while cost > self.params['alpha']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For efficiency, it's better to get alpha from the params dictionary once, rather than every iteration:

alpha = self.params['alpha']
while cost > alpha:
    ...

sum_j += diffK * x_ij.dot(x_ijT)
sum_i += (yhat[i] - y[i]) * sum_j
gradient = 4 * A.dot(sum_i)
A -= self.params['epsilon'] * gradient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for epsilon as above with alpha.

dist_mat = pdist(X, metric='mahalanobis', VI=A.T.dot(A))
dist_mat = np.square(dist_mat)
dist_mat = squareform(dist_mat)
return np.exp(-dist_mat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's faster to do all the math on the distances before calling squareform:

dist = pdist(X, metric='mahalanobis', VI=A.T.dot(A))
return squareform(np.exp(-dist**2))

@devashishd12
Copy link
Contributor Author

I just had a doubt about how to apply the rotation matrix in the final step after generating X. Is it necessary?

@perimosocordiae
Copy link
Contributor

Oh right, I forgot about that part! That's the most important step:

rotation = scipy.linalg.orth(np.random.normal(size=(10,10)))
return X.dot(rotation)

@devashishd12
Copy link
Contributor Author

Thanks! Btw what do you think about adding an "SGD mode"? Running on this dataset is taking forever!

@perimosocordiae
Copy link
Contributor

It might be a good idea, but let's make sure the non-stochastic version works first (even if it's really slow). SGD is much more difficult to debug!

@devashishd12
Copy link
Contributor Author

devashishd12 commented Sep 18, 2016

On twin-peaks, with the default values for alpha and epsilon, the cost is always increasing. Should we put in a maximum number of line searches or function computations check in there? Would it be better to use scipy.optimize routines for minimizing?

@devashishd12
Copy link
Contributor Author

ping @perimosocordiae :)

@perimosocordiae
Copy link
Contributor

I was away last week, so I'm just getting to look at this now. If possible, it would be great to use scipy.optimize here. That would give us a lot of support for changing convergence conditions and iteration/evaluation limits.

@devashishd12
Copy link
Contributor Author

Alright awesome! I'll get to it in a while.

@devashishd12
Copy link
Contributor Author

devashishd12 commented Oct 27, 2016

There seems to be some problem somewhere in my code. My cost keeps rising on the twin peaks dataset. Here's what my cost looks like on consecutive iterations on twin_peaks(100):

def twin_peaks(n=8000):
    r = np.random.random(n)
    s = np.random.random(n)
    z = np.sin(r) * np.tanh(s)
    x = np.column_stack((r, s, z))
    y = np.linalg.norm(x, axis=1)
    noise = np.random.random((n, 7)) * 10
    X = np.concatenate((x, noise), axis=1)
    rotation = scipy.linalg.orth(np.random.normal(size=(10, 10)))
    return X.dot(rotation), y

X, y = twin_peaks(100)
mlkr = MLKR()
mlkr.fit(X, y)
inf
18.6375498396
18.638362056
18.6391722602
18.6399804524
18.6407866332
18.641590803
18.6423929623
18.6431931116
18.6439912514
18.6447873824
18.6455815052
18.6463736204
18.6471637286
18.6479518307
18.6487379272
18.6495220191
18.6503041071
18.651084192
18.6518622747
18.6526383561
18.6534124371
18.6541845186
18.6549546017
18.6557226873
18.6564887764
18.6572528701
18.6580149696
18.6587750759
18.6595331901
18.6602893135
18.6610434473
18.6617955926
18.6625457507
18.6632939229

I've been trying to debug it but not reaching anywhere. My gradient computation also seems alright :/
@perimosocordiae could you please have a look? Thanks!

@perimosocordiae
Copy link
Contributor

I'm looking into this now.

@devashishd12
Copy link
Contributor Author

I'm also trying switching to maximum number of line searches rather than keeping an epsilon. Atleast with that we'll be sure of getting out of the loop. This is what is done in the octave code. They have an option of number of line searches or number of function evaluations but not a minimum loss value.

@perimosocordiae perimosocordiae merged commit 54b06ae into scikit-learn-contrib:master Oct 28, 2016
@perimosocordiae
Copy link
Contributor

I'm merging this now so that I can apply some fixups that I've made locally.

@perimosocordiae
Copy link
Contributor

And thanks @dsquareindia for making this happen!

@devashishd12
Copy link
Contributor Author

Oh I was experimenting with the line searches on twin peaks and getting pretty good results :P
No problem and thanks a lot for the help! Once you merge in MLKR with your patches, I'll put in the PR for twin peaks (if you're not doing it already) using the final MLKR. Are you working on efficient optimisation too?

@devashishd12 devashishd12 deleted the mklr branch October 28, 2016 20:18
@perimosocordiae
Copy link
Contributor

I've committed all of my changes now, so feel free to work with the current code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants