CRISP-DM – Cross-Industry Standard Process for Data Mining – is the best known data mining methodology out there. It’s been around a long time but ownership/management of the consortium that developed it has gotten complex recently (the CRISP-DM.ORG site is down at present for instance but you can get some details in the CRISP-DM Wikipedia article).
As part of work at Decision Management Solutions (the consulting company I run) on a more complete methodology (one that includes business rules and integration of analytics and rules into Decision Management Systems) , our intern created a version of CRISP-DM in the Eclipse Process Framework (http://www.eclipse.org/epf/). This is an open source tool for managing methodologies both to allow developers of methodologies to share them and companies to customize them.
One of the nice things about this approach is that you can generate a hypertext linked output. For CRISP-DM in this format, check out this zip file – crisp_dmpub_v3.0 [CORRECTED] – and then open the top level Index file. You should find the same CRISP-DM methodology from the original publication moved from a document to a repository. As soon as the CRISP-DM website is back up and / or we can find the current “owners” we plan to transfer these assets to them and hopefully post them as open source in the EPF libraries. In the meantime just drop me a note with questions or comments. If you are intrigued by the EPF drop me a line for the editable libraries.