首先介绍数据流挖掘工具。

MOA (MASSIVE ONLINE ANALYSIS)

MOA is the most popular open source framework for data stream mining, with a very active growing community (blog). It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems.

链接:http://moa.cms.waikato.ac.nz/。

WEKA

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

声明一下:WEKA是有可视化分析工具的。

链接:http://www.cs.waikato.ac.nz/ml/weka/。

如果想在项目中使用WEKA,需要下载Linux平台的weka,找到weka.jar即可。

接下来介绍一些常用的工具。

scikit-learn

Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language.

  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license

链接:http://scikit-learn.org/stable/index.html。

NumPy

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

SciPy

SciPy is a series of Scientific Computing Tools for Python. It refers to several related but distinct entities:

  • The SciPy Stack, a collection of open source software for scientific computing in Python, and particularly a specified set of core packages.
  • The community of people who use and develop this stack.
  • Several conferences dedicated to scientific computing in Python - SciPy, EuroSciPy and SciPy.in.
  • The SciPy library, one component of the SciPy stack, providing many numerical routines.