1. Supervised learning
2. Operates on streams of big data
3. Fast, lightweight models
4. Small(er) RAM footprint
5. Updated continuously
6. Adaptable to changes in the environment
Many machine learning algorithms train in batch mode. The model requires the entire batch of training data to be fed in at one time. To train, you select an algorithm, prepare your batch of data, train the model on the entire batch, check the accuracy of your predictions. You then fine tune your model by iterating your process and by tweaking your data, inputs and parameters. Most algorithms do not allow new batches of data to update and refine old models. So periodically you may need to retrain your models with the old and new data.
There are a number of benefits to the batch approach:
- Many ML algorithms to choose from. You have many more algorithms because that is typically how they are developed at the universities and the batch approach aligns with traditional statistics practices.
- Better accuracy. Since the batch represents the "known universe", there are many mathematical techniques which have been developed to improve model accuracy.
- Can be effective with smaller data sets. Hundreds or thousands of rows can results in good ML models. (Internally, many algorithms iterate over the data set to learn the desired characteristics and improve the results).
There are some advantages and a few drawbacks to the online learning approach.
Advantages:
- Big Data: Extremely large data sets are difficult to work with. Model development and algorithm training is cumbersome. With online learning, you can wrestle the data down to manageable sized chunks and feed it in..
- Small(er) RAM footprint. Obvious benefits of using less RAM.
- Fast: Because they have to be.
- Adaptive: As new data comes, the learning algorithm adjusts the model and automatically adapts to the changes in the environment. This is useful for keeping your model in sync with changes in human behavior such as click-thru behavior and financial markets etc. With traditional algorithms using a batch approach, the newer behavior is blended in with the older data so these subtle changes in behavior are lost. With online learning, the model continuously moves toward latest version of reality.
- It requires a lot of data. Since the learning is done as it goes along, the model accuracy is developed over millions of rows not thousands. (You should pre-train your model before production use, of course).
- Predictions as not as accurate. You give up some accuracy in the predictive powers of the model as a trade off for the speed and size of the solution.
Meanwhile, here are some interesting links to learn more:
http://en.wikipedia.org/wiki/Online_machine_learning
http://www.youtube.com/watch?v=HvLJUsEc6dw
http://www.microsoft.com/en-us/showcase/details.aspx?uuid=436006d9-4cd5-44d4-b582-a4f6282846ee
Enjoy !