DANAOS Waves for Data Science

8 min readNov 3, 2023

DANAOS Waves for Data Science

Waves is a Module, part of the Danaos Suite that can be used to perform data science work on your data. Briefly, you can load your data into a relational database, visualize the data, perform analytics, build machine learning models, evaluate them, perform anomaly detection, outlier detection, trend identification, forecasting and more. You can do that on data from your vessel, the engine, your office, any sensor and basically, it is a data agnostic tool. Importantly, you can merge data from many disperse data sources, databases and silos and perform data science as if it was one ready dataframe. It is the perfect tool for the analytics professional and let us dive into it!

Data Preparation

As an analyst or support engineer, you receive data from companies in all sorts of formats. Let us focus on a use case where you receive data from a vessel’s sensors. This data may come to you through an API that you need to consume as a client, may be sent to you as a csv file, an excel, a .dat file, an xml, a json and even within each file type the data may need to be pivoted, may need to be transformed and may be in different formats for every different vessel. This is a problem that any data science tool in the market faces, the problem of dirty data. Therefore, it is paramount that an initial script shall be written as soon as you receive a new data file, to convert this data into a common SQL table like format. You then load the data into SQL tables and in our DANAOS case to Oracle tables, and have it ready to be used by the Waves data science platform. Once the data is in Oracle tables where every column is one sensor, where you include a column that specifies the IMO of the vessel, and where you have a DATE column (because the data from vessel sensors are time series data), you go to Waves and map these Oracle columns to MEASUREMENTS.

Above you can see that you name a Measurement MEFOFLOW (which corresponds to the Fuel consumption flow meter of the engine of a vessel and is measured in Metric Tons per day) and below that, you set up where is the Oracle table where the actual data is stored. You effectively create a SQL query by populating the fields “table”, “date field” etc. You can create as many measurements as you want from all sorts of different databases and tables and all be seamlessly blended in the analytics modules below.

Data Visualization

You can then go to Data Graph in Waves and visualize the measurements of a vessel of your choice. We are currently focusing on a fuel consumption use case so we visualize data from sensors that are related to fuel consumption:

You can eyeball the data and see that for example Speed Over Ground, Power and Fuel Consumption have time series that are very correlated or that the range of values for fuel consumption is 0 to 38 metric tons per day. Alternatively, you can change axis and see a scatter plot of Fuel Consumption vs Speed Over Ground, where the nonlinear relationship between fuel consumption and speed can be detected.

Once you have a visual understanding of the data, have checked for any outliers, have visually checked which date periods have enough data, you can go ahead and use the machine learning tool of the Waves module to build a fuel consumption model.

Neural Network Training

We saw in Data Graph that we have enough data from June, July and August 2023 so we should use this date range to build a model:

You go to Neural Network Training (new) under Vessel Performance and the above screen opens. You select vessel for which you want to build a model, name the model (in this case NEW_MODEL_FOR_DEMO_ML), select the date range we described and choose the measurements that will be used as features in the model. Importantly, in order to define what the target (dependent) variable of the model is, you just place it last in the list of Input arguments. In the case above, the MEFOFLOW is the target variable (the one we are trying to predict with our model) and the other measurements are the features. Before clicking the Train button, we click the estimated correlations button to see the correlations of the features to the target variable. We then click “train”. Behind the scenes, we have an Auto ML pipeline that for the given dataset tests many different machine learning and deep learning algorithms and selects the one with the least error in a hold out set. In addition, it removes outliers, does scaling and more to ensure that the output model is robust. Once the model is trained, we go again to Data Graph in order to evaluate the model. To do that, for the given vessel we select to visualize only two measurements: the actual values of the fuel consumption for a given period, and the predictions of fuel consumption model we just created for the same period. We want these two measurements to be as close as possible, which would imply very high accuracy for our model. The measurement name for the predicted values will be the name we gave our model above i.e. NEW_MODEL_FOR_DEMO_ML. Here is how the visualization looks like:

The blue line is the predicted values of fuel consumption given our features (speed over ground, draft etc) and the orange time series is the actual — true values. So, the model does pretty well! In places where the model has error, you can visualize the features’ time series to see if the error is due to extreme values in one of the features. Again, you can use this capability to train any model you like from measurements stored in your database.

Anomaly Detection

Another thing you can do in the Waves module is anomaly detection. This means that you can use it to detect if a sensor suddenly is acting oddly implying either a problem in the sensor itself or the underlying thing being measured. This is very useful to detect problems in the vessel engine very fast. So, you need to go to the Anomaly Detection program under Vessel Performance, and input the measurements and the date range that you want to analyze.

So, what we do is select some measurements for a period that we know our data is correct without any anomalies and call that the benchmark period. The months of June, July and August above is one such period. We then select the data, in our case we select rpm related data (STW, SPEEDOVER GROUND, MEFOFLOW, POWER, RPM) and run Train for that period. The tool identifies that all these measurements are highly correlated and places them all in one group. If we also input weather data for example, these would be placed in a different group as they are not so correlated to RPM. So, in the initial benchmark stage of anomaly detection, groups of measurements are formed with highly correlated measurements in each group. In the example above only one group was found. Now, imagine we run the same group detection algorithm for September. As you can see in the data graph pane in the first page of the document, the fuel consumption measurement (in pink) and power measurement (in green) are flat for September for whatever reason. This means that there is an anomaly in our data as this is not normal. So how would the anomaly detection tool help us detect that? We would run the same “Train” job above for the month of September and then compare the two periods. In the screen below, we can see that for September the group is missing the Power and Fuel Consumption measurements as we expected given the visualization in data graph above. But the tool offers the “find anomaly” button below to help you find the differences in the groups, which are shown below in RED.

So, to bring it all together, you can use the Anomaly detection tool to find that the MEFOFLOW measurement is missing for the month of September, then build a model that predicts MEFOFLOW in the Neural Network Training module, and use it to predict the missing MEFOFLOW values for the month of September given the other measurements as features (speed etc.). Instead of using a multivariate model though, you can also forecast the missing values using another tool in Waves called the Time Series Forecast tool.

Time Series Forecast

This tool takes a time series and performs a forecast. Behind the scenes, we test many different parameterizations of ARIMA seasonal and regime-switching models to find the one with least error. The screen of this module looks like this below. You select vessel, date range for the training period, date range for the prediction period, the measurement in question and click Forecast. As you can see, the forecasted values are within the 0 to 38 fuel consumption range which is consistent with the past values observed in the graphs above. Also, the forecast switches regimes between 0 fuel consumption and 20–35 fuel consumption, which means that the model has understood from past data that the vessel stops and travels iteratively.

Find Trend

Additionally, in Waves we have the “find trend” module, which takes in a time series and returns a trend that is found in this time series i.e. it performs time series decomposition. In the Wave Height measurement for example from the end of July to end of August, the trend is:

Outlier Filtering

The outlier filtering module in Waves, takes any time series and removes outliers. In the graph below you can see the Wave height measurement without outliers and you can also compare it with the trend identified in the “find trend” module above to better understand what the trend means: it is basically a smoothed version of the original time series.

So, these are the capabilities the Waves tool offers that are a perfect fit for a data scientist. Such model creation modules can also be found in the Weather Navigator product, with additional model evaluation checks custom to the fuel consumption prediction use case.

Written by Dimos Anagnostopoulos