Python Time Series Forecasting Tutorial (2024)

InfluxData sponsored this post.

A consequence of living in a rapidly changing society is that the state of all systems changes just as rapidly, and with that comes inconsistencies in operations. But what if you could foresee these inconsistencies? What if you could take a peek into the future? This is where time series data can help.

Time series data refers to a collection of data points that describe a particular system at various points in time. The time interval depends on the specific system, but typically, the data is arranged based on the date and/or time of each record. This means that the time series data of a system is a detailed record of its various states as time passes by. Each data point can be uniquely identified by its timestamp. Some may argue that all data is time series data, however, not all records are recorded in such a way that the details of a system’s transformation through time are maintained.

To maintain your system’s transformation, you need time series forecasting, which is the use of available time records to predict the state of a system at a later time. So given records of a system from time immemorial till yesterday, the process of predicting the state of the system today and tomorrow is time series forecasting. You can use time series forecasting to predict the weather or stock prices. It also helps in carrying out predictive maintenance in more industrial setups as well as predicting the usage of energy resources for proper management.

In this tutorial, you’ll learn more about time series forecasting using InfluxDBand how to build a time series forecaster to take a glance into the future.

InfluxData is the creator of InfluxDB, the leading time series platform used to collect, store, and analyze all time series data at any scale. Developers can query and analyze their time-stamped data in real-time to discover, interpret, and share new insights to gain a competitive edge.

Learn More

The latest from InfluxData

Understanding Time Series Forecasting

As stated previously, time series forecasting is the process of using stored timestamped records of a particular system’s past to predict what happens to it in the future.

Note that the past and the future are used more loosely during development. Given a specific reference point, all data points that were recorded before it are in the “past” and all data recorded after are referred to as the “future.”

Using the past to predict the future could sound tricky because you could ask, “How do you segment data into features and targets without a target column?” To start, a window is defined. This window essentially refers to how far back in time you need to look to make a prediction. This helps in setting a cutoff point beyond which the impact of data points on the prediction is negligible. Using this window, you slide across the data set to generate the training data.

Say you have data about a company’s sales for the year with the window being sixty days: Jan. 1 to Feb. 29. These 60 days will be used to predict the sales on March 1 (assuming a leap year). Next, you would use data from Jan. 2 until March 1 to predict the sales on March 2. This is progressively done until you’re considering Nov. 1 to Dec. 30 to predict sales on Dec. 31.

Aside from sales, time series forecasting can be used to predict the weather. With historical information about temperature, relative humidity and other weather-related parameters, the weather at a later date can be predicted.

Time series forecasting has also been used to predict stock prices of various organizations, given their historical data. Similarly, the prices of currencies, from standard currencies to cryptocurrency, can be predicted using time series forecasting.

On the more industrial end, with data about the working condition of equipment in all sorts of mechanical plants, future states can be predicted, which can help identify failure early so that maintenance can be done before pieces break down and halt operations. This is referred to as “predictive maintenance.”

Implementing Time Series Forecasting Using InfluxDB

To demonstrate how time series forecasting can be effectively carried out, this tutorial includes a walkthrough of the data preparation and modeling process using InfluxDB. To retain context, the sample problem that will be dealt with relates to forecasting a household’s energy consumption given data about their energy consumption measured over 15-minute time intervals.

Set up InfluxDB

To set up InfluxDB, navigate to the InfluxDB OSS documentation and click the Get started button. This is the open source version of InfluxDB, which can be set up on a local server. Follow the installation instructions on the Install InfluxDB page for your specific operating system (in this case, Windows) to install and start the InfluxDB OSS.

InfluxDB OSS installation instructions

At this point, you should have InfluxDB running on a local server. Enter the URL of this local server into your browser to access the InfluxDB interface. Then enter your name, password, organization and bucket name to complete the InfluxDB setup. Once completed, you’ll be taken to the InfluxDB OSS homepage:

Another option for getting started with InfluxDB without having to set up anything on your computer is to use a free InfluxDB Cloudinstance.

Load the Data into InfluxDB

With InfluxDB set up, now you need to load the data into the database. Begin by downloading the CSV data from this Kaggle page.

Before uploading a CSV file directly into InfluxDB, it must be annotated. Since this CSV file is not annotated, the InfluxDB Python clientwill be used to write the data to the database. This will give you a good idea of how data can be streamed into InfluxDB.

Next, navigate to the API Tokens tab and click the Generate API Token button if no API tokens exist yet. Then click the name of the newly created token to view and copy the all-access API token, as shown subsequently. This token will be used to authenticate your connection to InfluxDB from a client:

InfluxDB OSS generate token

With the API token, bucket and organization known, navigate to your preferred code editor and create a folder for this project. Then transfer the previously downloaded CSV file into a new data/ directory in this parent folder.

Create and activate a Python virtual environment named .venv using the following Windows commands:

1 2	python -m venv .venv .venv\Scripts\activate.bat

Next, install all the required libraries for this tutorial and set the API token as an environmental variable:

1 2	pip install pandas influxdb-client matplotlib pip install fbprophet

Once that is done, create a script and import the necessary libraries:

python

import os

from datetime import datetime

import pandas as pd

from influxdb_client import InfluxDBClient, Point, WritePrecision

from influxdb_client.client.write_api import SYNCHRONOUS

token = os.getenv("INFLUX_TOKEN")

organization = "forecasting"

bucket = "energy_consumption"

Here, you install the os module to load environment variables, the pandaslibrary to load the CSV file and the InfluxDB methods to facilitate the writing process. Next, load the API token, organization and bucket name into properly named variables.

Now you need to create the InfluxDB client and instantiate the write_API:

python

PORT = 8086

client = InfluxDBClient(url=f"http://127.0.0.1:{PORT}", token=token, org=organization)

write_api = client.write_api(write_options=SYNCHRONOUS)

df = pd.read_csv('data/D202.csv')

In the previous code, you define the PORT number your InfluxDB server is running on. Then you instantiate the InfluxDB client by passing in the URL of your running server, the API token and the organization name as parameters.

Next, the write_API method is called using the client instance previously defined. Finally, in this snippet, you load the CSV file as a Pandas DataFrame using the read_csv method. Here’s what the data set looks like:

Read the Data from InfluxDB

Next, you have to read the data that is stored on InfluxDB into a Python environment for training:

python

query_api = client.query_api()

query = f'from(bucket:"{bucket}")' \

' |> range(start:2016-10-22T00:00:00Z, stop:2018-10-24T23:45:00Z)'\

' |> filter(fn: (r) => r._measurement == "Electric usage")' \

' |> filter(fn: (r) => r._field == "usage(KWh)")'

Here, you call the InfluxDB client instance again, but this time, you pick the query API since your goal is to read from the database. Then you create the query. InfluxDB uses a scripting language known as Flux that is simple to use. In Python, you write the Flux query in a string. In this query string, you state the bucket that is to be read from. Next, you define the time range that you wish to query. Finally, you declare filters to select data points that contain the specified information in their measurement and field attributes. In this case, "Electric usage" measurements and the "usage(KWh)" field:

python

result = query_api.query(org=organization, query=query)

data = {'y': [], 'ds': []}

for table in result:

for record in table.records:

data['y'].append(record.get_value())

data['ds'].append(record.get_time())

print("here")

df = pd.DataFrame(data=data)

df.to_csv('data/Processed_D202.csv', index=False)

In this snippet, you parse the query using the `query_api.query` method alongside the organization. Then you create storage to hold the results and loop through the query results. As you iterate through, you collate the results in the earlier defined storage, which in this case is a Python dictionary. The processed data is saved as a CSV file in the `data/` directory when this is done.

Forecast with Prophet

With the data loaded from InfluxDB, the next step is to build the forecasting model. Facebook created and released Prophet, an appropriately named and performant library for time series forecasting in Python and R. Prophet is designed to handle outliers, variations of all kinds (seasonal, monthly and daily, amongst others) and missing data in any given time series. It also provides parameters that help with tuning the model to get better performance.

The input to a Prophet model is a DataFrame consisting of two columns, y and ds. y represents the variable of interest (energy usage), and ds refers to the datetime attribute. As you can see, the DataFrame columns in the previous section were not named randomly.

To get started, import the Prophet library and other libraries for data handling and visualization:

python

import fbprophet

import matplotlib.pyplot as plt

import pandas as pd

df = pd.read_csv('data/Processed_D202.csv')

df['ds'] = pd.to_datetime(df['ds']).dt.tz_localize(None)

df_copy = df.set_index('ds')

After importing the required libraries, read the processed data into a data frame, convert the time stamp column to a Datetime object and remove the time zone to avoid errors when plotting. Then create a copy of the data frame with the time stamp column set as the index:

python

df_copy.plot(kind='line',

xlabel='Datetime',

ylabel='Energy Consumption (KWh)',

)

plt.title('Household Energy Consumption over Time', fontweight='bold', fontsize=20)

plt.show()

Here, you use the copy of the DataFrame created to visualize the data, as shown subsequently. You can see that there is a spike in energy consumption toward the end of the year and early in the next year. This is a seasonal variation, which you expect any well-trained model to identify:

Plot of energy consumption data

python

energy_prophet = fbprophet.Prophet(changepoint_prior_scale=0.0005)

energy_prophet.fit(df)

energy_forecast = energy_prophet.make_future_dataframe(periods=365, freq='D')

energy_forecast = energy_prophet.predict(energy_forecast)

energy_prophet.plot(energy_forecast, xlabel = 'Date', ylabel = 'Energy Usage (KWh)') # 0.0005

plt.title('Household Energy Usage')

plt.show()

Next, instantiate the Prophet model and fit it to the data. In instantiating the Prophet, you pass the changepoint_prior_scale parameter to control the flexibility of the forecaster. Next, you create a test data frame for predicting with Prophet. This DataFrame is built over 365 days from the last day in the input data with the same interval observed in the input. It is then parsed into the model for prediction using the .predict method. Finally, you plot the result to see how the predictions match up with the training data. It was noted that smaller values of the changepoint_prior_scale parameters led to better predictions:

Forecast results

Here, you see that the predicted values roughly follow the trends that occurred in the previous years. You can also view the component trends of the forecast to understand how the energy consumption changes over a day, week or year:

python energy_prophet.plot_components(energy_forecast)

Forecast components

With information like this, you can then make important decisions about energy allocation or regulation of energy consumption.

Conclusion

In this tutorial, you learned about the importance of time-series data and forecasting. You also learned how to interact with InfluxDBvia the Python client as well as how to build a forecaster using Prophet.

With InfluxDB, you can use to manage time series data in applications and carry out analytics. Try out InfluxDB today

Learn More

The latest from InfluxData

Python Time Series Forecasting Tutorial (2024)

Understanding Time Series Forecasting

Implementing Time Series Forecasting Using InfluxDB

Set up InfluxDB

Load the Data into InfluxDB

Read the Data from InfluxDB

Forecast with Prophet

Conclusion

References