One of the many advantages of Python is its abundant and often powerful Libraries. For my research, besides plotting maps, I often play with time series. When it comes to manipulating and plotting time series, no other tools can beat python pandas.
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
At the core of Pandas are the data structures: Series, DataFrame and Panel. The ones I used the most are the first two. A Series is an array labeled with timestamps, and a DataFrame consists of many Series. In a real-world use case, I use pandas to generate a range of time-axis, which is then attached to my Agulhas leakage time-series. After doing that, the value at a specific timestep can be easily retrieved by calling
Series['timestamp']. And to plot the whole time series is as simple as
For a DataFrame, to see the key statistics of a DataFrame with many columns, simply use
DataFrame.describe(). A table with mean, standard deviation, counts, and percentiles will then pop up. To compare multiple time series visually, naively put
Working with missing data
Recently, I am calculating the Atlantic Ocean Heat Content (OHC).
#headers=['date','OHC2000','OHC300','OHC700'] OHC_multilevels=DataFrame.from_csv('OHC_HRC07_1951-2002.csv') # If it's pandas generated, this is much easier. OHC_multilevels.plot()
Obviously, something fishy happened near 1952 and again in 1971. Several months have values close to zero, which is unlikely. Going back to the data, I confirmed that the temperature and salinity fields of those months are missing. To clean up the time series, I first assigned
None to those months, and interpolate linearly using the neighboring months. Three time series in the same DataFrame are processed using following two lines.
This is just a glimpse of the awesomness of pandas. More details can be found in the official documentation.