Statistical Analyst Projects

Most of my work at RootMetrics cannot be shared but I will endeavor to share as much as I can be it R packages, project descriptors, or bare repositories.

Fast Analytics

Fast Analytics is an R package that stemmed from performing small experimental analysis that all had roughly the same structure. One option was to make a set of monolithic functions that had a bunch of arguments for different experimental parameters (e.g. split on carrier, aggregate over devices) however this would always require adjustment for new experiments. Instead Fast Analytics takes a minimal set of parameters and output an R script containing the necessary setup parameters (database connections, other libraries), data pulls, processing, and plotting. All of the plots are simple and can then be adjusted in the new script to fit the particular experiment.

As a bonus there is support to also run the script upon generation and compile the resulting figures into a Beamer slide deck.

PhD Projects

Here are a few of my coding projects that I have made during my PhD research, a few are directly supporting WWLLN, others are new WWLLN products, and others are unrelated to my research. All of them are available on GitHub as repositories or as folders/files in existing repositories. Due to now working in industry my work since my PhD is not available publicly on GitHub.

Whistler Detector using Neural Networks

An ongoing problem with many scientific experiments is the deluge of data that cannot be easily stored or processed. Currently WWLLN stations do not save the wideband VLF waveform data nor do they have the bandwidth to send it to a central service, as a result it is discarded after a few days. We know that the stations can detect other phenomena aside from lightning, for example whistler waves. In the past manual inspection of the spectrograms from select stations was used to create a database of ground observed whistlers. I have created a better whistler detector using a neural network.

Neural network implementation of a Whistler wave classifier.

I trained the network with a manually collated archive of whistlers along with randomly sampled spectrograms. The training was performed in MATLAB with general code that can work for training on any collection of 2D images and specified network parameters. The trained network is implemented in a python program to scan broadband VLF data files for potential whistlers. The system will be deployed at remote WWLLN stations to start automatic collection and archiving of whistlers at multiple locations.

Real-Time Lightning Map

WWLLN produces realtime lightning locations with a delay of less than a second between the lightning occurring and the network locating the stroke. I created a visualization of the real time data using the Google Maps API that shows the locations at a 10-minute delay from real time. Due to contractual constraints with the data only a small slice of archival data can be shown for those who are not subscribed to the real-time WWLLN data feeds. Currently the demo is available on the beta version of the new WWLLN website.

Clustering

I implemented the Density Based Spatial Clustering with Application and Noise (DBSCAN) algorithm in MATLAB to work with the WWLLN lightning dataset to cluster lightning strokes into lightning flashes and thunderstorms. The implementation includes the general algorithm and one tailored towards the WWLLN data. The clustering was validated against precipitation radar data to show that WWLLN can find and track thunderstorms over the course of their lifespan. Thunderstorm clustering enables further research into the relationship between thunderstorm paramaters and the lightning in the thunderstorm, such as the relation between area and flash rate.

Gumstix Integration

As part of integrating the Gumstix ARM computers into the WWLLN service units I needed to write several programs to automate OS creationg and to enable interactions between the computer and the components of the system. makeSD.sh is a basic shell script to format and load microSD cards with a customized Ångstrom Linux distribution for the stations. The process includes loading a new kernel and subsequent modules to enable netfilter and iptables, a few minor distribution fixes, and loading WWLLN settings onto the machines.

readTSIP.py and sendTSIP.py are a set of routines to read and send Trimble TSIP commands to GPS units. Both currently rely on the module tsip.py which incorporates the functions to open the serial line, read the binary data, and interpret the messages. Testing has only been conducted with the Trimble ResolutionT, which was recently transitioned into an End of Life product.

Support Projects

LWPC Implementation

LWPC.m contains two functions, LWPC.m and LWPCpar.m, that run the U.S. Navy Long Wave Propagation Capability code directly from MATLAB. LWPCpar is a parallelized version of LWPC meant to be run with the MATLAB Parallel Computing Toolbox. The program needs to be run with an altered version of the LWPC v2.1 code and is focused on reporting electric field at a location given a 100kW transmitter at another location. It can be run at a given frequency from 8-18 kHz for either an all day, all night, or mixed path ionosphere (which depends on the input time). The code does not run very fast (due to limitations of LWPC itself) and is geared towards lightning frequencies and propagation in the Earth-ionosphere waveguide.

MATLAB Functions

Most of my research, for various reasons, has been performed using MATLAB. I have slowly built up a collection of MATLAB functions for performing routine tasks including reading/writing WWLLN files. For example there is a function that implements vectorized day/night terminator calculations.

Other Projects

Website Development

I have been developing two websites, this one and an updated WWLLN website (currently in beta). The WWLLN website contains a publicly available version of the WWLLN real time data map.

Android Applications

Recently I started writing simple Android applications as part of an effort to build up to a larger application. Currently I only have the single repository for the small sample applications, eventually one will be added for the more comprehensive program.