Astronomy surveys contain millions of objects causing a significant time requirement to do manual classification. Many stars remain unclassified. These stars are represented as a time series. In order to make new discoveries, we will need scalable Astroinformatics systems. We built a time series search engine that allows scientists to explore and isolate interesting observations to find similar time series in a database.
We use a kernelized cross-correlation distance metric to compare light curves (astronomy time series) based on their morphology. We pre-process the light curves by folding them based on their period and using interpolation and standardizing to produce evently spaced observations on a similar scale. Finally, we compute and maximize the distance between two light curves with a kernelized cross-correlation score using phase shifting.
Using the feature space of our time-series data, we can compute the weighted arithmetic mean as a distance metric. The feature space provides a scalable way to explore large time series databases. It allows us to enhance our similarity searches and identify interesting observations using a potentially lower-dimensional space. We can select weighted features based on a specific profile to encourage the search results.
We use parallel coordinates to explore the feature space of the databse where each column represents a feature. Different ranges can be selected to explore & isolate examples with which a similarity search can be based on.
Try it live (click the "Explore the Data" button)
First, select the features important for your desired search profile. Next, decide whether to compute similarity metrics using the standardized or non-standardized version of the light curves. Finally, search for the most similar in the database using a specific observation.
Try it live
The "target" light curve (which the search was based on) is shown in green. The left half of the screen contains a search results cluster at the top. The target is at the center, closer nodes are more similar based on the cross-correlation metric. Each node in the cluster has a bar to the right which expresses feature space distance to the target.
Below the cluster, a one-to-many comparison chart shows the target's overall shape compared to the top 30 most similar in the database. To the right, a one-to-one comparison is plotted showing the target directly compared to the most recently hovered object from the cluster.
The right half of the screen is a table of the feature space for the 30 most similar observations in the database. The most recently hovered node on the cluster is highlighted in blue on the table while the most recently hovered row on the table is highlighted in orange on the cluster.
You can click a node on the cluster or a row on the table to search with that observation as the new target.
Try it live (click "Get Time Series")
We built this time series search engine application to enable a target-based search in order to find similar time series in a database. We demonstrated its capabilities using two astronomy light curve surveys. To allow scientists to explore and isolate interesting examples, we provided parallel coordinates to probe the feature space and a visual analytics dashboard to interact with search results. Further, new light curves of interest can be selected for a new similarity search.
We feel that two improvements in particular are of the utmost importance. First, we are only using 2,000 total light curves in the current demo version due to scalability issues. We feel that utilizing parallelization and improving the database architecture are needed to scale with the billions of light curves expected to be available in the near future. Second, we have receieved positive feedback about this system but are always asked if new light curves can be added to the database. Ultimately, users should be able to upload their own light curves to see what they are similar to in a massive database containing all available astronomy surveys.