May 18 2017
IGC Ghent - International Convention Center Ghent
Van Rysselberghedreef 2 - Citadelpark
9000 Gent, Belgium
Following last year’s success, we attended FlandersBio’s Life Sciences conference Knowledge for Growth at the ICC in Ghent again this year. (Curious to see what we did last year? Check out our blogpost ) However, unlike last year, this year we not only held one of the booths on the main floor, we also hosted a Satellite Workshop. The workshop’s main themes: Machine Learning, Amazon Web Services, and High Performance Computing. For more information on AWS, read our three-post-series on the AWS summit XAOP attended last year.
In order to demonstrate what these themes are and how they work, we set up a small website that functions on these in tandem. In particular, we wanted to provide our participants with an introductory showcase by demonstrating the technical possibilities of these principles and services. We used Machine Learning as our central focal point, with the technical setup and backend using AWS and HPC.
Using PubMed abstracts ranging from 2010-2017 as our public data source, we set up a search engine that returns words related to the word searched for using the Word2Vec model. Word2Vec is a model that creates word embeddings from a corpus, which in our case are PubMed abstracts. When words are in a vector space, an interesting result occurs. Given a word, the model returns words that are related (by using a distance metric, e.g. cosine similarity) thereby providing the user with some context.
The website that shows our demo is fully operational on AWS. Feel free to give it a try here:
Each related request is passed to a model on an EC2 instance. The training of this model was done on AWS by using autoscaling. This cut down tremendously on the training time, since one fully trained model takes about 48 hours and by using EC2 instances, we could train different models in parallel using different training parameters. This gave us multiple models with different settings, all in 48 hours.
The goal of this model was to find out how we could return other suggestions for related words without returning similar words like PubMed currently returns its similar suggestions. Related in the sense that they have something in common but without being too obvious - for example, searching for depression in PubMed gives us results like postpartum depression, major depression, .. while our model returns insomnia, mdd, anxiety, … The model we created gives users the advantage of being able to broaden their search terms with some context, instead of narrowing them.
We’re already looking forward to next year’s Knowledge for Growth conference. In the meantime, here are some snapshots of our XAOP team present at KFG2017.