What does a data scientist do? An Interview with Dr. Liv Aleen Remez, Data Scientist, SecBI
Liv holds a doctorate in biology, specializing in genetics and bioinformatics, and is currently working at SecBI as Data scientist.
How did you get into this field?
During my PhD I realized that I was connected more to the world of computers and programming than biology.
When I finished it, I heard there was a field of Data Science in the market, which is something between programming and researching, and in general it was easier to find work because the demand in high tech is much higher than in biology.
How did you find the transition from biology into the cyber world?
When I arrived, I knew how to investigate biological systems. Cyber is a completely different field, but my research skills are reflected in my work – I have data and I need to research it, identify patterns, work out and understand what can be done. This is true across all domains.
What did you have to do and learn?
There was a basic algorithm when I started, but it wasn’t built for scale or efficiency. Since we analyze large quantities of data (one customer had nearly Billions of log events we analyzed) we needed to redesign the algorithm to work effectively and still generate accurate results.
The specific algorithm I’m dealing with at SecBI is called “Cluster Analysis”- and a similar type of algorithm is used in genome comparison, so I wasn’t totally lost…
What challenges did you have to overcome?
I’m a Data scientist but not a domain expert, so it was hard for me to qualify the results and understand what makes sense. I was aided by SecBI CTO which helped me refine the algorithm. Once the Algorithm is up and running, now it’s a matter of refinement, adding analysis capabilities and ongoing analysis- what we refer to as “cluster evolution”.
How were you able to develop a cyber-specific algorithm?
I have worked closely with Alex, SecBI CTO. He’s the domain expert and an experienced analyst. I’ve designed the algorithm to emulate how he thinks and operates. The algorithm uses far more computing power so is able to conduct multiple queries with higher accuracy.
Alex’s feedback on the results helped me fine-tune the algorithm to produce results which make sense to an analyst.
So what is “Cluster Analysis”
Clustering is actually grouping according to certain parameters. Every day terabytes of data are added, and the algorithm needs to decide whether it belongs to an existing data set. If this doesn’t work we would be able to scale or get skewed results. New data points simply update existing clusters.
How can this algorithm be improved?
We’re working on updating the algorithm and adding capabilities all the time. Our roadmap includes adding data sources, comparing against known databases of virus signatures, Improve prioritization of tasks and to Obtain more data from the logs according to expected behavior
What is the greatest challenge of cybersecurity in your eyes?
Cybersecurity is a complex and evolving problem. Most solutions attempt to identify known patterns but the requirement today is to identify new patterns and attack types. To do so today, an analyst writes one big query with multiple “Ifs”, then runs it and analyzes the results. We developed a machine-learning algorithm called “cluster analysis”. It automatically groups data together to one cluster which reduces time and false- positives.
In the eyes of a data scientist, cybersecurity is a Multi-dimensionality problem; Data-translates into a mathematical problem, but when dealing with multi dimensions all distances are relative.
SecBI algorithm looks at a certain aspect of the data set(s) – location, address times. If we’ll do our job right, the solution will not only crunch large amounts of data by also reduce False positives- accurate clustering will result in better detection and less alerts.
The future of Data Scientist?
The profession is about 5 years in existence and the demand is only increasing. As more and more organizations are dealing with greater amounts of data I think there will be a growing demand for more people like me.
How about the role of Women in science / cybersecurity – there are few of us, not really enough as there should be. We really need to work in order to change that…
What other things our audience needs to know about you? – I love sports and animals.