I am a Senior Applied Data Scientist at causaLens aiming to revolutionise the world of AI by stepping away from
traditional "Correlation" Machine Learning and focusing on the future: causal AI. Previously, I worked as a Data
Scientist developing Machine Learning models for credit decisioning at NewDay. I
completed my Master's in Artificial Intelligence from the University of St Andrews, during which I earned
the all-time top grade in the Machine Learning module with a 99% average and graduated with the
Dean’s List Award. Before my Master’s, I graduated from the University of Bath with a First-Class Honours
degree in Computer Science, which included a 13-month placement as a software engineer in the fast-paced world
of Formula 1 with the Scuderia Toro Rosso team.
My goal is to combine the practical, theoretical and mathematical knowledge that the fields of machine learning
and data science entail in order to develop meaningful and efficient solutions with direct contributions
to real-life problems and to society. As both a data scientist, I truly enjoy digging deep into the data I work
with to truly understand and build the best model possible that can be used in real-life scenarios.
As a software engineer, I truly enjoy fully automated pipelines covered by extensive test suites, well
version-controlled and documented code, analysing the data I'm working with, developing code using state-of-the
art tools and always learning more by exploring different project.
My main programming languages are Python, Java, web-based languages (JavaScript/HTML/CSS) and SQL. I also have experience programming in C, Bash, GodotScript, Haskell, MATLAB, Swift and Basic (AGKv2).
I have worked with a variety of ML tools such as Scikit-Learn, Pandas, LightGBM, XGBoost, NumPy, Matplotlib, Keras/Tensorflow, PyTorch, Seaborn, OpenCV, NLTK; and general frameworks such as Django, Flask, Boostrap, JQuery, Node.JS, Jekyll, Highcharts, D3.js.
A few of the tools that I have used on a daily basis include JetBrains IDEs (PyCharm, WebStorm, IntelliJ IDEA), JupyterLab, git (GitHub/BitBucket), Vim, Travis CI, Heroku, and different OS (macOS, Ubuntu, Fedora, Debian, Windows).
Revolutionising the world of AI by stepping away from traditional Correlation Machine Learning and focusing on causal AI. Building causal AI-powered solutions for leading organizations across a wide range of industries.
My Master's Thesis, where CNNs are used as part of a DL pipeline on the Curated Breast Imaging Subset of DDSM (CBIS-DDSM) dataset. A divide and conquer approach is followed to analyse the effects on performance and efficiency when utilising diverse deep learning techniques such as varying network architectures (VGG19, ResNet50, InceptionV3, DenseNet121, MobileNetV2), class weights, input sizes, image ratios, pre-processing techniques, transfer learning, dropout rates, and types of mammogram projections.
This project presents the design concepts and implementation steps of a content-based retrieval system for videos. It was originally inspired by the famous music-matching mobile application Shazam, with the aim to create a similar system for matching movies. Ultimately, a functional system was built by combining multiple methods into one pipeline and tested with a database of 50 short videos along with various videos recorded through mobile phones, resulting in correct matches reaching accuracies of 93%.
HacK-StacK is a web application designed to help people learn the basics of cyber security so they can defend themselves against online threats. Command-line and information security basics are covered through sets of increasingly challenging problems carried out by typing commands directly from the built-in HacK-StacK terminal. The users who complete the most challenges can reach the top of the leaderboards.
FCF Real-Estate is a catalogue to browse FCF Real-Estate Monaco's properties available for sale & renting. It features a single XML feed for the data, property sorting, system of favourites, a search engine, an offline mode, multiple language support, an HD image gallery and in-app emailing.
A retro blog built with Hexo and EJS dedicated to my personal hobbies and activities outside of work.
Predicting short-term solar irradiance using deep learning and statistical methods on the Folsom dataset.
My slides + code solutions to the exercises for the 2023 edition of the Stanford Code in Place online course that I am teaching for.
Building a 2D pixel platformer video game to learn game development. Built using the Godot Engine (3.5).
Python code and notes in the form of Jupyter Notebooks useful for general Machine Learning and Data Science projects.
Machine Learning & data visualisation/processing techniques for classifying seal pups from aerial imagery using Neural Networks.
A node.js application deployed on Heroku showing the spread of Coronavirus through visualisations designed in D3.JS.
Machine Learning & data visualisation techniques for predicting the critical temperatures of superconductors.
Part-of-Speech (POS) tagger for predicting POS tags in sentences from the Brown corpus using the Viberbi algorithm.
Ticketing-routing agent using neural networks to learn the data and how to submit new tickets based on optimal hyperparameters.
Implementation of a simple HTTP server in Java, supporting basic HTTP requests, binary images and multithreading.
Flight route planning agent using classic AI search algorithms (A*, Best-First, DFS, BFS).
The code for this very website, a one-page mobile-friendly website powered by Jekyll acting as an interactive online CV.
Meal tracker and rating iOS application built in Swift to learn the basics of the iOS programming language.
Flask web application parsing F1 crash statistics from Wikipedia and displaying them in charts and tables.
Image matching using intensity-based and feature-based template matching (SIFT) algorithms.
A wall-following rover built using a LeJOS EV3, touch sensors and ultrasonic sensors for an intelligent system research project.
Relaxation technique using POSIX threads (shared memory) and MPI (distributed memory).
One-page mobile-friendly website built with Jekyll and Bootstrap showcasing my online filmmaking projects.
A prototype space-shooter video game created in AGK BASIC v2 to learn the basics of game development through a 2D game engine.
Binary image classifier using Gaussian Mixture Models and computer vision-based feature extraction written in MatLab.
Text-based adventure game played in the terminal written in a functional way in Haskell.
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.metrics import RocCurveDisplay, roc_curve, auc from xgboost import XGBClassifier # xgboost version 1.7.6 import matplotlib.pyplot as plt # Data available on kaggle here https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009/ data = pd.read_csv('winequality-red.csv') data.head() # Separate targets X = data.drop('quality', axis=1) y = data['quality'].map(lambda x: 1 if x >= 7 else 0) # wine quality >7 is good, rest is not good # Split into training and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create model and fit params = { 'eval_metric':'auc', 'objective':'binary:logistic' } model = XGBClassifier(**params) model.fit( X_train, y_train, eval_set=[(X_test, y_test)] )
results = model.evals_result() plt.plot(np.arange(0,100),results['validation_0']['auc']) plt.title("AUC from eval_set") plt.xlabel("Estimator (boosting round)") plt.ylabel("AUC")
test_predictions = model.predict(X_test) fpr, tpr, thresholds = roc_curve(y_true=y_test, y_score=test_predictions,pos_label=1) roc_auc = auc(fpr, tpr) display = RocCurveDisplay(roc_auc=roc_auc, fpr=fpr, tpr=tpr) display.plot()
test_probabs = model.predict_proba(X_test)[:, 1] fpr, tpr, thresholds = roc_curve(y_true=y_test, y_score=test_probabs, pos_label=1) roc_auc = auc(fpr, tpr) display = RocCurveDisplay(roc_auc=roc_auc, fpr=fpr, tpr=tpr) display.plot()
Morphological transformations are some simple operations based on the image shape. It is normally performed on binary images. It needs two inputs, one is our original image, second one is called structuring element or kernel which decides the nature of operation. Two basic morphological operators are Erosion and Dilation. Then its variant forms like Opening, Closing, Gradient etc also comes into play. We will see them one-by-one with help of following image.You could use a closing operator, which is:
Closing is reverse of Opening, Dilation followed by Erosion. It is useful in closing small holes inside the foreground objects, or small black points on the object.The result would look something like this:
import cv2 import numpy as np img = cv2.imread('<path_to_your_image>',0) kernel = np.ones((5,5),np.uint8) closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
dict = {"a" : 1, "b" : 2}
if 'Dictionary key' == "a": return True else: return False
my_dict = {"a" : 1, "b" : 2} for key in my_dict: print(key)
a b
my_dict = {"a" : 1, "b" : 2} for key in my_dict: if key == "a": return True else: return False
my_dict = {"a" : 1, "b" : 2} print("a" in my_dict.keys())
my_dict = {"a" : 1, "b" : 2} for key in my_dict: print(my_dict[key])
1 2
words = ['hello', 'world', 'name', '1', '2018']
words = ['hello', 'world', 'name', '1', '2018'] valid_years = {str(x) for x in range(2000,2021)} for word in words: if word in valid_years: print word
Sets let you test for membership in O(1) time, using a list has a linear O(length_of_list) costAs you can see in the comments, there are a lot of different ways of generating the set of valid_years, as long as your data structure is a Set you will have the fastest way of doing what you want.
When support for time zones is enabled, Django stores datetime information in UTC in the database, uses time-zone-aware datetime objects internally, and translates them to the end user’s time zone in templates and forms.
Time zone support is disabled by default.
# enable time zone support USE_TZ = True # select a timezone TIME_ZONE = 'Europe/Rome'
django.setup() stdout = sys.stdout conf = [ { 'file': 'myfile.yaml', 'models': [ dict(model='your.model', pks='your, primary, keys'), dict(model='your.model', pks='your, primary, keys') ] } ] for fixture in conf: print('Processing: %s' % fixture['file']) with open(fixture['file'], 'w') as f: sys.stdout = FixtureAnonymiser(f) for model in fixture['models']: call_command('dumpdata', model.pop('model'), format='yaml',indent=4, **model) sys.stdout.flush() sys.stdout = stdout
from django.test import TestCase class classTest(TestCase): fixtures = ('myfile.yaml',) def setUp(self): """setup tests cases""" # create the object you want to test here, which will use data from the fixtures def test_function(self): self.assertEqual(True,True) # write your test here