I am a Senior Data Scientist at causaLens aiming to revolutionise the world of AI by creating cutting-edge AI
Agents to enable data science teams to develop faster and more effectively, and by stepping away from
"traditional correlation" Machine Learning and focusing on causal AI. On the side, I am also building small SaaS
startups, and am currently working on GlobeFlix.
Previously, I worked as a Data Scientist at NewDay developing Machine Learning models for credit decisioning. I hold a
Master's degree in Artificial Intelligence from the University of St Andrews, during which I earned the all-time top grade
in the Machine Learning module with a 99% average and graduated with the Dean’s List Award. Prior to
this, I earned a First-Class Honours degree in Computer Science from the University of Bath, which included a
13-month placement as a software engineer with the Formula 1 team Scuderia Toro Rosso.
My goal is to combine the practical, theoretical and mathematical knowledge that the fields of machine learning
and data science entail to develop meaningful and efficient solutions with direct contributions to real-life
problems and to society. As a software engineer, I truly enjoy creating fully automated pipelines covered by
extensive test suites, well version-documented and version-controlled code, analysing complex datasets, and
leveraging state-of-the art tools to deliver impactful results.
My main programming languages are Python, Java, web-based languages (TypeScript/JavaScript/HTML/CSS) and SQL. I also have experience programming in C, Bash, GodotScript, Haskell, MATLAB, Swift and Basic (AGKv2).
I have worked with a variety of ML tools such as Scikit-Learn, Pandas, LightGBM, XGBoost, NumPy, Matplotlib, Keras/Tensorflow, PyTorch, Seaborn, OpenCV, NLTK; and general frameworks such as NextJS, React, Django, Flask, Bootstrap, Hexo, Jekyll, Highcharts, D3.js.
A few of the tools that I have used on a daily basis include LLMs, JetBrains IDEs (PyCharm, WebStorm, IntelliJ IDEA), JupyterLab, git (GitHub/BitBucket), Vim, Travis CI, Heroku, and different OS (macOS, Ubuntu, Fedora, Debian, Windows).
Revolutionising the world of AI by creating cutting-edge AI Agents to enable data science teams to develop faster and more effectively; and by stepping away from traditional Correlation ML and focusing on causal AI to buikd causal AI-powered solutions for leading organizations across a wide range of industries.
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.metrics import RocCurveDisplay, roc_curve, auc from xgboost import XGBClassifier # xgboost version 1.7.6 import matplotlib.pyplot as plt # Data available on kaggle here https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009/ data = pd.read_csv('winequality-red.csv') data.head() # Separate targets X = data.drop('quality', axis=1) y = data['quality'].map(lambda x: 1 if x >= 7 else 0) # wine quality >7 is good, rest is not good # Split into training and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create model and fit params = { 'eval_metric':'auc', 'objective':'binary:logistic' } model = XGBClassifier(**params) model.fit( X_train, y_train, eval_set=[(X_test, y_test)] )
results = model.evals_result() plt.plot(np.arange(0,100),results['validation_0']['auc']) plt.title("AUC from eval_set") plt.xlabel("Estimator (boosting round)") plt.ylabel("AUC")
test_predictions = model.predict(X_test) fpr, tpr, thresholds = roc_curve(y_true=y_test, y_score=test_predictions,pos_label=1) roc_auc = auc(fpr, tpr) display = RocCurveDisplay(roc_auc=roc_auc, fpr=fpr, tpr=tpr) display.plot()
test_probabs = model.predict_proba(X_test)[:, 1] fpr, tpr, thresholds = roc_curve(y_true=y_test, y_score=test_probabs, pos_label=1) roc_auc = auc(fpr, tpr) display = RocCurveDisplay(roc_auc=roc_auc, fpr=fpr, tpr=tpr) display.plot()
Morphological transformations are some simple operations based on the image shape. It is normally performed on binary images. It needs two inputs, one is our original image, second one is called structuring element or kernel which decides the nature of operation. Two basic morphological operators are Erosion and Dilation. Then its variant forms like Opening, Closing, Gradient etc also comes into play. We will see them one-by-one with help of following image.You could use a closing operator, which is:
Closing is reverse of Opening, Dilation followed by Erosion. It is useful in closing small holes inside the foreground objects, or small black points on the object.The result would look something like this:
import cv2 import numpy as np img = cv2.imread('<path_to_your_image>',0) kernel = np.ones((5,5),np.uint8) closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
dict = {"a" : 1, "b" : 2}
if 'Dictionary key' == "a": return True else: return False
my_dict = {"a" : 1, "b" : 2} for key in my_dict: print(key)
a b
my_dict = {"a" : 1, "b" : 2} for key in my_dict: if key == "a": return True else: return False
my_dict = {"a" : 1, "b" : 2} print("a" in my_dict.keys())
my_dict = {"a" : 1, "b" : 2} for key in my_dict: print(my_dict[key])
1 2
words = ['hello', 'world', 'name', '1', '2018']
words = ['hello', 'world', 'name', '1', '2018'] valid_years = {str(x) for x in range(2000,2021)} for word in words: if word in valid_years: print word
Sets let you test for membership in O(1) time, using a list has a linear O(length_of_list) costAs you can see in the comments, there are a lot of different ways of generating the set of valid_years, as long as your data structure is a Set you will have the fastest way of doing what you want.
When support for time zones is enabled, Django stores datetime information in UTC in the database, uses time-zone-aware datetime objects internally, and translates them to the end user’s time zone in templates and forms.
Time zone support is disabled by default.
# enable time zone support USE_TZ = True # select a timezone TIME_ZONE = 'Europe/Rome'
django.setup() stdout = sys.stdout conf = [ { 'file': 'myfile.yaml', 'models': [ dict(model='your.model', pks='your, primary, keys'), dict(model='your.model', pks='your, primary, keys') ] } ] for fixture in conf: print('Processing: %s' % fixture['file']) with open(fixture['file'], 'w') as f: sys.stdout = FixtureAnonymiser(f) for model in fixture['models']: call_command('dumpdata', model.pop('model'), format='yaml',indent=4, **model) sys.stdout.flush() sys.stdout = stdout
from django.test import TestCase class classTest(TestCase): fixtures = ('myfile.yaml',) def setUp(self): """setup tests cases""" # create the object you want to test here, which will use data from the fixtures def test_function(self): self.assertEqual(True,True) # write your test here