Module 1: The Role of a Data Scientist: Combining Technical and Non-Technical Skills
What is the required skillset of a Data Scientist?
Combining the technical and non-technical roles of a Data Scientist
The difference between a Data Scientist and a Data Engineer
Exploring the entire lifecycle of Data Science efforts within the organisation
Turning business questions into Machine Learning (ML) and Artificial Intelligence (AI) models
Exploring diverse and wide-ranging data sources that you can use to answer business questions
Examine the difference between Generative AI and Discriminative AI
Module 2: Data Manipulation and Visualisation using Python's Pandas and Matplotlib Libraries
Introducing the features of Python that are relevant to Data Scientists and Data Engineers
Viewing Data Sets using Python’s Pandas library
Importing, exporting, and working with all forms of data, from Relational Databases to Google Images
Using Python Selecting, Filtering, Combining, Grouping, and Applying Functions from Python's Pandas library
Dealing with Duplicates, Missing Values, Rescaling, Standardising, and Normalising Data
Visualising data for both exploration and communication with the Pandas, Matplotlib, and Seaborn Python libraries
Module 3: Preprocessing and Analysing Unstructured Data with Natural Language Processing
Preprocessing Unstructured Data such as web adverts, emails, and blog posts for AI/ML models
Exploring the most popular approaches to Natural Language Processing (NLP), such as stemming and ``stop`` words
Preparing a term-document matrix (TDM) of unstructured documents for analysis
Look at how Data Scientists can integrate Large Language Models (LLMs) in their work
Module 4: Linear Regression and Feature Engineering for Business Problem Solving
Expressing a business problem, such as customer revenue prediction, as a linear regression task
Assessing variables as potential Predictors of the required Target (e.g., Education as a predictor of Salary Build)
Interpreting and Evaluating a Linear Regression model in Python using measures such as RMSE
Exploring the Feature Engineering possibilities to improve the Linear Regression model
Module 5: Classification Models and Evaluation for Predictive Analysis
Learning how AI/ML Classifiers are built and used to make predictions such as Customer Churn
Exploring how AI/ML Classification models are built using Training, Test, and Validation
Evaluating the strength of a Decision Tree Classifier
Module 6: Alternative Approaches to Classification and Model Evaluation
Examining alternative approaches to classification
Considering how Activation Functions are integral to Logistic Regression Classifiers
Investigating how Neural Networks and Deep Learning are used to build self-driving cars
Exploring the probability foundations of Naive Bayes classifiers
Reviewing different approaches to measuring the performance of AI/ML Classification Models
Reviewing ROC curves, AUC measures, Precision, Recall, and Confusion Matrices
Module 7: Clustering Techniques for Customer and Product Segmentation
Uncovering new ways of segmenting your customers, products, or services using clustering algorithms
Exploring what the concept of similarity means to humans and how you can implement it programmatically through distance measures on descriptive variables
Performing top-down clustering with Python’s Scikit-Learn K-Means algorithm
Performing bottom-up clustering with Scikit-Learn’s hierarchical clustering algorithm
Examining clustering techniques on unstructured data (e.g., Tweets, Emails, Documents, etc.)
Module 8: Association Rules and Recommender Systems for Business Applications
Building models of customer behaviours or business events from logged data using Association Rules
Evaluating the strength of these models through probability measures of support, confidence, and lift
Employing feature engineering approaches to improve the models
Building a recommender for your customers that is unique to your product/service offering
Module 9: Network Analysis for Organisational Insights
Analysing your organisation, its people, and its environment as a network of inter-relationships
Visualising these relationships to uncover previously unseen business insights
Exploring ego-centric and socio-centric methods of analysing connections critical to your organisation
Module 10: Big Data Analytics, Communication, and Ethics
Examining Cloud (Microsoft, Amazon, Google) approaches to handling Big Data analytics
Exploring the communications and ethics aspects of being a Data Scientist
Discuss the ethical implications of recent developments in AI
Surveying the paths of continual learning for a Data Scientist