Data conversations are a major one today as a result of enormous amounts of data being generated every minute and every second. Data is being generated effortlessly in our day-to-day activities like creating/liking a post on social media, purchasing or carting a product on an e-commerce store, receiving treatment in a health care facility, moving from one place to another and so on. Hence, there is a need to effectively maximise the potential of data. This study is called Data Science.
Data Science is a field of study that involves making the most out of any sample of data. It involves all processes(statistics, mathematics and programming) and actions that lead to the extraction of meaningful and useful insight from raw data to solve real-world problems. It is a powerful tool for businesses and organisations, it enables them to make well-informed decisions considering what has happened in the past. Data science allows us to see the reason for a particular action which can then be repeated if wanted and can be avoided if unwanted. It explains the present condition of a company or organisation and also predicts its future condition.
Some of the popular data science subfields are data analytics, machine learning, artificial intelligence etc.
Data analytics: this is the process of probing into a large amount of raw data and extracting reasonable, meaningful and actionable insight according to the underlying questions. Businesses and companies don’t just want to make business decisions, they only want to make prospective decisions that can and will only drive their business growth, Hence, they have questions that need to be strategically answered. Data analysts provide answers to those questions.
Machine Learning: this is the part of AI(Artificial Intelligence) that focuses on developing systems that can learn and improve based on the data fitted into them. They do this by identifying trends, patterns and relationships between one or more variables in the datasets which then help them to recognize and make predictions with new data. Machine learning has been used in the development of solutions like search engines, fraud detection, disease prediction, detection and treatment, product recommendation etc.
Artificial Intelligence: this is the technique used for developing systems that are capable of learning and making decisions without any human intervention as a result of the training they have been subjected to. In other words, AI are systems capable of simulating the capabilities of humans. A closely related field of AI is deep learning which is the part of machine learning that focuses on the development of systems that simulates the working principle of the human brain to learn from advanced data(images, texts, sounds, visuals, time series, etc). Though it requires a large volume of data to train among other requirements, it is a better and more advanced method of building a model. A popular example of AI in our world today is the self-driving car.
Get started with free resources
1. Python: learning the python programming language is a good place to start your data science journey. Python is the most used and most popular programming language used for data science and AI due to its robust libraries and frameworks that support the easy implementation of data science projects.
Resources:
- Introduction to Python Programming(Udacity)
- Introduction to Python(Datacamp)
- Python for Data Science(GreatLearning)
- Python(Kaggle)
2. SQL(Structured Query Language): this is used to communicate with the database.
Resources:
- SQL for Data Science(GreatLearning)
- SQL for Data Analysis(Udacity)
- SQL for Data Science(EDX)
- SQL Tutorials(Mode)
3. Statistics: as stated above, major data scientists’ skills revolve around statistics and programming. hence, a data scientist should possess some level of knowledge in statistics.
Resources:
- Statistics for data science(Analyticsvidhya)
- Introduction to Data Exploration and Distribution in Statistics
- Statistics for Data Science(GreatLearning)
- Statistics for Data Science(Freecodecamp)
4. Data Science fundamentals: these courses should cover data collection, exploratory data analysis and data cleaning.
Resources:
- Intro to Data Science(Udacity)
- Data Science Fundamentals(cognitive class.ai)
- Data Science Basics for Absolute Beginner(GreatLearning)
5. Advanced data science courses(machine learning and deep learning)
Resources:
- Machine Learning(Simplilearn)
- Machine Learning Crash Course(Google)
- Machine Learning(Coursera)
- Machine Learning Scientist with Python(Datacamp)
6. Model Deployment: the popular frameworks and tools used for deployment are flask, Django, streamlit e.t.c
Resources:
- Machine Learning Model Deployment Using Flask(Microsoft Power Tools on Youtube)
- Learning Django by building a Project(Raunak Josh on Youtube)
Machine Learning Model Deployment Using Streamlit(Microsoft Power Tools on Youtube).
Non-technical skills like critical thinking, communication, domain knowledge etc.
Hackathons: Zindi and Kaggle are great places to start participating in hackathons and competitions.
We also have some low/no-code platforms that enable developers to build solutions by writing little or no code. These platforms include Tableau, Power BI, Power Virtual Agent, BigML etc.
Tableau is used for building analytics dashboards. Resource: Data Visualisation in Tableau (Udacity).
Power BI is also used for building analytics dashboards targeted at solving business problems. Resources: Diploma in Microsoft Power BI for Beginners(Alison), Learn Power BI etc.
Power Virtual Agent is used for the development of chatbots. Resource: Create bots with Power Virtual Agents(Microsoft).
BigML: is used for building machine learning and deep learning models. Resource: BigML Tutorial.
Welcome to the data world