Developer Roadmap

i.am.ai
AI Expert Roadmap

Roadmap to becoming an Artificial Intelligence Expert in 2020

AMAI GmbH MIT License


Below you find a set of charts demonstrating the paths that you can take and the technologies that you would want to adopt in order to become a data scientist, machine learning or an ai expert. We made these charts for our new employees to make them AI Experts but we wanted to share them here to help the community.

If you are interested to become an AI EXPERT at AMAI (opens new window) in Germany, or you want to hire an AI Expert, please say hi@am.ai.

# Note

👉 An interactive version with links to follow about each bullet of the list can be found at i.am.ai/roadmap (opens new window) 👈

To receive updates star ⭐️ (opens new window) and watch 👀 the GitHub Repo (opens new window) to get notified, when we add new content to stay on the top of the most recent research.

# Disclaimer

The purpose of these roadmaps is to give you an idea about the landscape and to guide you if you are confused about what to learn next and not to encourage you to pick what is hip and trendy. You should grow some understanding of why one tool would better suited for some cases than the other and remember hip and trendy never means best suited for the job.

# Introduction

GIT - Version Control
GIT - Version Control
Papers With Code
Papers With Code
Personal Recommendation!
Personal Recommendation!
Available Options
Available Options
Data Scientist
Data Scientist
Big Data Engineer
Big Data Engineer
Machine Learning
Machine Learning
Deep Learning
Deep Learning
Data Engineer
Data Engineer
Required for any path
Required for any path
AI Expert in 2020
AI Expert in 2020
Choose your path
Choose your path
Legend
Legend
Semantic Versioning
Semantic Versioning
Keep a Changelog
Keep a Changelog
Fundamentals
Fundamentals
Viewer does not support full SVG 1.1

# Fundamentals

Fundamentals

Fundamentals
Matrices & Linear Algebra Fundamentals
Matrices & Linear Algebra Fu...
Database Basics
Database Basics
Relational vs. non-relational databases
Relational vs. non-relational databases
SQL + Joins (Inner, Outer, Cross, Theta Join)
SQL + Joins (Inner, Outer, Cross, Thet...
NoSQL
NoSQL
Tabular Data
Tabular Data
Data Frames & Series%3CmxGraphModel%3E%3Croot%3E%3CmxCell%20id%3D%220%22%2F%3E%3CmxCell%20id%3D%221%22%20parent%3D%220%22%2F%3E%3CmxCell%20id%3D%222%22%20value%3D%22Tabular%20Data%22%20style%3D%22rounded%3D1%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22170%22%20y%3D%22350%22%20width%3D%22170%22%20height%3D%2230%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3C%2Froot%3E%3C%2FmxGraphModel%3E
Data Frames & Series%3CmxGra...
Extract, Transform, Load (ETL)
Extract, Transform, Load (ET...
Reporting vs BI vs Analytics
Reporting vs BI vs Analytics
Data Formats
Data Formats
JSON
JSON
XML
XML
Regular Expressions (RegEx)
Regular Expressions (RegEx)
Python Basics
Python Basics
Important libraries
Important libraries
Virtual Environments
Virtual Environments
Expressions
Expressions
Variables
Variables
Data Structures
Data Structures
Functions
Functions
Install packages (via pip, conda or similar)
Install packages (via pip, conda or si...
Codestyle, e.g. PEP8
Codestyle, e.g. PEP8
Numpy
Numpy
Pandas
Pandas
Basics
Basics
Python   Programming
Python   Programming
Exploratory Data Analysis /
Data Munging / - Wrangling
Exploratory Data Analysis /...
Dimensionality & Numerosity...
Normalization
Normalization
Data Scrubbing,
Handling Missing Values
Data Scrubbing,...
Unbiased Estimators
Unbiased Estimators
Binning sparse values
Binning sparse values
Feature Extraction
Feature Extraction
Denoising
Denoising
Sampling
Sampling
Principal Component Analysis (PCA)
Principal Component Analysis...
CSV
CSV
Awesome Public Datasets
Awesome Public Datasets
Kaggle
Kaggle
Jupyter Notebooks / Lab
Jupyter Notebooks / Lab
Data Sources
Data Sources
Some boxes link to additional ressources
Some boxes link to additional ress...
Interactive version on
i.am.ai/roadmap

Interactive version on...
Data Scientist
Data Scientist
Data Engineer
Data Engineer
Data Mining
Data Mining
Web Scraping
Web Scraping
Viewer does not support full SVG 1.1

# Data Science Roadmap

Data Scientist

Data Scientist
Probability Theory
Probability Theory
Probability distribution
Probability distribution
Randomness, random variable and...Conditional probability and...
(Statistical) independence
(Statistical) independence
iid
iid
cdf, pdf, pmf
cdf, pdf, pmf
Continuous distributions (pdf's)
Continuous distributions (pd...
Cumulative distribution function (cdf)
Cumulative distribution function (cd...
Probability density function (pdf)
Probability density function (pdf)
Probability mass function (pmf)
Probability mass function (pmf)
Normal / Gaussian
Normal / Gaussian
Uniform (continuous)
Uniform (continuous)
Beta
Beta
Dirichlet
Dirichlet
Exponential
Exponential
Uniform (discrete)
Uniform (discrete)
Discrete distributions (pmf's)
Discrete distributions (pmf'...
χ2 (chi-squared)
 χ2 (chi-squared)
Binomial
Binomial
Multinomial
Multinomial
Hypergeometric
Hypergeometric
Poisson
Poisson
Expectation and mean
Important Laws
Important Laws
Summary statistics
Summary statistics
Estimation
Estimation
Hypothesis Testing
Hypothesis Testing
Confidence Interval (CI)%3CmxGraphModel%3E%3Croot%3E%3CmxCell%20id%3D%220%22%2F%3E%3CmxCell%20id%3D%221%22%20parent%3D%220%22%2F%3E%3CUserObject%20label%3D%22Important%20Laws%22%20id%3D%222%22%3E%3CmxCell%20style%3D%22rounded%3D1%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22360%22%20y%3D%22740%22%20width%3D%22170%22%20height%3D%2230%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3C%2FUserObject%3E%3C%2Froot%3E%3C%2FmxGraphModel%3E
Confidence Interval (CI)%3Cm...
Monte Carlo Method
Monte Carlo Method
Geometric
Geometric
Variance and standard deviation (...Covariance and correlationMedian, quartile
Interquartile range
Interquartile range
Percentile / quantile
Mode
Mode
Law of large numbers (LLN)
Law of large numbers (LLN)
Central limit theorem (CLT)
Central limit theorem (CL...
Maximum Likelihood Estimation (MLE)
Maximum Likelihood Estimation (ML...
Kernel Density Estimation (KDE)
Kernel Density Estimation (KDE)
p-Value
p-Value
Chi2 test
Chi2 test
F-test
F-test
t-test
t-test
Statistics
Statistics
Chart Suggestions thought starter
Chart Suggestions thought st...
Python
Python
Matplotlib
Matplotlib
plotnine (like ggplot in R)
plotnine (like ggplot in...
Vega-Lite
Vega-Lite
D3.js
D3.js
Tableau
Tableau
Dash
Dash
Visualization
Visualization

Machine Learning

Machine Learning
Web
Web
Dashboards
Dashboards
BI
BI
PowerBI
PowerBI
seaborn
seaborn
ipyvolume (3D data)
ipyvolume (3D data)
streamlit
streamlit
Bokeh
Bokeh
Viewer does not support full SVG 1.1

# Machine Learning Roadmap

Machine Learning

Machine Learning
Concepts, Inputs & Attributes
Concepts, Inputs & Attributes
General
General
Categorical Variables
Categorical Variables
Ordinal Variables
Ordinal Variables
Numerical Variables
Numerical Variables
Cost functions and
gradient descent
Cost functions and...
Overfitting / Underfitting
Overfitting / Underfitting
Training, validation
and test data
Training, validation...
Precision vs Recall
Precision vs Recall
Bias & Variance
Bias & Variance
Lift
Lift
Supervised Learning
Supervised Learning
Methods
Methods
Unsupervised Learning
Unsupervised Learning
Ensemble Learning
Ensemble Learning
Reinforcement Learning
Reinforcement Learning
Regression
Regression
Classification
Classification
Classification Rate
Classification Rate
Decision Trees
Decision Trees
Naïve Bayes Classifiers
Naïve Bayes Classifiers
Linear Regression
Linear Regression
Poisson Regression
Poisson Regression
K-Nearest Neighbour
K-Nearest Neighbour
SVM
SVM
Clustering
Clustering
Association Rule Learning
Association Rule Learning
Dimensionality Reduction
Dimensionality Reduction
Hierarchical Clustering
Hierarchical Clustering
K-Means Clustering
K-Means Clustering
DBSCAN
DBSCAN
Fuzzy C-Means
Fuzzy C-Means
Mean Shift
Mean Shift
Agglomerative
Agglomerative
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Boosting
Boosting
Bagging
Bagging
Stacking
Stacking
Q-Learning
Q-Learning
Sentiment Analysis
Sentiment Analysis
Collaborative Filtering
Collaborative Filtering
Tagging
Tagging
Prediction
Prediction
Use Cases
Use Cases
Tools
Tools
scikit-learn
scikit-learn

Deep Learning

Deep Learning
Important libraries
Important libraries
spacy (NLP)
spacy (NLP)
Apriori Algorithm
Apriori Algorithm
ECLAT algorithm
ECLAT algorithm
FP Trees
FP Trees
Random Projection
Random Projection
NMF
NMF
T-SNE
T-SNE
UMAP
UMAP
HDBSCAN
HDBSCAN
OPTICS
OPTICS
Gaussian Mixture Models
Gaussian Mixture Models
Logistic Regression
Logistic Regression
Viewer does not support full SVG 1.1

# Deep Learning Roadmap

Deep Learning

Deep Learning
Deep Learning Papers Reading Roadmap
Deep Learning Papers Reading...
Papers
Papers
Papers with code
Papers with code
Papers with code - state of the art
Papers with code - state of...
Understanding
Neural Networks
Understanding...
Neural Networks
Neural Networks
Feedforward neural network
Feedforward neural network
Autoencoder
Autoencoder
Convolutional Neural Network
(CNN)
Convolutional Neural Network...
Generative Adversarial Network (GAN)
Generative Adversarial Netwo...
Architectures
Architectures
Important Libraries
Important Libraries
Tools
Tools
PyTorch
PyTorch

keep exploring and stay up-to-date

keep exploring and s...
Recurrent Neural Network
(RNN)
Recurrent Neural Network...
LSTM
LSTM
GRU
GRU
Tensorflow
Tensorflow
Loss Functions
Loss Functions
Activation Functions
Activation Functions
Weight Initialization
Weight Initialization
Vanishing / Exploding
Gradient Problem
Vanishing / Exploding...
Pooling
Pooling
Transformer
Transformer
Encoder
Encoder
Decoder
Decoder
Attention
Attention
Siamese Network
Siamese Network
Residual Connections
Residual Connections
Optimizers
Optimizers
Training
Training
Learning Rate Schedule
Learning Rate Schedule
Batch Normalization
Batch Normalization
Batch Size Effects
Batch Size Effects
Regularization
Regularization
Multitask Learning
Multitask Learning
Transfer Learning
Transfer Learning
Curriculum Learning
Curriculum Learning
SGD
SGD
Momentum
Momentum
Adam
Adam
AdaGrad
AdaGrad
AdaDelta
AdaDelta
Nadam
Nadam
RMSProp
RMSProp
Early Stopping
Early Stopping
Dropout
Dropout
Parameter Penalties
Parameter Penalties
Data Augmentation
Data Augmentation
Adversarial Training
Adversarial Training
Tensorboard
Tensorboard
MLFlow
MLFlow
Distillation
Distillation
Model optimization
(advanced)
Model optimiza...
Neural Architecture
Search (NAS)
Neural Architecture...
Quantization
Quantization
Awesome Deep Learning
Awesome Deep Learning
Huggingface Transformers
Huggingface Transformers
Evolving Architectures / NEAT
Evolving Architectures / NEAT
Viewer does not support full SVG 1.1

# Data Engineer Roadmap

Data Engineer

Data Engineer
Summary of Data Formats
Summary of Data Formats
Data Discovery
Data Discovery
Data Source & Acquisition
Data Source & Acquisition
Data Integration
Data Integration
Data Fusion
Data Fusion
Transformation & Enrichment
Transformation & Enrichment
OpenRefine
OpenRefine
Data Survey
Data Survey
How much Data
How much Data
Using ETL
Using ETL
Data Lake vs Data Warehouse
Data Lake vs Data Warehouse
Dockerize your Python Application
Dockerize your Python Applic...

keep exploring and stay up-to-date

keep exploring and st...
Viewer does not support full SVG 1.1

# Big Data Engineer Roadmap

Big Data Engineer

Big Data Engineer
Architectural Patterns & Best Practices (video)
Architectural Patterns & Bes...
Horizontal vs vertical scaling
Horizontal vs vertical scali...
Map Reduce
Map Reduce
Data Replication
Data Replication
Job & Task Tracker
Job & Task Tracker
Name & Data Nodes
Name & Data Nodes
Check the Awesome Big Data List
Check the Awesome Big Data L...
Hadoop (large data)
Hadoop (large data)
Spark (in memory)
Spark (in memory)
HDFS
HDFS
Loading data with Sqoop and Pig
Loading data with Sqoop and Pig
Big Data Architectures
Big Data Architectures
Principles
Principles
Tools
Tools
RAPIDS (on GPU)
RAPIDS (on GPU)
Flume, Scribe: For Unstruct Data
Flume, Scribe: For Unstruct...
Data Warehouse with Hive
Data Warehouse with Hive
Elastic (EKL) Stack
Elastic (EKL) Stack
Avro
Avro
Flink
Flink
MLFlow
MLFlow
Kafka & KSQL
Databases
Databases
Storm: Hadoop Realtime
Storm: Hadoop Realtime
to get data (e.g. logging), search, analyze
   and visualize it in realtime
to get data (e.g. logging),...
Cassandra
Cassandra
MongoDB, Neo4j
Scalability
Scalability
ZooKeeper
ZooKeeper
Kubernetes
Kubernetes
Cloud Services
Cloud Services
AWS SageMaker
AWS SageMaker
Google ML Engine
Google ML Engine
Microsoft Azure
Machine Learning Studio
Microsoft Azure...

keep exploring and stay up-to-date

keep exploring and st...
Awesome Production ML
Awesome Production ML
Dask
Dask
Numba
Numba
Onnx
Onnx
OpenVino
OpenVino
Viewer does not support full SVG 1.1

# 🚦 Wrap Up

If you think any of the roadmaps can be improved, please do open a PR with any updates and submit any issues. Also, we will continue to improve this, so you might want to watch/star this repository to revisit.

# 🙌 Contribution

Have a look at the contribution docs for how to update any of the roadmaps

  • Open pull request with improvements
  • Discuss ideas in issues
  • Spread the word
  • Reach out with any feedback

# Supported By

AMAI GmbH AMAI GmbH

Last Updated: 11/30/2020, 10:35:55 PM