Comprehensive list of data science resources

Comprehensive list of data science resources

Originally posted here, but this version here is up-to-date.

We blended together the best of the best resources posted recently on DSC. It would be great to organize them by category, but for now they are organized by date. This is very useful too, since you are likely to have seen old entries already, and can focus on more recent stuff. Starred entries have interesting charts.


October 26, 2016

  1. Authoring Books with R Markdown
  2. Feature Selection: The 10-dimensional burrito
  3. From scatter plot to slope chart
  4. Using Big Data for Machine Learning Analytics in Manufacturing 
  5. A Complete Tutorial on Linear Regression with R
  6. Statistical Computing with Stata
  7. Build an AI Writer – Machine Learning for Hackers – Video
  8. Demystifying linear regression and feature selection
  9. Monitoring A/B experiments in real-time
  10. JupyterLab: the next generation of the Jupyter Notebook
  11. Cheat Sheets for Web Developers
  12. How to Start Learning Deep Learning
  13. How to evaluate Data Science models ?
  14. Variable selection vs Model selection
  15. Anomaly detection with normal distribution

May 6, 2016

  1. Linux Data Science Virtual Machine
  2. Deep Learning for Beginners
  3. Google BigQuery Public Datasets
  4. How to Remove Duplicates in Large Datasets
  5. When should a Data Scientist use Structural Equation Modeling?
  6. Categorisation of Machine Learning algorithms for business applicat…
  7. Must Know Tips for Deep Learning Neural Networks
  8. Kaggle Releases Data Sets About Global Warming
  9. Book: Mastering Python for Data Science
  10. Public beta of toolkit for developing machine learning for robots a…
  11. NoSQL Performance Benchmarks
  12. Survival Modeling Tutorial using R – Part 1
  13. From Algorithms to Z-Scores: Probabilistic and Statistical Modeling…– Free book
  14. Step by step Kaggle competition tutorial

March 14, 2016

  1. Spark Machine Learning Library Tutorial
  2. Math for Machine Learning – PDF document
  3. Combining the Strengths of MLlib, scikit-learn, and R
  4. Auto-scaling scikit-learn with Spark
  5. What is Machine Learning and Predictive Analytics? A Real World Exa…
  6. A bunch of free data science books
  7. Learning Python For Data Science
  8. How William Cleveland Turned Data Visualization Into a Science *
  9. OpenText Data Visualization – Red Carpet Edition **
  10. A Visual Introduction to Machine Learning
  11. Sharing & Preserving Beautiful Graphs With Your Data
  12. A Complete Tutorial to learn Data Science in R from Scratch
  13. How to create confounders with regression: a lesson from causal inf…

February 7, 2016

  1. Open Data: 7 V’s, and Thousands of Repositories
  2. MapReduce Use Case
  3. Common Probability Distributions: The Data Scientist’s Crib Sheet
  4. A Brief History of Neural Nets and Deep Learning
  5. Announcing the Machine Learning Quora Sessions Series
  6. A Step-By-Step Introduction to SAS Report Procedure
  7. Introduction to Statistics (Part – I)
  8. Top certifications for SAS, R, Python, Machine Learning
  9. 6 points to compare Python and Scala for data science using Spark
  10. Build your own neural network classifier in R

December 6, 2015

  1. Assumptions of Linear Regression **
  2. Support Vector Machines for beginners **
  3. Visualizing Representations: Deep Learning and Human Beings **
  4. Predictive Modeling in R & Topic Modeling in Python – Tutorials (Videos)
  5. Correlation and Linear Regression
  6. Lower Bound for the Number of Examples Needed for Learning – PDF document
  7. Plotly Offline for RStudio and Shiny
  8. Introduction to Circular Statistics – Rao’s Spacing Test
  9. 50+ free online resources to learn more about data science and anal…
  10. Anomaly Detection in Predictive Maintenance with Time Series Analysis
  11. Beautiful dendrogram visualizations in R **
  12. R Programming: 35 Job Interview Questions and Answers

November 16, 2015

  1. A Bayesian Model to Calculate Whether My Wife is Pregnant or Not
  2. Interpolation and smoothing functions in base R
  3. Tools in the data armoury: R vs Spark
  4. Best Big Data, Data Science, Data Mining, and Machine Learning podc…
  5. An Introduction to Linear Models for Data Science
  6. Statistics for Hackers
  7. Apache Spark – Executive Summary
  8. Random vs non-random: How to Tell the Difference
  9. A Visual Survey of Visualization Techniques for Time-Oriented Data **
  10. Top 20 Python Machine Learning Open Source Projects
  11. Identifying paid content (native advertising) from unpaid content
  12. 100 open source Big Data architecture papers for data professionals

September 24, 2015

  1. 10 Misconceptions about Neural Networks **
  2. Deep Learning Libraries by Language
  3. A Visual Introduction to Machine Learning
  4. A Byte of Python – Free Python eBook (Tutorial)
  5. TauCharts. Data focused javascript charting library *
  6. Bokeh is a Python interactive visualization library **
  7. Ebook: Data, all about Big Data ecosystem
  8. How a Kalman filter works, in pictures
  9. A Beginner’s Guide to SQL
  10. How to Become a Data Scientist for Free – By Nir Goldstein
  11. How to Become a Data Scientist for Free – By Zeeshan Usmani
  12. The advantage of short paper titles ** – Technical
  13. Staying in Control with Moving Averages *
  14. How does a relational database work
  15. Understanding the DNA of Data Science – Booz Hallen Hamilton
  16. Machine Learning in 7 Pictures
  17. These 4 data sets all have the same mean and variance
  18. Data science projects using Python
  19. Five data science projects to learn data science
  20. A 6-Step Guide To Tracking Social Media In Google Analytics

July 21, 2015

  1. Comprehensive guide for Data Exploration in R
  2. Kaggle R Tutorial on Machine Learning
  3. What is Data Munging? An Example
  4. Building Analytics at 500px
  5. Interview Questions for Data Scientist Positions * – See chart at the bottom explaining isotonic (piecewise linear) regression (useless for predictive analytics)
  6. Analyzing and Visualizing Flows in Rivers and Lakes with MATLAB
  7. 50+ Data Science and Machine Learning Cheat Sheets
  8. Must read books for people interested in Analytics
  9. Step by Step guide to learn Time Series Modelling
  10. Comprehensive Guide to Data Visualization in R
  11. Data Exploration using Pandas in Python
  12. Data science book for ordinary people

June 22, 2015

  1. Predicting Heavy and Extreme Losses in Real-Time for Portfolio Hold…*
  2. htmlwidgets for Rich Data Visualization in R *
  3. Which Big Data, Data Mining, and Data Science Tools go together?  +
  4. Kaggle Ensembling Guide **
  5. Static and dynamic network visualization with R
  6. Correspondence Analysis in R
  7. SparkR will be included in Apache Spark 1.4
  8. So, You Need a Statistically Significant Sample?
  9. Feature Selection using Information Gain in R
  10. Distributed Cache in hadoop MR
  11. A Comparison of Open Source Tools for Sentiment Analysis *
  12. Most popular use cases for Hadoop
  13. 18 free and widely used Open Source NoSQL Databases 
  14. Signal Extraction Methodology (Deep Dive)
  15. The Grammar of Data Science *

June 8, 2015

  1. Lessons learned in high-performance R
  2. Quick R *
  3. R, Python, MATLAB, & Excel Dashboards & Graphs With D3.JS &…**
  4. Python: Why is [] 3 times faster than list()?
  5. 7 Techniques for Data Dimensionality Reduction ** + – Also google “Fast Combinatorial Feature Selection with New Definition of Predictive Power”
  6. Moving from structured database to big data analytics – PDF document (article)
  7. Web Scraping in R for Data Science Colleges 
  8. Data Science for Internet of Things Course
  9. Tools for writing a data science dissertation
  10. Integrating R and Python with Slack

May 12, 2015

  1. Algorithms detect and remove trolls, massively reducing costs * – PDF document
  2. 7 machine learning concepts – These are also statistics or data science techniques.
  3. Top 50 open source web crawlers for data mining
  4. A map of data mining algorithms
  5. Visualizing the ties between big pharma and doctors in France
  6. Binomial Logistic Regression – An Analysis of Stock Dividends
  7. Amazon Machine Learning: use cases and a real example in Python
  8. Tutorial: How to determine the quality and correctness of classific…
  9. Machine Learning Wars: Amazon vs Google vs BigML vs PredicSis
  10. 100x faster Plotly graphs in iPython notebook
  11. An Introduction to Deep Learning and it’s role for IoT/ future cities
  12. 23 Resources for Finding Open Data
  13. Machine learning classes at Oxford University
  14. Columbia data science course, week 1: what is data science?
  15. Book: Beyond Basic Statistics
  16. Book: Data Science from Scratch – First Principles with Python
  17. R for more powerful clustering **

April 18, 2015

  1. Linear Regression Implementation in Python
  2. Various tutorials for data scientists (just started – add yours)
  3. Beginners guide to creating a REST API
  4. Common threads: Awk by example, Part 1
  5. Course: Introduction to C and C++
  6. Course: Probabilistic Systems Analysis and Applied Probability
  7. Understanding Hadoop’s Map-Reduce Application
  8. How To Run Linear Regression In Python SciKit-Learn
  9. Data tidying (R programming)
  10. Free access to 2015 stats and maths journal issues (Wiley)
  11. Causal Inference with Graphical Models
  12. Modern Methods for Sentiment Analysis *

April 11, 2015

  1. Evaluating RF for Survival Analysis Using Prediction Error Curves * – PDF document
  2. The Grammar of Data Science: Python vs R
  3. Turning the Database Inside-Out with Apache Samza
  4. Hierarchical Clustering with R (feat. D3.js and Shiny) **
  5. Does Balancing Classes Improve Classifier Performance? *
  6. Turning Machine Learning Models Using the Caret R Package
  7. Building an NCAA Men’s Basketball Predictive Model
  8. Ultimate guide for Data Exploration in Python using NumPy, Matplotl…

March 31, 2015

  1. Common Pitfalls in Machine Learning
  2. launches data science platform to the public – Languages offered: R, Python, Julia, SQL
  3. Course: Introduction to Computer Science and Programming
  4. ML in the Valley
  5. Why and how to use random forest variable importance measures
  6. Frequently updated Machine Learning blogs
  7. The Basic Recipe For Machine Learning Explained In A Single PowerPo…
  8. Video: How to Train a Data Scientist
  9. Text Analytics 2015 *
  10. Deep Learning Gurus Talk about History and Future *
  11. Data Visualization cheatsheet

March 16, 2015

  1. Learning Python for Social Scientists
  2. Bayesian statistics tutorial – Very compact, 30 pages in PDF format, by my ex-colleague Sujit Sahu from the Stats Lab @ Cambridge University
  3. TDD makes the difference between Software Engineering and Programming – Video
  4. Test-Driven Development in Python
  5. pythex is a quick way to test your Python regular expressions
  6. Learn to program in Python, a powerful language used by sites like …
  7. Learning Spark Lightning-Fast Big Data Analysis
  8. The Revolution in Astronomy Education: Data Science for the Masses
  9. Text mining in R – Automatic categorization of Wikipedia articles
  10. A Survey of Statistical Methods and Computing for Big Data
  11. Mapping Your Music Collection
  12. How autoencoders defeat the curse of dimensionality
  13. Data Frames in Spark for Large Scale Data Science – Presentation
  14. Deep Learning for Text Understanding, From Scratch

Mar 2, 2015

  1. An Open-Source JavaScript Library for Mobile-Friendly Interactive Maps
  2. Beginners guide to creating a REST API
  3. MySQL Tutorial
  4. PMML 4.2 – Tree Models
  5. Exploring Machine Learning with Scikit-learn – PyCon 2014 – Video
  6. Whirlwind tour of pandas in 10 minutes
  7. Producing Simple Graphs with R
  8. Scraping Web Data from Trip Advisor with R
  9. Simple but fast reverse geocoding up to city granularitiy level
  10. Linear Regression In Python Using Stats Models
  11. A Brief Overview of Deep Learning
  12. Awesome Data Science Repository
  13. Hierarchical Clustering in Action – Clustering API
  14. Hierarchical Clustering with R (feat. D3.js and Shiny) **
  15. A Quick Guide to Free and Inexpensive Data Tools **
  16. Intro to pandas data structures
  17. Introduction to basic Text Mining in R
  18. Text Classification for Sentiment Analysis – Eliminate Low Informat…
  19. K-means clustering is not a free lunch
  20. Why and how to use random forest variable importance measures (and …
  21. Common Pitfalls in Machine Learning
  22. The Periodic table for machine learning libraries *
  23. Awesome 1-page R Survival Guides
  24. Distinguishing cause from effect using observational data: methods …– PDF document
  25. Building High-level Features Using Large Scale Unsupervised Learning PDF document about Google’s Deep Learning, nonlinear statistical model with 1 billion parameters; note that Adaptive kernel density estimators have an infinite number of parameters: the kernel bandwidth, at each location.
  26. Introduction to supervised machine learning and pattern classification
  27. Avoiding a Common Mistake with Time Series *
  28. Facebook Open Sources deep-learning modules for Torch

February 5, 2015

  1. Code for learning the Structure of Graphical Models **
  2. PokitDok HealthGraph **
  3. Data Wrangling with dplyr and tidyr Cheat Sheet
  4. Deep Learning in a Nutshell
  5. Do we Need Hundreds of Classifiers to Solve Real World Classificati…– PDF document
  6. Video: Advanced Machine Learning with scikit-learn
  7. Predictive Modeling with R and the caret Package
  8. Protovis: A Graphical Toolkit for Visualization
  9. R Data: Data Analysis and Visualization Using R
  10. How to Choose Between Learning Python or R First
  11. Top 50 open source web crawlers for data mining
  12. Year 2014 in Review as Seen by a Event Detection System **
  13. Optimization Algorithms in Machine Learning
  14. Machine Learning course Video
  15. Course from Rice University: An Introduction to Interactive Program…
  16. MapReduce: Simplified Data Processing on Large Clusters
  17. MapReduce Online
  18. Distributed Hash Tables, Part I
  19. One Page R: A Survival Guide to Data Science with R
  20. Abridged List of Machine Learning Topics **
  21. Decision Tree Algorithms – Simplified
  22. DataQuest – Browser-based learning for data science

January 23, 2015

  1. How To Implement These 5 Powerful Probability Distributions In Python
  2. Median Selection Subset Aggregation for Parallel Inference
  3. The caret Package – Short for Classification And REgression Training
  4. Bayesian Machine Learning on Apache Spark
  5. How to Visualize Website Clickstream Data
  6. Practical Data Science in Python
  7. Starting data analysis/wrangling with R – Things I wish I’d been told
  8. Sibyl: A System for Large Scale Machine Learning at Google – Video
  9. Top 77 R posts for 2014
  10. Implementing K-means Clustering to Classify Bank Customer
  11. Data Animations With Python and MoviePy
  12. A Young Person’s Guide to C# Bond
  13. Video: Advanced Machine Learning with scikit-learn
  14. pbdR: programming with big data in R
  15. 14 Best Python Pandas Features
  16. Deep Learning in a Nutshell *
  17. Big Data for Predictive Machine Learning and Data Mining – Research paper, Cornell
  18. R Markdown – About repoducibility of research experiments

January 6, 2015

  1. Machine Learning Discussion Group – Deep Learning with Stanford AI Lab (Video 1 of 3)
  2. Univariate Distribution Relationships – 76 probability distributions
  3. Abridged List of Machine Learning Topics
  4. Deep Learning in Neural Networks: An Overview
  5. Using Word Clouds for Topic Modeling Results
  6. The Split-Apply-Combine Strategy for Data Analysis
  7. Open source dashboard templates 
  8. Configuring a Linux Virtual Machine for Data Science – Step-by-step guide, with Python, R and GIT
  9. Do-it-yourself Crawlers vs. Crawlers as Service
  10. Abridged List of Machine Learning Topics
  11. Recommender Systems 101 – a step by step practical example in R
  12. Programming tools: Adventures with R
  13. Introductory R Presentation
  14. What is a Bayesian Network?

December 24, 2014

  1. Map-Reduce for Machine Learning on Multicore – PDF document
  2. A Map-Reduce Algorithm for Matrix Multiplication
  3. HAMA: An Efficient Matrix Computation with the Map-Reduce Framework – PDF document
  4. Deep Neural Networks are Easily Fooled: High Confidence Predictions…
  5. Video: An Overview of Deep Learning and Its Challenges for Technica…
  6. Representation Learning: A Review and New Perspectives
  7. 5 Amazingly powerful Python libraries for data science
  8. DIY Crawlers vs. Crawlers as Service
  9. 20 new data viz tools and resources of 2014
  10. JavaScript data visualization for R
  11. Controversies in the Foundations of Statistics – Research paper (1978)

December 10, 2014

  1. An open source repository for responsive dashboard templates 
  2. Do we Need Hundreds of Classifiers to Solve Real World Classificati…– PDF document (MIT)
  3. A Dozen Informative Videos on Data Science
  4. An Introduction to Unsupervised Learning via Scikit Learn
  5. 30 data visualization tools
  6. 14 Best Python Pandas Features
  7. What do practitioners need to know about regression?
  8. 10 Big data and analytics tutorials in 2014 – From IBM
  9. Deep Neural Networks are Easily Fooled  – PDF document
  10. What is deep learning? – PDF document
  11. Book: Statistics with R
  12. Automatically making sense of data

December 3, 2014

  1. Simple CSV Data Wrangling with Python
  2. Best Practices for Hadoop. Learn the best practices for applying ad…
  3. Data Science in the Statistics Curricula: Preparing Students to “Th…
  4. R, an Integrated Statistical Programming Environment and GIS
  5. Video: Introduction to Deep Learning with Python
  6. Resources regarding the Julia programming language
  7. Interpreting Confidence Intervals
  8. The learning behind gmail priority inbox

November 17, 2014

  1. Geoffrey Hinton on Deep Learning
  2. Python Packages For Data Mining
  3. Deep Learning Tutorial
  4. Recommender Systems (Machine Learning Summer School 2014 @ CMU)
  5. Tuning Machine Learning Models Using the Caret R Package
  6. Getting Started with Deep Learning and Python

November 6, 2014

  1. Hacker’s guide to Neural Networks
  2. 2015: the Year of Big Data * – Warwick Data Science Institute
  3. Exercise to compare classifier performance
  4. 10 Tips for Better Deep Learning Models
  5. Running R in the Azure ML cloud
  6. Getting Started with Deep Learning and Python
  7. Foundations of Data Science by John Hopcroft & Ravindran Kannan
  8. Videos: 20th ACM SIGKDD Conference on Knowledge Discovery and Data …

October 24, 2014

  1. Tutorial about Deep Belief Network in Python
  2. Cheat sheets for developers
  3. In-depth introduction to machine learning in 15 hours of expert videos
  4. The Python Tutorial
  5. Meta-list of data set repositories for cool data science projects *
  6. K Means Clustering – Effect of random seed
  7. Demographic and lifestyle information by Zipcode – Interesting, but they don’t provide education or age breakdown
  8. Equitability, mutual information, and the maximal information coeff…(Research paper)
  9. New approach to engineering analytics for deployment in streaming a…(Research paper)
  10. 50 Face Recognition APIs
  11. Prediction intervals too narrow

October 16, 2014

  1. Tutorial To Implement k-Nearest Neighbors in Python From Scratch
  2. Exercise to detect Algorithmically Generated Domain Names
  3. Deep Learning Tutorials
  4. Popularity rankings: How to do it Right
  5. How to Prepare Data For Machine Learning
  6. swirl teaches you R programming and data science interactively 
  7. Data Science at the Command Line
  8. Overfitting and Machine Learning
  9. ADW, free software to measure semantic similarity
  10. In-depth introduction to machine learning in 15 hours of expert videos
  11. Hacker’s guide to Neural Networks
  12. Visualizing MNIST: An Exploration of Dimensionality Reduction

October 7, 2014

  1. LASTA: Large Scale Topic Assignment on Multiple Social Networks
  2. Getting Started with Deep Learning and Python
  3. DescTools: a new R “misc package”
  4. Free online access to International Statistical Review article
  5. Top 10 presentations about data science / big data on SlideShare
  6. Automotive Customer Churn Prediction using SVM and SOM
  7. Mirador, a free tool for visual exploration of complex datasets
  8. Scikit-learn: Machine Learning in Python
  9. Forecasting Civil Unrest using Open Source Indicators – Academic Paper

September 25

  1. Enhancing R with Advanced Compilation Tools and Methods (PDF)
  2. A new open-source package for estimating causal effects in time series
  3. Tutorial To Implement k-Nearest Neighbors in Python From Scratch *
  4. Machine learning resources for beginners 
  5. Mining Massive Datasets – Coursera Course (Stanford)
  6. D3.js Step by Step: A series of posts to get you started 
  7. Rattle package for Data Mining and Data Science in R
  8. Science to Data Science Project (training)

September 14

  1. Sampling for Big Data –  KDD Powerpoint Presentation
  2. Extracting opinion phrases from user reviews with Stanford CoreNLP
  3. Making interactive 3D plots in an IPython Notebook
  4. The recommender problem revisited
  5. Easy parallel loops in R, Python, Matlab, and Octave
  6. SFO City Crime Analysis with R
  7. Crime Analysis with Shiny & R
  8. Understanding Hadoop Clusters and the Network
  9. Google Chart API

September 7

  1. Forecasting: principles and practice
  2. How To Estimate Model Accuracy in R Using The Caret Package
  3. A Method of Grouping and Summarizing Data of Big Text Files in R La…
  4. Forecast package for R
  5. A fast learning algorithm for deep belief nets (PDF)
  6. Probalistic programming in Python
  7. Statistics Functionality in Spark
  8. Most important APIs every Data Scientist should know –  What about Google APIs ?

August 25, 2014

  1. 100 most popular Machine Learning talks
  2. Baby steps in Python – Exploratory analysis in Python (using Pandas)
  3. Free online book: Neural Networks & Deep Learning
  4. A Large set of Machine Learning Resources for Beginners to Mavens
  5. Enigma Helps Businesses to Use Public Data
  6. Walking the Beat: Mining Seattle’s Police Report Data
  7. R packages
  8. Comparing Statistical Software
  9. 38 Seminal Articles Every Data Scientist Should Read
  10. Introducing tidyr
  11. CMU Machine Learning Summer School Videos

August 12, 2014

  1. 10 R packages I wish I knew about earlier
  2. Download one of the largest synomym dictionary
  3. Accelerating R Applications with CUDA
  4. Using scikit-learn Pipelines and FeatureUnions
  5. 10 Great R Packages
  6. The Shogun Machine Learning Toolbox (video)
  7. Python for Data Science
  8. What is deep learning, and why should you care?
  9. Using Python’s sci-packages to prepare data for Machine Learning ta…
  10. Data Science 101: SparkR – Interactive R Programs at Scale
  11. Building Data Apps with Python Workshop
  12. How to Build Dashboards that Persuade, Inform, and Engage
  13. Differences between econometrics and statistics: From varying treat…

August 4, 2014

  1. Using Python’s sci-packages to prepare data for Machine Learning ta…
  2. Feature Scaling and Normalization and the effect of standardization…
  3. 10 R Packages to Win Kaggle Competitions
  4. Challenges of Big Data analysis
  5. 22 free tools for data visualization and analysis
  6. Python for Data Science
  7. 10 types of regressions. Which one to use?
  8. 16 analytic disciplines compared to data science

July 28, 2014

  1. Using Python’s sci-packages to prepare data for Machine Learning ta…
  2. An awesome GitHub list of Big Data frameworks, resources, and more
  3. Digital Dashboards: Strategic & Tactical: Best Practices, Tips,…
  4. 15 interviews with 15 data scientists (PDF)
  5. Apache Hadoop : Introduction to Zookeeper (Video)
  6. Comparing the top Hadoop distributions
  7. Scale Free Network *

July 21, 2014

  1. Plyrmr: a data manipulation DSL for big data (PDF)
  2. Parallel computing in R
  3. 10 books to get started with Hadoop
  4. 12 Books and other resources to learn R
  5. 35 books on Data Visualization
  6. Book: An Introduction to Statistical Learning, Using R
  7. Gradient boosting machines, a tutorial

July 14, 2014

  1. Simple script from setting up R, Git, and Jags on Amazon EC2 Ubuntu…
  2. Plyrmr: a data manipulation DSL for big data (PDF)
  3. Parallel computing in R
  4. 10 books to get started with Hadoop
  5. 12 Books and other resources to learn R
  6. Book: An Introduction to Statistical Learning, Using R
  7. Gradient boosting machines, a tutorial

July 6, 2014

  1. 35 books on Data Visualization ~ picture of the week
  2. Coding data and visualization with Javascript, d3.js, and JSON
  3. Must Read Before Attending Any Data Science Interview
  4. Text gender analyzer (API)
  5. Learn R : 12 Books and Online Resources

June 29, 2014

  1. Conjecture: Scalable Machine Learning in Hadoop with Scalding
  2. Challenge of the Week – Random Numbers
  3. Machine Learning is Fun! The world’s easiest introduction to Machin…
  4. Reducing overfitting by randomly dropping 50% of features (PDF)
  5. Mapmaking for R Programmers 
  6. Good list of R function to access, manipulate, summarize, plot and …
  7. Wikipedia Usage Statistics – analyze this 4TB data set, now in AWS …
  8. OrientDB vs MongoDB: OrientDB has a hybrid Document-Graph engine
  9. Federal R&D machine-readable data on 700 sites
  10. NASA opens data to the public as part of new challenge
  11. 35 invaluable books on Data Visualization

June 21, 2014

  1. Cluster analysis in R: determine the optimal number of clusters
  2. Glossary for Analysis Nerds 
  3. Big Data Analytics with Google Big Query and R
  4. Graph gallery
  5. Data Science on
  6. Must read before attending any data science interview
  7. 37 colleges to fulfil your dream of becoming a Data Scientist

June 15, 2014

  1. The Big Data Poster
  2. Machine learning library
  3. A good tutorial on web-scrapping using R
  4. 70+ websites to get large data repositories for free

June 10, 2014

  1. A tutorial on statistical-learning for scientific data processing
  2. R useful functions
  3. Numeric matrix manipulation: Cheat sheet for MATLAB, Python NumPy, …
  4. Is there a future for Map/Reduce?
  5. Performance improvements for R – PDF doc
  6. Statistical Language Wars – Infographics
  7. 50 selected papers in Data Mining and Machine Learning
  8. Top 10 Algorithms in Data Mining
  9. Build your own recommendation engine with R
  10. A Tour of Machine Learning Algorithms
  11. 100+ Interesting Data Sets for Data Science

June 3, 2014

  1. A Tour of Machine Learning Algorithms
  2. 100+ Interesting Data Sets for Statistics 
  3. Regular Expressions as used in R
  4. Tricking your elephant to do data manipulations 
  5. Step-by-Step Guide to Setting Up an R-Hadoop System
  6. openFDA provides open APIs, raw data downloads
  7. Living Social: What is Your Social Graph and How is it Used?
  8. Signi-Trend: Detecting Significant Trends in Text

May 28, 2014

  1. Advanced R 
  2. Fast out-of-core learning system Vowpal Wabbit 
  3. Where are the Deep Learning Courses? 
  4. 11 Data Science Research Papers 
  5. Calling R from other applications: ZeroMQ, R, PHP, Python 
  6. A Tour of Machine Learning Algorithms

May 3, 2014

  1. MLTK: Machine Learning Toolkit in Java – free download
  2. Python tutorials for data scientists
  3. Experfy, Big Data Consulting Marketplace in Harvard 
  4. Massachusetts releases Big Data Report 2014 – PDF socument
  5. Visualizing supervised machine learning with association rules and …
  6. Large selection of courses for data scientists
  7. 9 Free Books for Learning Data Mining & Data Analysis
  8. Reliability and Reproducibility: Fraudulent p-values through multip…
  9. 16 resources to learn and understand Hadoop

March 21, 2014

  1. Machine Learning in 7 Pictures (Tutorial)
  2. Expand Your Data Science Toolbelt with 3 Predictive Model Tests
  3. Machine Learning in Parallel with Support Vector Machines, Generali…
  4. Jackknife logistic and linear regression for clustering and predict…
  5. Create a Mobile App in 3 Easy Steps

Replies to This Discussion

This a great list of sources.

A good new blog I’ve been following is if want to add it.


[frontpage_news widget=”35948″ name=”3ROW”]

About the author


Demonstrated history of working in the hospitality industry. Skilled in programming, Big Data, DevOps, Project Management, Server and Network Infrastructure, and general tinkerer. Frequent speaker and visitor of Makerspaces.

Related Posts