{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction to Python and Jupyter 2: Random Uncertainties"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This week let's use Python to start to get a better handle on random uncertainties. Suppose that, like in Intro Physics, you were doing a measurement of the period of a pendulum of length $\\ell$ made from a mass hung off a string. Newtonian theory predicts that this period will be \n",
    "$$\n",
    "T = 2\\pi \\sqrt{ \\frac{\\ell}{g}}. \n",
    "$$\n",
    "\n",
    "In the laboratory, you attempt to watch exactly one period of the pendulum, that is, one back and forth trip. You use a stop watch to measure the time of flight of this trip repeatedly and get $N$ measurements for the period. Some of these measurements are probably shorter than the true period of the pendulum because you were a bit quick in stopping the watch and others are probably a bit long. Let's assume you are careful in your measurements and they are dominated by the random chance of being a little on one side or the other of the true value. How will the distribution of your measurements look? As you take more measurements, how will your estimate of the mean of the distribution change? These are the questions that you will take up in this second Python notebook exercise. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below is snippet of code that will generate a virtual set of measurements from your pendulum experiment for you. The $N$ at the top of the code determines how many measurements you generate. Try generating 12 period measurements and looking at their values. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N1=12\n",
    "data1=[np.random.normal(loc=10.0, scale=1.3, size=None) for x in range(0,N1)]\n",
    "print(data1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What do you notice about these data? Suppose that each of these period measurements was made in seconds, then roughly what is the period of your pendulum? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An invaluable tool in doing computer programming is to be able to define functions. A function takes inputs and returns a value after combining those inputs in the manner that you prescribe. The syntax for doing this in Python is to use the def command, followed by the name of your function with its arguments in parentheses. After these three ingredients you put a colon, which is followed by the definition of the function. You can have several intermediate calculations on separate lines as always in python, but the last command of a function is usually the return command which returns the final value that you compute on that line. For example, the function below computes the Euclidean distance between two points $(x_1,y_1)$ and $(x_2,y_2)$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def EucDist(xc1,yc1,xc2,yc2):\n",
    "    return math.sqrt((xc1-xc2)**2+(yc1-yc2)**2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Practice this syntax by writing a function that will compute the mean value of a list of data. (Hints: Python has built in functions that will add up the elements of a list and give the length of a list. If you don't know what these functions are called you can always do an internet search to find them.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compute the mean of the list data1. This is the best estimate of the pendulum's period from this data set. Now suppose you made 100 measurements instead of 12: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N2=100\n",
    "data2=[np.random.normal(loc=10.0, scale=1.3, size=None) for x in range(0,N2)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the mean value of this larger data set? Why did it change? Roughly how much did it change by? Increase the size of the data set over a few orders of magnitude and see if you can spot any patterns. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By now you will have noticed that your data all fall nearby one another. This makes sense, you keep measuring the period of the same pendulum. However, it does mean that you have to stop and think a bit about the best way to visualize these repeated measurements. If we labeled the data points 1,2,3, etc. and made a plot of the value of the data point on the y-axis and the label on x-axis, the plot would be really boring. It would be roughly flat with some variation from point to point. A much more useful visualization is to imagine breaking the interval of your measurement into <i>bins</i> and counting how many of your data fall into each bin. In our current example, we are measuring period and we could break the region from 5 secs to 15 secs into ten equally spaced bins and counting how many of measurements fall into each bin. These kinds of plots are called histograms. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generate a data set with 1000 measurements:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N3=1000\n",
    "data3=[np.random.normal(loc=10.0, scale=1.3, size=None) for x in range(0,N3)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The code below makes an histogram of the data set you have just generated. See if you can understand what each piece is doing. Feel free to do internet searches to figure out particular pieces, but you can also just change them and see what happens. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_bins = 10\n",
    "plt.figure(dpi=800)\n",
    "plt.style.use('seaborn-whitegrid')\n",
    "n, bins, patches = plt.hist(data3, num_bins, density=False, facecolor='blue',alpha=0.5,label='Counts for period T')\n",
    "plt.ylabel('Counts')\n",
    "plt.xlabel('Period T (secs)')\n",
    "plt.title('An histogram of randomly generated Period Measurements')\n",
    "plt.legend()\n",
    "#plt.savefig('Hw5Snapshots.png')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A striking feature of this distribution is its width. This width is not perfectly well defined, we have to decide on a conventional way to characterize it. This is what the variance and standard deviation from your Lyon's reading accomplish. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using Lyons to help you, define a function that computes the standard deviation of your measurements data1, data2, and data3. (Starting with the smaller data set will help you to debug the code.) Once you've completed your standard deviation function apply it to data3 and see if it compares reasonably with the histogram you made above. Explain how you conclude that the histogram and standard deviation function are giving consistent results. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}