Problem 3

Analyze the height data of a population.

The data is available at: http://jse.amstat.org/v11n2/datasets.heinz.html.

Given a series of measurements in centimeters, compute: * The minimum, the maximum and average height of the population, * A histogram of the heights given the number of bins

Classes:

Measurement(value)

Represent a single measurement of a human height.

Range(start, end)

Represent a range of measurements.

BinRanges(bin_count, lower_bound, ...)

Represent the ranges of the histogram bins.

Histogram(ranges)

Represent a mutable histogram.

Functions:

compute_stats(measurements)

Compute the statistics of the given measurements.

bin_index(ranges, value)

Find the index of the bin range among ranges corresponding to value.

compute_histogram(measurements)

Compute the histogram over measurements.

class Measurement(value: float)[source]

Represent a single measurement of a human height.

Methods:

__new__(cls, value)

Enforce the valid range on the measurement.

static __new__(cls, value: float) Measurement[source]

Enforce the valid range on the measurement.

Requires
  • 0 < value < 400

    (Only valid value; the tallest man on earth ever measured was 251cm tall.)

compute_stats(measurements: List[Measurement]) Tuple[float, float, float][source]

Compute the statistics of the given measurements.

Returns

Minimum, mean, maximum

Requires
  • len(measurements) > 0

Ensures
  • not (len(set(measurements)) != 1)
    or result[0] < result[1] < result[2]
    
  • not (len(set(measurements)) == 1)
    or result[0] == result[1] == result[2]
    

    (Identical measurements all give the same min, average, max)

class Range(start: float, end: float)[source]

Represent a range of measurements.

Methods:

__init__(start, end)

Initialize with the given values.

__repr__()

Represent as mathematical range for easier debugging.

__init__(start: float, end: float) None[source]

Initialize with the given values.

Requires
  • not math.isnan(end)

  • not math.isnan(start)

  • start < end

__repr__() str[source]

Represent as mathematical range for easier debugging.

class BinRanges(bin_count: int, lower_bound: float, upper_bound: float, include_minus_inf: bool, include_inf: bool)[source]

Represent the ranges of the histogram bins.

Methods:

__new__(cls, bin_count, lower_bound, ...)

Construct bin_count number of histogram bins between lower_bound and upper_bound.

__getitem__()

Get the bin range at the given index.

__len__()

Return the number of the bin ranges.

__iter__()

Iterate over the bin ranges.

static __new__(cls, bin_count: int, lower_bound: float, upper_bound: float, include_minus_inf: bool, include_inf: bool) BinRanges[source]

Construct bin_count number of histogram bins between lower_bound and upper_bound.

If include_inf, include -∞ and +∞ in the spanned total range of histogram.

Requires
  • (
            bin_width := (upper_bound - lower_bound) / bin_count,
            bin_width != 0
    )[1]
    

    (Bin width not numerically zero)

  • not math.isnan(lower_bound) and not math.isinf(lower_bound)

  • not math.isnan(upper_bound) and not math.isinf(upper_bound)

  • lower_bound < upper_bound

Ensures
  • include_inf and include_minus_inflen(result) == bin_count + 2

    (bin_count does not refer to +/- inf bins)

  • not include_inf and include_minus_inflen(result) == bin_count + 1

    (bin_count does not refer to +/- inf bins)

  • include_inf and not include_minus_inflen(result) == bin_count + 1

    (bin_count does not refer to +/- inf bins)

  • not include_inf and not include_minus_inflen(result) == bin_count

    (bin_count does not refer to +/- inf bins)

  • not (include_inf ^ math.isinf(result[-1].end))

    (include_inf <=> upper bound of the last bin is +inf)

  • not (include_minus_inf ^ math.isinf(result[0].start))

    (include_min_inf <=> lower bound of the first bin is -inf)

  • all(
        previous.end == current.start
        for previous, current in common.pairwise(result)
    )
    

    (Bin ranges without a hole)

__getitem__(index: int) Range[source]
__getitem__(index: slice) BinRanges

Get the bin range at the given index.

__len__() int[source]

Return the number of the bin ranges.

__iter__() Iterator[Range][source]

Iterate over the bin ranges.

bin_index(ranges: BinRanges, value: float) int[source]

Find the index of the bin range among ranges corresponding to value.

Requires
  • not math.isnan(value)

Ensures
  • value < ranges[0].startresult == -1

    (Value not covered in ranges => bin not found)

  • value > ranges[-1].endresult == -1

    (Value not covered in ranges => bin not found)

  • ranges[0].start <= value <= ranges[-1].end0 <= result < len(ranges)

    (Value in the ranges => bin found)

  • result != -1ranges[result].start <= value < ranges[result].end

    (Index not found or it corresponds to the correct bin range)

class Histogram(ranges: BinRanges)[source]

Represent a mutable histogram.

Establishes
  • all(count >= 0 for count in self.counts)

Methods:

__init__(ranges)

Initialize the histogram with zero counts for ranges.

add(value)

Count the value in the corresponding bin.

items()

Iterate over (bin range, count of observations).

Attributes:

ranges

Bin ranges

counts

Count of observations for the given bin

__init__(ranges: BinRanges) None[source]

Initialize the histogram with zero counts for ranges.

Requires
  • len(ranges) > 0

ranges

Bin ranges

counts

Count of observations for the given bin

add(value: float) None[source]

Count the value in the corresponding bin.

Requires
  • self.ranges[0].start <= value < self.ranges[-1].end

  • not math.isnan(value)

items() Iterator[Tuple[Range, int]][source]

Iterate over (bin range, count of observations).

compute_histogram(measurements: Sequence[Measurement]) List[Tuple[Range, int]][source]

Compute the histogram over measurements.

Returns

List of (bin range, count of observations for that bin)

Requires
  • len(measurements) > 0

Ensures
  • len(measurements) == sum(item[1] for item in result)