Overview

Dataset statistics

Number of variables1
Number of observations13880
Missing cells3254
Missing cells (%)23.4%
Duplicate rows1311
Duplicate rows (%)9.4%
Total size in memory216.9 KiB
Average record size in memory16.0 B

Variable types

TimeSeries1

Timeseries statistics

Number of series1
Time series length13880
Starting point1983-01-01 00:00:00
Ending point2020-12-31 00:00:00
Period1 day
2024-05-12T15:33:19.424442image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-05-12T15:33:19.708351image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Alerts

Dataset has 1311 (9.4%) duplicate rowsDuplicates
Flow has 3254 (23.4%) missing valuesMissing

Reproduction

Analysis started2024-05-12 19:33:17.351815
Analysis finished2024-05-12 19:33:19.358309
Duration2.01 seconds
MissingQ_Station_NA_24017640_ok_Missing.csv
Download configurationconfig.json

Variables

Flow
Numeric time series

MISSING 

Distinct8382
Distinct (%)78.9%
Missing3254
Missing (%)23.4%
Infinite0
Infinite (%)0.0%
Mean0.13673998
Minimum-1740.8
Maximum1841.4
Zeros32
Zeros (%)0.2%
Memory size216.9 KiB
2024-05-12T15:33:20.173288image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum-1740.8
5-th percentile-287.595
Q1-44.32925
median5.95
Q362
95-th percentile258.675
Maximum1841.4
Range3582.2
Interquartile range (IQR)106.32925

Descriptive statistics

Standard deviation193.65542
Coefficient of variation (CV)1416.2312
Kurtosis13.300936
Mean0.13673998
Median Absolute Deviation (MAD)53.55
Skewness-1.1424242
Sum1452.999
Variance37502.421
MonotonicityNot monotonic
Augmented Dickey-Fuller test p-value0
2024-05-12T15:33:20.643885image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
2024-05-12T15:33:21.651992image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Gap statistics

number of gaps191
min4 days
max2 years and 1 week
mean2 weeks, 4 days and 37 minutes
std10 weeks, 1 day and 9 hours
2024-05-12T15:33:22.256068image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
0 32
 
0.2%
4 18
 
0.1%
14 12
 
0.1%
18 12
 
0.1%
1 12
 
0.1%
16 11
 
0.1%
20 11
 
0.1%
-12 11
 
0.1%
23 11
 
0.1%
-6 11
 
0.1%
Other values (8372) 10485
75.5%
(Missing) 3254
 
23.4%
ValueCountFrequency (%)
-1740.8 1
< 0.1%
-1725.2 1
< 0.1%
-1643.7 1
< 0.1%
-1589.6 1
< 0.1%
-1585.66 1
< 0.1%
-1566.6 1
< 0.1%
-1517.2 1
< 0.1%
-1509 1
< 0.1%
-1391.4 1
< 0.1%
-1386.1 1
< 0.1%
ValueCountFrequency (%)
1841.4 1
< 0.1%
1319.8 1
< 0.1%
1315 1
< 0.1%
1209.68 1
< 0.1%
1169.6 1
< 0.1%
1167 1
< 0.1%
1166.2 1
< 0.1%
1128.6 1
< 0.1%
1126.4 1
< 0.1%
1110.81 1
< 0.1%
2024-05-12T15:33:21.075122image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ACF and PACF

Interactions

2024-05-12T15:33:18.747262image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Missing values

2024-05-12T15:33:19.110088image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-12T15:33:19.304442image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Flow
Date
1983-01-01NaN
1983-01-02NaN
1983-01-038.000000e-01
1983-01-04-4.700000e+00
1983-01-051.421085e-14
1983-01-061.460000e+01
1983-01-07-1.650000e+01
1983-01-08-4.000000e-01
1983-01-094.800000e+00
1983-01-10-1.200000e+00
Flow
Date
2020-12-22-60.850
2020-12-23107.940
2020-12-24-95.640
2020-12-25187.020
2020-12-26-234.030
2020-12-2733.840
2020-12-2838.443
2020-12-293.841
2020-12-3029.295
2020-12-3172.815

Duplicate rows

Most frequently occurring

Flow# duplicates
1310NaN3254
4910.032
5864.018
5241.012
73114.012
77818.012
351-12.011
374-9.011
398-6.011
451-2.011