Wednesday, April 2, 2025

What’s your take on using variational convolutional neural networks in TensorFlow Probability?

About a year ago, Nick Strayer showcased a method for categorizing daily activities using smartphone-derived gyroscopic and accelerometric data from recordings. Despite initial accuracy being superb, Nick conducted a meticulous examination of the classification outcomes. Were their actions more susceptible to misclassification than those of others? Were community reports of faulty outcomes filed with similar confidence levels to those submitted by individuals who had experienced proper procedures?

Upon analyzing the results of our communication using this approach, we’re actually referring to the probabilities calculated for the “successful” class following the application of the softmax activation function. Given a confidence threshold of 0.9, the consensus would be that the penguin is definitively a gentoo; at a rating of 0.2, the collective opinion is ambiguous, but with a slight lean towards considering it a cheetah.

Despite its persuasive tone, this passage’s usage of “confidence” remains disconnected from any notion of self-assurance, creditability, or forecasting; instead, it seems to allude to statistical intervals. What we would ideally want to accomplish is impose probability distributions on the community’s weights. Utilizing Variational Bayes’ Keras-compatible layers – this is indeed something we can accomplish effectively.

Provides insight into leveraging variational dense layers to quantify epistemic uncertainty estimates. In this context, we adapt the convolutional network presented by Nick to incorporate a variational approach throughout its architecture. Before commencing, let’s briefly recap our objectives.

The duty

Researchers studied various types of actions – walk, sit, stand, and transitions between them – to develop the . Two types of smartphone sensors have been utilized to record movement data: linear accelerometers, which quantify three-dimensional movements by measuring linear acceleration, and gyroscopes, which track angular velocity across the coordinate axes. The uncooked sensor data for six types of actions from Nick’s unique post are listed below.

With a focus on Nick’s approach, we’re going to drill down into these six types of exercises, leveraging our understanding of sensor data to make informed inferences about each one. To effectively utilize our dataset, we must first wrangle it into a suitable format for analysis. Building upon Nick’s foundation, we will initiate our workflow by leveraging pre-processed data, carefully partitioned into training and testing sets.

Observations: 289 Variables: 6 $ experiment    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 17, 18, 19, 2… $ userId        <int> 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 7, 7, 9, 9, 10, 10, 11… $ exercise      <int> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7… $ knowledge          <listing> [<data.frame[160 x 6]>, <knowledge.body[206 x 6]>, <dat… $ activityName  <fct> STAND_TO_SIT, STAND_TO_SIT, STAND_TO_SIT, STAND_TO_S… $ observationId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 17, 18, 19, 2…
Observations: 69 Variables: 6 $ experiment    <int> 11, 12, 15, 16, 32, 33, 42, 43, 52, 53, 56, 57, 11, … $ userId        <int> 6, 6, 8, 8, 16, 16, 21, 21, 26, 26, 28, 28, 6, 6, 8,… $ exercise      <int> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8… $ knowledge          <listing> [<data.frame[185 x 6]>, <knowledge.body[151 x 6]>, <dat… $ activityName  <fct> STAND_TO_SIT, STAND_TO_SIT, STAND_TO_SIT, STAND_TO_S… $ observationId <int> 11, 12, 15, 16, 31, 32, 41, 42, 51, 52, 55, 56, 71, …

The code required to reach this stage, as copied from Nick’s post, is available in the appendix section at the back of this webpage.

Coaching pipeline

The dataset in question is small enough to fit into memory; however, yours will not be, so it’s worth seeing some streaming in action. Given the advancements in TensorFlow 2.0, it is reasonable to assert that pipelines can now be used to train models with ease?

As soon as the code listed in the appendix executes, the sensor data will be available trainData$knowledge, an inventory column containing knowledge.bodyAre the rows aligned with a specific timeline, while each column represents a distinct measurement or variable? Notwithstanding, not all time collections have the same length; hence, we note the necessity to standardize and pad each collection to a uniform size. pad_size (= 338). The anticipated format of coaching sessions will likely consist of (batch_size, pad_size, 6).

Initially, we establish our coaching dataset.

 
<ZipDataset shapes: ((338, 6), (6,)), varieties: (tf.float64, tf.float64)>

Then shuffle and batch it:

 
<BatchDataset shapes: ((None, 338, 6), (None, 6)), varieties: (tf.float64, tf.float64)>

Identical for the check knowledge.

 

Utilizing tfdatasets Doesn’t preclude us from conducting a quick and informal assessment of our understanding?

 
tf.Tensor( [[ 0.          0.          0.          0.          0.          0.        ]  [ 0.          0.          0.          0.          0.          0.        ]  [ 0.          0.          0.          0.          0.          0.        [ array([[-0.00416672, 0.2375, 0.12916666, -0.40225476, -0.20463985, -0.14782938],  [-0.00333334, 0.26944447, 0.12777779, -0.26755899, -0.02779437, -0.1441642 ],  [-0.0250001, 0.27083334, 0.15277778, -0.19639318, 0.35094208, -0.16249016]]], dtype=float64)

Now let’s construct the community.

A variational convnet

Building upon Nick’s straightforward convolutional architecture, we implement minor tweaks to kernel dimensions and filter counts. Additionally, we eliminate all dropout layers, as no further regularization is needed on top of the priors applied to the weights.

What’s the probability that the Bayesified community will continue to thrive, despite its seemingly obscure roots in statistical inference? As the group’s reputation grows, so too does the likelihood of attracting new members who are eager to apply Bayesian principles to real-world problems – a prospect that’s both exciting and intimidating in equal measure.

  • The variability of every layer stems from its inherent nature; convolutional layers (), alongside dense layers (), both exhibit this characteristic.

  • By incorporating variational layers, we explicitly define both the prior weight distribution and the type of posterior, whereas previously, the defaults were employed, yielding a conventional regular prior and a standard mean-field posterior.

  • To mitigate the impact of the divergence metric used to quantify the discrepancy between prior and posterior distributions, consumers can exercise control; specifically, we adjust the default Kullback-Leibler divergence by the size of the training dataset.

  • A key aspect of any neural network model is its output layer. It’s a distribution layer, essentially a layer enveloping a distribution – where “enveloping” implies: Standardizing coaching within the community is expected, but predictions vary depending on individual knowledge levels.

 

Communities are urged to minimize their detrimental logging practices to mitigate its impact on the environment.

If this trend continues, it could potentially contribute to a significant setback. Although arranging this instance may seem crucial, it is merely a fraction of its overall significance. What drives the losses is predominantly the cumulative sum of KL divergences, consistently combined with mannequin$losses.

As you dissect each component of the loss in isolation, you’ll find it utterly captivating. To establish a comprehensive understanding, we will employ two key indicators:

 

While we allowed more preparation time than Nick in this specific setup, enabling an early stop option as well.

 

As the number of epochs increases, the overall loss decreases in a predictable linear fashion; however, this trend does not necessarily extend to metrics such as classification accuracy and negative log likelihood, which exhibit distinct patterns.

While accuracy in this variational setup is not quite as high as in non-variational settings, it’s still respectable for a six-class problem. As we observe, absent additional regularization, the model is unlikely to exhibit significant overfitting to the training data’s coaching expertise.

To extract predictions from the mannequin, you first need to provide input data that is relevant and consistent with its training dataset. This could include specific product descriptions, customer reviews, or other features that are important for making accurate recommendations.

Probabilistic predictions

While we won’t delve into this matter here, it’s valuable to recognize that our models generate more than just output distributions; in fact, they produce full-fledged Bayesian networks. kernel_posterior Let’s attribute and enter the hidden layer’s posterior weight distributions neatly.

Given the limited scope of our test data, we calculate all predictions upfront. Predictive probabilities currently manifest as a collection of categorical distributions, each tailored to a specific pattern within the batch.

 
tfp.distributions.OneHotCategorical(  "sequential_one_hot_categorical_OneHotCategorical_OneHotCategorical",  batch_shape=[69], event_shape=[6], dtype=float32)

We prefixed these predictions with one_shot Noting the inherent noise in these predictions: These outcomes result from a solitary iteration by the collective, with all layer weights drawn directly from their posterior distributions.

Based on the expected distributions, we calculate both implied and customary deviations.

 

The typical fluctuations thus acquired may indeed serve as a reflection of the overall. We will quantify an additional type of uncertainty, termed bootstrap sampling, by repeatedly iterating through the community’s data and recalculating the mean deviations for each iteration.

 

The concept of collective ownership has been reimagined.

(Note: No changes were made as the original sentence does not make sense. Therefore, the answer is the same as the input.)

 
# A tibble: 414 x 6      obs class       imply      sd    mc_sd label           <int> <chr>      <dbl>   <dbl>    <dbl> <fct>         1     1 V1    0.945      0.227   0.0743   STAND_TO_SIT  2     1 V2    0.0534     0.225   0.0675   SIT_TO_STAND  3     1 V3    0.00114    0.0338  0.0346   SIT_TO_LIE    4     1 V4    0.00000238 0.00154 0.000336 LIE_TO_SIT    5     1 V5    0.0000132  0.00363 0.00164  STAND_TO_LIE  6     1 V6    0.0000305  0.00553 0.00398  LIE_TO_STAND  7     2 V1    0.993      0.0813  0.149    STAND_TO_SIT  8     2 V2    0.00153    0.0390  0.102    SIT_TO_STAND  9     2 V3    0.00476    0.0688  0.108    SIT_TO_LIE   10     2 V4    0.00000172 0.00131 0.000613 LIE_TO_SIT   # … with 404 extra rows

Assessing predictive outcomes against underlying realities:

 
# A tibble: 69 x 7      obs maxprob maxprob_sd maxprob_mc_sd predicted    fact        appropriate    <int>   <dbl>      <dbl>         <dbl> <fct>        <fct>        <lgl>    1     1   0.945     0.227         0.0743 STAND_TO_SIT STAND_TO_SIT TRUE     2     2   0.993     0.0813        0.149  STAND_TO_SIT STAND_TO_SIT TRUE     3     3   0.733     0.443         0.131  STAND_TO_SIT STAND_TO_SIT TRUE     4     4   0.796     0.403         0.138  STAND_TO_SIT STAND_TO_SIT TRUE     5     5   0.843     0.364         0.358  SIT_TO_STAND STAND_TO_SIT FALSE    6     6   0.816     0.387         0.176  SIT_TO_STAND STAND_TO_SIT FALSE    7     7   0.600     0.490         0.370  STAND_TO_SIT STAND_TO_SIT TRUE     8     8   0.941     0.236         0.0851 STAND_TO_SIT STAND_TO_SIT TRUE     9     9   0.853     0.355         0.274  SIT_TO_STAND STAND_TO_SIT FALSE   10    10   0.961     0.195         0.195  STAND_TO_SIT STAND_TO_SIT TRUE    11    11   0.918     0.275         0.168  STAND_TO_SIT STAND_TO_SIT TRUE    12    12   0.957     0.203         0.150  STAND_TO_SIT STAND_TO_SIT TRUE    13    13   0.987     0.114         0.188  SIT_TO_STAND SIT_TO_STAND TRUE    14    14   0.974     0.160         0.248  SIT_TO_STAND SIT_TO_STAND TRUE    15    15   0.996     0.0657        0.0534 SIT_TO_STAND SIT_TO_STAND TRUE    16    16   0.886     0.318         0.0868 SIT_TO_STAND SIT_TO_STAND TRUE    17    17   0.773     0.419         0.173  SIT_TO_STAND SIT_TO_STAND TRUE    18    18   0.998     0.0444        0.222  SIT_TO_STAND SIT_TO_STAND TRUE    19    19   0.885     0.319         0.161  SIT_TO_STAND SIT_TO_STAND TRUE    20    20   0.930     0.255         0.271  SIT_TO_STAND SIT_TO_STAND TRUE    # … with 49 extra rows

Do customary deviations tend to be more substantial when misclassifications occur?

 
# A tibble: 2 x 5   appropriate rely avg_mean avg_sd avg_mc_sd   <lgl>   <int>    <dbl>  <dbl>     <dbl> 1 FALSE      19    0.775  0.380     0.237 2 TRUE       50    0.879  0.264     0.183

They are, albeit to a lesser degree than we would ideally like.

Within just six lessons, we will also investigate traditional variations in the person-prediction target pairings level.

 
# A tibble: 14 x 7 # Teams:   fact [6]    fact        predicted      cnt avg_mean avg_sd avg_mc_sd appropriate    <fct>        <fct>        <int>    <dbl>  <dbl>     <dbl> <lgl>    1 SIT_TO_STAND SIT_TO_STAND    12    0.935  0.205    0.184  TRUE     2 STAND_TO_SIT STAND_TO_SIT     9    0.871  0.284    0.162  TRUE     3 LIE_TO_SIT   LIE_TO_SIT       9    0.765  0.377    0.216  TRUE     4 SIT_TO_LIE   SIT_TO_LIE       8    0.908  0.254    0.187  TRUE     5 STAND_TO_LIE STAND_TO_LIE     7    0.956  0.144    0.132  TRUE     6 LIE_TO_STAND LIE_TO_STAND     5    0.809  0.353    0.227  TRUE     7 SIT_TO_LIE   STAND_TO_LIE     4    0.685  0.436    0.233  FALSE    8 LIE_TO_STAND SIT_TO_STAND     4    0.909  0.271    0.282  FALSE    9 STAND_TO_LIE SIT_TO_LIE       3    0.852  0.337    0.238  FALSE   10 STAND_TO_SIT SIT_TO_STAND     3    0.837  0.368    0.269  FALSE   11 LIE_TO_STAND LIE_TO_SIT       2    0.689  0.454    0.233  FALSE   12 LIE_TO_SIT   STAND_TO_SIT     1    0.548  0.498    0.0805 FALSE   13 SIT_TO_STAND LIE_TO_STAND     1    0.530  0.499    0.134  FALSE   14 LIE_TO_SIT   LIE_TO_STAND     1    0.824  0.381    0.231  FALSE  

While customary deviations remain relatively small, we observe slightly larger discrepancies in fallacious predictions.

Conclusion

Proven is our capacity to design, develop, and procure predictive models from fully variational convolutional neural networks. It appears that there is still scope for innovation: Multiple layer architectures are available; a custom prior can be defined; the dissimilarity might be computed differently; and conventional neural network hyperparameter optimization techniques remain applicable.

Then, the inquiry surrounding penalties – specifically, decision-making processes – arises. In situations characterized by substantial unpredictability and ambiguity, uncertainty’s intensity escalates, challenging decision-making processes. What will transpire when the path forward appears shrouded in mist, where even the most reliable information seems to falter under scrutiny, and all that remains is conjecture? While naturally arising questions like these may fall outside the scope of this setup, their importance in practical applications cannot be overstated.
Thanks for studying!

Appendix

Before running the setup code, it must be executed earlier? Copied from .

 
Reyes-Ortiz, J.-L., Oneto, L., Samà, A., Parra, X., & Anguita, D. 2016. 171 (C): 754–67. .

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles