Browse Source

hs-v3: BUG() on missing descriptors during rotation

Because of #25306 for which we are unable to reproduce nor understand how it
is possible, this commit removes the asserts() and BUG() on the missing
descriptors instead when rotating them.

This allows us to log more data on error but also to let tor recover
gracefully instead of dying.

Signed-off-by: David Goulet <dgoulet@torproject.org>
David Goulet 6 years ago
parent
commit
5804ccc907
2 changed files with 34 additions and 4 deletions
  1. 6 0
      changes/bug25306
  2. 28 4
      src/or/hs_service.c

+ 6 - 0
changes/bug25306

@@ -0,0 +1,6 @@
+  o Minor bugfixes (hidden service v3):
+    - Avoid asserting when building descriptors in the next rotation time is
+      out of sync with the consensus valid after time. Instead, log a bug
+      warning with extra information to hunt down the cause of this assert.
+      Fixes bug 25306; bugfix on 0.3.2.1-alpha.
+

+ 28 - 4
src/or/hs_service.c

@@ -1508,7 +1508,9 @@ build_all_descriptors(time_t now)
      * empty, we'll try to build it for the next time period. This only
      * happens when we rotate meaning that we are guaranteed to have a new SRV
      * at that point for the next time period. */
-    tor_assert(service->desc_current);
+    if (BUG(service->desc_current == NULL)) {
+      continue;
+    }
 
     if (service->desc_next == NULL) {
       build_service_descriptor(service, now, hs_get_next_time_period_num(0),
@@ -1925,6 +1927,31 @@ should_rotate_descriptors(hs_service_t *service, time_t now)
   }
 
   if (ns->valid_after >= service->state.next_rotation_time) {
+    /* In theory, we should never get here with no descriptors. We can never
+     * have a NULL current descriptor except when tor starts up. The next
+     * descriptor can be NULL after a rotation but we build a new one right
+     * after.
+     *
+     * So, when tor starts, the next rotation time is set to the start of the
+     * next SRV period using the consensus valid after time so it should
+     * always be set to a future time value. This means that we should never
+     * reach this point at bootup that is this check safeguards tor in never
+     * allowing a rotation if the valid after time is smaller than the next
+     * rotation time.
+     *
+     * This is all good in theory but we've had a NULL descriptor issue here
+     * so this is why we BUG() on both with extra logging to try to understand
+     * how this can possibly happens. We'll simply ignore and tor should
+     * recover from this by skipping rotation and building the missing
+     * descriptors just after this. */
+    if (BUG(service->desc_current == NULL || service->desc_next == NULL)) {
+      log_warn(LD_BUG, "Service descriptor is NULL (%p/%p). Next rotation "
+                       "time is %ld (now: %ld). Valid after time from "
+                       "consensus is %ld",
+               service->desc_current, service->desc_next,
+               service->state.next_rotation_time, now, ns->valid_after);
+      goto no_rotation;
+    }
     goto rotation;
   }
 
@@ -1977,9 +2004,6 @@ rotate_all_descriptors(time_t now)
       continue;
     }
 
-    tor_assert(service->desc_current);
-    tor_assert(service->desc_next);
-
     log_info(LD_REND, "Time to rotate our descriptors (%p / %p) for %s",
              service->desc_current, service->desc_next,
              safe_str_client(service->onion_address));