In case anyone was wondering, yes we were down for ~2 hours or so. I apologize for the inconvenience.

We had a botched upgrade path from 0.17.4 -> 0.18.0. I spent some time debugging but eventually gave up and restored a snapshot (taken on Saturday Jun 24, 2023 @ 11:00 UTC).

We’ll likely stick to 0.17.4 till I can figure out a safe path to upgrade to a bigger (and up-to-date) instance and carry over all the user data. Any help/advice welcome. Hopefully this doesn’t occur again!

  • pohart@lemmyrs.org
    link
    fedilink
    English
    arrow-up
    14
    ·
    1 year ago

    I suspect almost everyone here understands how this happens and feels more sympathy than anything. We’ve been there, and with higher stakes. Thanks for taking the time to administer this instance.

  • SoundsLocke@lemmyrs.org
    link
    fedilink
    English
    arrow-up
    14
    ·
    1 year ago

    No problem at all, the transparency is appreciated and it’s understandable. There will be learning curves and hiccups with any new and evolving tech being pushed into the limelight (Lemmy, in this case).

  • Aloso@lemmyrs.org
    link
    fedilink
    English
    arrow-up
    11
    ·
    1 year ago

    thank you for maintaining this server, and being very transparent about it ❤️

  • lemmyrs@lemmyrs.orgOPM
    link
    fedilink
    English
    arrow-up
    10
    ·
    1 year ago

    Sigh, I tried it one more time. Same problem. Downtime this time was probably < 5 mins though so hopefully not noticeable. I am wondering whether the database is in some bad shape (no idea why/how). I guess we wait and retry with 0.18.1.

  • RustSlinger@lemmyrs.org
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 year ago

    Sadness, jerboa doesn’t work with lemmyrs.

    Is there any other native Android client that I can use for lemmyrs while waiting?

    • lemmyrs@lemmyrs.orgOPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      1 year ago

      Well, the lemmy container kept running into:

      lemmy  | 2023-06-24T23:28:52.716586Z  INFO lemmy_server::code_migrations: Running apub_columns_2021_02_02
      lemmy  | 2023-06-24T23:28:52.760510Z  INFO lemmy_server::code_migrations: Running instance_actor_2021_09_29
      lemmy  | 2023-06-24T23:28:52.763723Z  INFO lemmy_server::code_migrations: Running regenerate_public_keys_2022_07_05
      lemmy  | 2023-06-24T23:28:52.801409Z  INFO lemmy_server::code_migrations: Running initialize_local_site_2022_10_10
      lemmy  | 2023-06-24T23:28:52.803303Z  INFO lemmy_server::code_migrations: No Local Site found, creating it.
      lemmy  | thread 'main' panicked at 'couldnt create local user: DatabaseError(UniqueViolation, "duplicate key value violates unique constraint \"local_user_person_id_key\"")', crates/db_schema/src/impls/local_user.rs:157:8
      

      despite the fact that:

      lemmy=# select id, site_id from local_site;
       id | site_id
      ----+---------
        1 |       1
      (1 row)
      

      So you can see that it was unconditionally trying to create a local_site and running into a DB constraint error. I further narrowed it down to this piece of code:

      ///
      /// If a site already exists, the DB migration should generate a local_site row.
      /// This will only be run for brand new sites.
      async fn initialize_local_site_2022_10_10(
        pool: &DbPool,
        settings: &Settings,
      ) -> Result<(), LemmyError> {
        info!("Running initialize_local_site_2022_10_10");
      
        // Check to see if local_site exists
        if LocalSite::read(pool).await.is_ok() {
          return Ok(());
        }
        info!("No Local Site found, creating it.");
      

      At this point I gave up because I couldn’t really tell why LocalSite::read(pool).await.is_ok() was, well…not ok.