Announcement

Collapse
No announcement yet.

Data Maintenance

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Maintenance

    Difficult choices lay ahead. For those of you in the know, I run this place out of my pocket rather than relying on advertising revenue etc. We upgraded the database server a while back and that increased stability and performance.

    It's coming towards that time again. However, there are alternatives. The one I'm looking at right now is to reduce the number of posts/threads on the database.

    I'm looking into this, but right now the best option would be to remove older posts from Off Topic, and since the forum's been up for quite some time then there are items in there that haven't really been accessed for years. There's nearly a quarter of a million posts in there, so I'm thinking along the lines of maybe removing a hundred thousand of them as a starter - the oldest stuff.

    The real downside to this? Some stuff will be lost forever. For the most part that isn't going to be an issue. However, there may be some items of sentimental value for people of this community. One that springs to mind is anything posted by Plaidman, if anyone wants to keep hold of memories. Maybe other notable posts by yourself or others.

    I'd like commentary in this thread, but I'm intending trimming in about two Sundays time. That's Sunday the 6th January. That's a little over a week. I could be persuaded to be lazy and leave it another week.

    Thoughts please.

    Rapscallion

  • #2
    I don't see why not. As long as people who want to archive old posts have time to do so (maybe ask for a volunteer to do it and ZIP/RAR an archive elsewhere, so you don't have 50 people slamming the Search system all at once?), nuking years-old posts would probably help.
    "For a musician, the SNES sound engine is like using Crayola Crayons. Nobuo Uematsu used Crayola Crayons to paint the Sistine Chapel." - Jeremy Jahns (re: "Dancing Mad")
    "The difference between an amateur and a master is that the master has failed way more times." - JoCat
    "Thinking is difficult, therefore let the herd pronounce judgment!" ~ Carl Jung
    "There's burning bridges, and then there's the lake just to fill it with gasoline." - Wiccy, reddit
    "Retail is a cruel master, and could very well be the most educational time of many people's lives, in its own twisted way." - me
    "Love keeps her in the air when she oughta fall down...tell you she's hurtin' 'fore she keens...makes her a home." - Capt. Malcolm Reynolds, "Serenity" (2005)
    Acts of Gord – Read it, Learn it, Love it!
    "Our psychic powers only work if the customer has a mind to read." - me

    Comment


    • #3
      pk: I used to have reliable (paid) hosting, but I had to drop it a while back due to finances >_<
      "For a musician, the SNES sound engine is like using Crayola Crayons. Nobuo Uematsu used Crayola Crayons to paint the Sistine Chapel." - Jeremy Jahns (re: "Dancing Mad")
      "The difference between an amateur and a master is that the master has failed way more times." - JoCat
      "Thinking is difficult, therefore let the herd pronounce judgment!" ~ Carl Jung
      "There's burning bridges, and then there's the lake just to fill it with gasoline." - Wiccy, reddit
      "Retail is a cruel master, and could very well be the most educational time of many people's lives, in its own twisted way." - me
      "Love keeps her in the air when she oughta fall down...tell you she's hurtin' 'fore she keens...makes her a home." - Capt. Malcolm Reynolds, "Serenity" (2005)
      Acts of Gord – Read it, Learn it, Love it!
      "Our psychic powers only work if the customer has a mind to read." - me

      Comment


      • #4
        The hosting space isn't an issue - we can do that on here. It's getting the information in some form of format that doesn't impact on the SQL database memory.

        Rapscallion

        Comment


        • #5
          That's IMO not that difficult.

          pre-render each thread prior to thread #x, save as thread#.html

          Then in php, if thread < x, load html version.

          You may need to split the various threads among different directories to reduce the number of html files per directory for performance reasons; I'd recommend splitting them up according to the last two digits of the topic #.

          Comment


          • #6
            You're right, in principle, that it's not that difficult. Here's the crux of the problem, though: Searches. Even with full text indexing turned on, searches really hammer the server.

            To put things in perspective, the directory that holds the full CS database (and only that) is currently sitting at 1.6G in use (and that's with the tables being optimized fairly regularly). That means that any search which winds up being deemed not able to use the full text search will result in a table scan, where everything winds up getting re-read. This blows away the operating system cache, resulting in the indexes for those tables having to be re-loaded. Sure, we can add more memory, but that costs more money every month.

            So, the next option, the one you've just mentioned (pre-rendering the pages), seems simple on the surface but causes new problems. Assume for a moment that the first 5000 threads were pre-rendered, each of them just to one long page containing all posts in that thread. Here's a list of things that aren't easily resolved (just in the order I think of them).
            • Some of those threads are in restricted groups. Mod only areas, or not for the general public (General Work Chat, for instance). Those have to be hidden in some fashion, depending on access of the user reading.
            • Search is now broken. Unless you want to provide a patch to modify vbulletin to search through all the archived html files, and then deal with those permissions I just mentioned above.
            • Some individual comments will provide issues. Suppose people want to be able to see the edit history of the comments? Should that be pre-rendered too? What if a mod edited those comments due to issues with the content? What then?
            • Some comments have been deleted, or at least hidden, by the mods. They should be able to see the full version. What about them?
            • Search engines are also going to be negatively impacted. Right now, if the site is crawled by Google/etc, then everything works as expected. With the pre-rendered idea, we have to handle the 302/301 redirects properly. or else our page rank will get dropped (possibly very much so).
            • What level of pre-render should we do? Should it include the whole theme? Should it use server side includes, and just write a page that has just the post details?


            Those are just off the top of my head. There's probably others, too. I think the amount of work to make the pre-rendered threads work well is going to be higher than it's worth, honestly.

            Comment


            • #7
              First, doesn't get said enough, thanks for the site Raps. And specifically, thanks for doing things like this - keeping us aware/giving a heads up.

              Since the decision is a time/money one, and it's not OUR time and/or money, I gotta go with the non-SC answer of "It's totally your call" but what you've outlined sounds like a pretty good solution. Maybe you could snapshot and offline archive the whole thing before you start the deletions? (That way, if one of us becomes global dictator some day, you've got a little inside info on them. )

              On the "people wanting to preserve" side - I can't imagine that bumping the search wait time up for a couple of weeks would kill anybody, or you could let us know when the best times to do extensive search and retrieves would be.

              And thanks for the info Peds. Always interesting to see the guts side of db use.

              Comment


              • #8
                But... but... we won't be a megapost site anymore!

                I feel a shrinkage in the old bloviationer.
                I am not an a**hole. I am a hemorrhoid. I irritate a**holes!
                Procrastination: Forward planning to insure there is something to do tomorrow.
                Derails threads faster than a pocket nuke.

                Comment


                • #9
                  I'm thinking in two Sundays time I'll do some pruning of old stuff. Unless anyone comes up with a pertinent reason not to.

                  Rapscallion

                  Comment


                  • #10
                    Is there, perhaps, a way to make a sister-site or sub-site that holds just html-converted archived threads?

                    Only basic formatting required to make them readable (bold, italics, font size and quotes [if possible]), no search (forum or Google, etc), each thread has it's own page and each page is an entire archived thread.

                    Deleted and hidden posts ignored and discarded since the archive is for nostalgia purposes only - same with edit information unless it can be easily included with other formattin info. And stick to only html-page archiving threads that are open to all forumites so that no admin-only information is released.

                    I have absolutely no idea how difficult or not this would be, just throwing ideas out there.

                    ^-.-^
                    Faith is about what you do. It's about aspiring to be better and nobler and kinder than you are. It's about making sacrifices for the good of others. - Dresden

                    Comment


                    • #11
                      I like Andara's thoughts.
                      1129. I will refrain from casting Dimension Jump and Magnificent Mansion on every police box we pass.
                      -----
                      http://orchidcolors.livejournal.com (A blog about everything and nothing)

                      Comment

                      Working...
                      X