Skill Sharing Session #7 (log)

<didleth> do you have session right now?
<Alster> ideally yes
<Alster> i guess txopi and harcesz are missing
* txopi has joined #kosmos
* ChanServ grants op status to txopi
* txopi grants op status to didleth
<txopi> hi
<Alster> hi again
<Alster> so we're 3 now
<Alster> didleth: any idea whether harcesz is planning to join us?
<didleth> Alster: i don't see him
<didleth> but he said every time is good for him
<didleth> he is on list
<didleth> so he become the mails
<didleth> so i belive we can start
<Alster> we could wait until .15?
<Alster> i don't mind
<Alster> (both is fine for me)
<Alster> txopi: any preference?
<txopi> no problem
<txopi> i just arrived at home so i'm gonna do some things (in just 5 minutes...)
<Alster> so i assume starting in 5 minutes is fine
<didleth> for me ok
* Alster has changed the topic to: https://we.riseup.net/kosmos | https://we.riseup.net/kosmos/skill-sharing-session-6-log https://we.riseup.net/kosmos/skill-sharing-session-7-agenda  | https://we.riseup.net/kosmos/meeting-planner
<Alster> txopi suggested to work on remote backups today, since there is now a remote location available for this. He wrote the agenda accordingly (which I recently added to): https://we.riseup.net/kosmos/skill-sharing-session-7-agenda 
<txopi> ok, we can start
<txopi> didleth, are you ready?
<txopi> your dog is ready?
<txopi> :-D
<didleth> lol
<didleth> yes she is
<didleth> i meaned
<didleth> i was with her before meeting
<didleth> and can go after
<didleth> today it is snow :D
<Alster> hehe
<Alster> so, time constraints for this meeting
<Alster> how long shall it be?
<txopi> 00:00 max?
<didleth> for me ok
<txopi> if we finish sooner is ok for me
<Alster> i'm not in a hurry today, and i'm afraid the backups can take a long time, we may not even finish with a working backup today, though this should be our plan in my opinion
<txopi> tomorrow will be a long day for me
<Alster> ok 00:00 max is fine with me
<txopi> ok
<txopi> 2. Do we want a summary or IRC logs of this meeting? If so, who volunteers to do it?
<Alster> but i'm open for longer if it turns out you would like it then
<txopi> i can't stay longer
<Alster> ok
<txopi> if you want you can finish after i am gone to bed
<Alster> i can do the logs today
<didleth> ok, thx Alster
<didleth> aaaa... sorry
<didleth> can i add to agenda something now?
<txopi> NO!
<didleth> :-/
<txopi> yesssss :-)
<didleth> ok, sorry
<Alster> txopi: i'd prefer not to do any part of the backup alone
<txopi> Alster: i would like to see all the process
<Alster> :)
<txopi> that way i can make my own other times
<Alster> yes that's my point
<txopi> and not "ping Alster root" :-P
<txopi> didleth: what do you want to talk about?
<Alster> no such host
<didleth> i added it
<didleth> i meaned
<txopi> 4. Meeting in the next weekend
<didleth> or we want to meet in the next week and when - i belive planer is not good in this situation
<txopi> planner is just to help, we don't HAVE TO use it
<Alster> right
<didleth> i mean, we can use it but in christmas i belive its not helpful, beocuse it has to be a lot of terms every day and it wolud be to long
<didleth> ok, added to agenda
<Alster> ok, so to keep the agenda order: next meeting? saturday is fixed?
<Alster> it'S fine with me, i'm just wondering whether it is for you
<didleth> saturday is the 6 day of the week, right?
<didleth> for me ok
<txopi> i don't understand
<didleth> (sorry for stupid question, i have always problem saturday/sunday and in grmany too ;P)
<txopi> in the planner we choosed saturday as a good day
<txopi> so, what's the problem?
<txopi> in england sunday is the first day of the week
<didleth> eh... ok wait i look to dictionary
<didleth> planner i did with a dictionary ;P
<Alster> there's no problem.
<txopi> xDD
<txopi> didleth, you are the best :-P
<txopi> :-D
<txopi> sunday is "the day to go to the chuch"
<didleth> whell...its their fould.... why call they weeksday by the names in this same letter
<didleth> in polish is 'sobota' and 'niedzielia' and its ok ;P
<Alster> So I take it the meeting at Saturday (19.12) 17.00 CET is fixed, unless I hear differently.
<didleth> for me ok
<txopi> you know the "saturday nish fever" sing? of course it is in the weekend, in the day all the people go to dance at night
<Alster> I'm not sure what "4. Meeting in the next weekend" is about. It seems to be redundant with 3. then?
<didleth> well...not ecactly
<Alster> Or is this about the weekend 26./27.?
<txopi> sobota (polish) = sabado (spanish)
<didleth> its the meeting after the next meeting
<didleth> txopi: you see? spanish is good :D
<didleth> ok what i mean
<txopi> didleth, the meeting after the next meeting? i think i'm loosing something
<didleth> in the next week there are christmas
<didleth> so: the week is not as usualy
<txopi> ah
<txopi> ok
<txopi> let me see
<didleth> usualy in planer you give mon-friday 19-21, and weekend from 15.-23
<didleth> but in the next weekend there is more work-free day
<txopi> but, the next meeting before this is this saturday
<didleth> but no idea how it looks in yours cuntry
<txopi> why we have to talk about two meetings in the future?
<didleth> txopi: yest
<didleth> :/
<txopi> i'm just asking
<didleth> well...
<didleth> it just put in my head when i did planner and didn't know how to do it for the week in christmas
<txopi> ah
<didleth> we can talk about this in saturday if you want
<txopi> i think we can talk about it on saturday, if you don't mind
<txopi> perfect
<txopi> off-topic: didleth, polish and spanish are not very near languages: http://www.ikusimakusi.net/pub/2009/hizkuntzen-zuhaitza.png
<txopi> so:
<txopi> 3. Next meeting
<txopi> Saturday (19.12) 17.00 CET
<txopi> 4. Meeting in the next weekend
<Alster> I won't have any time between 22nd and 25th, 27th and 30th (CCC), nor on the 01st. I'd be happy to learn you're organising a meeting in between (if there's a topic). Personally I'll prefer to not have to attend a meeting between x-mas and January 01st 
<txopi> we will decide on saturday
<didleth> txopi: i know
<didleth> but - i supposte polish is nearer to spanis that basque ;P
<txopi> i will be very busy too, but still don't have clear my agenda 
<txopi> didleth, of course :-)
<Alster> deciding on the next but one meeting at the next meeting sounds good ot me.
<txopi> "one meeting at the next meeting sounds good" ?
<didleth> i go to christmas to my parents bo after this i will probably have to much time - maybe its a good motivation to sit about drupal
<txopi> Alster, can on 26th
<txopi> on saturday we will talk about 26 if possible or not
<txopi> i don't know ever where i will be that day :-/
<Alster> txopi: sorry to say it so complicated, i just agreed with you
<txopi> i prefer to let this point for a later day
<Alster> are we ready for the next topic?
<txopi> Alster, don't apologize. this days are very complicated for everybody. travels, family meetings, etc.
<Alster> yes that's right, txopi. ;-)
<didleth> hehe, so its the west in this same?
<didleth> komisch ;-)
<Alster> people are still too christian here, too
<didleth> i thoungt only in poland its so crazy-time becouse of catolic-country ;-)
<didleth> anyway, go to another topic
<Alster> or coca cola driven or whatever it is
<txopi> Alster :-)
<Alster> # Server hosting (Nadir virtual server status)
<Alster> this surely doesn't make it any easier txopi
<Alster> any news on the server hosting?
<txopi> i have no news from last meeting
<Alster> oh i guess i'm supposed to report, too
<Alster> tachanka had a busy week because of COP15
<Alster> so we didn't really get around to discuss hosting you, yet
<txopi> i asked to the people if someone can help asking to other servers that can host us, but i just received silence...
<txopi> ok
<didleth> maybe we wait with this topic
<txopi> if any of us has news about this, i think we can jump this point until... next year! :-)
<didleth> to the next week?
<didleth> i belive we have no new information now
<txopi> until we have news
<Alster> that's a common issue, we've had the same problem amongst de.i.o when german people with tech background asked the other people without tech background to help them find new servers. no response.
<txopi> english knowledge is an aditional problem
<didleth> the problem will be if all indymedia have no server once time and will die ;-/
<txopi> if spanish law were not so bad, we could use some servers i know
<txopi> but we can't :-/
<txopi> ip logging is obligatory here
<Alster> didleth: this did not happen so far and I don't expect it to happen too soon. 
<txopi> sindominio, mundurat, etc.
<Alster> txopi: you could still provide backup servers
<Alster> or even backend servers
<txopi> Alster, yes. one of the backup place i want to use is mundurat
<txopi> hey_neken is the sysadmin of mundurat.net
<txopi> and we already have the permission to use it
<Alster> is it illegal to forget passphrases for encrypted hard disks?
<txopi> but hey_neken is so bussy
<didleth> txopi: i belive ip logs are obviously everywhere in the europe
<txopi> i will achieve mundurat for backups. give me more time...
<txopi> Alster: in england it is
<Alster> it's not obligatory to log ip addresses in germany, unless you'Re a larger provider
<txopi> in spain if you are a "information society provider" you have the legal responsability to save logs for 12 months
<Alster> gah this sucks
<txopi> if you don't have that information can be much worse than the other option (until 600.000 euros fines)
<txopi> in spain the law is specially made to include everything
<txopi> if you have a powerful enemy they can use the lssi law and fuck you
<Alster> if we're lucky the german main court will have to ask EU to make new data retention laws since they are non-conformant with german consititutional law. but this won't be too soon.
<txopi> and now we have another law lmisi that is also very very bad
<didleth> txopi: in polish too - in 2 years as i know - but im not sure or this law come in a life
<didleth> but police have right to check your laptop and invigilate it without reason
<Alster> this is very interesting, i'd love to have a collection of these different terms throughout the world.
<Alster> but right now we'll need to concentrate on this meeting ;-)
<didleth> we can make orwell-map then ;]
<txopi> now, with the copyright excuse, spanish government is going to create a new law to block a lot of websites without judge
<Alster> so that we get our backups worked out
<didleth> ok
<txopi> tomorrow we have an online action in spain
<didleth> txopi: polish goerment too!
<txopi> we have done a lot of demos
* Alster went off-topic, too ;-)
<txopi> and a lot of thins i can't explain now because they are an off-topic :-)
<didleth> it write this, voted this - but this low would oprobably go in the constitutional tribunal
<Alster> maybe we can have this as a topic for one of the next meetings and take some notes
<didleth> maybe once we just make society-topic (as a party) and talkin about everything what we want to ;P
<txopi> the conclusion is: no indymedia servers in spain
<didleth> but no come back to the agenda
<Alster> Are we ready for the Introduction to backup strategies, yet?
<txopi> ready!
<Alster> cool. ok, so let's start with what is a backup and what not, and what is a backup good for
<Alster> the idea of a backup is to be able to restore your data, and often also to be able to get back to a working system quickly (thiough this is already a bit out of scope of the original 'backup' idea, it's rather called 'disaster recovery').
<Alster> you basically need to think about risks, which risks is the system you want to be able recover prone to?
<Alster> what can possibly go wrong?
<Alster> and which counter measures can you take against this in advance?
<didleth> yhm
<Alster> for example, in case of kosmos, the inherent issue we face right now is that the hard disks are broken and are already so broken that they are loosing data
<Alster> and while there is a copy of this data, it is on the same hard disk
<Alster> so this is not a real backup, it is just a copy, since it does not provide any additional safety
<Alster> if the HDDs fail, they fail, and both copies are gone.
<Alster> i'm putting this a little bit too simple, since a copy, even if it is stored on the same media, does in fact increase your chances to be able to recover the data, but still it is not a backup.
<txopi> aha
<Alster> kosmos is also setup with a so-called mirror RAID, which means that the data stored on one of the disks is also stored on the other disk.
<Alster> so if one of them fails there should still be a copy on the other drive.
<Alster> but this is not a backup either
<Alster> a so-called 'local backup' is a real backup, but this should be done on another system (server).
<txopi> RAID = redundant array of inexpensive disks (wikipedia
<didleth> so kosmos has mirror raid?
<txopi> )
<Alster> some of the partitons of kosmos are setup as a mirror raid, but not everything. i don't remember what exactly, and it does not matter much since at least 2 of the 3 HDDs in kosmos are broken.
<Alster> they have simply worn out, have exceeded their life span
<Alster> anyway, RAID is not a backup.
<didleth> yhm
<Alster> a local backup is a copy of all relevant data which is stored on another, separate system, at the same or nearby location.
<Alster> often in the same network segment
<Alster> but as you can imagine, this doesn't help if the place where both servers are located is set on fire, gets stormed away or is flooded.
<Alster> In fact hamburg, where I assume kosmos is located (I do not know for sure, it could be almost anywhere really), is known to be endangered by flooding
<didleth> :-/
<didleth> do you want to scary us Alster? ;D
<Alster> yes. it's very unlikely that it affects this colocation facility, I'm just discussing potential risks.
<didleth> 'it come a big flood and take kosmos away' - sounds like in a bibel ;P
<Alster> you will hardly find any location which is not subject to any risks
<Alster> hehe
<Alster> a comet can always crash into the earth somewhere
<didleth> yes - in a place in our backups!
<txopi> the idea is to have various backups geographically distributed
<Alster> but a much more probable risk in our case is actually that the cops raid the colocation facility and take all the servers with them
<Alster> oops
<Alster> there goes our data
<txopi> didleth: bibel?
<didleth> so the good idea is to have a backus in difrrent continent etc?
<didleth> a holy book for christian
<txopi> ah
<txopi> ok
<Alster> hehehe
<txopi> the Holly Bible!
<Alster> yes, exactly. a good backup is a remote backup. So that's a backup with is geographically diverse to the hosting location
<txopi> i read it every night to sleep quicker :-P
<Alster> you could also have multiple remoter backups at multiple remote locations, bt that's a bit over the top.
<didleth> i belive our Ziel now should be - make once remote backup in some not-broken disc
<didleth> and the rest of the backups doing after this
<txopi> he he, i agree
<Alster> in our case it's not just geographical locations which you should take into account, but also different legal environments and mutual legel treatments between countries.
<didleth> Alster: now you know what to do when you have to stand up erly and can't slepp at the evening? ;D
<didleth> ihm
<Alster> i just do a backup
<didleth> but i think the firs case is: broken disc
<Alster> well you'd really want to consider both
<didleth> i suppose that if we can do backups from this
<Alster> since if we just do a local backup now it does help, but only until the cops raid nadir
<didleth> the another will easier to do becouse of tech-reasons
<txopi> the machine we are going to use for the first backup is in the basque country (near to Bilbao)
<didleth> Alster: and how often it is necessary to make backups? once a day?
<txopi> the other backup machine i will get is located in Madrid
<Alster> so (common or different) legislations and mutual legal treaties between countries where the colocations are situated do matter, too. ideally you would not have your only remote backups in a country which has a mutual legal treaty with the country which your server runs in. this has mattered for indymedia before.
<Alster> so basque is in EU and germany is in EU so this is not perfect, but it sure is good enough for now
<Alster> more than good enough in our very case ;-)
<didleth> :]
<Alster> therE's more to think of about how to make backups
<Alster> of course, the bandwith of the production ('live') server (which you serve the actual content to the world to, or at least your Mir backend server) and of the backup server are a limting factor.
<txopi> kosmos has any problem with bandwith?
<Alster> in our case kosmos does not have good connectivity, and cannot push much data at a time (little bandwidth), and what's worse, it's connnections drop.
<txopi> ok
<didleth> yhm
<Alster> so the problem is on kosmos' end and it does not matter so much how well connected the backup server is as long as it is not as bad as kosmos
<Alster> or worse than kosmos rather
<Alster> so there are some counter measures for this issue:
<txopi> i have no idea about the bandwith of the backup server :-S
<txopi> the machine is on a rent flat so probably the band with isn't to big...
<Alster> you can just transfer small files, and retransmit them if the transfer fails. that's better than transmitting large files since you'll have to retransmit less if the connection fails.
<txopi> the transfer protocols shouldn't manage this situations?
<Alster> most residential internet connections have a big downstream bandwidth, i.e. internet -> home is a large bandwith
<txopi> ah, it's true
<Alster> txopi: ideally yes, it depends on the implementation how well this works.
<txopi> aha
<Alster> but if you have to transfer 10 million small files it also takes like forever
<Alster> but you also want to always have a somewhat current backup at your remote backup location
<didleth> so...it is possible to do  it on kosmos not in forever-time?
<Alster> so you must make sure the backup is transferred correctly before the next backup is created, otherwise it's not so usefull
<Alster> luckily there are incremental backups
<Alster> they work like 'patches' of 'diffs', in case you heard of this before
<txopi> i have heard
<Alster> with incremental backups, only changed files are transferred
<txopi> $ diff file1 file2
<didleth> i haven't
<Alster> so you don't need to transfer all the files every time
<didleth> but feel free
<didleth> yhm
<didleth> ok
<txopi> didleth, imagine poland imc we site
<Alster> 'diff' is actually somewhat misleading since it looks at the contents of files, but incremental backup mechanisms usually only look at changed files within a couple directories
<txopi> most of the files are the same (old articles don't change)
<Alster> please continue your explanation while i'm on the loo, txopi :)
<didleth> i belive i understan what you mind
<txopi> but the most new ones changes (because comments) and new ones are created
<didleth> so i transwer only the last changes
<txopi> do, when you use rsync to copy them (i don't know if rsync is an incremental backup tool), 
<didleth> and not the all concent
<didleth> right?
<txopi> it detects that most of the files haven't been changed since last execution and just copies the changed files
<txopi> rsync copies incrementally
<txopi> scp copies always all
<txopi> END.
<didleth> so in our case
<didleth> rsync is ok
<didleth> right?
<txopi> to update the mirror site is the best
<didleth> :-/
<txopi> if someone makes a comment in a very old article, rsync detect the change and updates from kosmos to the mirror server
<didleth> but last time we talk
<txopi> so, is what we want
<didleth> it takes to much time and its impossible
<didleth> becouse the mirror-site doesn't have filesystem and dataebase like in kosmos
<txopi> i don't remember what you are speaking about
<didleth> and it would have to copy every article one by one
* Alster is back but please go on
<didleth> (21:39:06) txopi: to update the mirror site is the best
<txopi> didleth, the mirror site has just the static content (shtml files, not the database)
<didleth> you did meanted, that we should make bacup from mirror server right?
<didleth> yes, that is what i talk about
<txopi> rsync compares all the files to decide if it has to update the file or not
<didleth> so we have to make backup directly from kosmos
<txopi> if the most of the content don't change, rsync is quick
<didleth> txopi: i'm not sure
<txopi> the first time you execute rsync (so all the files must be copied) is slow
<didleth> you have to decide manualy
<didleth> or you want to update only changes or all
<didleth> in polish mir-instalation
<txopi> if you regenerate a lot of files because a change in the mir templates or something like that, rsync must copy a lot of contents
<didleth> but it is possible that the last change are default
<Alster> you should always make your backups of what you want/need to restore. since web mirrors can be (mostly, aside from their server configuration) restored from a Mir production server, it is sufficient (and neccessary) to backup the Mir production server
<txopi> incremental is good if you change just a piece of the contents
<txopi> if you change almost everything in each copy, is better to copy all
<Alster> you could also backup the mirror server if it takes too long to recreate its contents from the production server.
<Alster> but that would be in addition to the Mir backend server, not an alternative to backing up the Mir backend server
<didleth> :-/
* harcesz has joined #kosmos
* ChanServ grants op status to harcesz
<didleth> hi harcesz
<Alster> hello harcesz 
<txopi> didleth, imagine a text document
<txopi> a very long one
<Alster> sorry txopi, i should have waited until you're done ;-)
<txopi> with 200 pages
<didleth> i belive we can just let mirror server away and continuate production server
<didleth> :-/
<didleth> but are you sure
<didleth> I don';t understand this?:-/
<didleth> i suppose i understand :-/
<txopi> wait
<didleth> ok
<txopi> can i put another example?
<Alster> fine with me!
<didleth> you can do everything ;D
<txopi> ok
<txopi> didleth, imagine a text document
<txopi> a very long one
<didleth> yhm
<txopi> with 200 pages
<didleth> yhm
<txopi> imagine, you are writing that document and make backup of it everyday
<txopi> the simple method is just to copy all the document each time to other directory/devide/machine
<didleth> sent it to e-mail, to the friend, etc
<txopi> yes :-)
<txopi> the idea is that mainly, you add content to the document
<txopi> sometimes you change some contents in the middle
<txopi> but just some little things
* didleth had her magister-thesis in a 20 of copies ;]
<didleth> ok i undersstand 
<txopi> so you are copying 20MB everyday and 19MB is alwais the same information
<didleth> (even if i don't practice this ;P)
* txopi hopes is distribute the copies geographically
<txopi> imagine that there is an easy method to detect with pages have been change and just copy the changed copyes to the other file (the backup file)
<txopi> that way when you execute the copy it just copies 1MB and you get the backup sincronized very quick
<txopi> it is better, don't you think?
<didleth> well - in the text i don't think so
<didleth> but in kosmos is good idea
<txopi> you copy less information and the result is the same!
<txopi> the document look good also
<txopi> as if you copy the whole document
<didleth> yes, but - you know, you can change something
<txopi> you can change everything if you want
<didleth> and if you copy all text every time
<didleth> you can see all versions
<didleth> like in wikipedia
<txopi> and the backup tool will detect wich pages (45, 100 and 201)
<txopi> didleth, don't think about version now
<txopi> i will explain versions later
<didleth> :-/
<txopi> just imagine this example
<txopi> you have a document with 200 pages, and you change page 45, page 100 and add page 201
<txopi> if you make a total backup, you copy 201 pages
<txopi> if you make an incremental backup, you just copy 3 pages
<txopi> and the result is the same
<txopi> you get the same document in the other side
<didleth> yhm
<txopi> if you are making a backup of a content that changes not too much, it is better to make incremental backups, isn't it?
<didleth> yhm
<txopi> ok
<txopi> now imagine
<txopi> that you want to have all the versions of the document of each day
<txopi> the document size is 20MB and you wan't to save a backup of the last 100 days
<txopi> you need 2000MB, 2GB of disk space
<txopi> that's a lot!
<txopi> and a lot of the content inside the backup directory are files with almost the same infromation
<txopi> there is a lot of redundancy
<txopi> that don't sound a very good backup method isn't it?
<didleth> yhm
<txopi> the incremental backup sistems, just save the diferences
<txopi> the original file (20MB) and then the diferences of each day (1MB first day, 2MB second day...)
<txopi> and you can tell to the tool: give me the file of the day 36
<txopi> and it composes the document authomatically
<didleth> yhm
<txopi> reads the first file and add the incremental changes
<txopi> so, you have a lot more small backup
<txopi> and you can get a version of each day you want
<txopi> that is an incremental backup as far as i know
<txopi> do you understand?
<txopi> Alster, i did any error?
<didleth> yes i undersand
* txopi doesn't know what happens into the incremental backup really
<txopi> :-D
<Alster> txopi: no error, but i think there a little bit which is not exactly how it works in the real world. but that's my fault since I started explaining it talking about 'diff' and file contents.
<Alster> ok, so, to be really precise, I need to correct this a little. While most backup utilities use this priciple txopi explained, they do not actually copy partial contents of changed files. Instead, they look at directories (equivalent to the text document as in txopis explanation) and see which entries in these directories (subdirectories or files) have changed and copy any added or modified files completely (always), or make a note 
<Alster> that some file was deleted.
<Alster> So they work on a higher level. They do not record the exact changes within files, but they record the exact changes within monitored file systems. 
<Alster> Example: if you had every page of the original text document in a single file, and edit only the files with pages 4 and 5, then the incremental backup will only copy the files with pages 4 and 5 next time it checks. so, yes rsync can be used as an incremental backup tool, but it needs a bit more than this.
<Alster> let me know when you'Re done reading this and if whether it's understandable until this point.
<didleth> for me understandbal
<Alster> great, so I'll just assume it's the same for txopi :-)
<txopi> yes
<txopi> continue
<Alster> So that's where the versioning comes in. If you rsync two directories between two servers, you always have the latest version of all files on both servers. That's good. But this way you _only_ have the latest version on both sides. Now imagine someone hacks into kosmos and deletes all files (or someone does it accidentially). What will be on the 'backup server'? The same: nothing. That's not what we want!
<Alster> So you want to keep more than just the latest backup on the backup server, at least reaching back a couple days or weeks.
<Alster> Unfortunately this means you may need quite some space on the backup server. Incremental backups help a lot there. You can also compress your backups, which can also help a lot.
<didleth> yhm
<Alster> But there's a little problem with incremental backups: to restore them, you always need to go over all the incremental backups you have created since the last complete backup you did
<txopi> wich tools are used for backups?
<didleth> brb
<Alster> that'S the next topic, txopi :)
<txopi> ok ok :-D
<Alster> i know we're slow
<txopi> no, i just wanted to put a name to the concept
<Alster> it's up to you whether we should stop explaining the backgrounds and start getting our handy dirty
<Alster> unless you two ask otherwise, i'll continue explaining, but it may mean we don't get to start doing the backups today
<Alster> so... up to you
<txopi> aha
<txopi> continue
<Alster> ...so if you created a complete backup 6 months ago and, since then, only created incremental backups, and your Mir backend server disks break completely, you will have to do the restore process starting with the 6 months old backup, going through all the incremental changes which happened since then (which can be a lot)
<Alster> ...until the point/date you actually want restored.
<Alster> so the (almost) ideal way is to do a mixture of full/complete and incremental backups
<Alster> good backup utilities will create a full backup from time to time, and incremental backups between those
<Alster> depending on how many changes you have, it can be good to have a full backup after 10 or 20 incremental backups
<txopi> aha
<txopi> didleth, are you there?
<Alster> i guess the dog wants more snow
<didleth> txopi: yes i am
<txopi> :-)
<Alster> everything fine so far?
<Alster> there' one or two more things before we come to actual implementations
<didleth> Alster: sory, i just was in a bethroom and that i readed und tried to understand what you write :]
<Alster> i'll just go on for now, interrupt me if you need a break or have questions
<Alster> no problem
<didleth> yhm
<Alster> ok, sorry got distracted myself
<Alster> well, so what we have found out so far is that backups should be
<Alster> * remote (optionally local, too, so you can restore them more quickly if they are still available there)
<Alster> * a mixture of full and incremental backups
<Alster> * compressed so they don't take so much space on the backup server and don'T waste so much bandwith while transferring
<Alster> and there's one more thing to this
<didleth> ?
<Alster> we may have some sensitive data in our backups, such as admin logins to the Mir moderation website, or IP addresses in the Mir posting filter, or hidden articles which contain sensitive information.
<Alster> so even when we took measures to secure our Mir (or whatever it is we use for our IMC website) server, it doesn't mean the backup location is as secure, too.
<didleth> Alster: i thought
<didleth> kosmos no write ip
<Alster> and we need to take precautions to not leak information through backups which we had safeguarded on the production server
<Alster> kosmos does not record IP addresses to log files, I don't think. But Mir can be configured to disallow certain IP addresses from connecting/posting while they are connecting. that's two things: whom do you allow access, and do you note down who has been accessing.
<txopi> Alster, you are rignt
<txopi> right
<txopi> but anyway you gave the example of the hidden articles, and is a very good example of why the backup must be secure
<Alster> but let's get back to securing backups: you will often not be able to check, reliably, for yourself whether the backup server is secure.
<Alster> so there are two options:
<Alster> * make sure you can trust your data to those who run your backup server, to protect it from cops and the like. Such as by using hard disk encryption, which may or may not suffice to protect against cops (hard disk encryption only safeguards you then the storage location is not mounted or the server is off, not while it is running!).
<Alster> * encrypt your backups before you send them to the backup server, and make sure they cannot be decrypted by anyone but you. This means you do not need to trust those who provide you with the backup location to safeguard your data against cop raids etc.
<txopi> on the case of the machine we aregoing to use
<didleth> but in this second way
<didleth> you have to be sure you are not arrasted etc ;?
<didleth> ;>
<didleth> so maybe a few people should have an acces?
<txopi> i don't think we have and encrypted partition, but the partition is just for us, so we can ask to encrypt it
<Alster> didleth: right, but this would be bad in the firstcase, too ;-)
<txopi> but i think it will be more easy if the use the second method
<txopi> i suppose that the second method is slower, need more machine processor
<txopi> isn't it?
<Alster> yes, encryption is slower an more likely to cause problems, though just marginally.
<Alster> so it's fine
<Alster> however, if you choose the option of encrypting your backups, you need to think about how you will encrypt and decrypt it later, too.
<Alster> you could just encrypt with a password or passphrase. but those can be cracked AND must be stored on the production server so that the automatic encryption can actually take place.
<Alster> so if someone gets to know this password and has a copy of your encrypted backups that's all they need.
<Alster> (which already means they fucked you a lot, but still something to consider)
<Alster> the other option are pulbic/private key pairs, such as used by GPG
<txopi> if they get the password from the production server, probably  you are fucked anyway
<txopi> aha
<Alster> yes
<txopi> using publick key encription sound very good strategy
<txopi> i didn't heard it
<Alster> with public/private keys you can have just the public key on the mir server. it can be used to encrypt, but not to decrypt
<txopi> yes
<Alster> so whoever gets ahold of this key cannot decrypt your backups (you neither!)
<Alster> these 'keys' are actually just two small files, and they can be encrypted by a password, too.
<txopi> but if the attacker goes to the backup server, it has all the information to read the data?!
<txopi> so finally we trust in that passphase, isn't it?
<Alster> in case of backups, you would put the public key on the mir server and create the encrypted backups with it. and you would create a corresponding private key file (ideally with a password on it) and distribute this amongst your collective. people would probably store it on their desktops or whereever they think it's safe.
<Alster> and they would know the password for the private key, too.
<Alster> and hopefully dont forget it something
<didleth> :-/
<Alster> if you ever need to recover the backups, you need the private key to decrypt them
<txopi> aha
<Alster> this is the only way to recover them
<Alster> ...aside from brute forcing the key file (see you in 100 years!) or a possible yet unknown design error in the encryption scheme.
<Alster> so this is rather secure.
<txopi> aha
<Alster> you just need to be sure you don't shoot yourself in the foot and the cops don't get ahold ofeither A.1. your production server or B.1. your encrypted backups, 2. your private backup decryption key and 3. your private backup decryption key password.
* txopi is lost
<txopi> can you write the last sentence in some more sentences?
<Alster> of course. 
<txopi> get ahold or get hold?
<didleth>  you just need to be sure you don't shoot yourself in the foot and the cops don't get ahold ofeither A.1. your production server or B.1. your encrypted backups, 2. your private backup decryption key and 3. your private backup decryption key password.
<didleth> for me ununderstable too :-/
<Alster> to get ahold of something = to get your hands on something, to take possession / control of
<txopi> ok
<Alster> ok, if the risk you want to secure against is cops raiding places and taking things with them then you need to do both of this:
<Alster> A. make sure they do not get access to the raw/unencrypted files on your production/Mir/Drupal server
<Alster> B. they may not get all of this together (as long as they just get one or two of it it is still ok): 
<Alster>   1. your encrypted backups from the backup server
<Alster>   2. the private key files (which can be used to decrypt the encrypted backups) your collective members probably store at home
<Alster>   3. the password the private key file is encrypted with.
<didleth> Alster: this mean - before we beginn backups i should better security my computer that nobody can't read logs?
<txopi> they need 1+2+3 isn't it?
<Alster> yes txopi, regarding the backups (B) they need all of these three parts 
<txopi> Alster, ok
<Alster> didleth: you should always try to make the locations where you store sensitive information (such as backups, passwords, key files, ...)  as secure as possible while not destroying their usability.
<Alster> i.e. if your computer is so safe you cannot browse the internet anymore it doesn't help you either.
<txopi> you are supposing that they know where is the backup server, and if you don't get into the production server it isn't so easy to know
<didleth> i havwe no saifty computer
<didleth> becouse i don't know how to secure this
<didleth> the main thing right now is - do i have to learn it before we start backup?
<didleth> and do i have to install everything from the begining?
<Alster> Whenever you have two or more factor authentication (such as PIN/TAN for online banking, key file + password, passport and airline ticket) you should try to store them at different locations.
<Alster> txopi: not neccessarily. you need to think the other way around, too. IMC servers get raided regularly. One of the last servers they took had backups of several IMCs and other projects on it. If these backups were encrypted, the cops may have realized 'oh this looks like the backups of IMC Euskalherria (for example), let's tell and pass the files to our spanish friends'. And the spanish cops could then try to get the other two fact
<Alster> ors, too, namely the private key file and the password, by simpyl raiding those whom they already know do the moderation on IMC EH.
<Alster> didleth: not neccessarily. even if the key fil is not protected because you store it on your unencrypted hard disk, there is still the password needed to use it AND they also need toget the encrypted backup files form your backup server.
<Alster> this scenario of the cops getting all three things (encrypted backups form the backup server, private key file from your home PC, passowrd for the private key file from.. ideally...your brains (or your password manager software which has a master password on it which you can remember))) is quite unlikely.
<Alster> so this is almost theoretical
<Alster> it is MUCH more likely that they try to get the production server, since then they have all they need already.
<txopi> i understand
<Alster> :)
<Alster> how about you didleth?
<didleth> more ore less
<didleth> i belive
<didleth> good idea is
<didleth> make a password
<didleth> who is long
<didleth> but have a sence who we understand
<didleth> and dont have it wroten
<didleth> only have it in head
<Alster> yes, that's the ideal situation regarding passwords.
<txopi> yes, it is
<txopi> but think that the cops also know this
<txopi> in the case of ETA, the cops forced information protected with gpg because the passphase was a sentence included in a song
<txopi> like "dashingthoughthesnowinaonehorseopensleight"
<Alster> :-/ that's not a good passphrase then
<txopi> no it isn't
<txopi> but i think we already know how to think a good password
<Alster> it's not too easy really. I always try not to have to repeat it since I can hardly remember all the things you need to keep in mind to compose a secure password.
<didleth> i belive we can find a good password
<Alster> yes i'm sure
<didleth> but it shoud have numbers, big and small letter and interpunktions-sings, not nly letter like in a example up, right?
<txopi> right
<txopi> and must be long
<Alster> it should be able to have that, i.e. if you use a password creation utility, then you should configure that to allow for such characters, too
<Alster> first of all you should deicde whether to use a password or a passphrase
<txopi> if you use a password you must remember it
<txopi> or store it in a secure manner
<txopi> so you are in the same situation as in the beginning
<Alster> for private keys they ususally suggest using pass phrases but personally i think a good password is better. But passphrases can provide sufficient security if they are long enough and not easily determined such as the one txopi just told us about.
<Alster> a good password is hardly rememberable by more than one person
<Alster> if you have a shared password, which you will need to have if you distribute the private backup decryption key amongst several collective members, I'd choose one of these options:
<txopi> i suppose you can join all the passwords ans encrypt them with a passphase :-)
<Alster> * passphrase: have a good passphrase which everyone can and does remember
<Alster> * password: everyone of you installs a password manager on their home PC, and stores the password in it, and everyone of you sets a secure master password or passphrase on their password manager which they can remember.
* didleth has no secure password menager probably :-/
<Alster> in both cases, you should ideally check regularly that there are still enough people who have access to the backup decryption key password/passphrase.
<Alster> you can install pwsafe (command line utility), PasswordSafeSWT (java utiltiy with a grpahical user interface) or... (looking it up now)
<Alster> keypassx
<Alster> I use PasswordSafeSWT currently. It allows me to have a graphical interface now, and I caould also get access to the password database through pwsafe (they use the same file format) command line interface (at least I _think_ so, have not verified it yet!) when I don't have access to a graphical user interface
<Alster> anyway, we're quite offtopic now ;-)
<didleth> yes
<didleth> today is an offtopic-dai belive ; )
<Alster> well these are important things to know, kind of a foundation/basics, so it's good to talk about it.
<Alster> one last thing to mention about the backups. just really quickly....
<Alster> if someone hacks into your production server and wants to delete all your data, depending on how you create and transfer your backups to the backup server, they may be able to do so.
<txopi> aja
<Alster> i.e. they may get access to your production server AND your backup server at the same time
<Alster> just because you have a password-less ssh key pair installed so that the production server can push (copy) the back files to the backup server.
<Alster> an attacker who hacked the production server could use it to connect to the backup server, delete all the backups there, and also delete all the files on the production server.
<Alster> there are some strategies to prevent this, as discussed here: https://wiki.boum.org/TechStdOut/EncryptedBackupsForParanoiacs
<Alster> this is a very good document which I recommend reading when you have moooore time.
<Alster> Most of the things we discussed today and many implementations (backup softwares) are discussed here: http://dev.riseup.net/grimoire/backup/
<Alster> Both documents may be a bit outdated by now. But not much.
<Alster> so, that part is done. Any questions? Anyone still awake=?
<didleth> Alster: and it is possible to have remote backups in a server which is not running the all time only in a time to making-data?
<Alster> didleth: sure, if you can ensure that the backup server will be available as long as the backup is running.
<txopi> Alster, i will read that pages other day. good references :-)
<Alster> didleth: personally I would try to use a backup location which is always online, though.
<Alster> Independantly of the protocol you choose to copy the files from the production to the backup server (such as SCP, SFTP, FTP, RSYNC, HTTP,...) there are basically two methods you can choose from:
<Alster> pull and push
<Alster> 'push' means the productions erver initiates the transfer of the backups.
<Alster> 'pull' means the backup server initiates the transfer of the backups.
<Alster> both options come with their very own advantages and disadvantages. Personally I prefer 'push' for a variety of reasons which are also discussed in the two document I referred you to.
<didleth> yhm
<txopi> ok
<Alster> so we have 20 minutes left
<didleth> i don't think so its enought to start making backup ;D
<Alster> we could use them to define our requirements for the backup and to form our backup strategy
<Alster> other suggestions?
<txopi> no suggestions
<Alster> so, let's do it:
<Alster> is the backup location always online?
<Alster> how much space is available in the backup location (we discussed this before) and how much do we actally need?
<txopi> is always online and available
<Alster> how is the connectivity between the live and backup server (discussed earlier, too)?
<txopi> availabel space is 35GB
<txopi> i don't know how much we need
<txopi> i don't know how it is the connectivity
<txopi> i can give you both the information to login into the server
<Alster> is everyone who has physical access to the backup server trusted in that they will not delete the files on purpose?
<txopi> if you mean someone that broke the door and stole the hd, no untrusted people can acces de backups server
<didleth> :-/
<didleth> i dont understand the last
<Alster> is the backup location on an encrypted drive and all of those who have access to the backup server are trusted to know what they are doing (i.e. not create copies of unencrypted backups on unencrypted media)?
<txopi> i can give you the connections info by talk on kosmos
<txopi> the partition is ext3 now
<txopi> but i think we can change it
<Alster> by 'delete files on purpose' I mean that they could be actually freinds with the policy without you knowing, trying to work against you.
<txopi> we don't have permissions for that, but the owner of the computer is from eh-imc and perhaps knows how to do it or we can help him
<Alster> so that they could disable your access to both the production and the backup server in a concerted action at the same time.
<Alster> i don't think I need to know the authentication credentials nor the backup server location at this time.
<txopi> i think that the machine is in a shared flat (i'm not sure of that), so we can't be very trusted about people deleting files on purpose
<Alster> of course you cannot be 100% sure of any of this, but you need to make educated guesses on probabilities.
<Alster> even though it's a shared apartment/flat it may not be actually likely that one of the people living there actively cooperates with the cops which in turn cooperate with the cops in germany (or whereever the production server is located)
<Alster> so I would not worry so much about this point, I just listed it for completelness, since it is a theoretically possible risk
<Alster> i'll be right back
<txopi> ok
<txopi> ten minutes left
<Alster> back
<Alster> i think we found out last time how much space a full backup would require, didn'T we?
<Alster> also how much of it is composed of the database dump and the files in the file system matters.
<Alster> that'S because the database backup will probably need to be copied completely every time.
* Alster is checking logs at the url in the topic
<txopi> but as you explained, incremental backups need a lot of space :-(
<didleth> need at least 750 MB RAM and 60 GB disk space, but more RAM and 100 GB would be much better
<txopi> 31 GB
<didleth> its from mail to nadir but it was not only backups
<txopi> abe 09 23:21:14 <Alster> so: /dev/md0 100MB Used + /dev/md1 24MB Used + /dev/md2 572MB used + /dev/md4 133GB used = /dev/md* ~134 MB used
<txopi> abe 09 23:22:34 <Alster> that's total used disk space currently.
<txopi> abe 09 23:22:40 <Alster> minus 103GB /var/backup
<txopi> abe 09 23:22:58 <Alster> = 31 GB
<Alster> right
<Alster> a full DB backup is:
<Alster> kosmos:~# ls -lah /var/backup/December-2009/2009-12-17/
<Alster> total 3.5G
<Alster> drwx------  2 root root 4.0K Dec 17 06:37 .
<Alster> drwx------ 19 root root 4.0K Dec 17 03:00 ..
<Alster> -rw-------  1 root root 2.3G Dec 17 05:04 03:00-postgresql_database-eh_indy-backup.gz
<Alster> -rw-------  1 root root 1.2G Dec 17 06:26 03:00-postgresql_database-poland_00-backup.gz
<Alster> -rw-------  1 root root  486 Dec 17 06:37 03:00-postgresql_database-postgres-backup.gz
<Alster> kosmos:~# 
<Alster> so 3.5G for gzip compressed DB dumps.
<Alster> this probably needs to be transferred every night
<txopi> aha
<Alster> It may turn out to be better to store those database dumps uncompressed and really use a utility which does incremental file content backups.
<Alster> I will need to determine which utility does this since I don't remember.
<txopi> 3.5 GB/backup * 20 backups = 70 GB
<Alster> of those 31GB, we do not need to backup all of it, such as the operating systems (we only need the configuration there, so mostly /etc which is not much, and changes rarely)
<txopi> we can't store 20 days of backups
<Alster> if it was just database backups, yes
<Alster> we also need to store backups of file uploads and Mir templates and Mir/operating system configuration, though
<Alster> kosmos:~# du -sch /var/www/
<Alster> 15G /var/www/
<Alster> 15G total
<Alster> kosmos:~# 
<txopi> i don't know where are the file uploads
<txopi> at /var/www/ ?
<txopi> (35GB - 15GB) / 3.5GB = 5.7
<txopi> we can store 5 days of backups?
<Alster> i think there is a lot of cruft in /var/www
<Alster> we will need to look more closely
<Alster> kosmos:~# du -sch /var/www/euskalherria.indymedia.org/mir/
<Alster> 72M /var/www/euskalherria.indymedia.org/mir/
<Alster> 72M total
<Alster> kosmos:~# 
<Alster> thise should be EH Mir templates
<Alster> this is just a symlink to /home/eh/mir
<Alster> kosmos:~# du -sch /home/eh/mir
<Alster> 72M /home/eh/mir
<Alster> 72M total
<Alster> kosmos:~# 
<Alster> kosmos:~# du -sch /var/www/euskalherria.indymedia.org/site/
<Alster> 8.5G /var/www/euskalherria.indymedia.org/site/
<Alster> 8.5G total
<Alster> kosmos:~# 
<Alster> this should be uploads and html files
<txopi> aha
<Alster> kosmos:~# du -sch /var/www/pl.indymedia.org/mir/
<Alster> 94M /var/www/pl.indymedia.org/mir/
<Alster> 94M total
<Alster> kosmos:~# 
<Alster> PL Mir templates ^^
<Alster> kosmos:~# du -sch /var/www/pl.indymedia.org/site/
<Alster> 5.1G /var/www/pl.indymedia.org/site/
<Alster> 5.1G total
<Alster> kosmos:~# 
<Alster> PL file uploads and html files ^
<txopi> aha
<Alster> kosmos:~# du -sch /etc
<Alster> 7.5M /etc
<Alster> 7.5M total
<Alster> kosmos:~# 
<Alster> operating system configuration (mostly)
<txopi> 72M + 8.5G + 94M + 5.1G + 7.5M = 13.7G
<txopi> more or less
<Alster> so we have 3.5G gzip compressed database dumps and 14 GB flat files
<Alster> yes, you'Re more precise
<txopi> yes, but we have to let some space for the future
<Alster> we need to take into account that the data grows over time, too
<Alster> right
<txopi> 18G each backup
<txopi> but perhaps we don't need to backup all in the same velocity
<Alster> so the backup space does not suffice for two complete backups
<txopi> i mean the gzip must the most changing file
<txopi> the rest less
<txopi> or at least we can generate it from the gzip (static content)
<Alster> what you're saying means incremental backups will not neccessarily be large. but it doesn't change the size of a complete backup, unless I misunderstood you.
<txopi> if we backup the gzip every night and the rest every week, for one week we need:
<txopi> 3.5G * 7 + 14G = 48.5G
<txopi> i'm not thinking about incremental backups, just files copied :-)
<txopi> i have no idea of the space needed for incremental backups
<txopi> you told that the database must be copied entirely each time, isn't it?
<Alster> yes, unless we use a utility which does inspect in-file changes
<Alster> which would only work if the DB dumps were not encrypted
<Alster> which would only work if the DB dumps were not compressed
<txopi> (35G - 14G) / 3.5G = 6
<Alster> ('encrypted' was wrong, sorry)
<Alster> before we think about incremental backups, let's focus as full backups, since their size cannot be changed.
<txopi> the incremental backup only would work if the database is not compressed?
<txopi> ok
<txopi> (35G - 14G) / 3.5G = 6
<txopi> we have space for 6 days
<Alster> do you agree with this? <Alster> so the backup space does not suffice for two complete backups
<txopi> yes
<Alster> ok, I think this is a problem, since we should probably store at least two full backups
<txopi> but i proposed to you to make backups of the db every day and the rest every week
<Alster> well 2 full backups is fine, but less is not good.
<txopi> why is not fine?
<txopi> one sounds enough, not two
<Alster> it would men you need to delete all the backups before you even start to create the next full backup 
<txopi> ah
<Alster> so if something breaks in between you end up with a situation where you have no recoverable backup
<txopi> i understand
<txopi> so, there is not enough space for two
<Alster> so 2 fll remote backups is not really optional but pretty much neccessary
<txopi> the machine doen't fit to our needs?
<Alster> either that, or we need to find ways to decrease our data
<txopi> static contents can be generated from the db...
<Alster> I did not spend much time on determining these values, so they may be incorrect or exxagerated
<Alster> that's true, but it takes a loooooong time, I would not recommend relying on this.
<txopi> yes, but we also have the mirror server
<Alster> where looooooong is actually several days if you're generating multiple years of static files
<txopi> if we have a disaster on kosmos, at least we can compile the info form backup and mirror
<txopi> it isn't the best option but for now we don't have more
<Alster> using the mirror server as a backup works as long as you can _ensure_ that the files on the mirror server will not be overwritten by empty files or because of (accidentially) deleted files on the production server.
<txopi> aha
<txopi> and we can't ensure that
<Alster> I don't think so, not with rsync.
<Alster> And I don'T know of a better option than rsync for the mirroring
<Alster> but you're right in that this is an option if there can be no other option
<txopi> 48.5G - 35G = 13.5G
<Alster> but i think in this case i'd prefer to be your backup location. And/or, that's the other option, to have seperate backup locations for EH and PL
<txopi> if we had 13.5G more we could make two full backups
<Alster> <txopi> 18G each backup
<txopi> /dev/sda1              46G   20G   25G  45% /
<txopi> /dev/sda3              37G  177M   35G   1% /home/indy
<Alster> i think we said one full backup is 18G, so two are 36G
<Alster> oh your caclulation includes incremental backups, right?
<txopi> oooh
<txopi> we could try to make the 35G partition bigger, but 36G bigger is absolutely impossible
<txopi> ok
<Alster> hmm no i think the 48G you calculated were _only_ incremental backups.
<Alster> so yes, it would need to be remarkably more than 36G
<txopi> i don't know how to calculate incremental backups, so i don't think i calculated that
<txopi> 72M + 8.5G + 94M + 5.1G + 7.5M = 13.7G
<txopi> <Alster> so we have 3.5G gzip compressed database dumps and 14 GB flat files
<txopi> a full backup are 18GB, isn't it Alster?
* didleth is wondering - am i need you in something right now?
<txopi> 18G * 2 = 36G
<txopi> we need one more giga?
<Alster> My rough bet for two months of daily backups incl. 2 full backups (all other incremental) would be 2 x 18G = 36 G full backups + 60 x 500M = 30G, which totals at 66G, and requires that we find a way for incremental database dump backups
<txopi> didleth, i don't think so. you can read the logs tomorrow
<txopi> i have to go to bed too...
<didleth> ok
<didleth> i will sleep i belive
<Alster> ok, go to sleep everyone, we will continue on saturday
<txopi> 500M why?
<Alster> just a guess, i don't expect that you cahnge more data each day
<txopi> aaah
<txopi> ok
<txopi> i don't know
<txopi> we can continue on saturday
<txopi> ok?
<Alster> yes :)
<didleth> ok, so we will continate in saturday
<txopi> good night!
<Alster> bye bye
<didleth> good nice and nice sleep to you both :]
<txopi> Alster, don't forget the logs ok?
<Alster> of course not :)
<txopi> same didleth 
<didleth> thx :]
<txopi> great
* txopi has disconnected (Quit: Leaving)