FileHistory is skipping Files without notice / Microsoft hates The Crystal Method

August 17, 2017

So I have a Windows PC here, and I’ve installed Windows 8.1 instead of 7 because

basically of one killer feature: FileHistory. After all, having an almost fully automated backup is a very nice thing to have, especially if you got used to it with TimeMachine.

Now the only problem is, that FileHistory is a buggy piece of shit. It silently skips files and doesn’t include them in the backup.

Which is in some sense kinda funny, because backing up your files is the SINGLE PURPOSE OF THE FRIKKIN APPLICATION!

I first thought that the issue is due to the MAX_PATH thing in Windows. Because of ancient compatibility layers, a file path (i.e. the full directory + filename) must not have more than 260 characters. MAX_PATH is/was a constant that lots of programmers included in their applications. When backing up files to the destination directory, FileHistory changes the filename: Essentially filename.end become filename (2017_08_15 21_53_46 UTC).end – this is the way FileHistory keeps track of different file versions. So the limit is actually below 260 characters.

I have a complex file structure in my home directory, and my files look like this:

C:\Users\user\MyFiles\Musik\mp3\Alben\The.Crystal.Method-Legion.Of.Boom-Advance-2004\The_Crystal_Method-Legion_Of_Boom\00-the_crystal_method-legion_of_boom-advance-2004-(back)-ph.jpg

which should be come

D:\FileHistory\user\EUROPA\Data\C\Users\user\MyFiles\Musik\mp3\Alben\\The.Crystal.Method-Legion.Of.Boom-Advance-2004\The_Crystal_Method-Legion_Of_Boom\00-the_crystal_method-legion_of_boom-advance-2004-(back)-ph (2017_08_15 21_53_46 UTC) .jpg

But the file is never copied up. And it’s not just the cover jpg, all mp3 files are missing, too.

And indeed that destination is a long filename. But still only 243 characters, so no problem. In fact, I did have some files which were longer. They showed up as errors in the Event Log (that is start the event viewer, then Applications and Services Logs, then Microsoft, Windows, FileHistory-Engine.

However the whole album from Crystal Method was simply skipped. No error message at all. Just silently skipped. Actually it wasn‘t just these mp3’s, some important insurance contract files were missing, too. But hey, I have priorities. I mean I got fond of “Keep Hope Alive” from the album “Vegas”, which was used in the movie Replacement Killers. Quite average movie, but the music…

The really annyoing thing is that you cannot file a bug report at Microsoft. They have no bug tracker. They do everything to not get bothered with bug reports.

Of course you can google and then find out that zillions of other people have the same problem, and you can go to answers.microsoft.com, ask your question, and then you will get an answer from some MVP/MCSE/BST (BST stands for Bullshit Talker), who will post a reply that completely ignores your question, so that he can claim he answered your question in order meet company goals on the amount of answered questions. Even though he didn’t.

So this is really really annoying – a backup that you cannot trust is no backup at all.

A very neat application in Linux is BackInTime. It works similar to FileHistory – and by similar I mean the backup strategy, not the skipping files at random thing – in that it copies all your files to your destination drive on the first run. On subsequent runs it checks wether there are new files or files have changed, and then copies these files. If a file did not change, simply a (hard)link is created that links to the file stored in the previous backup. For home users, who rarely have a lot of changing files (think an mp3 or movie collection), this is actually quite perfect.

A very appealing thing is also that in the worst case, if you really need your data, it’s all there – not stored in some proprietary backup container, it’s just there, and a simple file copy restores all your data.

Unfortunately I could not find any open source solution on Windows with the same functionality. There is a script by German c’t magazine heise Backup Tool but it’s very primitive, no GUI – my wife wouldn’t be able to use it. Also it cannot copy files which are currently in use, which is a must have if you want to backup your system while working (my browser and mail program is always running while backing up, and thus the files which store my mails are also opened).

Then there is True Image, but it seems they store their backups in a proprietary file format, i.e. you will need True Image in case of a recovery. And of course as Murphy dictates, just in that moment, you won’t have a working installation and the setup won’t work, and well, these things.

There is also HardlinkBackup which seems to do what I want, but it’s not open source, the free version also can’t copy open files, and the pro version is quite costly. Also, it seems to be a one man show, and I am always a little skeptical when building my whole strategy on a one man show. What if he retires – then I have to setup my workflow again from scratch.

To paraphrase the Hitchhikers Guid to the Galaxy, I am quite sure that the FileHistory guys will be first against the wall when the great revolution comes.

Update @25..08.2017: After evaluating various solutions, I finally settled with Acronis True Image. There are several reasons for that; but the main point is that Acronis works similar to TimeMachine:

  • True Image creates a full system backup, i.e. you can completely recover you whole system.
  • The backup is stored in Acronis proprietary .tib file format. This is probably the only negative point. Nevertheless, as long as you have True Image installed (or created a rescue disk) you can mount *.tib files as drives and access/recover any file individually
  • Despite the full system backup, files are saved incrementally, i.e. only changed files are copied. You can even go for differential backups, i.e. only the changed parts of a file is saved. (However this takes more time and is of course more risky, because after several several differential backups, all of these backups have to be intact to be able to restore a file). You can also schedule to do 1 Full + 5 incremental and then repeat by starting with a full one.
  • Backups can be automated, scheduled, and stored to a USB drive or network drive.
  • It is amazingly fast. For my system the backup time of TrueImage was approximately half compared to the time required by FileHistory or HardlinkBackup (the latter was approx. 8 hours, vs. 4-5 hours for TrueImage).

After googling a lot, I was able to buy a 3-PC license for 35 Euro, which is definitely worth the money.

Interestingly, when it comes to convenience, this definitely tops BackInTime and anything else that I got used to on Linux…

Advertisements

Linux Wifi on Thinkpad E470

Juni 11, 2017

So wifey needed a new computer, and we bought a Thinkpad E470.

Which so far is running ok in Ubuntu 16.04 except the WIFI card. Which is a big showstopper.

It seems the Thinkpad E470 comes with different Atheros 10k based chipsets, and this one had a QCA9377. It seems the brand name is Atheros Killer N1525 Wireless-AC, but I am not 100% sure. Support for the QCA9377 Wlan was only recently added to various distributions, as far as I know you need at least a kernel 4.8. However even then the Wifi connection drops randomly. Googling revealed that this is apparently a common bug (link in German).

After a dropout, manual reconnection is then required – in everyday life this is quite annoying. This is apparently due to a firmware crash. You can check if you are affected by this bug by

dmesg| grep ath10k

which should yield something like:

[ 715.689510] ath10k_pci 0000:05:00.0: firmware crashed! (uuid bf474904-06ea-4611-80b3-949e2ac31e80)
[ 715.689560] ath10k_pci 0000:05:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 17aa:0827
[ 715.689571] ath10k_pci 0000:05:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0
[ 715.691880] ath10k_pci 0000:05:00.0: firmware ver WLAN.RM.2.0-00180-QCARMSWPZ-1 api 5 features wowlan,ignore-otp,no-4addr-pad crc32 75dee6c5
[ 715.692947] ath10k_pci 0000:05:00.0: board_file api 2 bmi_id N/A crc32 6fc88fe7
[ 715.692963] ath10k_pci 0000:05:00.0: htt-ver 3.26 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[ 715.694993] ath10k_pci 0000:05:00.0: firmware register dump:
[ 715.695012] ath10k_pci 0000:05:00.0: [00]: 0x05030000 0x000015B3 0x009860FA 0x00955B31

This behaviour is documented also here, but it seems no patch has been issued. In the above linked thread on github, someone observed that these connection dropouts occur only when frequency changes are isssued by the router. I set a fixed frequency (i.e. Wifi channel) on our AVM FRITZ!Box 7330, but this did not help at all. Running Ubuntu 16.04, I also tried updating to a very recent mainline kernel build (Kernel 4.11), however again this did not help at all.

In the end, I gave up an ordered an Intel Wireless 8265. There is support for this (link in German) one in Linux since Kernel 4.6. Fortunately, the Wifi card can easily replaced on the Thinkpad E470, since the card is socketed in an M.2 slot, and the notebook-case itself is easy to open. Also, as the above link mentions, there seems to be no hardware white-list, i.e. no restriction w.r.t. the choice of the Wlan module.

Nevertheless, it is quite annoying that in the year 2017 there are still these issues. Reminds me of the old days with ndiswrapper. If you want to run Linux, you still need to do endless hours of research before buying a notebook.

Which in this case I did, e.g. this compatibility test (link in German), but they apparently missed that bug in the test. Note that the Thinkapd E470 is also Ubuntu certified. I guess that says a lot about the value of such kind of certification program.

Let’s see if the Intel Wifi will work…

*Update: The Intel Wifi card works without problem. No connection losses anymore. Haven’t run any benchmarks w.r.t. WLAN performance, but everything seems to work fine! You can say what you want, but Intel’s open source policy is really excellent…“

 

Baikal, CardDAV, CalDAV and Raspberry Pi

Januar 13, 2017

… so I gave it another try, and Baikal runs quite well, much better than radicale. The only drawback is that you can’t share calendars in the sense, that you cannot give read-only access of your calendar to another user. So basically that means that my wife and I have each one calendar, have them each added both on our cellphones in DAVDroid and have full access on each others calendar. Which is ok. Hope she won’t mess with my appointments 🙂

Again, my setup is: Raspberry runs local in my home-network, hence not much security, no SSL or other shenanigans.

Here in short the installation:

Install Lighttpd, php5, sqlite:

sudo apt-get install lighttpd
sudo apt-get install php5-common php5-cgi php5
sudo apt-get -y install sqlite3
sudo apt-get install php5-sqlite
sudo lighttpd-enable-mod fastcgi fastcgi-php

Download Baikal:

sudo -i
cd /var/www
wget https://github.com/fruux/Baikal/releases/download/0.4.6/baikal-0.4.6.zip
unzip baikal-0.4.6.zip 
cd baikal
touch Specific/ENABLE_INSTALL
cd Specific
mkdir db

Set rights for www-directory:

chown -R www-data:www-data /var/www/baikal
cd /etc/lighttpd/
vim lighttpd.conf

set root folder for www in config-file:

server.document-root = "/var/www"
sudo service lighttpd force-reload

Point your browser to

http://192.168.xx.xx/baikal/html/admin/

create admin, create (at least one new user)
For the USERNAME-field I recommend not to take an email address, since the
resulting URL for the calendar will be simpler

Now for DAVDroid:
Create new account, user „Use URL and username“. The URL is:

http://192.168.xx.xx/baikal/html/cal.php/calendars/USERNAME-field/default

For login, the username is the chosen USERNAME-field and password.

DAVDroid will scan and find the resulting addressbok. After enabling, giving rights to
DAVDroid and syncing, the calendar will show up in the Android calendar app. Same for the Android Contacts app.

For Thunderbird, install Lightning, and use the same URL as above. When questioned for the login, user same data as above.

To sync contacts in Thunderbird, install the SoGo Connector addon from here:

https://sogo.nu/download.html#/frontends

Tools -> Adressbook
Then File-> New -> Remote Adressbook. URL is

http://192.168.XX.XX/baikal/html/card.php/addressbooks/USERNAME-field/default/

username and password for login as above.

Yay!

Happy syncing. If you would like to create backups (e.g. via cronjob), the database is in

/var/www/baikal/Specific/db/db.sqlite

 

Setting up Radicale on a raspberry pi

Dezember 24, 2016

… so I wanted to escape the monster called Google, that wants all my data but at the same time is so convenient. Setting up radicale for CalDAV was more tricky then expected, since most tutorials are outdated. In particular, there is a package in raspbian available, so there is no longer a need to pip-install. In particular:

sudo apt-get install radicale

And then

sudo vim /etc/default/radicale

remove the # before

#ENABLE_RADICALE=yes

This enables radicale to run as a server.

Next there are some permission issues with the raspbian-package. I don’t remember where I picked up the next lines, but it works:

sudo -i

service radicale stop
rm -rf /var/log/radicale
mkdir /var/log/radicale
touch /var/log/radicale/radicale.log
chown -R radicale:adm /var/log/radicale
service radicale start
exit

Since security is no issue for me (radicale runs only in my internal network, and I don’t sync while outside) the next very simple config file works for me:

sudo mv /etc/radicale/config /etc/radicale/config.backup

sudo vim /etc/radicale/config


[server]
hosts = localip:5232

[encoding]
request = utf-8
stock = utf-8

[auth]
type = htpasswd
private_users = alice, bob
htpasswd_filenalme = /etc/radicale/users
htpasswd_encryption = plain

[rights]
type = from_file
file = /etc/radicale/rights

[storage]
type = filesystem
filesystem_folder = /var/lib/radicale/collections

[logging]
config = /etc/radicale/logging
debug = TRUE

where localip is the ip of your raspberry within your local network, e.g. 192.168.0.10

The users file is simple:


alice:alicepassword
bob:bobpassword

And the rights file is roughly taken from the documentation:


[allow-everyone-read]
user: .*
collection: .*
permission: r

[owner-write]
user: .*
collection: ^%(login)s/.*$
permission: w

The idea is: Everyone can read everyones calendar, but only users themselves can edit.

Now this is not a _secure_ solution at all. When testing, it seems everyone can edit everyone’s calendar… So if you want a secure solution, just add an http proxy inbetween or … I don’t know, use owncloud or something. The important point for me was just that my wife and I have two separate calendars and we can view each others.

Then
pi@raspberrypi:~ $ sudo service radicale status
● radicale.service - LSB: Radicale CalDAV and CardDAV server
Loaded: loaded (/etc/init.d/radicale)
Active: active (running) since Fri 2016-12-23 18:28:24 UTC; 2s ago
Process: 1148 ExecStop=/etc/init.d/radicale stop (code=exited, status=0/SUCCESS)
Process: 1157 ExecStart=/etc/init.d/radicale start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/radicale.service
└─1176 /usr/bin/python /usr/bin/radicale --pid=/var/run/radicale/radicale.pid --daemo...

Dec 23 18:28:24 raspberrypi radicale[1157]: Starting Radicale CalDAV server : radicale.
Dec 23 18:28:24 raspberrypi systemd[1]: Started LSB: Radicale CalDAV and CardDAV server.

Great! If you go to http://your_local_ip:5232/ you should see

Radicale works!

Next, start e.g. Lightning, File, New Calendar, Network. Remote calendar access: Protocol is CalDAV. Location is http://your_raspi_ip:5232/alice/calendar.ics/ for alice‘ calendar, then add http://your_raspi_ip:5232/bob/calendar.ics/ for bob’s. Calendars are automatically created. Trailing slash is important!

Next I downloaded DAVdroid. DAVdroid is GPL, but if you download from play store you have to donate approx 3 Euro. I instead use F-Droid, and download from the F-Droid store the GPL’ed version for free. Next open DAVdroid, add new account, check „login with URL and username“, then

a) on Alice smartphone, create one account with URL http://your_raspi_ip:5232/alice/calendar.ics/  and username „alice“ and password „alicepassword“, and another account with the URL http://your_raspi_ip:5232/bob/calendar.ics/ and username „alice“and password „alicepassword“.

b) on Bob’s smartphone, do the same but with bob’s and alice’s urls and bob’s username and password.

If you sync locally like me, afterwards check settings for each account (the cog-wheel in the upper right), check „sync only via WLAN“ and limit the ssid to your home wlan.

Go to the Android calendar apps, go to settings (in Android 6 it’s the button on the upper left corner), uncheck Google Sync, and instead check „calendar.ics“ both for Alice and Bob.

You can now directly edit within the Android Calendar App, and everything is synced.

A few notes:

  • The lack of true open source apps for Android is disturbing. It seems everyone is going for the quick buck, either via apps loaded with ads, or by releasing GPL’ed apps for money (i.e. re-licensing via app store). Sure that’s legally perfectly fine, and I can understand it somehow (spend zillion of hours in open source, with zero donations in return) but still, …. where is the open-source spirit?
  • This whole CalDAV thing sucks big time. I spend a whole day getting radicale to run, the documentation is missing for the crucial things (like the config files), the package on debian is broken, you have to be extremely careful when entering URLs and stuf… seriously, just enabling „sync“ for Google Calendar is way easier. Sure Google then knows again a little bit more about you, your hobbies, everything… but it’s so convenient

Update: I reverted back to Google Calendar & Contacts. That’s two clicks. In particular:

  • Radicale seemed to ignore the rights settings no matter what I set, i.e. from_file with my own definitions, owner_write, etc. So anytime Alice could edit Bobs calendar. For me, security wasn’t really a matter, however safety was – I wanted to prevent Alice from accidentally making changes to Bob’s calendar.
  • DAVDroid would suck too much battery time. At least that was my impression.
  • Syncing contacts was very annoying. Basically you first have to export your local (or Google stored) contacts and then re-import them. But re-import how and where? I finally managed to find a CardDAV add-on for Thunderbird that somehow worked, but it boiled essentially down to lot’s of manual re-edits. Moreover I couldn’t manage to sync with the add-on and radicale.
  • Radicale has no web-frontend. That however would be very convenient if one doesn’t want to rely on Thunderbird add-ons.
  • Radicale seemed to ignore the storage location I specified in the config file. That made backup quite difficult.

I could’ve tried more, maybe I’ll try nextcloud on my shared-hosting box in future. But right now I’ve already wasted almost two days on this, and it seems to me it just isn’t worth the effort…  not mentioning the administrative overhead in future. And yes, maybe DAVical or baikal or anything is easier to set up, but I doubt it.

Die zauberhafte Beruhigungskraft von Takemoto Piano

Dezember 4, 2016

Die hzB und ich sind im Moment wirklich ziemlich fertig. Konstanter Schlafentzug – nie mehr als vier Stunden am Stück – und überhaupt, gesunde Säuglinge finden immer einen Grund zum Schreien: Windel voll, Hunger auf Stillen, wollen auf den Arm, Bauchschmerzen, sind einfach mit der Gesamtsituation unzufrieden – im Bauch war bequemer, hier muss man selbst atmen, essen etc. – das nervt natürlich.

Aus Informatikersicht quasi ein nicht-deterministischer Zustandsautomat, bei dem man verschiedenste Inputs ‚reingibt (neue Windel, im Arm halten, Stillen, vor der Spülmaschine oder Dunstabzugshaube hin- und herlaufen) und hofft, dass man zwischendurch mal eine kleine Schreipause bekommt.

Die hzB fand dann auf jeden Fall dieses Video von Takemoto Piano.

Takemoto ist ein Gebrauchtpianohändler (An- und Verkauf). Naja, und die haben dann halt diesen Jingle produziert, in dem freundlich gebeten wird doch dort anzurufen, um sein gebrauchtes Klavier zu verkaufen. Und irgendeine verzweifelter Mutter oder ein verzweifelter Vater hat dann herausgefunden, dass dieser Jingle die Kleinen aus irgendeinem Grund beruhigt. Und ein einstündiges Youtubevideo daraus zusammengeschnitten. Warum sollte das funktionieren? Vielleicht, weil viel schwarz- weiß darin vorkommt, was die schon wahrnehmen können? Vielleicht weil der Jingle so eingängig ist? Vielleicht weil völlig sinnfrei vier Frauen in hautengen Kostümen mit ihren Brüsten wackeln?

Fühle ich mich schuldig, weil ich mein Kind schon in Alter von vier Wochen mit Youtube-Videos beruhige?

Etwas. Aber das Erstaunliche ist: ES FUNKTIONIERT WIRKLICH.

Die einzige Frage dabei ist allerdings, was meine Nerven mehr angreift: Eine Stunde Geschrei aus voller Lunge, oder eine Stunde dieser bescheuerte Jingle…

My first Triathlon

September 26, 2016

So we expect our family to increase by one within the next five weeks.

And then your life is over I was been told by friends and family who already went through that process. Time to do those things I always wanted to do and still have time for. Like a triathlon.

Well at least the sprint version, i.e. the triathlon for those folks who are actually not able to and probably never will be physically able to do a full triathlon (aka me). Meaning 500 meters of swimming, 16 kilometers cycling, and 6 kilometers running. For those of you who are not familiar with the rules, that’s on the same day.

And I even prepared for the whole thing. Thoroughly! With all my effort!

Like I bought some new goggles for swimming.

Perfect.

In all seriousness though I actually went mostly running after work approximately 3 times per week for 6 km run, and went swimming once to twice a week (approximately 30 mins). Just before the triathlon I measured my time on 6 km wich was 28:56, quite ok. Or so I thought….

I chose a triathlon at my home town, mostly because swimming was within a pool. And swimming was the part I feared most, since I have zero experience swimming competitively. Also my technique sucks, especially for crawl style (breast stroke is better).

On the other hand this was a cross-triathlon. Meaning cycling was through the woods. Originally I intended to cycle on my stylish dutch-bike I use to commute to work (after all, I subscribed for this for the fun and to finish it, not to win anything). But the organizers didn’t allow participation without a mountain-bike.

 

I didn’t want to invest a lot for this one-time thing, and I figured that my sister still had my old mountain-bike I used as a teenager. Mint-condition she assured me, I use it for cycling tours once in a while, and of course you can have it for the weekend.

Now, I really like my sister, but my alarm bells should’ve gone off as she assessed a technical thing as mint-condition. For example in 2012 she once considered a Thinkpad T40 with a 30 GB hdd and 512 MB of RAM running Windows XP and loaded with bloat-ware in mint-condition, and when the Windows XP support ended and Microsoft Security Essentials refused to work, a friend of her husband recommended Kaspersky Internet Security as a replacement. I was asked to fix this mint-condition notebook since there was some issue with their WLAN… I mean the thing took like 10 minutes to boot up alone…

So it was no surprise that three days before the triathlon I got this message:

Uh, there was one thing I forgot to mention: There is a problem with the back cassette. The chain is stuck. So this means you can only switch the three gears with the front cassette. Well it was never a problem for me, so you should be ok.

Three Gears. Like my dutch bike has three frickin‘ gears!

Then I looked at the track profile which I simply didn’t do before. After all, this is my home town, I should kinda somehow remember the woods, right? Right?

cycling

FU#!§!“$!“$§!“

I really didn’t anticipate this. All my training was on a dutch bike on mostly flat surface riding along rivers. And add insult to injury, I had a 90’s style mountain-bike with three gears working. No suspension… well removing a little bit pressure from tires… who needs more? Disc brakes? What is this? We have these shiny V-brakes… and three gears.

Well in any event, to sum up:

Swimming went far far better than I thought. I guess it’s due to regularly going to the pool in Japan when I was doing my PhD. Since I didn’t know my expected time, I put in a very slow worst-case guess and was grouped with really really bad swimmers. I was second in my lane, but if the lane wouldn’t have been so crowded (6 swimmers per lane) I could have easily overtaken the first guy too. I mean some of these guys‘ swimming techniques were like… a dog or something… almost felt pity…

The cycling part went horrible. Like really horrible. I finished on the last place. I had three situations where I was milliseconds away from crashing into the next tree. Uphill I lacked stamina due to training on flat tracks and three!!! fscking gears. Downhill I was slow too: No suspension and the track was way more technically difficult than I thought. I had to concentrate extremely, I sometimes had even problems to really grip the handlebar due to the heavy shaking. One girl on the track crashed pretty badly, she had to give up and couldn’t finish.

That’s why the cycling was way more exhausting than anything I experienced during training. Both physically and mentally. Starting as such into the running track was difficult. And considering the height-profile of the running track, that was no piece of

running.png

cake, either.

So what did I learn?

  • swimming was easier than I thought
  • get your cardio by cycling. Cycling is by far the longest part of the whole event, so that should be the discipline one should prep most for. Especially if it’s a cross-triathlon.
  • prep your gear. Especially get used to your bike and absolutely make sure it’s in mint condition.

In the end I became last among the males (with two minutes on the next guy, and it was a small event, maybe 150 male participants). All solely due to cycling; my swimming time and running time were ok.

So hey, I beat some of the girls…

I will sure make time to participate next year. The whole thing was a real fun experience. And plans to buy a proper bike are already made…

 

The sad state of PDF-Accessibility of LaTex Documents

August 11, 2016

[I will use this blog as a dump for random things and thoughts from now on. German, Japanese, English – all mixed. Topic wise: Anything from computer science, life in general, up to things related to Japan].

Accessibility is becoming more important nowadays. Whereas the 90s saw a quick and uncoordinated development with various technologies that didn’t really account for folks with disabilities, fortunately nowadays developments take that into account.

For example PDF files. After being a half-open format for quite some time, PDF is nowadays an official ISO standard. And the specification of PDF requires accessible PDFs to be tagged. What this means is that in addition to the printable graphical description of the content, all content should be additionally included in a tagged, somewhat primitive XML-like structure. Something like (very much simplified here):

[element header]
 This is a very important document
[end header]
[table start]
    [table row 1]
        [table cell]
            blablabla
        [end table cell]
    [end table row]
[end table]
[graphic start]
    [alternative text]
        This image shows something beautiful
    [end alternative text]
[end graphic]

Well you can see the idea here. This allows screen readers to extract information out of the document and read it out loud to the visually impaired. It also allows to display information about included elements, such as an alternative text for a graphic.

It also allows for things like reflow, i.e. when you display a PDF on your kindle. Then the kindle can extract the text from the tagged PDF, reflow it according to your screen size, display it in a font you chose, and modify it in other ways suitable to the reader.

Sure this goes somehow against the original idea of a PDF (you see what you print), but then again, originally 86-DOS was thought of as a quick hack for a computer kit, and we know how it all ended.

Now where is the problem?

LaTeX or TeX in general cannot generate PDF documents with tags. pdfTeX is probably the right place to target, but I do not see this happening anywhere anytime.

And this is a very sad state of affairs.

A lot of institutions nowadays require accessible documents for publication. There are other requirements than just tagging alone, but this is the biggest obstacle.

The irony is that a LaTeX-document itself is already a quite structured document. But translating the LaTeX syntax-constructs into tags has never been done. It’s also probably a non-trivial task since a) there are a bazillion LaTeX packages out there which all use their own syntax constructs and b) Tex/LaTeX was never designed with a clear XML-like [tag] [/tag] structure in mind. So parsing and translating is probably non-trivial.

And that’s just one problem. TeX was designed when things like object oriented programming were virtually at a research stage, far from being common. Mainframes were the hot thing. And despite Knuth being a genius, look at this fscking mess that TeX is. Take your average computer science graduate from the last ten years. Do you think anyone would be remotely able to understand what is going on there?

Achim Blumensath understood this problem some 15 years ago (kinda funny to accidentally hit his name when writing up this article, as he happened to be one of the tutors of an undergrad logic course I took some 15 years ago), and wrote ANT  as a TeX – replacement, but the whole thing is unmaintained since 2007. Guess he was busy kick-starting his university career, which is understandable. Sadly, as most one-man-shows, that project never really took of.

My point being that if we wouldn’t rely on TeX itself and use ANT (or whatever alternative) which is written in the quite elegant OCaml, than hacking it would be at least possible for mere mortals. Although I have to admit, despite being in love with OCaml since my PhD days, it’s also a quite niche language. But imagine if the whole thing was written in Python, or at least C.

I haven’t looked at pdfTeX’s source, but it very much looks to me like development is not a huge common effort, but rather Hàn Thế Thành on his own, chronically overworked and left alone with this huge task. So we are stuck.

There are some folks who think that tagging with LaTeX can be done. For example the guys at CHI have some instructions for it, see here.

The problem is that they are all wrong, which is one motivation for this post.

Basically what they suggest is to add tags after PDF generation by Acrobat Pro. This doesn’t work of course for anything more complex than a simple single page. The detection of what is a header, what is a sub-header, what is a table, what is a table header – all this is impossible to do once the PDF is generated, because you will have to use heuristics, which will lead to issues. So kudos to the Adobe guys for trying, but weirdo bastard tags won’t help, they will lead to a bigger mess for screen readers than just trying to directly extract the text for reading. If you don’t believe that, just take a random paper from arXiv, run it through Acrobat Pro and add tags, and see the result.

There is Babette Schalitz‘ accessibility package, where she tried to hack the PDF generation in a way that tags are generated automatically. If you take a look into her source-code, you can see that this inevitably lead to a complete mess (no offense here – but I claim it is simply impossible to do in a clean way on the LaTeX level instead of below within pdfTeX). The package is unusable on modern TeX distributions and documents won’t compile because, well because the code does all sorts of nasty hacks which don’t work in current versions.

Andy Clifton tried to hack the package and fix these compilation issues, but again: Run it through Acrobat Pro’s accessibility checker, run it through PAC or better: Inspect the document manually using Acrobat Pro: The generated tag structure is completely broken. Spaces are missing, the structure is interwinded. It’s completely useless. You could as well manually add [tag] foobar [/tag] to the document. Sure some tools like Acrobat Reader (not the Pro version) would then show „document is tagged“, but what is the point?

Ross Moore wrote some papers on tagged PDF’s with LaTeX by directly hacking pdfTeX, but it seems to be a single man show and a Sisyphean task. There seems to be nothing that is remotely production read, more like super alpha-alpha stage.

ConTeXt made some efforts into that direction, but there seem to be also all sorts of minor issues and let’s face it: ConTeXt is an unpredictable one man show. No defined APIs, documentation is a clusterfsck of entries on wikis here and there or on mailing-lists, there are constant syntax changes (especially from MkII to MkIV), examples in the wiki don’t work, there are no books, the official manual is always behind… Despite being a real interesting approach, ConTeXt is PRAGMAs inhouse tool of choice, but simply not production ready for outsiders.

And then there are numerous threads on tex.stackexchange with questions on what to do concerning accessibility and tagged PDFs, and the answer is always the same: It doesn’t work.

In some universities and government institutions it is legally mandatory to publish accessible documents, and essentially that rules out LaTeX for document creation. Did I mention that both Word and LibreOffice generate tagged PDFs? (not perfect, but usable).

That’s all in all a very sad state of affairs. But it kind of shows the underlying problem: From a coder’s perspective, (La)TeX is a big mess, there is incredible dirt under the carpet, and as such, the development is driven by a few folks which are overworked. Since the development of pdfTeX there were few substantial developments in the TeX-world that address the real core functionality (yes, we have a better packaging system, yes we have TikZ & beamer now – all nice, but they’re all built on top). And syntax-wise btw, TikZ is horrible, too.

I sometimes miss WordPerfect. WYSIWYG approach, yet there was „reveal codes“. Still a word processor, and not remotely close to the typesetting quality of TeX, but still.

So? Means I have to stick to Word & LibreOffice is my daily life.

Oh what a mess…

Brexit

Juni 27, 2016
I rely can’t stand the anti-british rethoric anymore. Which goes like „uh, it’s the old and stupid folks, who voted brexit; they just don’t get how incredibly awesome the EU really is.“ Which is just some emotional discrediting of legitimate concerns:
 
The current EU system is highly undemocratic. Most of the power is in the hand of the commission, an uncontrollable bureaucracy always eager to increase its power and influence. The only democratic legitimation are the commissioners, nominated by representatives of the members states. So many in-betweens, and those who write legislative drafts – in the case of a directive something that will be immediately legally binding in all member states – are of course not done by the comissioners themselves, but by lower-ranking bureaucrats, which somehow got in the system, seeking a cushy government job.

Then the commission will work out some legislative draft with representatives of the ministries of the member states with some shady back-door deals. Some countries representatives will be missing (as their ministries aren’t that large enough to send representatives to all the various groups), and the result is often a least-common denominator, and not the best solution.

The parliament is toothless. And really not better than the commission, because whereas the commission is an an uncontrollable bureaucracy, at least their is some independence from corporations. On the other hand the members of parliament are often under higher pressure and subject to lobbying by their countries corporations.
From the perspective of the commission, laws must and should just somehow be passed through the parliament, hopefully without representatives pushing to much corporate alterations into the drafts.
The press often only notices then, and with that the European citizens. No public debate, it just came over us from nowhere.
And the commission is always out to push the member states ministries out of jurisdiction (once European legislation is established, it tops national law, cf. EU vs National Law), increase it’s power and get more money.

No seriously, I can fully understand the British concerns. We are about to, or already have created, a bureaucratic monstrosity, that is really hard to stop once it gains traction.
Instead of mocking the British for their alleged „stupidity“ (we smart continental Europeans all get it, right?), these concerns should be addressed.
Disclaimer: As for the EU: Been there. Done that.

St. Martin

November 4, 2013

Im Briefkasten ein Zettel der KiTa gegenüber. Der Martinszug würde am Montag stattfinden, und man würde sich doch freuen, wenn wieder von den Anwohnern Laternen in die Fenster/Gärten gestellt würden.

Fragt die hzB was denn ein Martinszug sei. Ich verweise sie auf Wikipedia, und gehe in den Keller, die Waschmaschine ausräumen.

Komme wieder.

Sage „Gelesen? War der mit dem Mantel!“

Sagt sie:

„Ja…

Aber er konnte ihm schon den ganzen geben. Die Deutschen …

so geizig!“

What the frack has happened?

Oktober 2, 2013
Disclaimer: Dieser Text wurde in betrunkenem Zustand verfasst.
Irgendwas ist so grundsätzlich falsch gelaufen.Ich meine, ich weiß auch nicht, wie das immer passiert. Ich merke das immer nur hinterher, wenn schon alles zu spät ist. Ich lasse mich vom Image blenden, von der Außenwirkung, ohne den Bullshit und das Kool-Aid zu durchschauen, und die Realität zu sehen.In Japan war ich dem Drehrumdiebolzenengineering verpflichtet. Was halt mein Hobby ist. Also warum nicht zum Beruf machen?Nun, weil das halt Bock macht. Nur war ich unter lauter smarten Leuten, die es irgendwie alle mehr drauf hatten, als ich. Was nicht heißt, daß ich es nicht drauf hatte. Aber irgendwie braucht man halt auch ’ne Stelle und eine gewisse Zukunftsaussicht.Jetzt sitze ich in einem Meeting am anderen Ende der Welt und standardisiere, was das Zeug hält. Mit mir im Raum sitzen vielleicht noch dreißig Personen, von denen noch eine einen PhD in so etwas ähnlichem wie Drehrumdiebolzenengineering hat. Der Rest sind…

halt so Standardisierer. Ein absoluter Insiderclub. Von Technik nicht unbedingt wirkliche Ahnung, dafür erfinden sie Abkürzugen, und Prozedere, und Abläufe, die Outsider ausschließen. So sichert man sich wohl sein Auskommen bis ans Lebensende… was wiederum in gewisser Weise nicht unschlau ist. Denn so ist man unersetzbar.

A will X, aber nicht wenn B Y will, und C will Z und Y vielleicht, aber nur wenn B Y will und A nicht X, und überhaupt nur, wenn man ihm auch die Schaufel gibt, um im Sandkasten zu spielen. Natürlich sagt das aber keiner so direkt.

Ja, ganz sicher ist Politiker das bessere Wort. Sie sprechen davon, Informationen von ihren „Implementierern“ zu bekommen, und ich frage mich dann immer, warum ich hier sizte, und nicht bei diesen mysteriösen „Implementierern“.

Ich fühle mich bitter verarscht. In der Job-Description stand was von Drehrumdiebolzenengineering und Mathebullshit, aber wie ich dann herausfand, wird hier nichts engineered. Sondern alles outgesourced, was man outsourcen kann. Und den wenigen interessanten Rest machen die anderen Abteilungen.

Ich verstehe erst jetzt im Rückblick, _was_ ich wirklich in Japan gelernt habe, _wie gut_ die Ausbildung, die mir mein geliebt-verhasster Supervisor angediehen, eingebläut, ja verpaßt hat. Eine verpaßt hat er mir. Und jetzt kriege ich diese Einstellung nicht mehr aus mir raus.

Ich weiß nicht, wie lange ich das hier noch aushalte. Als Minimum habe ich mir ein Jahr gesetzt. Ich meine, man kann ja nicht schon nach der Probezeit, nach sechs Monaten selbst kündigen. Sieht doch so aus, als wäre man rausgeflogen… Oder? Oder???

Wobei, mein Supervisor hat nach seinem Master exakt drei Monate in einer Firma gearbeitet, dann gekündigt und zack zurück an die Uni.

Was mach ich bloss, was mach ich bloss, was mach ich bloss. Wie komme ich raus, aus dieser Hölle, diesem feuchten Traum eines japanischen Salarymans? (Die laufen natürlich auch hier rum. Also Ojiisans natürlich. Würde sogar sagen, die haben mit noch am Meisten Ahnung. Aber wenn ich dann schon die WindowsXP-Rechner mit Word hochfahren sehe, kriege ich das Kotzen.)

Ein Zeile Code hat von denen hier noch kein einziger geschrieben. Und das werde ich auch nicht.

… und jetzt sitze ich in der Business Lounge kurz vor dem Rückflug nach dem Social-Dinner… und vielleicht ist alles nicht sooo schlimm.

Trotzdem weiß ich noch nicht, ob ich das wirklich 30 Jahre machen kann. Es hat halt wirklich mit Drehrumdiebolzenengineering kaum noch was zu tun. Sondern mit Management… Ich muss in Ruhe nachdenken,

M. würde sagen, ich bin ein Schlipswichser geworden.

Und ich könnte darauf nichts erwidern. Nichts. Nichts… Vom Drehrumdiebolzeningenieur zum Schlipswichser… gibt es einen schlimmeren Abstieg?

Vielleicht vom Ingenieur zum Geisteswissenschaflter?

Ich werde mir jetzt auf jeden Fall gepflegt in der Lounge und der Business-Class – erwähnte ich bereits, daß ich Business-Class fliege? Ich fliege Business-Class! – gepflegt einen hinter die Binde kippen, mit den Japanern beim nächsten Meeting den Kurokirishima-Shochu leeren (sofern die hzB das erlaubt) und später darüber nachdenken, was schief gelaufen ist in meinem Leben.

Achja, und Postings über japanisches Essen gibt es auch irgendwann… irgendwann…