Overview over all blog items

May 22, 2015

XML-driven Plone portal "Onkopedia" finally online

Onkopedia is a medical guideline portal in the field of hematology and oncology. It is based on the Plone content management system and driven by an XML publishing workflow with the conversion from DOCX to XML/HTML and PDF.

I am pleased to announce the official (re)launch of the the Onkopedia (www.onkopedia.com) portal after one year of hard conceptual and implementation work.

Onkopedia is a medical guideline portal in the field of hematology and oncology. It features offical  guidelines for the diagnosis and treatment of diseases. The guidelines and supplementary documents are grouped by audience:

  • Onkopedia for physicians
  • My Onkopedia for patients and their relatives
  • Onkopedia-P for caregivers

The Onkopedia project started in 2010 with a simple DOCX to HTML/PDF publishing workflow. After some years with a growning amount of context, new external requirements it became obvious that an updated infrastructure and a new publishing workflow would be necessary. XML as document standard was directly on the desk and ZOPYX started in 2014 with the conceptual design and architecture of Onkopedia.

The new system features a new complex but easy-to-use conversion workflow with a DOCX to HTML+XML conversion build on top of the c-rex.net platform by Practice Innovation. The XML to PDF conversion is based on the Produce & Publish system in combination with the PDFreactor converter by RealObjects. The Plone content management system was used as implementation platform for the complete system in combination with the open-source XML database eXist-db Version 2.2. The integration layer of Plone with eXist-db is available as open-source project XML Director.

Reference

Project partners


May 18, 2015

CSS Paged Media workshop @XML London 2015

Join me at the XML London 2015 conference for hands-on-training on generating high-quality PDF documents from XML/HTML.

I will attend the XML London 2015 conference from June 5th to 7th and give a hands-on-training on

CSS Paged Media and generating high-quality PDF documents from XML/HTML

The training will involve a lot of live coding in order to show you what you ask and what want to see. 

The training material will evolve over the next two week in our public repository https://github.com/zopyx/css-paged-media-tutorial.

The training consists of two slots:

  • slot one will teach  you the CSS Paged Media basics
  • slot two involves styling of a real world content document (HTML source with lots of chapters, images and tables)

Requirements

  • Participants must have either PrinceXML 10 or PDFreactor 7 installed on their systems. Both converters are available for free for private or evaluation purposes. The trainer can assist you with the installation on Mac or Linux (not with Windows) but please make sure that you install the converter before the tutorial in order to safe time for the real cool stuff.
  • Particpants must have basic skills in HTML and CSS.

Trainer

Andreas Jung is working in the electronic publishing business for almost 20 years. Andreas is a Python & Plone freelancer, works on large internet and web applications, publishing solutions and funder of the Produce & Publish and XML Director projects. 

Further information on CSS Paged Media

 

 


Mar 31, 2015

New hands-on training "Generating high-quality PDF documents from XML and HTML using CSS Paged Media"

Our hands-on training "Generating high-quality PDF documents from XML and HTML using CSS Paged Media" teaches you to generate high-quality PDF print layouts with HTML or XML as input and Cascading Stylesheets for the definition of print layouts and styling.

Introduction

"CSS Paged Media" turned into a serious solution for generating high-quality PDF documents from XML or HTML over the last years. The advantages of the CSS Paged Media approach obvious:

  • basic knowledge in XML/HTML and CSS sufficient (you do not need to be an XML expert)
  • separation between content and layout/styling
  • easy to learn, easy to use
  • lower costs
  • higher flexibility 

Usecases

  • text-oriented publications (books, newspapers, documentation etc.).
  • layout-oriented publictions (flyers, brochures, web-to-print applications)

Contents

  • Introduction CSS Paged Media
  • The region model of CSS Paged Media
  • Basic formatting
  • Multi-column layouts
  • Mehrspaltiger Satz
  • Pagination
  • Images
  • Footnotes
  • Header and footer
  • Automatic table of content generation

Requirements

  • basic knowledge in XML/HTML and cascading stylesheets
  • basic knowledge in typography

Software

  • PDFreactor 7.0 (alternative: PrinceXML 9)

Price and Location

Our hands-on trainings usually usually takes place at your company or organization. We teach in small groups of up to three people. We offer an individual training based on your requirements, needs and the skills of your employees. The price depends on the number of training days and the location of the training. Contact us directly for further information and quotes.

Trainer

Andreas Jung has been working for more than 20 years in the field of Electronic Publishing and developed several PDF generation solutions over the last ten years. Andreas Jung is chief developer and author of the  Produce & Publish product family and found of the  XML Director project.


Mar 05, 2015

callas software GmbH releases pdfChip - a quick test

Quick test of a new PDF converter supporting CSS Paged Media.

callas software GmbH released today their pdfChip PDF converter product that follows the same CSS Paged Media approach. We are using CSS Paged Media converters like PrinceXML or PDFreactor for years for generating high-quality PDF documents from XML and HTML content. As an expert in this field I did some quick tests with  the new converter based on real-world customer content that are running our Produce & Publish solution in production for years.

Quick results

  • the feature set of pdfChip is comparable to where PrinceXML and PDFreactor were two or three years ago
  • PDF quality is ok but behind other tools due to missing features like build-in hyphenation
  • no build-in hyphenation support (except using Javascript)
  • no multi-column support (documented)
  • no flexbox support (at least undocumented)
  • no support for page regions, named pages (at least undocumented)
  • no footnote support (at least undocumented)
  • does not seem to implement all the CSS (3) features that we see in decent versions of PrinceXML and PDFreactor (CSS dot leaders missing, hyphenation, no repetitive table headers when a table spans multiple pages)
  • Poor documentation
  • Worth the money? Definitely NO. This product is completely overpriced. The smallest version pdfChip S costs 5000 EUR + VAT and allows to generate documents with up to 25 (twenty five) pages only! The next bigger version supports up to 250 pages per document for 10.000 EUR + VAT. Our customers have in general between 1 and 250 pages...so 10.000 is a huge investment. The unlimited version costs 25.000 EUR. Even Antennahouse Formatter offering much more features and much better typographical quality for a better price. Our tools (PDFreactor, PrinceXML) are in the price range from 2.250 EUR to 3800 USD with almost no restrictions (except: requires a box with 4 or less CPUs). So you have to pay a three or four times higher price for a tool with less features and many restrictions? I think this ridiculous. 

Update

The statement "The Paged Media Module is currently not supported by pdfChip"  makes it clear that pdfChip does not want to support the defacto standard for HTML/CSS based publishing or only to a certain degree. pdfChip appears half-baked and essential features that exist in other tools for years are missing. Unfortunately this product is also marketed as being superior over all other tools. This is not the case. There are better and cheaper alternatives.

Update (20.10.2015)

Even half a year after the first evaluation there is no progress visible. The product remains completely overpriced. Interesting enough: half of the documentation covers barcodes...well, this software seems to be a very expensive barcode generator. As written in my original positing: there are better alternatives. We really hope that alternative projects like Vivliostyle catch up fast in order to provide better and cheaper alternatives. Well, PrinceXML and PDFreactor are obviously superior.

Feel free to contact us for CSS Paged Media consulting.

 


Since two months I have been very busy with finding a working combination of Linux kernel, hardware and Linux distribution that would actually stable for running Docker in production. Only one out of eight combination worked for me.

Feel the pain and frustration?

All tests done with Docker 1.5.0 final without special configuration of the underlaying driver (AUFS vs. device mapper).

Linux Distribution Kernel Hardware Hoster Comment Status
Ubuntu 14.04 3.13 Bare metal Hosteurope the only working combination (using AUFS) WORKING
Ubuntu 14.04 3.13 VM Hosteurope hoster-patched kernel in order to fit its OpenVZ virtualization FAIL
OpenSuse 13.1 2.6 Bare metal self-hosted kernel panics after some minutes during a Docker build  FAIL
OpenSuse 13.1 3.11 Bare metal self-hosted kernel panics once or twice a day during Docker builds (possibly related to BTRFS crashes) FAIL
CentOS 6.6 2.6 VM Hetzner slow IO, long Docker builds (10 times slower than on the same VM), likely related
to CentOS and/or the Docker device mapper (although supported by Docker)
FAIL
CentOS 7.0 >3.10 VM Hetzner slow IO, long Docker builds (10 times slower than on the same VM), likely related
to CentOS and/or the Docker device mapper (although supported by Docker). Also: Docker did not play well with the 'firewalld' of CentOS. Reconfiguration of the firewalld caused a network loss of all Docker containers and the docker daemon had to be restart....a major fail.
FAIL
Ubuntu 14.04 3.13 VM Contabo Docker builds much slower than directly on the same VM, not as extreme as with CentOS, Docker container execution speed OK partly WORKING
CentOS 7.0 3.13 VM Contabo same problems as with other CentOS versions FAIL

Conclusions:

  • CentOS is completely unusable for running Docker - at least with the default device-mapper 
  • OpenSuse 13.1 problems likely related to BTRFS issues (in combination with the device-mapper)
  • running Docker on virtual machines in general does not seem to make much sense
  • Ubuntu 14.04 on real hardware seems to be the only reasonable combination right now
  • Docker does not perform any reasonable runtime checks for checking the sanity of the Linux host (crashes and unrelated or non-speaking error messages are the only thing you get from Docker)
  • The general attitude of the Docker devs: we-don't-care and works-for-us -> case closed
  • The monolithic design of Docker is broken. Restarting Docker - for whatever reason - implies a shutdown of all containers (using --restart you can restart all containers upon restart of the Docker daemon)
  • The Docker documentation lies about working and supported distro and kernel support (see link above) and the Dockers obviously do not care about instead of fixing their documentation and in particular: testing Docker on different hardware and distros - apparently their testing producedures are broken from ground up.

However there is hope...the upcoming CoreOS Rocket runtime engine looks very promising...however Rocket is still in early stages. At least Rocket already supports loading Docker images. On the other side: Rocket tried to place a pull request for Docker in order to achieve image compatibility between Docker and Rocket ...but typically for the ignorance and arrogance of the Docker devs: they give a shit and only care about their own thing. Unfortunately the Docker developers are corrupted by too much venture captial and became ignorant through the Docker hype.


Feb 20, 2015

XML Director 0.4.0 release/Newsletter #4

XML Director is a Plone-based XML content-management-system (framework) backed by eXist-db or BaseX.


Feb 11, 2015

MongoDB gate

Yesterday researchers of my home university Universität des Saarland published a report about 40.000 MongoDB servers in the world running on public ports and without authentication. This is kind of a nightmare. Disclosure of customer data, credit card numbers etc....but whom to blame? Of course MongoDB is (technology-wise) a crappy database and it would be easy to blame MongoDB altogether.

There are only two minor problems with MongoDB here: 

  • the MongoDB daemon binds to all public IP addresses by default depending on the distribution or download package. It is said that the standard installers bind to localhost only however the daemon distributed with the binary packages binds to 0.0.0.0 - BAD DESIGN DECISION 
  • MongoDB does not require a password by default. So every MongoDB server is open without authentication by default - BAD DESIGN DECISION

However there is no direct technical exploit in MongoDB responsible for the disclosure of private data - just bad design decisions (having their impact here). Unfortunately the answer of MongoDB CTO Eliot Horowitz on this issue is both cheap, weak and poor and one does not seem to care about the implications from this report.

More important in this case is the human factor.

Obviously several thousand adminsitrators are incompetent or incapable performing very basic administration tasks like

  • configuring a daemon to localhost or a private IP only
  • configuring a firewall

My theory on this is that more and more untalented IT workers are in charge for dealing with technology issues, networking and programming aspects that are far beyond their horizon. This is not only a problem of MongoDB but can be also observed with other IT technology. Watching mailing lists, IRC, Stackoverflow and other related media over the last years is becoming a growing pain. The technology is getting more and more diverse and complex but the intelligence and motivation of the "typical" IT workers seems to go down year by year. Yes, this is a typical Andreas Jung rant but many tasks in software development and system administration should be left to people that know what they are doing. But many IT departments apparently do not care about competence, security and privacy (any more). Mistakes happen every day - even for experienced IT workers and experts. However this report with 40.000 open MongoDB installations indicates some more fundamental problems how IT security handled in organizations: badly. And my recommendation: unmotivated and untalended script kiddies should keep their fingers away from security critical infrastructure and components.


If you are member of the UNIX 'docker' must be considered harmful. Being member of the 'docker' group is not unusual because it gives you the right to build and execute containers as normal user but it also gives you full root access rights which I consider as a major security issue and a broken-by-design feature.

By default I can not access /etc/shadow because it is only readable by root or group wheel:

ajung@demo:~$ who am i
ajung    pts/2        Feb  5 07:03 

ajung@demo:~$ groups
ajung docker


ajung@demo:~$ ls -la /etc/shadow
-rw-r----- 1 root shadow 897 Jan 25 10:05 /etc/shadow


ajung@demo:~$ cat /etc/shadow
cat: /etc/shadow: Permission denied

Now I create a simple Docker image that exposes /data as mount point for a volume

FROM phusion/baseimage
VOLUME /data

Now I can start the container and attach any local filesystem to the container and access it with full root rights.

In this case I can easily access the content of the formerly protected /etc/shadow file

ajung@demo:~$ docker run -v /etc:/data zopyx/test cat /data/shadow
root:$6$rnW9d.................awVOOsWtCb41DY01:16457:0:99999:7:::
daemon:*:16457:0:99999:7:::
bin:*:16457:0:99999:7:::
sys:*:16457:0:99999:7:::
sync:*:16457:0:99999:7:::
games:*:16457:0:99999:7:::

I can also create content on a root filesystem as standard user 

ajung@demo:~$ docker run -v /etc:/data zopyx/test touch /data/hello-world.txt
ajung@demo:~$ ls -la /etc/hello-world.txt
-rw-r--r-- 1 root root 0 Feb  5 07:36 /etc/hello-world.txt

The whole Docker security concept (is there a security concept?) appears completely broken.

So user accounts belonging to the UNIX group 'docker' are fully exploitable. Standard UNIX users can gain elevated rights on the local machine if they belong to the 'docker' group and can perhaps exploit other machines as well by tampering SSH keys etc....many attack vectors are possible.

Update (2015-02-05, 16:00 UTC)

The discovered behavior is in fact intentional and documented in the Docker security documentation. The first sentence is already completely broken.

"Only trusted users should be allowed to control your Docker daemon"

Building a secure IT system on human trust is fundamentally broken. A secure system must be build on the best technology practices. A human is always a weak factor when it comes to security.

Another point: the default security policy (if there is one?) is: everything is allowed, you are root, dropping the priviledges as needed is up to you. Complete improper approach. A secure system must be as closed as possible by default and give the container only the rights and capabilities that it really needs. 

And yet another point: as standard user it is by design not possible to gain root permissions (except using sudo). The problem once again arises from the 'docker' group being practically root. An attacker might get hold of my SSH keys and login into a dockerized box. With the described attach vector an intruder has an easy game getting through Docker. The preassumption "Only trusted users should be allowed to control your Docker daemon"  is therefore just wrong and the security concept of Docker is broken.

Docker leaves security to the user and administrators instead of providing a secure way for building secure containers for deployment. Instead Docker should be better compared to a rootkit generator.

 

 


Jan 22, 2015

The case against Docker

Trying to use Docker for production for several week finally ended in the decision to let Docker for the moment.

Over the last week I tried to use Docker in production for the following usecases:

  • putting eXist-db into a container for the Onkopedia relaunch in order to simply the eXist-db installation and adminstration (prod, dev, staging)
  • putting Plone 4.3 and eXist-db into a container for having an easy way to manage a demo instance of XML Director
  • using eXist-db and Base-X in containers in order test the XML Director backend against different XML database backends in an automatized way

This blog post summerizes the former blog posts (link, link) about Docker. 

Docker is not very developer friendly

For production we installed a decent VM running CentOS 7 with a recent kernel version (3.10) that is supported by Docker. I recreated the related Docker images on the deployment box from scratch. The bad experience discussed in my former blog posts remained. A typical build under Docker was  5-10 times slower than executing the same scripts and code on the same machine directly under the shell. Pushing the three images - each about 1.3 GB in size - took more than two hours. I have seen that one layer of an image got pushed with a decent speed close to our bandwith limit  while the next layer crawled over the net with 500 KB per second...completely unpredicatable push behavior. The same behavior was reproducible on a different host in a different data center. Pulling the images on a different host also showed same downstream issues with the Docker registry - completely useless and time consuming.

But anyway...running the Docker images on the host caused the next surprise. Starting exist-db, executing a small Plone script for site setup and finally starting the Plone instance takes about ten minutes (under one minute without Docker). The complete virtual machine became very unresponsive during that time with a CPU load going through the roof of up to 10 with no jobs running on the VM except this Docker container. But anyway...I proceeded to the next Docker container and tried to run Base-X and eXist-db. There was a mistake in one of the Dockerfiles and I had to re-run the build for eXist-db. This build suddently failed while running apt-get inside the build....network issues. I checked the log and discovered some issues with the iptables. Not being a network guru I filed a bug report on the Docker repo @Github. It turned out that the Docker chain within iptables configuration got lost and therefore the complete network functionality of the Docker build failed. Nobody could told me where and why this happened. The only manual change done earlier was to add port 80 to the list of public ports - perhaps something was happening here. The only solution to get around the problem is to restart Docker. But restarting Docker also means that your containers go away and need to get restart - major pain in the ass factor...why is Docker so stupid and monolithic that containers can not continue to run? This is bad application design.

Containers are for software, not for data

Docker containers are consumables. Docker containers should be used to scale applications, for firing more app servers with the same setup etc. However Docker guys and fan boys want to put data into containers and speak of data containers. What the fuck? Data belongs onto the filesystem but not into a container that can neither be cloneed in an easy way nor incrementally backed up in a reasonable way. Containers are for software, not for data.

Docker made me inefficient, Docker blocks my daily work

The slowness of Docker is a big pain. Build and deployment procedures are not predictable. Even with only three or four images on the system I need with something like thirty containers and images on the system (docker ps -a, docker images). There is not even a producedure for cleaning up the mess on the system except fiddling around with something like

docker rm $(docker ps -a -q)
docker rmi $(docker images -q)

Docker needed around 7-10 seconds per image/container removal..The overall cleanup operation took several minutes. Oh well, stopping the docker daemon and removing /var/lib/docker manually is much faster in reality.

Conclusions

Containers are great and provide the right abstraction layers for running applications in production. 

The theory and ideas behind Docker are great, its architecture and implementation is a mess. Docker is completely unusable in production. It is unreliable, it is unpredictable, it is flaky. The idea working with filesystem layers is great but in reality it sucks (push & pull of 30-40 layers takes a lot of time - at least with the current way how it is implemented). The idea of Docker file is great but in reality it sucks (you can not re-run the build from a certain step without fiddling inside the Dockerfile). Especially with Plone buildouts it takes a long time for re-running a dockerized buildout without the chance of using buildout caches in some way).

Other options? CoreOS came out with its Rocket approach some weeks ago...too new in order to consider it for production at this time. Rocket looks promising but well thought (compared to Docker) but far away from being ready for prime time. Vagrant is a nice way for deploying to virtual machines however this is not the level granularity we are all waiting for. NixOS with NIX package manager? The Nix package manager looks nice and powerfull and I heard only good things about Nix but I am not sure how this solves this issue with isolated environment  and how this plays together with containers - especially NixOS is a black box right now and I need to look deeper into its functionality and features. For now: back to old-style deployments. 

A big sigh.....

 


Jan 11, 2015

XML Director project website launched

XML Director finally has its own website!

I am pleased to announce that the XML Director project has now its own polished website with updated information about the project, scope of the project and development information.

The site is available under https://xml-director.info

The development of XML Director made some internal progress over the few weeks - basically the time over christmas was used to think about the design and technical issues. However all of internal code was refactored, the test coverage improved significantly and the documentation of the project was updated.
In addition we will release a Docker based demo shortly and provide access to an online demo showing the basic functionality of the current implementation:

  • programmatic content-types
  • through-the-web content-types
  • XSLT registry
  • custom views for XML content

If you are interested in XML Director then you get directly in touch with me either this month in Berlin (16-26.1.2015) or meet me at XML Prague in February (where I will talk about PDF generation based on CSS Paged Media).