I am working on a mathematical modeling figure using Graphpad Prism. I got data from a collaborator. He works on the computational part. He sends me his model fitting and simulation results in a set of CSV files. The data contains:
- Experiment data: 2 genotypes x 5 cell species x 7 times points x 3 replicates
- Simulations: 4 simulations, each contains:
- 2 genotypes x 5 cell species x 200 simulated times points
- 2 genotypes x 10 ( fitted parameters + standard deviations)
- 2 genotypes x 4 residual errors measuring fitting goodness x 100 runs.
As you can see, I need to put all these data into Prism to make good looking figures. Initially, I was copy-pasting by hand. Soon, as the collaboration progress, I found it is unsustainable. Every time we change something or have a new idea, my collaborator sends me a new set of simulated data. It may only take his computer minutes to run, or hours I am not sure (but it is his computer, not his hand.) It takes me a lot of time to redo the copy-and-paste. And I often am not sure if I have copy the wrong data or pasted to the right place or not, or whether I completely skipped anything, because the data looks so similar.
After a couple of iterations, I realize this has to be automated.
I found out that since version 5, Graphpad Prism stores its data in XML format in its pzfx files. It is indeed intended for automated data manipulation.
Unfortunately, there is no python module for this purpose yet. Here is what I did:
from bs4 import BeautifulSoup # read and write XML
import pandas as pd # get data from CSVs
from lxml import etree # generate XML
The pzfx to start with should already contains the tables and graphs you want. We will replace existing table with new data. I have not tested generating new table.
# read the pzfx file and parse it to bs4 xml data
with open('YL524.pzfx', 'r') as f:
data = f.read()
Bs_data = BeautifulSoup(data, "xml")
You can open the same file with your favorite text editor (Sublime Text 3 for me) and Prism to check the table names and data structure. A simple representation is like this:
You can now check the element with BeautifulSoup. For example:
# Table34 - 5th YColumn - 1st Subcolumn
Bs_data.find('Table', {"ID": "Table34"}).find_all('YColumn')[4].find_all('Subcolumn')[0]
Now we have two ways to replace the old data with new data.
1. You can replace the individual d value with a new one in string format:
Bs_data.find('Table', {"ID": "Table{}".format(value + 3)}).find_all('YColumn')[
i
].find_all('Subcolumn')[0].find_all('d')[0].string.replace_with(str(wtProRates.iloc[i, 0]))
2. You can replace the entire Subcolumn with a new XML etree element:
x = etree.Element('Subcolumn')
for k, row in chi2[i].iteritems():
item = etree.SubElement(x, 'd').text = str(row)
Bs_data.find('Table', {"ID": "Table45"}).find_all('YColumn')[
l+i
].find_all('Subcolumn')[0].replace_with(
BeautifulSoup(etree.tostring(x, pretty_print=True), 'xml').Subcolumn)
Lots of variable names in the code above are used for proper looping and pandas iterations. You need to adapt to your specific needs.
Becareful, check your data often to see the data goes to the place you want. If you replace a Subcolumn with a Nonetype object. You will get one subcolumn deleted and your loops and data table will be off.
After everying confirmed OK, you can output the new file:
with open('YL524_out.pzfx', 'w') as f:
f.write(str(Bs_data))
Use str(Bs_data) rather than Bs_data.prettify(). The latter output will be unreadable by Prism.
Open the output file with Prism, you will see everything updated automatically with new data in there! Hooray!