default search action
How to parse dblp.xml?
The dblp.xml is a simple, plain ASCII XML file, using the named entities as given in the accompanying dblp.dtd file. A daily updated (but unversioned) XML dump can be found on the dblp web server:
Furthermore, each month, a persistent snapshot release is archived:
We strongly encourage you to use these snapshot releases for your experiments and to cite them by their persistent URLs in published articles. This will allow your experiments to be reproducible in the future.
Detailed information on the XML structure of the dblp records and several design decisions can be found in the following paper:
- Michael Ley:
DBLP - Some Lessons Learned. Proc. VLDB Endow. 2(2): 1493-1500 (2009)
The dblp.xml file can be parsed by essentially any out-of-the-box XML parser.
- Note that since the referenced dblp.dtd is a "private" SYSTEM resource, you need to download the DTD file, too, and make it locally available to your application
Also note that due to the huge number entities contained in the dblp.xml, you may need to raise the security limit on the allowed number of entities in your parsing software. E.g., in Java, you need to provide a command line parameter like the following one in order to increase the limit to a sufficiently large number.
java -DentityExpansionLimit=2000000 SomeDblpParser.java
Alternatively, you can set this parameter programmatically by calling:
System.setProperty("entityExpansionLimit", "2000000");
- Furthermore, please be aware that due to the size of the dblp.xml, it is usually infeasible to process the file in the Document Object Model (DOM). Please consider using a Simple API for XML (SAX) or a Streaming API for XML (StAX) approach instead.
Example parser
As an example, we provide a simple main memory data structure to parse and query the whole dblp data, written in Java. The code in this section has been tested using the following environment:
- Ubuntu 16.04.6 LTS
- Java 8 (openjdk version "1.8.0_191")
- gunzip (gzip) 1.6
- Note that Java 8+ is required to run our example parser. The code will not work with earlier Java versions.
- Also note that more recent versions of the
dblp.xml
have to grown to no longer fit into the standard 4GB memory allocated by the JVM. Try runningjava
with larger allocations like-Xmx8G
.
Running the parser
Please load the files
from our web server into a local directory. E.g., you may run the following command:
wget https://dblp.org/src/DblpExampleParser.java \ https://dblp.org/src/mmdb-2019-04-29.jar \ https://dblp.org/xml/release/dblp-2019-04-01.xml.gz \ https://dblp.org/xml/release/dblp-2017-08-29.dtd
Unzip the dblp.xml.gz file using:
gunzip dblp-2019-04-01.xml.gz
Compile the parser:
javac -cp mmdb-2019-04-29.jar DblpExampleParser.java
Run the example application:
java -Xmx8G -cp mmdb-2019-04-29.jar:. DblpExampleParser dblp-2019-04-01.xml dblp-2017-08-29.dtd
JavaDoc and sources
The JavaDoc pages and the sources for the org.dblp.mmdb package are also available for download:
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.