# MARC Bibliographic data processing library for Java
This is a Java library for processing bibliographic data in the following formats:
- ISO 2709/Z39.2
Here is a code example for reading from an ISO 2709 stream and writing into a MarcXchange collection.
try (MarcXchangeWriter writer = new MarcXchangeWriter(out)) {
### MARC to MODS
Here is an example to create MODS from an ISO 2709 stream
Marc marc = Marc.builder()
Result result = new StreamResult(sw);
System.setProperty("http.agent", "Java Agent");
marc.transform(new URL("http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl"), result);
### MARC to Aleph sequential
And here is an example showing how records in "Aleph Sequential") can be parsed
and written into a MarcXchange collection:
try (MarcXchangeWriter writer = new MarcXchangeWriter(out, true)
.setFormat(MarcXchangeConstants.MARCXCHANGE_FORMAT)) {
Marc marc = Marc.builder()
### MARC in Elasticsearch
Another example, writing compressed Elasticsearch bulk format JSON from an ANSEL MARC input stream:
MarcValueTransformers marcValueTransformers = new MarcValueTransformers();
// normalize ANSEL diacritics
marcValueTransformers.setMarcValueTransformer(value -> Normalizer.normalize(value, Normalizer.Form.NFC));
where the result can be indexed by a simple bash script using `curl`, because our JSON
format is compatible to Elasticsearch JSON (which is a key/value format serializable JSON).
#!/usr/bin/env bash
# This example file sends compressed JSON lines formatted files to Elasticsearch bulk endpoint
# It assumes the index settings and the mappings are already created and configured.
curl -XPOST -H "Accept-Encoding: gzip" -H "Content-Encoding: gzip" \
--data-binary @$f --compressed localhost:9200/_bulk
The result is a very basic MARC field based index, which is cumbersome to configure, search and analyze.
In upcoming projects, I will show how to turn MARC into semantic data with context,
@ -145,15 +126,14 @@ and indexing such data makes much more sense and is also more fun.
By executing `curl localhost:9200/_search?pretty` the result can be examined.
### Example: finding all ISSNs
This Java program scans through a MARC file, checks for ISSN values, and collects them in
JSON format (the library `org.xbib:content-core:1.0.7` is used for JSON formatting)
public void findISSNs() throws IOException {
// set up MARC listener
@ -212,7 +192,7 @@ private static boolean matchISSNField(MarcField field, MarcField.Subfield subfie
return false;
## Bibliographic character sets
@ -232,7 +212,7 @@ it is recommended to use http://github.com/xbib/bibliographic-character-sets if
The library can be used as a Gradle dependency
@ -251,17 +231,15 @@ First, install OpenJDK 8. If in doubt, I recommend SDKMan http://sdkman.io/ for
Then clone the github repository
git clone https://github.com/xbib/marc
Then change directory into `marc` folder and enter
./gradlew test -Dtest.single=MarcFieldFilterTest
for executing the ISSN demo.
@ -276,17 +254,16 @@ It could be extended to include a command for finding ISSNs (essentially, by cop
there will find a file called marc-{version}.jar in the build/libs folder. To run this Java program,
the command would be something like
java -cp build/libs/marc-1.0.11.jar org.xbib.marc.tools.MarcTool
MarcTool is not perfect yet (it expects some arguments, if not present,
it will merely exit with an unfriendly `Exception in thread "main" java.lang.NullPointerException`).
@ -297,10 +274,9 @@ must be on the runtime class path (e.g. `org.xbib:content-core:1.0.7`, `com.fast
In Gradle, the exact dependencies for the JSON format in the junit test class `MarcFieldFilterTest`
./gradlew dependencies
Then, see section `testRuntime`.
@ -323,82 +299,29 @@ implements modern Java features into the MARC4J code base.
For the curious, I tried to compile a feature comparison table to highlight some differences.
I am not very familiar with MARC4J, so I appreciate any hints, comments, or corrections.
| | MARC4J | xbib MARC |
| started by | Bas Peters | Jörg Prante |
| Project start | 2001 | 2016 |
| Java | Java 5 | Java 17+ |
| Build | Ant | Gradle |
| Supported formats | ISO 2709/Z39.2, MARC (USMARC, MARC 21, MARC XML), tries to parse MARC-like formats with a "permissive" parser | ISO 2709/Z39.2, MARC (USMARC, MARC 21, MARC XML), MarcXchange (ISO 25577:2013), UNIMARC, MAB (MAB2, MAB XML), dialects of MARC (Aleph Sequential, Pica, SISIS format) |
| Bibliographic character set support | builtin, auto-detectable | dynamically, via Java `Charset` API, no autodetection |
| Processing | iterator-based | iterator-based, iterable-based, Java 8 streams for fields, records |
| Transformations | | on-the-fly, pattern-based filtering for tags/values, field key mapping, field value transformations |
| Cleaning | | substitute invalid characters with a pattern replacement input stream |
| Statistics | | can count tag/indicator/subfield combination occurences |
| Concurrency support | | can write to handlers record by record, provides a `MarcRecordAdapter` to turn MARC field events into record events |
| JUnit test coverage | | extensive testing over all MARC dialects, >80% code coverage |
| Source Quality Profile | | |
| Jar size | 447 KB (2.7.0) | 150 KB (1.0.11), 194 KB (2.8.0) |
| License | LGPL | Apache |
# License
Copyright (C) 2016 Jörg Prante
Copyright (C) 2016-2022 Jörg Prante
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@ -411,5 +334,3 @@ distributed under the License is distributed on an "AS IS" BASIS,
See the License for the specific language governing permissions and
limitations under the License.
