BALLAD – Technical Report

Batch Learners Evaluation for Link Discovery.

View on GitHub

BALLAD

BALLAD is the acronym of Batch Learners Evaluation for Link Discovery, a comparison of supervised machine-learning approaches for discovering new links in the Linked Data cloud. We generated this technical report to provide missing information, as for space reasons some content did not fit in the original published paper:

Tommaso Soru and Axel-Cyrille Ngonga Ngomo, "A Comparison of Supervised Learning Classifiers for Link Discovery", in proceeding of the 10th International Conference on Semantic Systems (SEMANTiCS), 2014 [PDF]

Pipeline overview

Execution runtimes

Execution runtimes for the evaluation are thus presented in the following table.

Classifier	D1	D2	D3	D4	D5	D6
Linear SVM	7.16	6.93	2.67	63.94	484.29	75.44
Linear SMO	17.07	12.93	3.77	113.40	369.20	37.16
Polynomial-3 SVM	5.67	6.18	2.63	162.82	1,091.10	103.89
Multilayer Perceptron	15.13	16.10	3.40	96.96	376.26	41.68
Logistic Regression	16.11	14.91	4.61	110.12	275.94	38.48
Linear Regression	16.04	16.21	5.02	120.54	497.43	44.50
Naive Bayes	17.34	17.09	4.39	105.31	375.91	43.79
Decision Table	16.68	16.44	3.78	90.99	389.35	48.87
Random Tree	12.02	11.16	2.24	53.67	347.36	34.11
J48	21.31	15.96	6.99	131.57	98.27	38.46

Comparison with the state of the art

This table shows a performance comparison overview on the six datasets.

Framework	D1	D2	D3	D4	D5	D6
refalign (IMEI2010)	100.00%	100.00%	100.00%	--	--	--
ASMOV	100.00%	93.73%	85.95%	--	--	--
AgreementMaker	98.99%	89.16%	69.92%	--	--	--
EAGLE (Unsupervised)	99.9%	94.23%	81.86%	98.20%	36.21%	45.32%
MARLIN AD-Tree	--	--	--	96.40%	50.50%	54.80%
MARLIN SVM	--	--	--	97.40%	59.90%	70.80%
FEBRL SVM	--	--	--	97.60%	60.10%	71.30%
PPJoin+	--	--	--	91.90%	41.90%	47.40%
[undisclosed]	--	--	--	96.20%	62.10%	70.70%
ACIDS	--	--	--	97.90%	--	--
Multilayer Perceptron	99.50%	99.50%	100.00%	97.43%	35.58%	43.49%