A Data Bootcamp in Taipei

I have enrolled in a Data Bootcamp in Taipei which starts from next week. The program takes 5 months and it covers basic toolboxes that a data engineer usually needs. Indeed, 5 months is quite long if you compare it with other Bootcamps in the U.S. or Europe, which usually take 2-3 months. However, this program I enrolled in is quite cheap (3000€ for 5 months ) and its applicants can even be fully funded by the government if they are eligible (Taiwanese citizenship, <30 yrs old & has been jobless > 3 months).

So, what are covered during these 5 months? In the first 17 weeks the courses are primarily taught based on names of programming languages, for example, Java, JSP, NOSQL, R (including using R to do some basics of Data Mining), Hadoop, Hive, Spark, HTML, etc. , whereas in the last 3 weeks students will be split into groups and are expected to use what they’ve learned to embark on a complete data project (data retrieval, data analysis and data visualization).

Actually, at a first glance, I think it’s a joke that they are unable to give intensive courses and seminars on algorithms/techniques of Machine Learning. Instead, they mentioned a very general term called Data Mining, which covers Statistics, Machine Learning and many other kinds of stuff. The basics of Data Minding, according to what they say, will be taught within 7 days (42hrs) only. This is potentially not good considering that the whole program contains 20 weeks and skills in Machine Learning are vital for a data scientist. However, if you read the title of the program carefully, you will notice that it is the program for those who want to be a Data Engineer instead of a Data Scientist.

Hence, the question now would be, what’s the difference between these two career paths? Here’s a quote from the Blog of Big Data University:

Data Engineers are the data professionals who prepare the “big data” infrastructure to be analyzed by Data Scientists.

Therefore, having a profound knowledge of Machine Learning under your belt is probably not necessary, if you’d like to pursue a career in data engineering.

Then, what’s the problem? Indeed, I’m motivated to become a data scientist instead of a data engineer because I enjoy not only programming but also finding features from unknown things through analytical reasoning. This kind of tendency is probably due to my background in physics. Hence, to fill the gap, I will have to teach myself Machine Learning from time to time, during this five months. Let’s see whether I can manage to learn those things myself while attending to the intensive program. I think it’s probably doable as the program is open for any graduates regardless of their background (so the program won’t be challenging to me, I assume).

For the next post, I’ll write something about my final project.

Hofstadter’s butterfly and the Harper’s Hamiltonian

Here, I will try to teach you some physics. I assume that you might have no knowledge of quantum physics and you probably have learned some linear algebra before. I hope you could learn bits of the Hofstadter’s butterfly after reading this article.

gplot_by_hofstadter
Rendering of the butterfly by Dr. Hofstadter (figure from Wikipedia)

Imagine a 2D periodic lattice in the presence of a weak magnetic field. It was Dr. Hofstadter who did pioneer works on this physical system and obtained the famous Hofstadter’s butterfly, as shown in the above figure. This figure can be obtained numerically by plotting the eigenspectrum of the Harper’s Hamiltonian versus a parameter of the Hamiltonian that controls the magnetic flux (if you don’t know what a Hamiltonian is, just consider it as a  N\times N Hermitian matrix that describes the system of consideration. In quantum mechanics, once the Hamiltonian is known, the system is presumably determined and we will be able to predict its dynamics. This is similar to Newtonian mechanics: once the forces are known, we are able to know how the particle moves).

It is possible to claim that the observed butterfly structure is fractal (i.e. when the magnetic flux is irrational the structure is a cantor set) through analytical reasoning (See Dr. Hofstadter’s paper).  In addition, one can easily observe the property of self-similarity in the Hofstadter’s butterfly, which is an interesting feature of a fractal structure. The term “Self-similarity” can be explained easily in one sentence from Wikipedia:

Self-similarity means small fragments of the structure contain a (distorted) copy of the entire structure.

kochsnowgif16_800x500_2
Self-similarity of a Koch curve (this figure is from Wikipedia)

To discuss some facts about the Harper’s Hamiltonian, let’s first define it:

\mathbf{H}_{ij}=\lambda_j(\lambda,b,\phi)\delta_{i,j} +s(\delta_{i,j+1}+\delta_{i,j-1}),

where \lambda_j(\lambda,b,\phi)=\lambda\cos(2\pi b j+\phi) and b is a parameter that tunes the magnetic flux. Indeed, b is the variable that requires being explored in order to observe the famous butterfly and the restriction \lambda=2s will be enforced if we want the Hamiltonian to describe a 2D periodic lattice in the presence of a weak magnetic field.

Next, we shall see how \lambda, b and \phi affect the system. When b is rational, i.e. b=\frac{p}{q},  the sequence \lambda_j is periodic, that is, \lambda_{j+q}= \lambda_{j} for some positive integer q. In this case,  one can argue that the eigenspectrum of the Harper’s Hamiltonian has q bands using the Bloch’s theorem. On the other hand, if b is irrational, surprisingly, the eigenspectrum is known to be a cantor set (of zero measure when \lambda=2). Non-rigorously speaking, the bands of the spectrum can be split into sub-bands and the sub-bands can be split into sub-sub-bands and so on, in a way as a cantor set.

500px-cantor_set_in_seven_iterations-svg
Cantor set in seven iterations (figure from Wikipedia)

Now, let’s consider how spectral properties would change as a function of  \lambda. This can be seen by plotting the Participation Ratio (an explanation of its inverse version see here) of the probability density of all eigenfunctions as a function of \lambda. The Participation Ratio is a quantity that reflects how localized an eigenfunction is. It goes from 0 (when the wave function is perfectly localized) to 1 (when the wave function is perfectly extended).

The plot is shown below. Note that the plot is a 2D plot that is mapped from 3D. The parameters of the plot are s=1, b=89/144 and N=987. Indeed, b=89/144 is a rational approximation to the irrational number, since we can never reach a true irrational number because of our computer’s ability. Also, the Participation Ratio I plot is \phi averaged actually. The purpose of doing so will be explained later.

pr_n987bgmean
The \phi-averaged Participation Ratio

Now, first of all, one can see that there seems to be a localization-delocalization transition around \lambda=2. That is to say, at \lambda<2 almost all the eigenstates are extended and all the eigenstates become quite localized as soon as \lambda goes larger than  2. Indeed, one is able to show analytically that this localization-delocalization transition happens exactly at \lambda=2. Another way to observe this phenomenon is to draw the animation of Local Density of States, in which we can observe a sharp transition at \lambda=2.

 

ldos-daah-n99-lambda0to4
The Local Density of States animation (increasing \lambda)

Also, surprisingly, eigenfunctions at \lambda=2 are known to be multi-fractal! I will write another article to introduce you about what “fractal” or “multi-fractal” means.

 

Now, let’s see how \phi plays the role. If we assume \lambda<2 and scan \phi from 0 to 2\pi, we’ll see some localized eigenfunctions pop up and then disappear. Interestingly, these eigenfunctions, when they are localized, they are localized at the ends of the system and their corresponding eigenvalues are within band gaps of the system (as you can see from the animation below). These localized eigenfunctions are actually known to have a topological origin.

n200-animation
The Local Density of States animation (shifting \phi)

Hence, if the Participation Ratio is \phi averaged, the existence of in-gap-localized eigenfunctions will be detected, since this type of eigenfunctions can become localized and have low Participation Ratios at certain \phi values, whereas the other extended eigenfunctions are always in-band and have higher Participation Ratios no matter how \phi changes. Due to this fact, we are able to say how many gaps or bands are there in the plot of the averaged Participation Ratio. For instance, there are 6 low averaged Participation Ratios at \lambda=0.5. It means that eigenfunctions with these low averaged Participation Ratios will become in-gap-localized at certain (and possibly also different) \phi values. In other words, there are 6 observable gaps (or equivalently, 7 bands) at \lambda=0.5.

Now, one can see from the plot of the averaged Participation Ratios, that as \lambda increases, more and more gaps (or bands) emerge before \lambda reaches to the localization-delocalization-transition point. However, it is not true that the number of gaps (or bands) will increase with \lambda (suppose \lambda<2), because we have fixed b=89/144, which corresponds to a fix amount of bands (or gaps), i.e. 144 bands( 143 gaps).

Then, a question arises: say, at \lambda=0.5,  why didn’t we see 143 gaps instead of only 6 gaps in the plot? The reason is that I have too less amount of eigenfunctions to reveal the very tiny gaps within the spectrum (tiny gaps are there, but we don’t have enough eigenvalues there to unravel those hidden tiny gaps). Suppose we choose the system size N (N is also the number of eigenvalues) to be extremely large, then it should be easier for us to discover more or even all the gaps in the plot at a fixed \lambda. On the other hand, if we restrict \lambda<2,  it is true that sizes of gaps increase proportionally to \lambda. Thus, a larger value of \lambda will make the tiny gaps bigger and thus easier to be unraveled whenever the number of eigenfunctions is finite. This is the reason that more and more gaps become visible as  \lambda increases.

The last question: whether the plot reveals some fractal structure?  Whether the gaps emerge in a way as a cantor set while \lambda increases? If you see the plot, it is really hard to tell. But I tend to say that it’s not fractal. The reasoning involves the assumption that all the sizes of the gaps grow uniformly as a function of \lambda (\lambda<2) and you know that sizes of all the gaps are not the same  (therefore it’s possible that the gaps of a kth. order cantor-like set can’t all be observed at the same time).